MPIEXEC code saturne on multiple nodes

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
zmchen
Posts: 2
Joined: Mon Feb 06, 2023 3:14 pm

MPIEXEC code saturne on multiple nodes

Post by zmchen »

I am having trouble mpiexec'ing code_saturne 7.2.1 at Hartree HPC on multiple nodes. I can mpiexec on multiple cores/processors on the SINGLE node, just not on two or more nodes.

The commands I use, on two nodes with 32 cores per node (64 cores in total), are


cd $CASEDIR
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --initialize

cd RESU/${CASEID}
mpiexec -n 64 -- cs_solver --trace

cd ../..
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --finalize



The process hangs just before MAIN CALCULATION, after the following step


** Field values on boundary_faces


For the mpiexec (cs_sovler) step, I've also tried the following commands to no avail,


mpiexec -machinefile ${HOSTSFILE} -n 64 -N 32 -- cs_solver --trace

mpiexec -machinefile ${HOSTSFILE} -n 64 --map-by ppr:32:node -- cs_solver --trace



I've compiled code_saturne 7.2.1 myself using install_saturne.py script, gcc 9.3, openmpi 4.0.4 and python 3.9.

The initial setup file is as follows

download yes
debug no
prefix {prefix-path}
compC {gcc-9 path}
mpiCompC {mpicc-4.0.4 path}
compF {gfortran-9 path}
compCxx {g++-9 path}
mpiCompCxx {mpicxx-4.0.4 path}
python {python-3.9 path}
disable_gui yes
disable_frontend no
salome no
hdf5 yes yes {HDF5 install path}
cgns yes yes {CGNS install path}
med yes yes {MED install path}
scotch yes yes {SCOTCH install path}

I can mpiexec on a single node using all its 32 cores, but cannot do the same on two or more nodes.

What am I missing? The architecture info listed in run_solver.log, when run on two nodes, is


Local case configuration:

Date: Mon 06 Feb 2023 12:16:00 GMT
System: Linux 3.10.0-957.12.2.el7.x86_64
Machine: sqg5b90.bullx
Processor: model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Memory: 191339 MB
User: zxc09-jxl04 (zxc09-jxl04)
Directory: ....../RESU/run64host
MPI ranks: 64 (appnum attribute: 0)
MPI ranks per node: 32
OpenMP threads: 2
Processors/node: 2

Compilers used for build:
C compiler: gcc (GCC) 9.3.0
C++ compiler: g++ (GCC) 9.3.0
Fortran compiler: GNU Fortran (GCC) 9.3.0

MPI version: 3.1 (Open MPI 4.0.4)
OpenMP version: 4.5

External libraries:
PT-SCOTCH 7.0.1

I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: MPIEXEC code saturne on multiple nodes

Post by Yvan Fournier »

Hello,

Does the same mpiexec command work with a "hello world" command ? Is this a system-wide MPI (which should already be configured) or you local install ?

Can you check the "run_solver" script generated in the run directory and see if the command is the one expected ?
Could environment modules loaded or not loaded have an impact ? Remember that code_saturne loads the environment modules detected at configure time.

If the command you want is use is the the one use, usethe post-install setup (<install_path>/etc/code_saturne.cfg or user's $HOME/.code_saturne.cfg) to force an MPI command.

I have not needed hostfiles in a long time on our machines (we use SLURM, and MPI libraries configured with direct SLURM support). So this has not been tested recenty, but should still work as you simply pass extra commands to MPI. But first, check that the build and run-time MPI match and that your MPI library works across nodes.

Best regards,

Yvan
zmchen
Posts: 2
Joined: Mon Feb 06, 2023 3:14 pm

Re: MPIEXEC code saturne on multiple nodes

Post by zmchen »

Hi Yvan,

Thanks for your quick response.

Will do the checking and testings.

Kind regards,
Post Reply