MPIEXEC code saturne on multiple nodes
Posted: Mon Feb 06, 2023 3:45 pm
I am having trouble mpiexec'ing code_saturne 7.2.1 at Hartree HPC on multiple nodes. I can mpiexec on multiple cores/processors on the SINGLE node, just not on two or more nodes.
The commands I use, on two nodes with 32 cores per node (64 cores in total), are
cd $CASEDIR
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --initialize
cd RESU/${CASEID}
mpiexec -n 64 -- cs_solver --trace
cd ../..
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --finalize
The process hangs just before MAIN CALCULATION, after the following step
** Field values on boundary_faces
For the mpiexec (cs_sovler) step, I've also tried the following commands to no avail,
mpiexec -machinefile ${HOSTSFILE} -n 64 -N 32 -- cs_solver --trace
mpiexec -machinefile ${HOSTSFILE} -n 64 --map-by ppr:32:node -- cs_solver --trace
I've compiled code_saturne 7.2.1 myself using install_saturne.py script, gcc 9.3, openmpi 4.0.4 and python 3.9.
The initial setup file is as follows
download yes
debug no
prefix {prefix-path}
compC {gcc-9 path}
mpiCompC {mpicc-4.0.4 path}
compF {gfortran-9 path}
compCxx {g++-9 path}
mpiCompCxx {mpicxx-4.0.4 path}
python {python-3.9 path}
disable_gui yes
disable_frontend no
salome no
hdf5 yes yes {HDF5 install path}
cgns yes yes {CGNS install path}
med yes yes {MED install path}
scotch yes yes {SCOTCH install path}
I can mpiexec on a single node using all its 32 cores, but cannot do the same on two or more nodes.
What am I missing? The architecture info listed in run_solver.log, when run on two nodes, is
Local case configuration:
Date: Mon 06 Feb 2023 12:16:00 GMT
System: Linux 3.10.0-957.12.2.el7.x86_64
Machine: sqg5b90.bullx
Processor: model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Memory: 191339 MB
User: zxc09-jxl04 (zxc09-jxl04)
Directory: ....../RESU/run64host
MPI ranks: 64 (appnum attribute: 0)
MPI ranks per node: 32
OpenMP threads: 2
Processors/node: 2
Compilers used for build:
C compiler: gcc (GCC) 9.3.0
C++ compiler: g++ (GCC) 9.3.0
Fortran compiler: GNU Fortran (GCC) 9.3.0
MPI version: 3.1 (Open MPI 4.0.4)
OpenMP version: 4.5
External libraries:
PT-SCOTCH 7.0.1
I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1
The commands I use, on two nodes with 32 cores per node (64 cores in total), are
cd $CASEDIR
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --initialize
cd RESU/${CASEID}
mpiexec -n 64 -- cs_solver --trace
cd ../..
code_saturne run --id=${CASEID} --param=${CASEDIR}/DATA/setup.xml --finalize
The process hangs just before MAIN CALCULATION, after the following step
** Field values on boundary_faces
For the mpiexec (cs_sovler) step, I've also tried the following commands to no avail,
mpiexec -machinefile ${HOSTSFILE} -n 64 -N 32 -- cs_solver --trace
mpiexec -machinefile ${HOSTSFILE} -n 64 --map-by ppr:32:node -- cs_solver --trace
I've compiled code_saturne 7.2.1 myself using install_saturne.py script, gcc 9.3, openmpi 4.0.4 and python 3.9.
The initial setup file is as follows
download yes
debug no
prefix {prefix-path}
compC {gcc-9 path}
mpiCompC {mpicc-4.0.4 path}
compF {gfortran-9 path}
compCxx {g++-9 path}
mpiCompCxx {mpicxx-4.0.4 path}
python {python-3.9 path}
disable_gui yes
disable_frontend no
salome no
hdf5 yes yes {HDF5 install path}
cgns yes yes {CGNS install path}
med yes yes {MED install path}
scotch yes yes {SCOTCH install path}
I can mpiexec on a single node using all its 32 cores, but cannot do the same on two or more nodes.
What am I missing? The architecture info listed in run_solver.log, when run on two nodes, is
Local case configuration:
Date: Mon 06 Feb 2023 12:16:00 GMT
System: Linux 3.10.0-957.12.2.el7.x86_64
Machine: sqg5b90.bullx
Processor: model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Memory: 191339 MB
User: zxc09-jxl04 (zxc09-jxl04)
Directory: ....../RESU/run64host
MPI ranks: 64 (appnum attribute: 0)
MPI ranks per node: 32
OpenMP threads: 2
Processors/node: 2
Compilers used for build:
C compiler: gcc (GCC) 9.3.0
C++ compiler: g++ (GCC) 9.3.0
Fortran compiler: GNU Fortran (GCC) 9.3.0
MPI version: 3.1 (Open MPI 4.0.4)
OpenMP version: 4.5
External libraries:
PT-SCOTCH 7.0.1
I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1