I am trying to run the CS/Syrthes coupled test case 3_2D_DISKS_2.
Each simulation works separately as expected : code_saturne 8.0.4, Syrthes5.0.
However, when running the coupled simulation, I get the following MPI Error :
Code: Select all
mpiexec has exited due to process rank 1 with PID 0 on
node po21210 exiting improperly. There are three reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.
This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
You can avoid this message by specifying -quiet on the mpiexec command line.
--------------------------------------------------------------------------
solver script exited with status 1.
Error running the coupled calculation.
Either of the coupled codes may have failed.
Check the following log files for details.
Domain FLUID (code_saturne):
run_solver.log, error*.
Domain SOLID (SYRTHES):
syrthes.log / listing
- the run_solver.log file does not exist
- 'listing' is a broken link to that non-existing file
- the preprocessor.log file is present and finishes with "preprocessor finish"
I am running the simulation on Scibian11 and saw in the following topic viewtopic.php?t=3052 there might be a problem with the MPI compilation of CS on Debian, so I rebuilt my code_saturne 8.0.4 entirely. This did not help.
Next, I noticed that the MPI splitting is done by the mpmd_exec.sh script called by run_solver next by. So I went on and tried to run the ./cs_solver and ./syrthes commands of this script exactly in their respective directories. Surprisingly, the cs_solver command does something and starts to write the expected run_solver.log until it crashes due to the absence of a running syrthes instance (which is ok).
When I replace the cs_solver and syrthes commands in the mpmd_exec.sh script by a simple 'hostname' command, the threads are correctly created and each thread spits out the host name.
Does anyone has a idea on what is wrong with my setup ?