CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

I recently installed CS (version 4.0.2) and Syrthes (version 4.3.0). To check their installation, I just run the test case 3disks2D (Three-2D-disks.pdf).
It turns out that each domain (fluid and solid) without coupling is perfectly working. However, when coupling by both adding a term in "Conjugate heat exchange" in the saturne gui and syrthes gui, I have an error message:

Code: Select all

rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS3$ ./runcase
 Coupling execution between: 
   o Code_Saturne [1 domain(s)];
   o SYRTHES      [1 domain(s)];


                      Code_Saturne is running
                      ***********************

 Version: 4.0
 Path:    /home/rolland/Code_Saturne_4.0.2

 Result directory:
   /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052


 Single processor Code_Saturne simulation.
 Single processor SYRTHES simulation.


 ****************************
  Preparing calculation data
 ****************************

 SYRTHES4 home directory: /home/rolland/syrthes4.3.0/arch/Linux_x86_64
 MPI home directory: /usr
 Building the executable file syrthes.. 

  *****  SYRTHES compilation and link completed *****

 ***************************
  Preprocessing calculation
 ***************************


  SyrthesCase summary:

    Name =                         solid
    Data file =                    solidcoupling.syd
    Update Data file =             True
    Do preprocessing =             True
    Debug =                        False
    Case dir. =                    /home/rolland/Documents/EXEMPLE3/CAS3/solid
    Execution dir. =               /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid
    Data dir. =                    /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid
    Source dir. =                  /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid/src
    Post dir. =                    /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid/POST

    Conduction mesh dir. =         /home/rolland/Documents/EXEMPLE3/CAS3/solid/
    Conduction mesh name =         3rond2d.syr

    Total num. of processes =      1
    Logfile name            =      syrthes.log
    Echo =                         True
    Parallel run =                 False
    Do preprocessing =             True

   SyrthesParam summary
    Param file name =            solidcoupling.syd
    Conduction mesh name =       3rond2d.syr
    Radiation mesh name =        None
    Result prefix. =             resu1
    Restart =                    False
    Coupling =                   True
    Interpreted functions =      False


  ---------------------------
  Start SYRTHES preprocessing
  ---------------------------

Updating the mesh file name.. 
   -> OK


 **********************
  Starting calculation
 **********************

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 18387 on node rolland-Precision-WorkStation-T7400 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
 solver script exited with status 139.

Error running the coupled calculation.

Either Code_Saturne or SYRTHES may have failed.

Check Code_Saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.


 ****************************
  Saving calculation results
 ****************************

 Error in calculation stage.
It seems that CS and Syrthes are correctly installed (according to my last post: CS_4.0.2 and syrthes4.3.0 coupling issues). So to solve the problem:

- I created a coupling file as suggested using:

Code: Select all

code_saturne create -c fluid --syrthes solid
- I followed to the letter the file Three-2D-disks.pdf: I put the same Time step,the same Number of iterations and Output every 'n' time step, but the same problem shows up.

The message error (in the enclosed files) are not really helpful: only referring to libmpi.so.1 or libsaturne.so.0 without much details.


I someone has any clue,
It would be very much appreciated!!! :D

Thanks,

QR
Attachments
syrthes.log
Syrthes log file from RESU_COUPLING-...-solid folder
(7.16 KiB) Downloaded 279 times
error.zip
CS Error file from RESU_COUPLING-...-fluid folder
(572 Bytes) Downloaded 261 times
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

I am not 100% sure, but this looks quite a bit like a case I have already encountered.

You did not post your setup data, so I can't be sure, but how many time steps of each code did you require (the computation stops at the smallest number, but might not stop cleanly if the requested number of time steps is 0 or 1).

There is also a bug in Syrthes which might cause it to not stop cleanly, but the computation should still go through.

Regards,

Yvan
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

I put the same time step of 0.5 in both GUI and the same Number of time step of 50000. However, changing between 1, 100 and 50000 doesn't change anything. That's the same conclusion for the time steps for 0.5 up to 10 s.
In the enclosed files, you will find the configuration files for running the code and the configuration files for installing CS and Syrthes. I hope it is the right files you requested.

Thank you,

Regards,

QR
Attachments
lauch(CS) et setup(Syrthes).zip
Configuration files for installing CS ans Syrthes
(1.84 KiB) Downloaded 263 times
3disks2D-fluid-coupling.xml
Configuration of the fluid domain
(7.5 KiB) Downloaded 352 times
solidcoupling.syd.zip
Configuration of the solid domain
(954 Bytes) Downloaded 242 times
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

Could you post the syrthes.log and Code_Saturne listing files again with this setup, to see if there is any difference ?

Regards,

Yvan
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

You will find the syrthes.log (RESU_COUPLING/.../solid/) and the listing (RESU_COUPLING/.../fluid/) files.

Regards,

QR
Attachments
syrthes.log
Syrthes log file
(7.16 KiB) Downloaded 267 times
listing.zip
CS listing file
(7.31 KiB) Downloaded 277 times
Erwan Le Coupanec
Posts: 45
Joined: Sun Sep 08, 2013 8:50 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Erwan Le Coupanec »

Hi,

If you take a look at the Code_Saturne listing, before the Variables Initialisation section, you will see this warning message

Code_Saturne : ../../../Downloads/code_saturne-4.0.2/src/base/cs_syr4_coupling.c:958 : Avertissement
Couplage avec SYRTHES impossible :
308 centres d'éléments du maillage "Faces SYRTHES solid"
non localisés sur le maillage SYRTHES.

It says that for some coupled faces on the fluid mesh, counterpart coupled faces on the solid mesh have not been found. Actually some coupling references on the solid mesh are missing in the solution of the tutorial: the complete references are 1 4 7 11 and not just 1.

Regards,
Erwan.
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

Thank you for this tip!
Writing 1 4 7 11 instead of 1 in the Conjugate heat transfer of the Syrthes GUI and the error message in listing file disappears!
However I have still the same error message in the Konsole:

Code: Select all

 **********************
  Starting calculation
 **********************

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 2939 on node rolland-Precision-WorkStation-T7400 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
 solver script exited with status 139.

Error running the coupled calculation.

Either Code_Saturne or SYRTHES may have failed.

Check Code_Saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.


 ****************************
  Saving calculation results
 ****************************

 Error in calculation stage.
I tried to add s4bin = '...' and s4home = '...' in the syrthes.py file according to the Guide to couple Saturne & SYRTHES tutorial but nothing changed.
I'm starting to despair :| . Eventually, Is there any way to bypass MPI Library? Or maybe, meshes (3rond2d.des and 3rond2d_fluide.des) don't even match.

Thank you in advance,

QR
Attachments
listing.zip
listing file
(7.54 KiB) Downloaded 281 times
syrthes.log
syrthes.log file
(7.15 KiB) Downloaded 276 times
Erwan Le Coupanec
Posts: 45
Joined: Sun Sep 08, 2013 8:50 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Erwan Le Coupanec »

Hi,

Have you retried with the same number of time steps and in CS, using the unsteady algorithm with constant time step ?

The error seems to be on syrthes side, and seems to be an invalid memory read or write. But it is not clear at all.

Have you tried also to use directly the solutions files for this tutorial?
The parameters files solutions are in the source directory of Code_Saturne, in examples/4-2Ddisks.

Regards,
Erwan.
Erwan Le Coupanec
Posts: 45
Joined: Sun Sep 08, 2013 8:50 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Erwan Le Coupanec »

By the way, on CS side, in your xml

the formula for the density is wrong

Code: Select all

p0 / 287 * temperature + 273;
it should be

Code: Select all

p0 / (287 * (temperature + 273));
.
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

Non, there is no way to use the coupling without MPI in current versions of the code(s).

The crash occurs inside Syrthes, but is not easy to pinpoint. I am not even sure it occurs inside MPI, and I doubt MPI is the issue, as mesh location seems to work correctly (Code_Saturne complains inside an MPI-related function, but that may simply be due to waiting for Syrthes at that point.

If you still have the run directory (/home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid), could you run "addr2line" inside the that directory, with:

addr2line -f syrthes 0x41d5f2
addr2line -f syrthes 0x414044
addr2line -f syrthes 0x40a95b
addr2line -f syrthes 0x402c4f

Or re-run it with the adresses in backtrace at the end of the syrthes.log ?

Hopefully, this will at least provide function name info.

An alternative is to re-run the "run_solver" file in the /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052 directory, editing it so as to insert "valgrind" just before "syrthes" (assiming Valgrind is installed). This will also provide more info.

Regards,

Yvan
Post Reply