CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

Thank you for you quick answer. I started from scratch the model using directly files and meshes from /home/rolland/Downloads/code_saturne-4.0.2/examples/4-2Ddisks/.
The Number of iterations and the reference time step are set to 600 and 0.5 respectively in both GUIs. The formula of density is correctly written: density = p0 /(287*(temperature + 273.0));. CS is running using unsteady flow algorithm with k-epsilon Linear Production turbulence model and with Constant for Time step option.
Unfortunately, I removed the folder /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid, but I ran the addr2line command line in a new folder (CAS4), here is the result:

Code: Select all

rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes 0x41d5f2 -f
ecrire_geom_cplcfd
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes  0x414044 -f
cfd_surf_init
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes  0x40a95b -f
syrthes
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes  0x402c4f -f
main
??:?
I implemented valgrind in the runsolver file just before ./syrthes and ran it:

Code: Select all

rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017$ ./run_solver 
==27404== Memcheck, a memory error detector
==27404== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==27404== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==27404== Command: ./syrthes -d tmp.data -n 1 --name solid --log syrthes.log
==27404== 
==27404== Conditional jump or move depends on uninitialised value(s)
==27404==    at 0x5939A03: vfprintf (vfprintf.c:1661)
==27404==    by 0x59F84F4: __vasprintf_chk (vasprintf_chk.c:66)
==27404==    by 0x59F8431: __asprintf_chk (asprintf_chk.c:32)
==27404==    by 0x545C0B5: opal_output_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x54594E7: opal_init_util (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53A8F0A: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Conditional jump or move depends on uninitialised value(s)
==27404==    at 0x5939A03: vfprintf (vfprintf.c:1661)
==27404==    by 0x59F84F4: __vasprintf_chk (vasprintf_chk.c:66)
==27404==    by 0x59F8431: __asprintf_chk (asprintf_chk.c:32)
==27404==    by 0xB9D4C6E: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Conditional jump or move depends on uninitialised value(s)
==27404==    at 0x4C2E0F8: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404==    by 0x597682D: strdup (strdup.c:41)
==27404==    by 0x545BE21: opal_output_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0xB9D4C7B: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Conditional jump or move depends on uninitialised value(s)
==27404==    at 0x4C2E0F8: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404==    by 0x545BE33: opal_output_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0xB9D4C7B: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404==    by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Syscall param writev(vector[...]) points to uninitialised byte(s)
==27404==    at 0x59DF417: writev (writev.c:49)
==27404==    by 0x8340062: mca_oob_tcp_msg_send_handler (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404==    by 0x8341225: mca_oob_tcp_peer_send (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404==    by 0x83450A5: mca_oob_tcp_send_nb (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404==    by 0x8134DE1: orte_rml_oob_send (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==27404==    by 0x8135403: orte_rml_oob_send_buffer (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==27404==    by 0x8750A0E: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==27404==    by 0x53A94EE: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==  Address 0x6ad4541 is 161 bytes inside a block of size 256 alloc'd
==27404==    at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404==    by 0x5430EF9: opal_dss_buffer_extend (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x54312AD: opal_dss_copy_payload (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x540DDAD: orte_grpcomm_base_pack_modex_entries (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x87508EF: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==27404==    by 0x53A94EE: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404==    by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Invalid read of size 8
==27404==    at 0x41D5E5: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==  Address 0x6af3610 is 0 bytes after a block of size 16 alloc'd
==27404==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404==    by 0x44D8D4: lire_syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x4432DE: lire_maill (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x403E4D: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Invalid read of size 8
==27404==    at 0x41D5EE: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==  Address 0xdf1fa70 is 0 bytes after a block of size 16 alloc'd
==27404==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404==    by 0x41D54B: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== 
==27404== Invalid read of size 8
==27404==    at 0x41D5F2: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==  Address 0x50 is not stack'd, malloc'd or (recently) free'd
==27404== 
==27404== 
==27404== Process terminating with default action of signal 11 (SIGSEGV)
==27404==  Access not within mapped region at address 0x50
==27404==    at 0x41D5F2: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==    by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==  If you believe this happened as a result of a stack
==27404==  overflow in your program's main thread (unlikely but
==27404==  possible), you can try to increase the size of the
==27404==  main thread stack using the --main-stacksize= flag.
==27404==  The main thread stack size used in this run was 8388608.
==27404== 
==27404== HEAP SUMMARY:
==27404==     in use at exit: 2,775,703 bytes in 6,156 blocks
==27404==   total heap usage: 13,511 allocs, 7,355 frees, 13,775,573 bytes allocated
==27404== 
==27404== LEAK SUMMARY:
==27404==    definitely lost: 2,967 bytes in 46 blocks
==27404==    indirectly lost: 30,912 bytes in 4 blocks
==27404==      possibly lost: 0 bytes in 0 blocks
==27404==    still reachable: 2,741,824 bytes in 6,106 blocks
==27404==         suppressed: 0 bytes in 0 blocks
==27404== Rerun with --leak-check=full to see details of leaked memory
==27404== 
==27404== For counts of detected and suppressed errors, rerun with: -v
==27404== Use --track-origins=yes to see where uninitialised values come from
==27404== ERROR SUMMARY: 22 errors from 8 contexts (suppressed: 0 from 0)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 27404 on node rolland-Precision-WorkStation-T7400 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I hope it will help.
Thank you,

QR
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

The "addr2line" provided useful info, that is the crash seems to be in a write in "ecrire_geom_cplcfd" in Syrthes, which is a function used to write info about the geometry regarding coupled faces. I'll try to check if this is optional and can be deactivated, and keep you updated (but I may keep you waiting a day or two).

Regards,

Yvan
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

I recently re-installed CS and Syrthes by making sure that all libraries (scotch, med, hdf5, mpi and metis) perfectly matches in the CS and Syrthes configuration files (respectively launch.config and setup.ini).
To my great disappointment, I have still the MPI_ABORT in the beginning of Starting calculation.

So I changed the configuration of the code_saturne.cfg in the /home/rolland/Code_Saturne_4.0.2/etc/ directory: I changed mpiexec = mpiexec for mpiexec = mpirun. I'm not sure to understand the difference between both files (mpirun and mpiexec both link to the same file orterun). I have an error message:

Code: Select all

 **********************
  Starting calculation
 **********************

/home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20160106-1523/mpmd_exec.sh : ligne 14 :  7172 Erreur de segmentation  (core dumped) ./syrthes -d tmp.data -n 1 --name solid --log syrthes.log $@
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
^Cmpirun: abort is already in progress...hit ctrl-c again to forcibly terminate
I guess that's good news as a "Primary job" has been terminated, which doesn't seem the case using mpiexec. But at least I know the problem comes from Syrthes. mpmd_exec.sh correspond to the new file that appeared (along with run_solver in the RESU_COUPLING directory) when using mpirun instead of mpiexec.

Does anyone know how to solve the "Erreur de segmentation (core dumped)"? or the failing process returning a non zero exit code?

Thank you.

QR
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

I've finally looked into the problem, and there are several issues:

- The crash is due to a bug writing the coupled area in Syrthes for a 2D mesh (and does not occur in 3D). The attached "ecrire_fichier.c" should fix this (replace the one from Syrthes sources with this and reinstall)

- There is another bug in this version of Syrthes which causes a crash (after the one you obtain) at the end of the computation when closing a file. The attached "syrthes.c" should fix this (same solution, you need to patch and reinstall).

Also, the tutorial seems to be missing some specifications: the Syrthes mesh faces you need to couple with Code_Saturne are not references 1 4 7 11 (the PDF only shows "1" in a screeshot at the end, and does not list those references; they are found in the "solid-coupling.syd" of the tutorial example files). References 2, 5, 8 are only for uncoupled faces.

Combining these fixes should help you get a running computation.

Regards,

Yvan
Attachments
ecrire_fichier.c
(50.45 KiB) Downloaded 432 times
syrthes.c
(45.44 KiB) Downloaded 415 times
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

I changed ecrire_fichier.c and syrthes.c and it's working !! Thank you for your time!

Regards,

QR
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

I decided to extend from a 2D (Three-2D-disks tutorial) model to a 3D model to fully check the installation. I created a very simple model: two volumes or two cubes (one fluid, one solid domain) put side by side with one common interface. This interface reference is set for Conjugate heat transfer in both CS and Syrthes.
Even by using the same Instance name (Name of the CFD code instance in Syrthes), even by using different Projection Axis and making sure that Selection criteria (CS) and References (Syrthes) matches for the same interface, I still have an error message:

Code: Select all

 **********************
  Starting calculation
 **********************

App launch reported: 1 (out of 1) daemons - 2 (out of 2) procs
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[53350,1],0]
  Exit code:    1
--------------------------------------------------------------------------
[rolland-Precision-WorkStation-T7400:05003] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[rolland-Precision-WorkStation-T7400:05003] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 solver script exited with status 1.

Error running the coupled calculation.

Either Code_Saturne or SYRTHES may have failed.

Check Code_Saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.


 ****************************
  Saving calculation results
 ****************************

 Error in calculation stage.
One thing is sure: the error message is different since the patch I was provided is installed. However it has to do again with the MPI library. I don't know if I have to remove the new "ecrire_fichier.c" since it is a 3D model or if there are wrong parameters implemented in CS or Syrthes.
A quick search in the internet shows that someone had the same issue (Code_Saturne forum: Install several versions and compile with Salome and Syrthes). According to this forum, it seems to be a mpicc problem. Running "mpicc -show" gives:

Code: Select all

rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/TEST2/CAS1$ mpicc -show
gcc -I/home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/include -pthread -Wl,-rpath -Wl,/home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/lib -Wl,--enable-new-dtags -L/home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/lib -lmpi
I doesn't shock me as as all lines refer to the right folder .../syrthes4.3.0/... .
Another forum tried to run "ldd" on Code_Saturne (cs_solver) and Syrthes (syrthes) executables (CS forum: Unable to run a coupled simulation with Syrthes). Maybe it is a library problem with mpicc? here's the result:

Code: Select all

rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/TEST2/CAS1/solid$ ldd syrthes
        linux-vdso.so.1 =>  (0x00007ffee7568000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f664b9c1000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f664b5fc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f664bcc7000)
rolland@rolland-Precision-WorkStation-T7400:~/Code_Saturne_4.0.2/libexec/code_saturne$ ldd cs_solver
        linux-vdso.so.1 =>  (0x00007ffda245e000)
        libsaturne.so.0 => /home/rolland/Code_Saturne_4.0.2/lib/libsaturne.so.0 (0x00007f7b14972000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7b145ad000)
        libple.so.1 => /home/rolland/Code_Saturne_4.0.2/lib/libple.so.1 (0x00007f7b1439b000)
        libcgns.so.3.1 => /usr/lib/libcgns.so.3.1 (0x00007f7b140cc000)
        libmedC.so.1 => /home/rolland/syrthes4.3.0/extern-libraries/opt/med-3.0.7/arch/Linux_x86_64/lib/libmedC.so.1 (0x00007f7b13db0000)
        libptscotch.so => /usr/local/lib/libptscotch.so (0x00007f7b13b64000)
        libscotch.so => /usr/local/lib/libscotch.so (0x00007f7b138d8000)
        libmpi.so.1 => /home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/lib/libmpi.so.1 (0x00007f7b135fd000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f7b13297000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f7b12f7d000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7b12c77000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7b12a73000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7b16383000)
        libhdf5.so.7.4.0 => /home/rolland/salome/Salome-V7_6_0-x86_64/prerequisites/Hdf5-1810/lib/libhdf5.so.7.4.0 (0x00007f7b125b6000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7b122b2000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7b1209c000)
        libopen-rte.so.7 => /home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/lib/libopen-rte.so.7 (0x00007f7b11e20000)
        libopen-pal.so.6 => /home/rolland/syrthes4.3.0/extern-libraries/opt/openmpi-1.8.3/arch/Linux_x86_64/lib/libopen-pal.so.6 (0x00007f7b11b44000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7b11926000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f7b1170d000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f7b114eb000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f7b112af000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7b110a7000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f7b10ea4000)
For me it seems OK, all common libraries match. But to solve the problem, do I have to add a mpicc path in setup.ini as it specifically tells not to add a path for a MPI installation? There isn't by the way a mpicc installation file in the .../syrthes4.3.0/extern-libraries/src/ folder.

I'm sorry for this long answer but I'm out of options.

Thanks in advance for your reply.

Regards,

QR
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

Did you check the (fluid) "listing" and (solid) "syrthes.log" files as indicated in the script's error message ?

The problem is probably a mesh location/interface problem, not an MPI problem.

You do not need to change the files I posted last time, as they should work both in 2D and 3D.

But if your case is 3D on the Syrthes side, you should not use a projection on the Code_Saturne side.

Regards,

Yvan
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

thank you for your very quick answer. I checked the (fluid) "listing" and (solid) "syrthes.log" files. The "systhes.log" seems fine but indeed there is an error message in the "listing" file:

Code: Select all

 Attention :
 ---------
    Un défaut de qualité de maillage a été détecté

    Le maillage devrait être revu en fonction des critères indiqués.

    Le calcul sera effectué mais la qualité de la solution peut être dégradée...

 Computing geometric quantities (0.122 s)


Code_Saturne : ../../../Downloads/code_saturne-4.0.2/src/base/cs_syr4_coupling.c:958 : Avertissement
Couplage avec SYRTHES impossible :
1838 centres d'éléments du maillage "Faces SYRTHES solid"
non localisés sur le maillage SYRTHES.

Code_Saturne : ../../../Downloads/code_saturne-4.0.2/src/base/cs_syr4_coupling.c:1536 : Avertissement
 Le message reçu de SYRTHES: "coupling:error:location"
 indique que les maillages ne correspondent pas correctement.

 Le calcul ne sera pas exécuté.

                                                             
       ALMAX  =    0.10000E+01 (Characteristic length       )
       ALMAX is the cubic root of the domain volume.
That's weird, I checked numerous time: the two meshes in salome (of the two different volumes) perfectly match at the interface (location of the face and location of mesh nodes), where the heat transfer should occur. Is there a way to name the mesh faces that would avoid conflicts between both softwares? or is there some kind of "trick" to create a face recognition for the common face between the two different meshes?

Regards,

Q.ROLLAND
Attachments
syrthes.log
solid
(7.3 KiB) Downloaded 357 times
listing.zip
fluid
(7.09 KiB) Downloaded 377 times
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by Yvan Fournier »

Hello,

Possibly, the meshes match, but the the selection criteria do not select all faces ?

How many faces of each code are on each interface ? If most faces, but not all, are found, increasing the search tolerance (for curved areas) might help (but you need user subroutines to set this).

Both codes allow visualizing which faces are selected for coupling. You may compare this to the full mesh.

Regards,

Yvan
ROLLAND
Posts: 17
Joined: Tue Dec 08, 2015 3:48 pm

Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT

Post by ROLLAND »

Hi,

Again, thank you for your quick answer! I followed your advice: I changed the cs_user_coupling.c and put it in the .../fluid/SRC/ folder. And I implemented in the cs_user_syrthes_coupling(void) function:

Code: Select all

  int  verbosity = 1, plot = 1;
  float tolerance = 0.1;
  bool allow_nonmatching = true;
 
  if (true)
  cs_syr_coupling_define("SYRTHES1",
                           "1",               /* boundary criteria */
                           NULL,              /* volume_criteria */
                           ' ',               /* projection_axis */
                           allow_nonmatching,
                           tolerance,
                           verbosity,
                           plot);
and it's working!!!!!! To make it work, I had to remove any selection in the conjugate heat transfer in the code_saturne GUI, and I had to add bool allow_nonmatching = true; (instead of false) under all circumstances (otherwise the computation systemically crashed). On the other hand, Tolerances doesn't seem to matter for this model.

I tried to extend from this very simple model (2 side by side cubes: one solid and one fluid) to another model, a little bit more representative of a real case model. I created one cylinder (fluid domain with inlet and outlet) and a surrounding bloc (solid domain with heat generation) to eventually design a basic water heating system with a pipe.

However, the computation crashed. The "listing" file end up with Lecture du fichier : mesh_input (with nothing else afterwards (without noticing any errors)) while the first model continues with Fin de la lecture : mesh_input until END OF CALCULATION with final results.

I guess that's a mesh problem. But CS and Syrthes work fine without coupling, so the meshes should be OK. I tried to change mesh file names, nothing changed. I tried also to add in cs_user_saturne_coupling(void) function a cs_sat_coupling_define("SATURNE_01"...) instead of the "conjugate heat tranfer" selection in Syrthes GUI, and nothing changed as well. Increasing the tolerance number doesn't change either.
Is there some problem with CS dealing with closed surfaces?

With all the work that has been achieved, I have a feeling the end is near! :) . Thank you for your time!

Regards,

Q. ROLLAND
Post Reply