Re: CS_4.0.2 and syrthes4.3.0 coupling MPI ABORT
Posted: Mon Dec 21, 2015 11:33 am
Hi,
Thank you for you quick answer. I started from scratch the model using directly files and meshes from /home/rolland/Downloads/code_saturne-4.0.2/examples/4-2Ddisks/.
The Number of iterations and the reference time step are set to 600 and 0.5 respectively in both GUIs. The formula of density is correctly written: density = p0 /(287*(temperature + 273.0));. CS is running using unsteady flow algorithm with k-epsilon Linear Production turbulence model and with Constant for Time step option.
Unfortunately, I removed the folder /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid, but I ran the addr2line command line in a new folder (CAS4), here is the result:
I implemented valgrind in the runsolver file just before ./syrthes and ran it:
I hope it will help.
Thank you,
QR
Thank you for you quick answer. I started from scratch the model using directly files and meshes from /home/rolland/Downloads/code_saturne-4.0.2/examples/4-2Ddisks/.
The Number of iterations and the reference time step are set to 600 and 0.5 respectively in both GUIs. The formula of density is correctly written: density = p0 /(287*(temperature + 273.0));. CS is running using unsteady flow algorithm with k-epsilon Linear Production turbulence model and with Constant for Time step option.
Unfortunately, I removed the folder /home/rolland/Documents/EXEMPLE3/CAS3/RESU_COUPLING/20151214-1052/solid, but I ran the addr2line command line in a new folder (CAS4), here is the result:
Code: Select all
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes 0x41d5f2 -f
ecrire_geom_cplcfd
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes 0x414044 -f
cfd_surf_init
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes 0x40a95b -f
syrthes
??:?
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-0927/solid$ addr2line -e syrthes 0x402c4f -f
main
??:?
Code: Select all
rolland@rolland-Precision-WorkStation-T7400:~/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017$ ./run_solver
==27404== Memcheck, a memory error detector
==27404== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==27404== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==27404== Command: ./syrthes -d tmp.data -n 1 --name solid --log syrthes.log
==27404==
==27404== Conditional jump or move depends on uninitialised value(s)
==27404== at 0x5939A03: vfprintf (vfprintf.c:1661)
==27404== by 0x59F84F4: __vasprintf_chk (vasprintf_chk.c:66)
==27404== by 0x59F8431: __asprintf_chk (asprintf_chk.c:32)
==27404== by 0x545C0B5: opal_output_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x54594E7: opal_init_util (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53A8F0A: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Conditional jump or move depends on uninitialised value(s)
==27404== at 0x5939A03: vfprintf (vfprintf.c:1661)
==27404== by 0x59F84F4: __vasprintf_chk (vasprintf_chk.c:66)
==27404== by 0x59F8431: __asprintf_chk (asprintf_chk.c:32)
==27404== by 0xB9D4C6E: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Conditional jump or move depends on uninitialised value(s)
==27404== at 0x4C2E0F8: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404== by 0x597682D: strdup (strdup.c:41)
==27404== by 0x545BE21: opal_output_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0xB9D4C7B: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Conditional jump or move depends on uninitialised value(s)
==27404== at 0x4C2E0F8: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404== by 0x545BE33: opal_output_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0xB9D4C7B: pml_v_output_open (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0xB9D49CA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_pml_v.so)
==27404== by 0x544790B: mca_base_components_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53E6B7B: mca_pml_base_open (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53A9198: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Syscall param writev(vector[...]) points to uninitialised byte(s)
==27404== at 0x59DF417: writev (writev.c:49)
==27404== by 0x8340062: mca_oob_tcp_msg_send_handler (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404== by 0x8341225: mca_oob_tcp_peer_send (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404== by 0x83450A5: mca_oob_tcp_send_nb (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==27404== by 0x8134DE1: orte_rml_oob_send (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==27404== by 0x8135403: orte_rml_oob_send_buffer (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==27404== by 0x8750A0E: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==27404== by 0x53A94EE: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== Address 0x6ad4541 is 161 bytes inside a block of size 256 alloc'd
==27404== at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404== by 0x5430EF9: opal_dss_buffer_extend (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x54312AD: opal_dss_copy_payload (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x540DDAD: orte_grpcomm_base_pack_modex_entries (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x87508EF: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==27404== by 0x53A94EE: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x53C065F: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==27404== by 0x4029DF: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Invalid read of size 8
==27404== at 0x41D5E5: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== Address 0x6af3610 is 0 bytes after a block of size 16 alloc'd
==27404== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404== by 0x44D8D4: lire_syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x4432DE: lire_maill (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x403E4D: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Invalid read of size 8
==27404== at 0x41D5EE: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== Address 0xdf1fa70 is 0 bytes after a block of size 16 alloc'd
==27404== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27404== by 0x41D54B: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404==
==27404== Invalid read of size 8
==27404== at 0x41D5F2: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== Address 0x50 is not stack'd, malloc'd or (recently) free'd
==27404==
==27404==
==27404== Process terminating with default action of signal 11 (SIGSEGV)
==27404== Access not within mapped region at address 0x50
==27404== at 0x41D5F2: ecrire_geom_cplcfd (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x414043: cfd_surf_init (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x40A95A: syrthes (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== by 0x402C4E: main (in /home/rolland/Documents/EXEMPLE3/CAS4/RESU_COUPLING/20151221-1017/solid/syrthes)
==27404== If you believe this happened as a result of a stack
==27404== overflow in your program's main thread (unlikely but
==27404== possible), you can try to increase the size of the
==27404== main thread stack using the --main-stacksize= flag.
==27404== The main thread stack size used in this run was 8388608.
==27404==
==27404== HEAP SUMMARY:
==27404== in use at exit: 2,775,703 bytes in 6,156 blocks
==27404== total heap usage: 13,511 allocs, 7,355 frees, 13,775,573 bytes allocated
==27404==
==27404== LEAK SUMMARY:
==27404== definitely lost: 2,967 bytes in 46 blocks
==27404== indirectly lost: 30,912 bytes in 4 blocks
==27404== possibly lost: 0 bytes in 0 blocks
==27404== still reachable: 2,741,824 bytes in 6,106 blocks
==27404== suppressed: 0 bytes in 0 blocks
==27404== Rerun with --leak-check=full to see details of leaked memory
==27404==
==27404== For counts of detected and suppressed errors, rerun with: -v
==27404== Use --track-origins=yes to see where uninitialised values come from
==27404== ERROR SUMMARY: 22 errors from 8 contexts (suppressed: 0 from 0)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 27404 on node rolland-Precision-WorkStation-T7400 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Thank you,
QR