Error in parallelization with differents # of processors
Posted: Fri Oct 03, 2014 6:09 pm
Hello,
I'm trying to simulate the operating condition of a gas oven. The simulation process went well until I tried to use several processors for parallelization. When I use 12 processors, the simulation stops at 1176 iterations with the following error message:
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
1: 0x7f543f2f01e0 <opal_progress+0x50> (libmpi.so.1)
2: 0x7f543f23df75 <ompi_request_default_wait_all+0x145> (libmpi.so.1)
3: 0x7f543abf965e <ompi_coll_tuned_sendrecv_actual+0x10e> (mca_coll_tuned.so)
4: 0x7f543ac019aa <ompi_coll_tuned_barrier_intra_bruck+0x9a> (mca_coll_tuned.so)
5: 0x7f543f24b1c2 <PMPI_Barrier+0x72> (libmpi.so.1)
6: 0x7f544024e63d <cs_halo_sync_var_strided+0x78d> (libsaturne.so.0)
7: 0x7f54403e9e52 <+0x25ee52> (libsaturne.so.0)
8: 0x7f54403ee3db <cgdvec_+0x3db> (libsaturne.so.0)
9: 0x7f54404327e9 <grdvec_+0x189> (libsaturne.so.0)
10: 0x7f5440515106 <vissst_+0x246> (libsaturne.so.0)
11: 0x7f54402f6840 <phyvar_+0x1060> (libsaturne.so.0)
12: 0x7f5440324d39 <tridim_+0xe91> (libsaturne.so.0)
13: 0x7f544020b6cd <caltri_+0x27e9> (libsaturne.so.0)
14: 0x7f54401e3725 <cs_run+0xa55> (libsaturne.so.0)
15: 0x7f54401e3885 <main+0x155> (libsaturne.so.0)
16: 0x3fa801ed1d <__libc_start_main+0xfd> (libc.so.6)
17: 0x400809 <> (cs_solver)
End of stack
But when I use 8 processors, the simulation stops at 1428 iterations with the same error message. Also my simulation reach the 2000 iterations if only is used one processor. Somebody knows why the reasons of this behavior when differents numbers of processors are used?
Code_Saturne was compiled using openmpi-1.6.5.I already tried to compile CS with another openmpi library (1.4.3), but the results are the same. I don't know what can be the problem. Any help or comments would be appreciated
My version of Code_Saturne is 3.0.5.
Regards,
Juan Felipe Monsalvo
I'm trying to simulate the operating condition of a gas oven. The simulation process went well until I tried to use several processors for parallelization. When I use 12 processors, the simulation stops at 1176 iterations with the following error message:
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
1: 0x7f543f2f01e0 <opal_progress+0x50> (libmpi.so.1)
2: 0x7f543f23df75 <ompi_request_default_wait_all+0x145> (libmpi.so.1)
3: 0x7f543abf965e <ompi_coll_tuned_sendrecv_actual+0x10e> (mca_coll_tuned.so)
4: 0x7f543ac019aa <ompi_coll_tuned_barrier_intra_bruck+0x9a> (mca_coll_tuned.so)
5: 0x7f543f24b1c2 <PMPI_Barrier+0x72> (libmpi.so.1)
6: 0x7f544024e63d <cs_halo_sync_var_strided+0x78d> (libsaturne.so.0)
7: 0x7f54403e9e52 <+0x25ee52> (libsaturne.so.0)
8: 0x7f54403ee3db <cgdvec_+0x3db> (libsaturne.so.0)
9: 0x7f54404327e9 <grdvec_+0x189> (libsaturne.so.0)
10: 0x7f5440515106 <vissst_+0x246> (libsaturne.so.0)
11: 0x7f54402f6840 <phyvar_+0x1060> (libsaturne.so.0)
12: 0x7f5440324d39 <tridim_+0xe91> (libsaturne.so.0)
13: 0x7f544020b6cd <caltri_+0x27e9> (libsaturne.so.0)
14: 0x7f54401e3725 <cs_run+0xa55> (libsaturne.so.0)
15: 0x7f54401e3885 <main+0x155> (libsaturne.so.0)
16: 0x3fa801ed1d <__libc_start_main+0xfd> (libc.so.6)
17: 0x400809 <> (cs_solver)
End of stack
But when I use 8 processors, the simulation stops at 1428 iterations with the same error message. Also my simulation reach the 2000 iterations if only is used one processor. Somebody knows why the reasons of this behavior when differents numbers of processors are used?
Code_Saturne was compiled using openmpi-1.6.5.I already tried to compile CS with another openmpi library (1.4.3), but the results are the same. I don't know what can be the problem. Any help or comments would be appreciated
My version of Code_Saturne is 3.0.5.
Regards,
Juan Felipe Monsalvo