[ask] on parallel computing
Posted: Thu Mar 18, 2010 7:32 pm
Hi,
Today I encountered a strange phenomenon. I was using a quad core (Intel Q6600) computer to perform simulation, but I only used two cores of it. The calculation was fine. Today I changed to use 3 or 4 cores, found the calculations do not stop for a much longer time than expected. Obviously, there should be problems, but I don't know what they are.
When using 3 cores, the calculation stopped at the first iteration step (I checked listing file)
MAIN CALCULATION
================
===============================================================
===============================================================
INSTANT 0.100000000E+01 TIME STEP NUMBER 1
=============================================================
--- Phase: 1
---------------------------------
Property Min. value Max. value
---------------------------------
Density 0.8835E+03 0.8835E+03
LamVisc 0.1277E-01 0.1277E-01
---------------------------------
--- Diffusivity:
---------------------------------------
Scalar Number Min. value Max. value
---------------------------------------
TempC 1 0.6311E-04 0.6311E-04
---------------------------------------
** INFORMATION ON BOUNDARY FACES TYPE
----------------------------------
Phase : 1
-------------------------------------------------------------------------
Boundary type Code Nb faces
-------------------------------------------------------------------------
Inlet 2 50
Smooth wall 5 500
Rough wall 6 0
Symmetry 4 25000
Free outlet 3 50
Undefined 1 0
SIGINT signal (Control+C or equivalent) received.
--> computation interrupted by user.
Call stack:
1: 0x7f69e5899340 ? (?)
2: 0x7f69eb0af05a <opal_progress+0x5a> (libopen-pal.so.0)
3: 0x7f69e64e7995 ? (?)
4: 0x7f69e47d774f ? (?)
5: 0x7f69eb9cbe8c <PMPI_Allreduce+0x17c> (libmpi.so.0)
6: 0x7f69ece23d0f <parcpt_+0x2f> (libsaturne.so.0)
7: 0x7f69ecf1c5f5 <typecl_+0x16e5> (libsaturne.so.0)
8: 0x7f69ecd5a831 <condli_+0x1301> (libsaturne.so.0)
9: 0x7f69ecf013d1 <tridim_+0x6931> (libsaturne.so.0)
10: 0x7f69ecd4a0d5 <caltri_+0x5085> (libsaturne.so.0)
11: 0x7f69ecd254db <cs_run+0x83b> (libsaturne.so.0)
12: 0x7f69ecd257c5 <main+0x1f5> (libsaturne.so.0)
13: 0x7f69e92c6abd <__libc_start_main+0xfd> (libc.so.6)
14: 0x4007a9 ? (?)
End of stack
When using 4 cores, I even could not arrive at the first step
...
Directory: /home/salad/tmp_Saturne/duct_2d.MEI.03181539
MPI ranks: 4
I/O mode: MPI-IO, explicit offsets
===============================================================
CALCULATION PREPARATION
=======================
===========================================================
Reading file: preprocessor_output
SIGINT signal (Control+C or equivalent) received.
--> computation interrupted by user.
Call stack: 1: 0x7fba619a73c0 ? (?)
2: 0x7fba674e3aad <mca_io_base_component_run_progress+0x3d> (libmpi.so.0)
3: 0x7fba66ba205a <opal_progress+0x5a> (libopen-pal.so.0)
4: 0x7fba674aa5f5 ? (?)
5: 0x7fba602ccdb6 ? (?)
6: 0x7fba674bf1b7 <MPI_Alltoall+0x107> (libmpi.so.0)
7: 0x7fba619aca3b ? (?)
8: 0x7fba619ae41c <ADIOI_GEN_ReadStridedColl+0xb8c> (mca_io_romio.so)
9: 0x7fba619c2712 <MPIOI_File_read_all+0x122> (mca_io_romio.so)
10: 0x7fba619c2977 <mca_io_romio_dist_MPI_File_read_at_all+0x27> (mca_io_romio.so)
11: 0x7fba674dc88f <MPI_File_read_at_all+0xff> (libmpi.so.0)
12: 0x7fba6856ef78 <fvm_file_read_global+0x178> (libfvm.so.0)
13: 0x7fba688a1a40 <cs_io_read_header+0x90> (libsaturne.so.0)
14: 0x7fba68931218 <ledevi_+0x158> (libsaturne.so.0)
15: 0x7fba6897c2a7 <iniini_+0xce3> (libsaturne.so.0)
16: 0x7fba6897effa <initi1_+0x16> (libsaturne.so.0)
17: 0x7fba68817d3e <cs_run+0x9e> (libsaturne.so.0)
18: 0x7fba688187c5 <main+0x1f5> (libsaturne.so.0)
19: 0x7fba64db9abd <__libc_start_main+0xfd> (libc.so.6)
20: 0x4007a9 ? (?)
End of stack
When using 2 cores, it works very well.
Any advices about this?
Many thanks.
Best regards,
Wayne
http://code-saturne.blogspot.com/
Today I encountered a strange phenomenon. I was using a quad core (Intel Q6600) computer to perform simulation, but I only used two cores of it. The calculation was fine. Today I changed to use 3 or 4 cores, found the calculations do not stop for a much longer time than expected. Obviously, there should be problems, but I don't know what they are.
When using 3 cores, the calculation stopped at the first iteration step (I checked listing file)
MAIN CALCULATION
================
===============================================================
===============================================================
INSTANT 0.100000000E+01 TIME STEP NUMBER 1
=============================================================
--- Phase: 1
---------------------------------
Property Min. value Max. value
---------------------------------
Density 0.8835E+03 0.8835E+03
LamVisc 0.1277E-01 0.1277E-01
---------------------------------
--- Diffusivity:
---------------------------------------
Scalar Number Min. value Max. value
---------------------------------------
TempC 1 0.6311E-04 0.6311E-04
---------------------------------------
** INFORMATION ON BOUNDARY FACES TYPE
----------------------------------
Phase : 1
-------------------------------------------------------------------------
Boundary type Code Nb faces
-------------------------------------------------------------------------
Inlet 2 50
Smooth wall 5 500
Rough wall 6 0
Symmetry 4 25000
Free outlet 3 50
Undefined 1 0
SIGINT signal (Control+C or equivalent) received.
--> computation interrupted by user.
Call stack:
1: 0x7f69e5899340 ? (?)
2: 0x7f69eb0af05a <opal_progress+0x5a> (libopen-pal.so.0)
3: 0x7f69e64e7995 ? (?)
4: 0x7f69e47d774f ? (?)
5: 0x7f69eb9cbe8c <PMPI_Allreduce+0x17c> (libmpi.so.0)
6: 0x7f69ece23d0f <parcpt_+0x2f> (libsaturne.so.0)
7: 0x7f69ecf1c5f5 <typecl_+0x16e5> (libsaturne.so.0)
8: 0x7f69ecd5a831 <condli_+0x1301> (libsaturne.so.0)
9: 0x7f69ecf013d1 <tridim_+0x6931> (libsaturne.so.0)
10: 0x7f69ecd4a0d5 <caltri_+0x5085> (libsaturne.so.0)
11: 0x7f69ecd254db <cs_run+0x83b> (libsaturne.so.0)
12: 0x7f69ecd257c5 <main+0x1f5> (libsaturne.so.0)
13: 0x7f69e92c6abd <__libc_start_main+0xfd> (libc.so.6)
14: 0x4007a9 ? (?)
End of stack
When using 4 cores, I even could not arrive at the first step
...
Directory: /home/salad/tmp_Saturne/duct_2d.MEI.03181539
MPI ranks: 4
I/O mode: MPI-IO, explicit offsets
===============================================================
CALCULATION PREPARATION
=======================
===========================================================
Reading file: preprocessor_output
SIGINT signal (Control+C or equivalent) received.
--> computation interrupted by user.
Call stack: 1: 0x7fba619a73c0 ? (?)
2: 0x7fba674e3aad <mca_io_base_component_run_progress+0x3d> (libmpi.so.0)
3: 0x7fba66ba205a <opal_progress+0x5a> (libopen-pal.so.0)
4: 0x7fba674aa5f5 ? (?)
5: 0x7fba602ccdb6 ? (?)
6: 0x7fba674bf1b7 <MPI_Alltoall+0x107> (libmpi.so.0)
7: 0x7fba619aca3b ? (?)
8: 0x7fba619ae41c <ADIOI_GEN_ReadStridedColl+0xb8c> (mca_io_romio.so)
9: 0x7fba619c2712 <MPIOI_File_read_all+0x122> (mca_io_romio.so)
10: 0x7fba619c2977 <mca_io_romio_dist_MPI_File_read_at_all+0x27> (mca_io_romio.so)
11: 0x7fba674dc88f <MPI_File_read_at_all+0xff> (libmpi.so.0)
12: 0x7fba6856ef78 <fvm_file_read_global+0x178> (libfvm.so.0)
13: 0x7fba688a1a40 <cs_io_read_header+0x90> (libsaturne.so.0)
14: 0x7fba68931218 <ledevi_+0x158> (libsaturne.so.0)
15: 0x7fba6897c2a7 <iniini_+0xce3> (libsaturne.so.0)
16: 0x7fba6897effa <initi1_+0x16> (libsaturne.so.0)
17: 0x7fba68817d3e <cs_run+0x9e> (libsaturne.so.0)
18: 0x7fba688187c5 <main+0x1f5> (libsaturne.so.0)
19: 0x7fba64db9abd <__libc_start_main+0xfd> (libc.so.6)
20: 0x4007a9 ? (?)
End of stack
When using 2 cores, it works very well.
Any advices about this?
Many thanks.
Best regards,
Wayne
http://code-saturne.blogspot.com/