Hello,
I've been trying to get code saturne running on our new departmental cluster.
My latest attempt was to build all dependencies individually, creating module files for each and then building code saturne 3.0.5 against these dependancies.
The issue I have occurs at the start of the main code and is a result of mpi failing. Please see the attached files.
You may notice the version of mpi being used here is impi-2013 which is the default mpi on our cluster, the results are the same when using openmpi.
The job runs in serial with no errors.
Eagerly awaiting your response, regards,
Martyn
HPC parmetis issue
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
HPC parmetis issue
- Attachments
-
- errors.tar
- (63.5 KiB) Downloaded 345 times
Last edited by mkendrick on Mon Dec 22, 2014 3:59 pm, edited 1 time in total.
-
- Posts: 4207
- Joined: Mon Feb 20, 2012 3:25 pm
Re: HPC mpi issue
Hello,
As the errors occur in ParMetis, I suspect an installation issue. More specifically, you may be building and running with different builds of ParMetis, assuming other versions of that tool are already installed.
Could you post your "config.log" file for installation, so I can try to see how the automatic module detection behaved ?
Could you run "ldd ./cs_solver" in your execution directory (assuming you have user subroutines; otherwise, use the cs_solver executable in <install_prefix>/libexec/code_saturne/cs_solver).
Could you also try running Code_Saturne with the built-in partitioner, using the 'Morton curve in bounding box" option (in the GUI, "Calculation Management/Performance tuning/Partitioning, or in user subroutines, cs_user_performance_tuning.c) ?
Regards,
Yvan
As the errors occur in ParMetis, I suspect an installation issue. More specifically, you may be building and running with different builds of ParMetis, assuming other versions of that tool are already installed.
Could you post your "config.log" file for installation, so I can try to see how the automatic module detection behaved ?
Could you run "ldd ./cs_solver" in your execution directory (assuming you have user subroutines; otherwise, use the cs_solver executable in <install_prefix>/libexec/code_saturne/cs_solver).
Could you also try running Code_Saturne with the built-in partitioner, using the 'Morton curve in bounding box" option (in the GUI, "Calculation Management/Performance tuning/Partitioning, or in user subroutines, cs_user_performance_tuning.c) ?
Regards,
Yvan
Re: HPC parmetis issue
Thanks for your reply Yvan, very helpful again!
I hadn't noticed it was using parmetis, infact I was having issues with parmetis originally which is why I chose to do a build of CS myself. I hadn't installed parmetis (i.e. I didn't compile using '--with-parmetis=...') but I guess the install of CS must've picked it up.
I tried building a copy of parmetis (ver 4.0.3) myself but this resulted in the code hanging upon submission.
I am happy to report that using morton box the code is running (albeit it seems to be hitting a limit on the number of 'pipes' when I tried to run 256+ processes, so I've reduced this to 128 and it runs nicely. I think this is an mpi issue I'll look into in more detail).
The files/outputs you asked for are attached
I hadn't noticed it was using parmetis, infact I was having issues with parmetis originally which is why I chose to do a build of CS myself. I hadn't installed parmetis (i.e. I didn't compile using '--with-parmetis=...') but I guess the install of CS must've picked it up.
I tried building a copy of parmetis (ver 4.0.3) myself but this resulted in the code hanging upon submission.
I am happy to report that using morton box the code is running (albeit it seems to be hitting a limit on the number of 'pipes' when I tried to run 256+ processes, so I've reduced this to 128 and it runs nicely. I think this is an mpi issue I'll look into in more detail).
The files/outputs you asked for are attached
- Attachments
-
- errors2.tar
- (200 KiB) Downloaded 346 times
-
- Posts: 4207
- Joined: Mon Feb 20, 2012 3:25 pm
Re: HPC parmetis issue
Hello,
ldd shows nothing related to ParMETIS, so you probably picked up a version based on a static library.
Also, from the config.log, It seems parmetis was found in default system directories.
The message relative to the number of pipes is strange. How many nodes are you runing on ? How many processes per node ?
Be careful: the ldd output you sent seems to indicate you installed your own OpenMPI build in "/shared/code_saturne/software/openmpi/1.6.5/".
On a cluster, you usually do not want to be doing that, but should be using the default MPI library (if several are available as modules, any one you want, as long as it is installed by the administrators). Otherwise, the MPI library might not be configured to use either the high speed network (such as Infiniband), or to interact properly with the resource manager. This could lead to bad performance, and possibly all instances running on the same node (which could explain the "pipe" message).
Regards,
Yvan
ldd shows nothing related to ParMETIS, so you probably picked up a version based on a static library.
Also, from the config.log, It seems parmetis was found in default system directories.
The message relative to the number of pipes is strange. How many nodes are you runing on ? How many processes per node ?
Be careful: the ldd output you sent seems to indicate you installed your own OpenMPI build in "/shared/code_saturne/software/openmpi/1.6.5/".
On a cluster, you usually do not want to be doing that, but should be using the default MPI library (if several are available as modules, any one you want, as long as it is installed by the administrators). Otherwise, the MPI library might not be configured to use either the high speed network (such as Infiniband), or to interact properly with the resource manager. This could lead to bad performance, and possibly all instances running on the same node (which could explain the "pipe" message).
Regards,
Yvan