HPC parmetis issue

All questions about installation
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
mkendrick

HPC parmetis issue

Post by mkendrick »

Hello,

I've been trying to get code saturne running on our new departmental cluster.

My latest attempt was to build all dependencies individually, creating module files for each and then building code saturne 3.0.5 against these dependancies.

The issue I have occurs at the start of the main code and is a result of mpi failing. Please see the attached files.
You may notice the version of mpi being used here is impi-2013 which is the default mpi on our cluster, the results are the same when using openmpi.

The job runs in serial with no errors.

Eagerly awaiting your response, regards,
Martyn
Attachments
errors.tar
(63.5 KiB) Downloaded 244 times
Last edited by mkendrick on Mon Dec 22, 2014 3:59 pm, edited 1 time in total.
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: HPC mpi issue

Post by Yvan Fournier »

Hello,

As the errors occur in ParMetis, I suspect an installation issue. More specifically, you may be building and running with different builds of ParMetis, assuming other versions of that tool are already installed.

Could you post your "config.log" file for installation, so I can try to see how the automatic module detection behaved ?

Could you run "ldd ./cs_solver" in your execution directory (assuming you have user subroutines; otherwise, use the cs_solver executable in <install_prefix>/libexec/code_saturne/cs_solver).

Could you also try running Code_Saturne with the built-in partitioner, using the 'Morton curve in bounding box" option (in the GUI, "Calculation Management/Performance tuning/Partitioning, or in user subroutines, cs_user_performance_tuning.c) ?

Regards,

Yvan
mkendrick

Re: HPC parmetis issue

Post by mkendrick »

Thanks for your reply Yvan, very helpful again!

I hadn't noticed it was using parmetis, infact I was having issues with parmetis originally which is why I chose to do a build of CS myself. I hadn't installed parmetis (i.e. I didn't compile using '--with-parmetis=...') but I guess the install of CS must've picked it up.
I tried building a copy of parmetis (ver 4.0.3) myself but this resulted in the code hanging upon submission.

I am happy to report that using morton box the code is running (albeit it seems to be hitting a limit on the number of 'pipes' when I tried to run 256+ processes, so I've reduced this to 128 and it runs nicely. I think this is an mpi issue I'll look into in more detail).

The files/outputs you asked for are attached
Attachments
errors2.tar
(200 KiB) Downloaded 241 times
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: HPC parmetis issue

Post by Yvan Fournier »

Hello,

ldd shows nothing related to ParMETIS, so you probably picked up a version based on a static library.
Also, from the config.log, It seems parmetis was found in default system directories.

The message relative to the number of pipes is strange. How many nodes are you runing on ? How many processes per node ?

Be careful: the ldd output you sent seems to indicate you installed your own OpenMPI build in "/shared/code_saturne/software/openmpi/1.6.5/".

On a cluster, you usually do not want to be doing that, but should be using the default MPI library (if several are available as modules, any one you want, as long as it is installed by the administrators). Otherwise, the MPI library might not be configured to use either the high speed network (such as Infiniband), or to interact properly with the resource manager. This could lead to bad performance, and possibly all instances running on the same node (which could explain the "pipe" message).

Regards,

Yvan
Post Reply