Issue when running CS on multi-procs

All questions about installation
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
Leo
Posts: 3
Joined: Mon Aug 23, 2021 11:22 am

Issue when running CS on multi-procs

Post by Leo »

Hello,

We installed Code_Saturne 7.0.0 on CENTOS 7 and everything is working just fine when running in serial mode.
The problem happens when the case is launched on multiple cores :

1 - On the GUI terminal, I had the following error (cf code_saturne_error).
code_saturne_error.png
2 - I tried changing the mpi module that is loaded by adding
  • setenv OMPI_MCA_pml "ucx"
  • setenv OMPI_MCA_btl "^vader,tcp,openib"
Results : I don't have any error, but there is no listing file. Just the run.cfg which tells that I ran on 2 procs, the setup.xml and the summary file.
code_saturne_error_2.PNG
I looked into the logs to see if any error were present, but there is none.



I changed back the way it was on the openmpi module but still no error.

I also tried running the case on our cluster : it worked on 1 proc, but the same thing happens with more than one :
  • No error in the files generated by SLURM
  • No listing
  • The slum log on the master node shows that the run was succesful
Any help would be appreciated!
Thank you,

Léo
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Issue when running CS on multi-procs

Post by Yvan Fournier »

Hello,

Do you use a packaged Scotch install or did you build your own ? I invite you to read the notes regarding Scotch build options in the code_saturne installation documentation : https://github.com/code-saturne/code_sa ... INSTALL.md.

Unfortunately, there is no function in Scotch that I know of (unless I missed recent updates) for querying whether Scotch was build with threading support or not, and adapt the MPI_Initialize method accordingly. And since this can degrade performance on may MPI implementations, I would rather lose a bit of performance with unthreaded Scotch one per run rather than in the code_saturne algorithms for the whole run...

On the cluster, is the post-install configured to use SLURM ? Do you have any messages in the slurm job logs ?

Best regards,

Yvan
Leo
Posts: 3
Joined: Mon Aug 23, 2021 11:22 am

Re: Issue when running CS on multi-procs

Post by Leo »

Hello,

Thank you for your time.

We did both : with one built on our own, then with one installed with the semi-automatic installation (the library installed with the python script were used for the compilation of code_saturne).
In each case, we don't have any error when running in multi-procs, but there is no listing nor error in the RESU folder.

I tried compile it without the SCOTCH library, just with parametis, and the same thing occurs.
I really don't know why there isn't error messages anymore.

With slurm, I add the "code_saturne.cfg" file and just modified "batch = SLURM". It works fine on one proc, but nothing on 2 or more.
I have no listing file, nothing written in the error slurm log, and here is what I get on the output log file :
code_saturne_3.PNG

Best regards,
Léo
Leo
Posts: 3
Joined: Mon Aug 23, 2021 11:22 am

Re: Issue when running CS on multi-procs

Post by Leo »

Hello,

I reinstalled code_saturne using an older version of openMPI and it's now working in parallel.

Thanks again for your help !

Regards,
Léo
Post Reply