parallel computing using sgi-mpt

All questions about installation
Forum rules
Please read the forum usage recommendations before posting.
lzhang
Posts: 44
Joined: Mon Nov 06, 2017 2:54 pm

parallel computing using sgi-mpt

Post by lzhang »

Hello,

I have a question about parallel computing on a cluster. In my local machine, the mpi package installed is openmpi, and when I launch "code_saturne run --parameter case1.xml --nprocs 10", it works correctly.

However on the cluster, the mpi package installed is sgi-mpt/2.14 and I have an error "MPT ERROR: mpiexec_mpt must be used to launch all MPI applications solver script exited with status 255." when running the simulation.

I think that the command "code_saturne run" uses some default openmpi parameters that need to be modified in case of the use of sgi-mpt. Concretely, I don't know how to modify the associated parameters. Do you have any ideas to solve this problem, please?

Best regards,
Lei
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: parallel computing using sgi-mpt

Post by Yvan Fournier »

Hello,

Did you read the section about "post-installation" in the installation guide ? It explains which file to edit to change the detected defaults and adapt to batch systems (so is essential for clusters).

Regards,

Yvan
lzhang
Posts: 44
Joined: Mon Nov 06, 2017 2:54 pm

Re: parallel computing using sgi-mpt

Post by lzhang »

Hello,

I began to read the post-install tutorial. For my case, the job is managed by PBS, and here is a correct command to launch a MPI simulation with PBS previously well set
mpiexec_mpt -n 240 ./prog_mpi_mpt_exe
where prog_mpi_mpt_exe is the executable.

My question is how can I launch a Code_Saturne simulation with this kind of command? As what I typically do is to use for example "code_saturne run --param case1.xml --nproc 8". I think about the runcase in SCRIPT/, but obviously it is not an executable.

Best regards,
Lei
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: parallel computing using sgi-mpt

Post by Yvan Fournier »

Hello,
/
In that case set "mpiexec_mpt" as the mpiexec command in the code_saturne.cfg file.

I also recommend setting the batch type to PBS, or as the absolute path to a copied and modified version of the extras/batch.PBS from the source directory whose name must end in .PBS), so the GUI recognises the batch system (for all newly created cases or cases updated with "code_saturne create --import-only" from the CASE directory). With the batch system setup, the GUI will switch from "direct execution" to "submission" mode (creating the execution directory, copying data, and compiling user subroutines before running "qsub" on the remaning operations, rather than running "qsub" on the base runcase).

Regards,

Yvan
lzhang
Posts: 44
Joined: Mon Nov 06, 2017 2:54 pm

Re: parallel computing using sgi-mpt

Post by lzhang »

Hello,

I modified the "code_saturne.cfg" file as you suggested.

And In modified the "SCRIPTS/runcase" file as follows in order to set the PBS:

Code: Select all

[i]#!/bin/bash

#PBS -S /bin/bash
#PBS -N job-VIV
#PBS -o output.txt
#PBS -e error.txt
#PBS -l walltime=02:00:00
#PBS -l select=2:ncpus=8:mpiprocs=8:mem=1gb
#PBS -P projetviv

# Module load
module purge
module load sgi-mpt/2.14

# Go to the directory where the job has been submitted 
cd $PBS_O_WORKDIR

# Ensure the correct command is found:

export PATH=/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/bin:$PATH

# Run command:
\code_saturne run --param tuto_viv_vimpo.xml --nproc 8[/i]
However I have the following error:

Code: Select all

[i]Traceback (most recent call last):
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/bin/code_saturne", line 76, in <module>
    retcode = cs.execute()
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_script.py", line 93, in execute
    return self.commands[command](options)
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_script.py", line 168, in run
    return cs_run.main(options, self.package)
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_run.py", line 387, in main
    return run(argv, pkg)[0]
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_run.py", line 375, in run
    stages=stages)
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_case.py", line 1937, in run
    retcode = self.prepare_data(force_id)
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_case.py", line 1497, in prepare_data
    d.compile_and_link()
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_case_domain.py", line 613, in compile_and_link
    stderr=log)
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_compile.py", line 439, in compile_and_link
    stdout=stdout, stderr=stderr) != 0:
  File "/home/lzhang/Code_Saturne/5.0.8/code_saturne-5.0.8/arch/Linux_x86_64/lib/python2.7/site-packages/code_saturne/cs_exec_environment.py", line 524, in run_command
    p = subprocess.Popen(cmd, universal_newlines=True, env = env, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory[/i]
Do you have ideas to fix this problem, please?

Best regards,
Lei
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: parallel computing using sgi-mpt

Post by Yvan Fournier »

Hello,

Did you use the " code_saturne submit" command or just submitted the runcase ?
I am not sure you followed the second part of the recommendation relative to the batch scripts.
Also, why do you modify loaded modules in the runcase file ? Was the code configured with "modules=no" to handle modes externally ? Otherwise are you sure the modules you set are the ones used for building the code ?

Regards,

Yvan
lzhang
Posts: 44
Joined: Mon Nov 06, 2017 2:54 pm

Re: parallel computing using sgi-mpt

Post by lzhang »

Hello,

I used "qsub runcase" to submit the job. Maybe the correct way is to use "code_saturne submit"?
I am not sure you followed the second part of the recommendation relative to the batch scripts.
Do you mean that in the code_saturne.cfg file, I need to specify the location of batch.PBS, as what I have done in the attached file?
Also, why do you modify loaded modules in the runcase file ? Was the code configured with "modules=no" to handle modes externally ? Otherwise are you sure the modules you set are the ones used for building the code ?
The module sgi-mpt/2.14 is the mpi library that I used to build the code. And by default It is not loaded on the cluster, so I load it manually in order to be able to use the command "mpiexec_mpt".

I tried to use "code_saturne submit ../SCRIPTS/runcase" to sumbit the job, but I encounter an error "qsub: script file:: No such file or directory".

Thanks in advance for your help!

Best regards,
Lei
Attachments
code_saturne.cfg.txt
(2.97 KiB) Downloaded 314 times
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: parallel computing using sgi-mpt

Post by Yvan Fournier »

Hello,

Your configuration file seesm OK at first glance.

It might be safer to run: "code_saturne submit ./runcase" from the scripts directory rather
than "code_saturne submit ../SCRIPTS/runcase" from another directory,
as there might be some path simplification (in any case that is how it is usually tested).

Regarding modules, if a module is loaded when you configure and install the code, it detects it, and reloads the correct modules when you run it (so you can have different builds with different modules and not need to change you environment when you choose each). So unless this causes incorrect module detections (in which case you can use "--with-modules=no" at configuration time and handle modules externally,), you should not need to manage modules yourself.

Regards,

Yvan
lzhang
Posts: 44
Joined: Mon Nov 06, 2017 2:54 pm

Re: parallel computing using sgi-mpt

Post by lzhang »

Hello,

Finally I have succeeded to run a parallel simulation on the cluster by following the instructions given in https://www.hpc.ntnu.no/display/hpc/Tut ... de+Saturne.

Best regards,
Lei
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: parallel computing using sgi-mpt

Post by Yvan Fournier »

Hello Lei,

Good that you have a working solution. If you have extr time for this, we can still keep on iterating to try to understand why the solution I suggested did not work. That solution was not available in version 3.0, for which the instructions you found were written, and those instructions are still OK, but my recommended solution requires a bit less modification of scripts by the user (but a bit more for the person installing the code), so I would be happy to have it work on this type of machine for future versions.

If you don't have the time, its OK, as long as you are able to use the code on that machine.

Best regards,

Yvan
Post Reply