Page 1 of 1

[SOLVED] Code_Saturne + SLURM: Errors

Posted: Wed Nov 21, 2018 7:01 pm
by FredH
Hello Specialists,

I'm trying to install Code_Saturne for one user on my HPC (w/ SLURM).
CentOS Linux release 7.3.1611 (Core)

I've "succeed" one time to install v5.1.5 (on NFS shared folder), with the auto install script.
But the user reported a SIGTERM signal (error.png).

So, I tried to compile others flavors: first time stable version v5.0.9, v5.0.9 debug, v5.3.0, v5.1.5 debug with the auto install script, and now manually.

Now, I'me facing this error for each new initialization/version, even new v5.1.5 :shock: :

Code: Select all

$ code_saturne run --initialize -p setup.xml --id=test1

                      Code_Saturne
                      ************

 Version:   5.0
 Path:      /work/projects/Code_Saturne/test

 Result directory:
   /scratch/hmzf/Test1/Case1-xml/RESU/test1


 Single processor Code_Saturne simulation.


 ***************************
  Preprocessing calculation
 ***************************

Traceback (most recent call last):
  File "/work/projects/Code_Saturne/test/bin/code_saturne", line 76, in <module>
    retcode = cs.execute()
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_script.py", line 93, in execute
    return self.commands[command](options)
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_script.py", line 168, in run
    return cs_run.main(options, self.package)
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_run.py", line 387, in main
    return run(argv, pkg)[0]
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_run.py", line 375, in run
    stages=stages)
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_case.py", line 1945, in run
    mpiexec_options)
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_case.py", line 1646, in preprocess
    d.preprocess()
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_case_domain.py", line 807, in preprocess
    retcode = run_command(cmd, pkg=self.package)
  File "/work/projects/Code_Saturne/test/lib/python2.7/site-packages/code_saturne/cs_exec_environment.py", line 524, in run_command
    p = subprocess.Popen(cmd, universal_newlines=True, env = env, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Every initialization is done on a fresh session (logout/logon), with export of the version path + alias.


Here is in attachment:
- install.txt : What i do for install (this time manually) + post
- mpic_versions.txt: -v of mpicc & mpic++
- configure.log.txt: the output of configure
- config.log
- run.txt: run Code_Saturne as user.
- test1: the RESU of the initialize



I'm really not a Code_Saturne specialist, and my user too, so if you could help me. I've probably missed something...
Thanks a lot.
Regards

Re: Code_Saturne + SLURM: Errors

Posted: Thu Nov 22, 2018 6:51 pm
by Yvan Fournier
Hello,

It seems that in your configuration options, you added --disable-frontend, which means the preprocessor is not available (we need to add a check for this to have a better error message).

So with this option, you can only use pre-,imported "mesh_input"/"mesh_output" files or directories.

Otherwise, using SlURM, did you do the post-installation involving code_saturne.cfg ?

Regards,

Yvan

Re: Code_Saturne + SLURM: Errors

Posted: Fri Nov 23, 2018 2:38 pm
by FredH
Hello Yvan,

Oh yes, thank you for this precision.
It's my first install and this confirm that I'm really not aware about the usage of Code_Saturne.
And my user not so much... (no prepossessing at the beginning, prepossessing now... v5.1, v5.0.9 now...),

For the post-install, I've followed point 8 of the Install manual, and missed the compute_versions setting in the cfg file. :oops: (A pause must be done to clear my mind).

So, two compilations must be done:
  • For Front-end without MPI
  • For Compute Nodes: --disable-frontend + MPI
I'll try to test this as soon as possible, and keep you in touch.

Thanks for your help.

Re: Code_Saturne + SLURM: Errors

Posted: Sat Nov 24, 2018 1:31 pm
by Yvan Fournier
Hello,

No, you do not need to disable the front-end on most installations. In our case, we usually install a "main" production version, and a "debug" build (using --enable-debug). I often add --disable-frontend to the debug build to avoid duplicate installs of the preprocessor and documentation, and add the debug buid to the "compute_versions" of the main build, so as to be able to do everything from the main build, including choose the compute version from the GUI, but this as an "advanced" (though recommended) install.

Completely different builds on the front-end and compute nodes is only necessary when the two system types are quite different (as on IBM Blue Gene machines, or some Crays, but not most clusters).

Simply installing the code on a cluster and adding SLURM or a path to a SLURM template file to the "batch" entry in code_saturne.cfg should work fine.

Best regards,

Yvan

Re: Code_Saturne + SLURM: Errors

Posted: Mon Dec 10, 2018 6:05 pm
by FredH
Thank you very much for the answer.

Sorry for the delay, I hope to be able to do a quick test this week (between maintenances and filesystem issues ).

Best regards.

Re: Code_Saturne + SLURM: Errors

Posted: Tue Mar 05, 2019 2:44 pm
by FredH
Hello,
sorry for the delay (admin life...)

Here is the last error reported by the user (error2.png)

Seems simple, I'll try to find on my side too.

Regards.

Re: Code_Saturne + SLURM: Errors

Posted: Tue Mar 05, 2019 3:09 pm
by Yvan Fournier
Hello,

For OpenMPI, using SLURM, on at least one of our systems, we set
mpiexec_n_per_node =
(empty string)

in etc/code_saturne.cfg so as to avoid the automatic -ppn setting. This should work here also. Though I don't understand why it appears if you did not set it (in bin/cs_exec_environment.py, it is set if the detected mpiexec is Hydra, from MPICH, so should not appear here).

Best regards,

Yvan

Re: Code_Saturne + SLURM: Errors

Posted: Tue Mar 05, 2019 5:54 pm
by FredH
Hello,
thanks for the quick answer, the ppn is removed now.

Another quick one:
I've added some #SBATCH lines in code_saturne-5.0.9/share/code_saturne/batch/batch.SLURM and they do not appear in the run_solver after a "code_saturne run --initialize...", I've missed something again ?

Thanks

Re: Code_Saturne + SLURM: Errors

Posted: Wed Mar 06, 2019 12:50 am
by Yvan Fournier
Hello,

Dis you reinstall the code after modification ?
As a note, you can use a batch template (named <basename>.SLURM if using SLURM) which is separate from the code sources, and use an absolute path in the code_saturne.cfg "batch" entry to use it (which avoids requiring reinstalling and also allows separating sources from "site" files).

In any case, you also need to build a new case (or import one using "code_saturne create --import-only" from within the base directory of a case) for the batch template to be rebuilt.

Best regards,

Yvan

Re: Code_Saturne + SLURM: Errors

Posted: Wed Mar 06, 2019 3:42 pm
by FredH
Dear Yvan,

Great news,
Finally my user was able to run a SLURM job on many nodes/cores.

He is using the run_solver in ../RESU/RUN_TEST of his case...
Adding #SBATCH options, good setup of openmpi... make it run.

Thanks a lot for your time and your support.

Best regards.