Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Miscellaneous discussion topics about Code_Saturne (development, ...)
Post Reply
antoineb
Posts: 26
Joined: Mon Sep 16, 2019 4:06 pm

Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by antoineb »

Hi,

I'm wondering if anyone has ever tried to run code_saturne simulations on an Azure Batch cluster or Google Cloud HPC ?
I'm not exactly sure how to set up the whole thing, but before diving into some time-consuming trial and error, I wanted to know if there was any point doing so...

I am refering to these cloud infrastructure : To give a hint on what I am trying to achieve :
  • Automatically deploy a code_saturne container (successfully created a docker image running code_saturne v8.1 and SALOME v9.11 on debian 12 locally)
  • Upload setup files for mesh generation and simulation onto cloud storage
  • Run the simulation on 16, 64 or 1000 cpus depending on the case
Best regards,
Antoine
Yvan Fournier
Posts: 4130
Joined: Mon Feb 20, 2012 3:25 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by Yvan Fournier »

Hello,

There have been successful tests on AWS clusters using ARM Graviton processors, including large cases on more than 1000 cores (and we are working on improving the settings for this type of configuration), but I do not know of any runs on Azure or Google cloud infrastructures.

Basically, there is no reason the code should not work, but efficiency and scalability will depend heavily on the underlying hardware (both processor and network). The MPI performance will probably depend heavily on the container type used (for example, on many clusters, Singularity is recommended over Docker, both for security and performance aspects).

Since the latency aspects of the MPI library can have a high impact on performance (especially when running on multiple nodes), you will probably want to check the scalability of the code. If those services allow you to choose the type of hardware (I think this is the case), choose something with a fast interconnect.

Best regards,

Yvan
antoineb
Posts: 26
Joined: Mon Sep 16, 2019 4:06 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by antoineb »

Hi Yvan,

I finally set up the whole thing on an AWS cluster using graviton cpus as suggested.
But I am now facing a challenge launching code_saturne on multiple nodes.

Here is what i have :
  • AWS cluster with head node and 10 compute nodes of 64 cores each (no multithread). I used C6gn instances as they use EFA which is AWS fast interconnect network capability.
  • SLURM as job manager
  • A singularity image with code_saturne and all dependencies installed (that works in a local singularity environment with no nodes or job manager)
i am trying to launch my calculation with the following scripts :

cs_job.sh

Code: Select all

#!/bin/bash
#SBATCH --job-name=mpi_cs
#SBATCH --output=csout.out
#SBATCH --nodes=2
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --partition=c6gn

/shared/apps/singularity/4.0.3/bin/singularity exec -e --no-home --bind ../:/TEST,../../:/mnt ../../code_saturne.sif /mnt/cs_compute.sh

exit 0
cs_compute.sh

Code: Select all

#!/bin/bash

shopt -s expand_aliases
alias code_saturne=${cs_path}

code_saturne run

exit 0
My run.cfg file is set up to use 128 procs with 1 thread each.


But i get the following output :

Code: Select all

                      code_saturne
                      ============

Version:   8.1.0
Path:      /opt/code_saturne/8.1.0

Result directory:
  /home/ec2-user/TEST/relief/RESU/20240126-2244

Copying base setup data
-----------------------

Compiling and linking user-defined functions
--------------------------------------------

Preparing calculation data
--------------------------

 Parallel code_saturne on 128 processes.

Preprocessing calculation
-------------------------

Starting calculation
--------------------

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 128
slots that were requested by the application:

  ./cs_solver

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
 solver script exited with status 1.

Error running the calculation.

Check run_solver.log and error* files for details.

Domain None (code_saturne):
  run_solver.log, error*.

Post-calculation operations
---------------------------

Run failed in calculation stage.
I am new to this and i don't really get how SLURM, OPENMP and code_saturne get to work together to set up the proper amount of cpus/tasks/nodes... etc

Can you give a hint on how to make this work ?

Best regards,
Antoine

----------
EDIT 29/01 :
----------

I did some digging on google (and asked chatGPT) and i got to the point where a calculation runs perfectly on 1 node with the number of cores set up in my slurm script and removing the run.cfg file from code_saturne case/DATA directory.

However, when I try to run on 2 nodes I get the following error :

Code: Select all

                      code_saturne
                      ============

Version:   8.1.0
Path:      /opt/code_saturne/8.1.0

Result directory:
  /home/ec2-user/TEST/relief/RESU/20240129-1102_121

Copying base setup data
-----------------------

Preparing calculation data
--------------------------

 Parallel code_saturne on 128 processes.

Preprocessing calculation
-------------------------

Starting calculation
--------------------

--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
 solver script exited with status 1.

Error running the calculation.

Check run_solver.log and error* files for details.

Domain None (code_saturne):
  run_solver.log, error*.

Post-calculation operations
---------------------------

Run failed in calculation stage.
The main difference since the edit is that I installed slurm, and munge inside the container and i used --bind option when running singularity to share the slurm and munge config from the host to the container.

Best regards,
Antoine
Yvan Fournier
Posts: 4130
Joined: Mon Feb 20, 2012 3:25 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by Yvan Fournier »

Hello,

To understand how code_saturne runs using SLURM, there are 2 important points in the documentation :
- Configuring SLURM in the post-install step : https://github.com/code-saturne/code_sa ... tall-steps. (if you do no do this, you will need to work against the main "code_saturne run" command, instead of working with it. At the very least you can set "batch=SLURM" in the configuration file, and it is best to actually point to a SLURM batch template adapted to your machine, rather then to the generic one provided, so that you do not need to edit parameters such as partitions for each new case.

- The following section https://www.code-saturne.org/documentat ... ation.html provides much more details on how the various run steps are organized.

Once the code is configured, we can use "code_saturne submit" instead of "code_saturne run" to run the batch job.

Actually, if you have exclusive use of the VM's, and are running just one job on them, SLURM might not be needed. But to use more than one node, you will probably either need to configure SLURM, or pass additional options to the mpiexec comand (such as a hostsfile) to make sure srun or mpiexec is aware that multiple nodes are available.

In any case, whatever your configuration choices, when a computation fails in this way, take a look at the "run_solver" script generated in the execution directory. It contains the actual MPI launcher command (srun/mpiexec/...) used, depending on the post-install configuration, and whatever the code managed to auto-detect otherwise.

Best regards,

Yvan
antoineb
Posts: 26
Joined: Mon Sep 16, 2019 4:06 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by antoineb »

Hi Yvan,

Thank you for your answer.

However I still don't completely get it.

I set up the config file with a custom batch.SLURM file that i bind with the singularity container for code_saturne to find.

Code: Select all

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --cpus-per-task=1
#SBATCH --time=0:10:00
#SBATCH --partition=c6gn
#SBATCH --output=job_%j.out.log
#SBATCH --error=job_%j.err.log
#SBATCH --job-name=csjob

Code: Select all

[install]
batch=/csconfig/batch.SLURM

[mpi]
mpiexec = mpirun
mpiexec_n = ' -np '
I also did bind the slurm.conf directory and the munge directory (if I did not, I has errors about slurm configuration and cluster name, etc...)

When i run code_saturne submit through the singularity container with the following scrpits :

Code: Select all

#!/bin/bash
#SBATCH --job-name=csjob
#SBATCH -N 2
#SBATCH -n 128
#SBATCH --ntasks-per-node=64

set -x

/shared/apps/singularity/4.0.3/bin/singularity exec --bind /run/munge:/run/munge,/opt/slurm/etc:/opt/slurm/etc,../../csconfig:/csconfig,../../:/mnt ../../code_saturne.sif /mnt/cs_submit.sh

exit 0

Code: Select all

#!/bin/bash
set -x
shopt -s expand_aliases
alias code_saturne=${cs_path}

code_saturne submit --id=testrun --param=setup.xml

exit 0
I get the following error :

Code: Select all

/var/spool/slurmd/job00348/slurm_script: line 9: /opt/code_saturne/8.1.0/bin/code_saturne: No such file or directory
If I understand correctly, the script is trying to call code_saturne on the nodes, but it is not installed there, only in the container.

The SLURM/MPI installation works as I am able to launch some basic python mpi jobs across nodes using :

Code: Select all

#!/bin/bash
#SBATCH -N 2                  # Number of nodes
#SBATCH -n 6                # Total number of tasks
#SBATCH --ntasks-per-node=3  # Number of tasks per node

mpirun -np $SLURM_NTASKS /shared/apps/singularity/4.0.3/bin/singularity exec mpi_container.sif python3 hello_mpi.py
I'm not sure what I'm doing wrong with code_saturne...
Yvan Fournier
Posts: 4130
Joined: Mon Feb 20, 2012 3:25 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by Yvan Fournier »

Hello,

To run in a manner similar to non-containerized installs, you would need to call either "code_saturne run' or "code_saturne submit" from within the container.

"code_saturne submit" will first prepare the execution directory "immediately" (i.e. run "code_saturne run --stage"), then submit the remaining steps to SLURM using sbatch.

"code_saturne run" does no SLURM submission, so needs to be called from an SLURM job (either using sbatch or salloc).

I'll recommend colleagues who have more experience with Singularity to check your post (they have built singularity containers with code_saturne), but since in your case if fails very early (i.e. not finding the install in /opt probably means the containerized command is not called corrrectly), so you could probably test this with a simpler example script calling a hello world MPI program before doing this with code_saturne. You will need to code and prerequisites (requiring the containers) to run on the nodes.

I can try to check with the people from AWS who ran benchmarks, but in the benchmarking case, the runs probably did not need to be automated in the same way, so lower-level commands in the run directory may have been used.

Best regards,

Yvan
antoineb
Posts: 26
Joined: Mon Sep 16, 2019 4:06 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by antoineb »

Hi Yvan,

I think I have a configuration issue between SLURM on the nodes and mpi inside the container.
For now, I will try to install code_saturne directly onto the nodes since this cluster will be used exclusively for this purpose...

I'll keep you updated.

Thanks for the advices !

Best regards,
Antoine
antoineb
Posts: 26
Joined: Mon Sep 16, 2019 4:06 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by antoineb »

Hi Yvan,

Works perfectly now that I installed code_saturne on a /shared partition between all nodes.
I can just submit code_saturne run through sbatch and performance is improving a lot while running on multiple nodes !

Leveraging AWS's fsx Lustre partition type, I gained 25% performance over 512 cores on a 30M cells mesh, thanks to better I/O speeds.

I am now looking into getting the simulation running as fast as possible to reduce costs. I saw that using PT Scotch might help gain from 10 to 50% in speed. I built code_saturne with PT Scotch support :

Code: Select all

../code_saturne-${CODE_SATURNE_VERSION}/configure --prefix=${cs_path} \
--with-hdf5=${hdf5_path} --with-med=${med_path} \
--with-scotch=${scotch_path} --without-metis \
--disable-static --disable-gui \
PYTHON=/usr/bin/python3 CXX=mpicxx CC=mpicc FC=mpif90 LDFLAGS=-lgdal
PT Scotch was built with

Code: Select all

cmake -DCMAKE_INSTALL_PREFIX:PATH=${scotch_path} -DCMAKE_C_FLAGS="-fPIC" ..
It worked well, but when I run a simulation it gets stuck at reading mesh_input.csm :

Code: Select all

Local case configuration:

  Date:                Fri Feb  9 15:51:48 2024
  System:              Linux 5.15.0-1051-aws (Ubuntu 22.04.3 LTS)
  Machine:             c6gn-dy-c6gn16xlarge-5
  Processor:

  Memory:              126512 MB
  User:                ubuntu (AWS ParallelCluster user)
  Directory:           /scratch/TEST/altarea_PREPRO/RESU/prepro
  MPI ranks:           256 (appnum attribute: 0)
  MPI ranks per node:  64
  OpenMP threads:      1
  Processors/node:     64

  Compilers used for build:
    C compiler:        gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    C++ compiler:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    Fortran compiler:  GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

  MPI version: 3.1 (Open MPI 4.1.6)
  OpenMP version: 4.5

  External libraries:
    PT-SCOTCH 7.0.4

  I/O read method:     collective MPI-IO (explicit offsets)
  I/O write method:    collective MPI-IO (explicit offsets)
  I/O rank step:        1


 Reading file:        mesh_input.csm
 Finished reading:    mesh_input.csm
I saw your answer here : viewtopic.php?p=17083 about modifying Scotch mpi rank, and tried it, but still stuck...

Here is the setup.xml part regarding this :

Code: Select all

<calculation_management>
    <block_io/>
    <partitioning>
      <rank_step>8</rank_step>
      <type>scotch</type>
    </partitioning>
    <run_type>mesh preprocess</run_type>
    <start_restart>
      <frozen_field status="off"/>
    </start_restart>
  </calculation_management>
Should I create another post on the forum ?

Best regards,
Antoine
Yvan Fournier
Posts: 4130
Joined: Mon Feb 20, 2012 3:25 pm

Re: Use of Azure Batch or Google Cloud HPC to run simulation on cloud cluster

Post by Yvan Fournier »

Hello,

Yes, creating another post in the "code_saturne usage" section will be better. I will probably also move this thread from discussion to "installation", as it started as a general discussion subject but evolved into an install issue.

If the computation is stuck, I recommened trying to switch to the "crystal router" option in "Performance settings/MPI algorithms/All to all data movement". But I can say that in the "general usage" section.

Best regards,

Yvan
Post Reply