Parallel computing on a cluster

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Parallel computing on a cluster

Post by Ruonan »

Dear developers,

Please could you help me with these questions?

Recently I compiled the GitHub latest version on my university's cluster. This time I didn't use GUI, what I did is: copy the mesh, xml (generated by GUI on my desktop PC) and src files to the cluster's corresponding folder, then go into the /DATA/run.cfg file, change the node and CPU numbers, then run "code_Saturne run" command in DATA folder. The case can run, but I have three questions:

1. On the cluster, I got two nodes, each node has 20 CPUs. In the run.cfg file, I wrote "n_procs: 2 n_threads: 20". Is this correct? Does "n_threads" mean the total CPUs or the CPUs per node please?

2. Are there anything else I need to specify in the run.cfg file? For example the Input/output method, MPI rank step, etc. ?

3. When I want to stop the case and save the results, what command should I run in the terminal? In GUI I can click "Stop now" but in the terminal I don't know how to stop it.

Many thanks and best regards,
Ruonan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Parallel computing on a cluster

Post by Yvan Fournier »

Hello,

n_procs is the number of MPI processes used, and n_threads the number of OpenMP threads per CPU.

I do not recommend more than 2 threads per CPU, as OpenMP is not used everywhere, so
(n_procs = 40, n_threads=1) or (n_procs = 20, n_threads=2) are the recommended options.

There have already been similar questions on this forum, regarding performance, so you should find more info by searching.

You will find detailed documentation on the run.cfg here: https://www.code-saturne.org/documentat ... rg_run_cfg

To stop the code, look here: https://www.code-saturne.org/documentat ... ntrol_file

Regards,

Yvan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hi Yvan,

Thanks for your reply! they are very helpful.

Regarding stopping the code, I tried but failed. I generated "control_file", added a line "<time_step_number>1000" into it, and put this file in the DATA folder (I also tried SRC folder). At that moment, the case had run more than 1000 timesteps, so I think the calculation should stop immediately after that timestep. But the calculation didn't stop, nothing happened. Please could you tell me if anything I did wrong?

Best regards,
Ruonan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Parallel computing on a cluster

Post by Yvan Fournier »

Hello,

The control_file must be placed in the execution folder (RESU/<run_id>) to be used.

If copied in DATA, it will be copied to RESU/<run_id> for each next run (probably not what you want).

Regards,

Yvan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hi Yvan,

Thanks for your reply! I tried but when I put the control_file in RESU/<run_id> folder, the control_file will be deleted immediately, and the calculation can't stop. Did I write the control_file wrong? I only have one line in the control_file:

Code: Select all

<time_step_number>100
Thanks for checking!

Best regards,
Ruonan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hi Yvan,

Sorry I miswrote the control_file. Just add a number "1" into control_file, and copy it to the result folder will work.

Many thanks,
Ruonan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hello Yvan,

Could you please help me with this error? I ran a test case in parallel on cluster, using the setting that you recommended, but still can't do it.

I have 1 node with 27 CPUs. I set "n_procs: 27, n_threads: 1" in run.cfg, but the calculation can't start. The error is shown below, the run_solver.log file and two error files are attached.

Code: Select all

  ----------------------------------------------------------
 Composing periodicities

 Halo construction with standard neighborhood
 ============================================

 Face interfaces creation
 Definition of periodic vertices
 Vertex interfaces creation
 Halo creation
 Halo definition
    Local halo definition
    Distant halo creation
SIGINT signal (Control+C or equivalent) received.
--> computation interrupted by user.

Call stack:
   1: 0x7fdbf92d9296 <PMPIDI_CH3I_Progress+0x1146>    (libmpi.so.12)
   2: 0x7fdbf93e4c29 <MPIC_Wait+0x39>                 (libmpi.so.12)
   3: 0x7fdbf93e526a <MPIC_Recv+0xea>                 (libmpi.so.12)
   4: 0x7fdbf92bdeef <MPIR_Barrier_intra+0x2ff>       (libmpi.so.12)
   5: 0x7fdbf92bd875 <I_MPIR_Barrier_intra+0x125>     (libmpi.so.12)
   6: 0x7fdbf92bd6cc <MPIR_Barrier+0xc>               (libmpi.so.12)
   7: 0x7fdbf92bd5fc <MPIR_Barrier_impl+0x4c>         (libmpi.so.12)
   8: 0x7fdbf92bf482 <PMPI_Barrier+0x1c2>             (libmpi.so.12)
   9: 0x7fdbfb56bf5f <+0x5f4f5f>                      (libsaturne-7.1.so)
  10: 0x7fdbfb56e229 <cs_mesh_halo_define+0x1139>     (libsaturne-7.1.so)
  11: 0x7fdbfb52e817 <cs_mesh_init_halo+0x1cd7>       (libsaturne-7.1.so)
  12: 0x7fdbfb106aa0 <cs_preprocess_mesh+0x370>       (libsaturne-7.1.so)
  13: 0x7fdbfc156b96 <main+0x2d6>                     (libcs_solver-7.1.so)
  14: 0x7fdbf89e6c05 <__libc_start_main+0xf5>         (libc.so.6)
  15: 0x401879     <>                               (cs_solver)
End of stack
But what strange is: I tried to decrease the process number, using "n_procs: 8, n_threads: 1", the calculation can run with no error. I also tried to run this case on my desktop PC, it can run with no error. So I think the case setting is ok, the error is related to parallel running.

(I am using the master version from GitHub. When I compiled the code on cluster, I used the semi-automatic installation method. PT-Scotch and ParMETIS were installed with no errors.)

Could you please guide me on what is wrong here?

Many thanks,
Ruonan
Attachments
run_solver.log
(18.33 KiB) Downloaded 109 times
error_r09.log
(648 Bytes) Downloaded 102 times
error.log
(1.2 KiB) Downloaded 107 times
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hi Yvan,

Thank you! I still have no idea what to do with the previous error. But I tried other nodes with different features. Then all the errors disappeared. I think some nodes on my cluster are not compatible with Saturne, or need special settings maybe, for some reason.

I tested the parallel performance on the cluster, using 2~56 cores. Could you suggest me, based on your experience, whether this parallel performance is good or not?

Please see the two graphs below. The speedup ratio is calculated by (time using 1 core)/(time using n cores). The parallel efficiency is calculated by (speedup ratio)/(core number). I got a parallel efficiency of about 40%. I followed the suggestion: putting 20000~80000 cells per core. This suggested region is highlighted in green.

I can see you are very experienced in parallel optimization, with many papers published. I really appreciate your help.

Best regards,
Ruonan
Attachments
parallel efficiency.jpg
speedup ratio.jpg
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Parallel computing on a cluster

Post by Yvan Fournier »

Hello,

What type of cluster do you have (processor type, network type, ... and even MPI library install)? From one of your older logs I would say Intel(R) Xeon(R) Gold 5120, but that does not tell me whether you have a fast network (Infiniband for example or something with higher latency (Gigabit Ethernet maybe). Or whether the MPI drivers make the best of the network (the compilers and system seem old).

Even on our own clusters (the latest is the one described here https://top500.org/system/179899/), we can observe a factor of 2 on the performance depending on the compilers and especially MPI library configuration used (between optimized libraries and a "generic" workstation-type configuration).

Partitioning quality may play a major role, as well as load balance. Do you have the performance.log files for some of your runs on different number of processes ?

Also, are other codes running on the same nodes, or do you have exclusive access (such as when using SLURM's --exclusive option, or whatever equivalent option LSF, Torque, or the scheduler/resource manager you use may have ?

And finally, some specific models might not scale as well as the commonly-used ones. The info in timer_stats.csv and performance.log can provide precious feedback, to help see where things are slower.

All these factors can be important.

Best regards,

Yvan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hello Yvan,

Thanks a lot for your very useful comments. I have attached the performance.log files when using 2cores, 28cores and 56 cores. I really appreciate it if you could help check.

Here are some details of my cluster:
Processor type: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
Network type: op
MPI version: 3.0 (MPICH 3.1.2)

I have "gold-5120" nodes as well, but these nodes give me errors described in the previous post. So I can only use "e5-2660" nodes now.

Yes, I use "--exclusive" command.

Best regards,
Ruonan
Attachments
56cores-performance.log
(37.13 KiB) Downloaded 99 times
28cores-performance.log
(37.1 KiB) Downloaded 96 times
2cores-performance.log
(37.11 KiB) Downloaded 88 times
Post Reply