Parallel computing on a cluster

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hello Yvan,

Here follow the timer_stats.csv files. They may help. Thank you!

Best regards,
Ruonan
Attachments
28cores-timer_stats.csv
(125.72 KiB) Downloaded 87 times
56cores-timer_stats.csv
(270.2 KiB) Downloaded 76 times
2cores-timer_stats.csv
(413.96 KiB) Downloaded 88 times
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Parallel computing on a cluster

Post by Yvan Fournier »

Hello,

Comparison is a bit tricky, since you have 2633 time steps in the 56 cores case, and 1224 in the 28 cores case.

Looking at the averages, I find 0.848 s/time step for the 56 cores case, and 1.64 s/time step for the 28 cores case and 9.34 s/time step for the 2 core case., so efficiency seems to match your curve. I see no specific operation being slower (we have about 2/3 of the time in the linear solvers and gradients for the 2 procs case, and 3/4 for the 56 procs case, so I see no obvious issue here.

If you are on a single node, memory bandwidth saturation does not seem to be the issue either, because in that case you would have very little additional speedup from 28 to 56. So I would guess some MPI or network driver aspect comes into play here. Do you have "vanilla" MPICH 3.1 on the machine, or some version with optimized drivers for OmniPath ? That could be the cause of the performance loss ?

Best regards,

Yvan
Ruonan
Posts: 136
Joined: Mon Dec 14, 2020 11:38 am

Re: Parallel computing on a cluster

Post by Ruonan »

Hello Yvan,

Thanks a lot for your comments! I really appreciate your help!

Sorry for not running the same timesteps for each case. I also use the average time for each step, the same as your method.

Actually not all the cases are on a single node. I have 28 cores per node. So for the 2cores and 28cores cases, I only use one node. But for the 56cores case, I use two nodes. So will the "memory bandwidth saturation" be a problem?

I will check the MPI or network driver thing with my IT support and get back to you soon. Because as you said, if I can increase the parallel performance by a factor of 2, that will be wonderful.

Best regards,
Ruonan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Parallel computing on a cluster

Post by Yvan Fournier »

Hello,

In that case, the drop in performance may be due to saturating the (node) memory bandwidth on the node when you move to 28 ranks.

A test to confirm this would be to try running 14 ranks on a single node, and 28 ranks on 2 nodes, and check the performance in those configurations.

Best regards,

Yvan
Post Reply