Hello Yvan,
Here follow the timer_stats.csv files. They may help. Thank you!
Best regards,
Ruonan
Parallel computing on a cluster
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
Re: Parallel computing on a cluster
- Attachments
-
- 28cores-timer_stats.csv
- (125.72 KiB) Downloaded 221 times
-
- 56cores-timer_stats.csv
- (270.2 KiB) Downloaded 217 times
-
- 2cores-timer_stats.csv
- (413.96 KiB) Downloaded 234 times
-
- Posts: 4220
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Parallel computing on a cluster
Hello,
Comparison is a bit tricky, since you have 2633 time steps in the 56 cores case, and 1224 in the 28 cores case.
Looking at the averages, I find 0.848 s/time step for the 56 cores case, and 1.64 s/time step for the 28 cores case and 9.34 s/time step for the 2 core case., so efficiency seems to match your curve. I see no specific operation being slower (we have about 2/3 of the time in the linear solvers and gradients for the 2 procs case, and 3/4 for the 56 procs case, so I see no obvious issue here.
If you are on a single node, memory bandwidth saturation does not seem to be the issue either, because in that case you would have very little additional speedup from 28 to 56. So I would guess some MPI or network driver aspect comes into play here. Do you have "vanilla" MPICH 3.1 on the machine, or some version with optimized drivers for OmniPath ? That could be the cause of the performance loss ?
Best regards,
Yvan
Comparison is a bit tricky, since you have 2633 time steps in the 56 cores case, and 1224 in the 28 cores case.
Looking at the averages, I find 0.848 s/time step for the 56 cores case, and 1.64 s/time step for the 28 cores case and 9.34 s/time step for the 2 core case., so efficiency seems to match your curve. I see no specific operation being slower (we have about 2/3 of the time in the linear solvers and gradients for the 2 procs case, and 3/4 for the 56 procs case, so I see no obvious issue here.
If you are on a single node, memory bandwidth saturation does not seem to be the issue either, because in that case you would have very little additional speedup from 28 to 56. So I would guess some MPI or network driver aspect comes into play here. Do you have "vanilla" MPICH 3.1 on the machine, or some version with optimized drivers for OmniPath ? That could be the cause of the performance loss ?
Best regards,
Yvan
Re: Parallel computing on a cluster
Hello Yvan,
Thanks a lot for your comments! I really appreciate your help!
Sorry for not running the same timesteps for each case. I also use the average time for each step, the same as your method.
Actually not all the cases are on a single node. I have 28 cores per node. So for the 2cores and 28cores cases, I only use one node. But for the 56cores case, I use two nodes. So will the "memory bandwidth saturation" be a problem?
I will check the MPI or network driver thing with my IT support and get back to you soon. Because as you said, if I can increase the parallel performance by a factor of 2, that will be wonderful.
Best regards,
Ruonan
Thanks a lot for your comments! I really appreciate your help!
Sorry for not running the same timesteps for each case. I also use the average time for each step, the same as your method.
Actually not all the cases are on a single node. I have 28 cores per node. So for the 2cores and 28cores cases, I only use one node. But for the 56cores case, I use two nodes. So will the "memory bandwidth saturation" be a problem?
I will check the MPI or network driver thing with my IT support and get back to you soon. Because as you said, if I can increase the parallel performance by a factor of 2, that will be wonderful.
Best regards,
Ruonan
-
- Posts: 4220
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Parallel computing on a cluster
Hello,
In that case, the drop in performance may be due to saturating the (node) memory bandwidth on the node when you move to 28 ranks.
A test to confirm this would be to try running 14 ranks on a single node, and 28 ranks on 2 nodes, and check the performance in those configurations.
Best regards,
Yvan
In that case, the drop in performance may be due to saturating the (node) memory bandwidth on the node when you move to 28 ranks.
A test to confirm this would be to try running 14 ranks on a single node, and 28 ranks on 2 nodes, and check the performance in those configurations.
Best regards,
Yvan