Slow computation for a relatively large model with 5m hexa elements

George Xu · Post by **George Xu** » Wed Nov 23, 2011 5:43 am

Hi,
I tried to run code_saturne-2.1.0 and previous versions for a relatively large model with 5m or so hexa elements. The simulation is simply related to fluid flow nothing to do with heat transfer. I found out that the computation was tremendously slow: each iteration may cost about 2min even when 6 cpus was used. I have tried to run the case using FLUENT and the computational time required is less than one-tenth of that required by code_saturne.
When I run smaller models (only a few hundreds thousands elements), the computational efficiency for code_saturne is pretty attractive.
COuld anybody advise me whether this is common for code_saturne when it is applied for larger models? What should I do in order to improve the efficiency?
By the way, I use metis-4.0 to parallalize my domain in 64-bit Ubuntu 10.04.
Regards,
George

Yvan Fournier · Post by **Yvan Fournier** » Wed Nov 23, 2011 12:59 pm

Hello,
Are you using the same mesh with both codes ? Are you using default options ? Are you running a transient or steady calculation ? Code_Saturne's default options are geared towards more precise but slower calculations, so changing default solver precision from 1.e-8 to 1.e-5 may help, and using upwind rather then centered may also help.
Also, choice of time step and mesh quality are important. In the log (listing) file, you have some detail of how much time Code_Saturne spends in different subsets, which may be useful (if convergence of linear solvers is bad, for example, you would see it there).
Did you compare computation time running on 1, 2, and 6 cpus ? Also, how much memory do you have, and what type of processor ? Depending on the hardware and MPI libraries, you may have memory contention or bad cache behavior, which (besides mesh quality differences) could explain a marked difference in performance between 2 similar cases with different mesh refinements.
Also, did you check that the multigrid option is activated ?
With version 2.1, if you have doubts on METIS-4, you can run with the built-in partitioning based on a space-filling curve (see the advanced options tab in the calculation management/prepare batch calculation section of the GUI). This is usually slightly less optimal so a little slower than with METIS, but in some cases, it may actually perform better, due to a better (case-dependent) load-balance.
Best regards,
Yvan

George Xu · Post by **George Xu** » Thu Nov 24, 2011 2:29 am

HI Yvan,
Great appreciation for your prompt reply.
For your information, my system configuration is as follows:
OS: Ubuntu 10.04 (64-bit) with Kernel Linux 2.6.32-35-generic. GNOME 2.30.2
Hardware: 32G Memory + SIX processors with Intel Xeon X5670@2.93GHz CPU
In my test, I have run three cases varying in no. of elements: 62K, 1.7M and 5M hexa-dominated. All 6CPUs were used in the run.
I have ever made such test using different version of code_saturne, including 1.3.3, 1.4, 2.0 and 2.1.0.
In the case settings, steady state simulations were performed. I did reduce the targetted residual for linear solver to be 1.0e-5, with multigrid activated. Openmpi-1.4.4 was used in the compilaltion and simulation. Six CPUS are all along used in all the tests, since we would like to challenge the limits for our system.
It turns out that the simulation time for each iteration, extracted from the log file, is summarized below:
0.56s for 62k model;
26s for 1.7m model;
120s for 5m model;
To me, the former two tests are reasonable, since the computational time is almost proportional to the model size. However, the third case is unacceptable slow. I wonder whether it is too large to be handled by my system due to the limited resources, or certain settings are required to be activated in CS in order to efficiently simulate such a large model.
I have not tested the scability curve according to different no. of CPUS, either the built in partitioning method instead of metis. I may try to do so in immediate future. Thanks for your information.
I am not sure about the meaning of memory contention and cache behavior in my system? For my tests, I found out that only half of my system memory has been used in the simulations, which implies that our memory is sufficient eought for the models to me. Am I right to say that? As for cache behavior, could you please share more how to twist the settings in our system in order to improve the performance by using cache? What is the indicator to check whether the cache settings are sufficient or not?
Thank you so much for your help.
Best regards,
George

George Xu · Post by **George Xu** » Thu Nov 24, 2011 2:51 am

Hi Yvan,
I have re-checked my runs and found out that almost no swap has been used during the simulation.

Tasks: 253 total, 9 running, 244 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.7%us, 0.5%sy, 97.8%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 33009404k total, 12214196k used, 20795208k free, 533040k buffers
Swap: 32107512k total, 0k used, 32107512k free, 8401972k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21264 georgexu 30 10 484m 272m 10m R 100 0.8 4:02.28 cs_solver
21265 georgexu 30 10 482m 271m 11m R 100 0.8 4:04.79 cs_solver
21266 georgexu 30 10 482m 271m 10m R 100 0.8 4:06.41 cs_solver
21263 georgexu 30 10 483m 272m 11m R 98 0.8 4:08.04 cs_solver
21262 georgexu 30 10 480m 270m 11m R 95 0.8 4:05.17 cs_solver
21261 georgexu 30 10 480m 270m 11m R 95 0.8 4:05.46 cs_solver
20739 georgexu 20 0 243m 25m 19m R 5 0.1 0:33.86 gnome-system-mo

Is the slow performance related to the settings in my system?
regards,
George

Yvan Fournier · Post by **Yvan Fournier** » Thu Nov 24, 2011 11:05 am

Hello George,
Output from "top" is not very useful, as CPU usage does not tell you whether your performance is degraded by cache misses or memory contention, and MPI processes can do "active waiting, meaning it does not even tell yopu anything about load balancing.
What would be useful would be a "listing" file from a small case, and the one from the 5m elements case, so as to compare the 2. Also, my previous remarks about comparing options betwwen Code_Saturne and FLUENT still holds.
Finally, what type of hardware are you using (seems to be a big workstation, but details may be useful).
Best regards,
Yvan

George Xu · Post by **George Xu** » Thu Dec 01, 2011 5:45 am

Hi Yvan,
The CPU time I reported here was extracted from 'listing' files for cs simulations.
I have done another round test with 'least square method over the neighbouring cells' option as the gradient calculation method, instead of the the default 'iterative handling for non-orthogonality'. I found that the required CPU time for each iteration was substaintially reduced. The results were summarized below for the relevant cases:
for 1.7m model, 8s (for least square method) vs 26s (for default option) for 5m model, 20s (for least square method) vs 120s (for default option)
To me, this set of values is much more reasonable.
It will be more appreciable if you can share why the default option is so much more costly? Could you please also give me any hints about the differences from the two options in terms of solution accuracy?
Best regards,
George

code_saturne User's Forum

Slow computation for a relatively large model with 5m hexa elements

Slow computation for a relatively large model with 5m hexa e

Re: Slow computation for a relatively large model with 5m hexa elements

Re: Slow computation for a relatively large model with 5m hexa elements

Re: Slow computation for a relatively large model with 5m hexa elements

Re: Slow computation for a relatively large model with 5m hexa elements

Re: Slow computation for a relatively large model with 5m hexa elements