Page 1 of 1

Parallel computation, computation time and hardware installa

Posted: Fri Jul 09, 2010 11:51 am
by Alicia Consigny
Hi all,
I'm sorry to bother you here again with this kind of subject, but maybe it does interest other Saturne's users.
I have three different possibilities for a new computer my boss is planning to buy, and basically I would like to know which one you think is the best for CFD and CFD/thermal computations with Saturne and Syrthes.
Each configuration is one node, several cores :
1) 6 cores, of 3,6 GHz  each and 6 Mo of cache memory
2) 8 cores, of 2,4 GHz each and 12 Mo of cache memory
3) 12 cores, of 1,9 GHz each and 12 Mo of cache memory
Anyway it will have 10Go RAM.
The calculations I'm interested in doing are on 3 to 6 millions cells meshes, resolving velocity, pressure and temperature fields (+ turbulence variables) with Saturne and sometimes coupled with Syrthes. The best would be to have the results of a steady state calculation within a few hours (at best 2 hours (lunch time !), at most one night, 10 to 15 hours). Another question is : do you think it is possible ? (I'm sure there are other parameters to take into account to get an estimation, but it's just to have a general idea).
Do you think the combination (nb of cores, frequency, cache memory) (one of the above)/RAM seems to be coherent ?
Last but not least, is there any specific recommandation about the installation or the use of more than 4 processors for a calculation with Saturne / Syrthes ?
Thanks a lot in advance for any information on the subject you can give me,
Alicia

Re: Parallel computation, computation time and hardware installation

Posted: Fri Jul 09, 2010 2:24 pm
by Olivier Geoffroy
Hi, Just a few hint from a beginner with CS: we have done some bench on a single machine 2cpu 4 core xeon (read 8 core) @ 2.2 Ghz. For mesh in the 2-4 millions range, the scalling is good up to 4 core (400 % with simple case i.e pipe, down to 280 % for complex case. For 8 core, in good case, speedup was 460 % only ( and 510% with 6 core !), and 330 % for complex case And forget the hyper-treading, speedup goes down to 150% - 180% with 16 core (i.e 8core+8HT) Here my conclusion would be: go with the configuration 1 or 2. Olivier
 

Re: Parallel computation, computation time and hardware installation

Posted: Fri Jul 09, 2010 5:29 pm
by Alicia Consigny
ok thank you very much for your answer ! it's always good to have feedback from users who use personal pc and not big clusters !!

Re: Parallel computation, computation time and hardware installation

Posted: Fri Jul 09, 2010 5:54 pm
by Yvan Fournier
Hello,
Code_Saturne is usually limited more by memory bandwith and latency than by theoretical peak CPU power, and you results seem in line with this. Having less cells per core (in the 20,000 - 50,000 cells/core range) may give better cache performance and less memory contention, so speedup would be better, but this means running on clusters instead of desktops...
Olivier's results are not surprising, but be careful: in his case, the 4 or 8 core tests were run on the same machine, with the same amount of cache. In your case, the 6 core option has a faster processor, with less cache memory, so it is hard to guess which will balance out the other in this case. Are all options Intel Xeons, AMD Opterons, or do you have a mix ? The memory bandwiths may be different, and change with every few generations, so it is very hard to keep up to date with which processor is faster...
Finally, I recently ran a few tests on a 7.9 million cell mesh on both a cluster, and performance in the first time steps seemed 10 times faster on 32 Xeon 5570 cores (4 nodes with Infiniband network, 8 processors/node, 2.93 GhZ, 8 Mb cache, 12 Gb/node, Intel compiler) than on a single 8 Xeon 5504 core node (2 GhZ, 4 Mb cache, 12 Gb, Gcc compiler). Still, I ran 50 time steps in 16 hours, so if you run steady simulations, having results overnight seems realistic. Also, 10 Gb should be just enough for simulations on 6 million cells.
Best regards,
  Yvan