code_saturne User's Forum

Posted: **Fri May 03, 2013 10:13 pm**

Hello
I'm interested to know in your experiences with Saturne, up to how many elemensts it works properly. In other words, is it an optimal solver for a simulation with 100 milion of elements and even more?

Posted: **Mon May 06, 2013 9:36 am**

Hello,

The biggest calculations run in production with Code_Saturne are about 170 million cells. With that many processors, running with at least 500 cores on a cluster is recommended (typically, 500 to 4000 may be good depending on the type of cluster, with 500 to 2000 cores for recent Intel processors, 2000 to 4000 for Blue Gene type machines).

"Test" calculations with the code have been run up to 3 billion cells, but only on a few iterations.

Also, the preprocessor, which imports meshes, is serial, and may require a lot of memory. It has not been tested much higher than 80 million cells, as all the bigger calculations use mesh joining (inside Code_Saturne) to join several sub-pieces of a mesh.

Regards,

Yvan

Posted: **Thu Oct 03, 2013 2:41 pm**

Hello,
I performed a scaling test of CS and here is the results:

ncpu--------- execution time (s)------scaling (%)
2------------------1635------------------ ---
4 ------------------ 976------------------ 84
8 ------------------ 697------------------59
24------------------ 809------------------17
48------------------ 668------------------10

There is a drop between 8 and 24 cpu and with 48 cpu, performance is just 10%. Our cluster has 12 cores on each node. Other specifications of the cluster: http://www.calculquebec.ca/index.php/fr ... rs/briaree

Any ideas?

Posted: **Fri Oct 04, 2013 1:14 am**

Hello,

Your cluster configuration seems very similar to that of one of our 2 main clusters.

What size mesh are you using ? under about 10000 cells per core on Intel/Infinuband type architectures (3000 for IBM Blue Genes), scalability drops.

Depending on how distant the nodes on which the code is distributed, performance may also vary by a factor of 2. If this happens (depending on the configuration of your batch system), you would observe quite "noisy" measurements on different runs. Having the code share nodes with other tools would be the worst case, as one node slowed by another tool could slow all others, but this should not happen if your batch system is configured correctly).

Also, how was the code installed ? Did you use the MPI compiler wrappers enabled by modules on your system (which should be configured to use Infiniband), or did you use the automatic installer or use reinstall your own MPI library ? I recall you needed to install more recent gcc/gfortran versions, but did you also reinstall your version of OpenMPI ? By default, if you do not tell Open MPI's "configure" where the proper headers and libraries for Infiniband may be found (unless the are in /usr or /usr/local), it probably won't find them without adding an appropriate --with-openib=<path> option, and default to Ethernet, which will probably kill your scalability due to higher latencies (tests a few years ago led to a factor of 2). So if your environment modules are sufficiently well written to allow using your local gcc/gfortran with the "system" Open MPI compiler wrappers, that would help. You administrators may help you here. Otherwise, I might be able to provide recommendations based on the output of "module avail" on you cluster.

Regards,

Yvan

Posted: **Mon Oct 07, 2013 11:20 pm**

Thanks Yvan,

My grid is composed of 500000 nodes.

Saturne has been compiled with Gnu (4.6.3) and OpenMPI 1.6.4 compiled with gcc-4.6.3. on our cluster.

I attached the compilation file.

Would you please take a look.

Posted: **Tue Oct 08, 2013 6:39 pm**

Hello,

It seems you used OpenMPI 1.6.1, and not 1.6.4 (unless it is installed as 1.6.1), so if there are multiple Open MPI installs, it may not be using the one you believe.

In any case, the log does not tell me if Open MPI is built with Infiniband support, or how that is configured (and I am not an expert on Open MPI installation on clusters, so you admin may help you here).

Regards,

Yvan

Posted: **Fri Oct 11, 2013 3:11 pm**

Thanks Yvan,

Before recompiling the software, do you think that mesh format may affect the partiontioning?

Posted: **Fri Oct 11, 2013 5:54 pm**

Yvan, is possible for you to give me a case study that you have already tested its scaling on your cluster? If your cluster is similar to ours, it shows that we have an installation iusse.

Thank you

Posted: **Sat Oct 12, 2013 1:13 pm**

Hello,

I don't have any recent data available for small meshes (you can google for "prace code_saturne scaling" to get data on very large meshes).

For a 500 000 cell mesh, scaling on an Intel WestMere/Infiniband cluster should be good at least up to 12 cores, and still good to reasonable at 24 cores.

Did you run "ompi_info | grep openib" using the version of OpenMPI you installed for Code_Saturne ?
As I said before, you might not have Infinband support built-in (check the OpenMPI FAQ and mailing lists for that).

Also, I forgot to ask you how you made your speedup measurements:

- What execution time did you take into account ? The one at the end of the "listing" files ? A cluster job total time (which may include serial steps such as mesh import) ?
- How many time steps did you run, with how much I/O ?

As I/O usually does not scale as well as computing (and may depend on the filesystem and its load), and users generally do not output postprocessing results and such at very short time step intervals, a scaling measure representative of "real usage" needs to run on quite a few time steps, or have restart file and postprocessing output disabled.

To really see how scaling behaves, you need to look both at the global timing info at the end of the "listing" files, and at more detailed info in the "performance.log" files. This will enable you to see which parts scale and which don't.

Regards,

Yvan

Posted: **Thu Oct 17, 2013 2:41 am**

Thanks Yvan
The problem is more severe than elapsed time for reading mesh. I used another mesh with 3.5 million of nodes and i deactivated I/O, it runs for 10 time steps. The execution time on 4 cpus is less than 6 or 12!
SCOTCH library is used for partitioning. Does METIS may change something? How about mesh format?
I attached configuration files. Do you have any suggestion for this configuration?

I alsocattached a log file of my calculation on 6 cpus.

code_saturne User's Forum

Maximum number of elements in Saturne

Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne

Re: Maximum number of elements in Saturne