Exceeded memory limit

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
konst
Posts: 30
Joined: Sun Sep 17, 2017 7:41 pm

Exceeded memory limit

Post by konst »

Hello!

I was running CS v5.3 and v5.2 with RSM LRR turbulence model, to calculate turbulent flow around cylinder. But this calculations are stopping after ~145000 timestep with error:

Code: Select all

slurmstepd-atcn451: error: Job 17541583 exceeded memory limit (61472984 > 61440000), being killed
slurmstepd-atcn451: error: Exceeded job memory limit

Probably there is some memory leaks in the implementation of RSM model. Is there a way to avoid this problem?

Best regards, Konstantin
Attachments
setup_rij-ssg.xml
(12.35 KiB) Downloaded 193 times
cylinder_eD04_large_zeroBedV2_9_1.med
(7.45 MiB) Downloaded 191 times
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Exceeded memory limit

Post by Yvan Fournier »

Hello,

Do you have a small test case ? We could debug this.

In any case the LRR model is nod recommended. SSG is a more "correct" RSM model (though tge memory leak might appear in both).

I won't be able to check before the end of the week, but I'll check if you can provide a small test case.

Best regards,

Yvan
konst
Posts: 30
Joined: Sun Sep 17, 2017 7:41 pm

Re: Exceeded memory limit

Post by konst »

Thank you for your reply, Yvan.

I was trying the same test but with k-epsilon model and results gives me the same error "exceeded memory limit". So looks like problem not in the turbulence model.

I attached zip archive with my setup files I was running on claster ATHOS usng 3 nodes.

Best regards, Konstantin
Attachments
Kiya.zip
(3.24 MiB) Downloaded 193 times
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Exceeded memory limit

Post by Yvan Fournier »

Hello,

A colleague checked your case and did not find a leak using the "classical" instrumentation, so I'll try with more complete tools and kep you updated.

In any case, your mesh is quite small, so running on 3 nodes seems a lot. 1 or 2 ranks on a single node should be enough for 60000 cells (unless you ran on Athos with a bigger mesh).

Best regards,

Yvan
konst
Posts: 30
Joined: Sun Sep 17, 2017 7:41 pm

Re: Exceeded memory limit

Post by konst »

Hello,

Yvan, thank you for spending time for my case. I have checked this case with a smaller number of processors as you recomend. And anyway I have this error. If there is no memory leaks in there, I have only one guess that this case does not converge at some moment.

Thank you again and bon weekend. :)
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Exceeded memory limit

Post by Yvan Fournier »

Hello,

At least did you get similar or better performance with 3 cores ? A solution which is not too elegant but should at least work is to checkpoint / stop /restart every 100000 iterations or so.

The fact that I did not reproduce the issue on a small number of time steps does not prove there is no leak, as a leak could be in a function called only in some types of regimes. Memory fragmentation increasing at each time step may be a possibility, though I have only once encountered a case where running out of memory was definitely due to this, and it was on another type of architecture.

Do you have the same error on other machines ? A memory leak could be in the cluster's MPI libraries for example. Since Athos is being retired at the end of this month, results on other machines might be more relevant.

To use another type of "external" instrumentation, I am running the case on version 5.3.2 on a laptop, on 2 MPI ranks, and see no evolution in the values of "top" after about 120 iterations... I'll let it run a bit longer. I detected no isssue with gcc's AdressSanitizer, Valgrind's leak-check, or CS_MEM_LOG environment variable, so I am running out of ideas for further tests... (but I am testing under a different Linux distribution, with recent tool versions, though the probability the issue is dependent on this is low, except as regards MPI drivers, which can be capricious on some HPC systems).

Best regards,

Yvan
konst
Posts: 30
Joined: Sun Sep 17, 2017 7:41 pm

Re: Exceeded memory limit

Post by konst »

Yvan, thank you for your help! You are right that was a problem of ATHOS. I were running these case on EOLE and it works really well.
Thanks again.

Bon weekend,
Konstantin
Post Reply