MPI_ABORT error in code-code coupling

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
Gaspard_Monge
Posts: 1
Joined: Tue Aug 16, 2022 2:50 pm

MPI_ABORT error in code-code coupling

Post by Gaspard_Monge »

Dear Code Saturne Team
I'm currently working on coupling two fluid domains in Saturne. There are two cartesians meshes embedded.One domain has coupled faces and the other has coupled cells. These are defined in the SRC/ directory through the cs_sat_coupling_define function. After a few successful runs, I started to get MPI_ABORT only in the case when I have coupled faces not NULL (for the domain2, (files)). The listing seems to indicate that one of the field has an exceeding value or diverging value. I tried to manage the issue with de dbg version but I didn't find out the right option regarding MPI_ABORT issues. Also I checked that I didn't put incoherent values in my log files and I have not modified the routines related to the coupling by faces.(cscfbr.f90,cscpfbr.f90,csc2cl.f90)
Do you know a way to adress MPI_ABORT type error with the dbg version in order to have more information on the coupling issue?
My Warmest Regards,
M.Monge
Attachments
domain2.png
domain1error.png
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: MPI_ABORT error in code-code coupling

Post by Yvan Fournier »

Hello,

Not only using the debug build but also running it under a debugger (see adavanced launch settings in the GUI) will be best here. If you are not familiar with gdb, see here for a few basics: https://www.code-saturne.org/documentat ... gging.html. You will have one gdb window per MPI process, so at first, using 1 process per compute domain is simpler, even if the computation is a bit slower.

The first thing you could do when running under a coupled (or parallel) case debugger would be to set a breakpoint on MPI_Abort. This is actually done by default in the code_saturne debugger wrapper since v7.1 or 7.2, but was not the case in prior versions.

In your case, the logs already indicate that you have a "runaway" computation, meaning the computation seems to diverge, and that is why it is stopped. In this case, the debugger might not be so useful, because the point where the problem is detected is at a check at the end of a time step, not at the source of the issue.

For runaway computations, you should still be able to visualize the computed results before the stop, so this may help you analyze whet might be wrong in your setup (coupling/boundary conditions).

Best regards,

Yvan
Post Reply