SIGTERM signal in ALE calculation

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
Roxan
Posts: 5
Joined: Tue Mar 01, 2022 3:07 pm

SIGTERM signal in ALE calculation

Post by Roxan »

Dear all,

I made a calculation using the routine "cs_user_boundary_conditions_ale.f90" which allows me to move my mesh.
An error that does not seem to be directly related to my calculation stops the calculation after a while and i can't identify why. The error appears at what seem to be a random moments and I don't notice any particularity at this moment, either in the behaviour of the fluid or the mesh. In addition, the calculation converges appropriately.

Here you will find the different information on the error that stops the calculation :
solver script exited with status 137.

Error running the calculation.

Check code_saturne log (listing) and error* files for details.

Error in calculation stage.
Parallel code_saturne on 12 processes.

Preprocessing calculation
-------------------------
Starting calculation
--------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 11 with PID 21217 on node node113 exited on signal 9 (Killed).
--------------------------------------------------------------------------
Post-calculation operations
---------------------------
SIGTERM signal (termination) received.
--> computation interrupted by environment.

Call stack:
1: 0x2b0157aa98e3 <+0x1578e3> (libopen-pal.so.20)
2: 0x2b015798fb39 <opal_progress+0xb9> (libopen-pal.so.20)
3: 0x2b015457254d <mca_pml_ucx_recv+0xdd> (libmpi.so.20)
4: 0x2b01544adeac <ompi_coll_base_allreduce_intra_recursivedoubling+0x4dc> (libmpi.so.20)
5: 0x2b01544779d3 <PMPI_Allreduce+0x173> (libmpi.so.20)
6: 0x2b01519a71ab <cs_gdot+0x4b> (libsaturne-7.0.so)
7: 0x2b015168feaa <cs_equation_iterative_solve_vector+0xb8a> (libsaturne-7.0.so)
8: 0x2b0151658dd2 <+0xffdd2> (libsaturne-7.0.so)
9: 0x2b0151774404 <navstv_+0x48a2> (libsaturne-7.0.so)
10: 0x2b01517a098a <tridim_+0x370b> (libsaturne-7.0.so)
11: 0x2b0151612d8b <caltri_+0x1c7b> (libsaturne-7.0.so)
12: 0x2b015082cddb <main+0x6eb> (libcs_solver-7.0.so)
13: 0x2b0155fc8555 <__libc_start_main+0xf5> (libc.so.6)
14: 0x401b49 <> (cs_solver)
End of stack

If you have any idea where the problem comes from and how to fix it, that would be very helpful.

Best regards,
Roxan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: SIGTERM signal in ALE calculation

Post by Yvan Fournier »

Hello,

What do the error* files say (see forum recommendations for list of other recommended info) ?

Regards,

Yvan
Roxan
Posts: 5
Joined: Tue Mar 01, 2022 3:07 pm

Re: SIGTERM signal in ALE calculation

Post by Roxan »

Hello,

I only have one error file with the "SIGTERM signal" error and I can't find my problem in the user guide or other forum's topic. The error only appears in the run_solver.log of one processor.

In my output I can see that calculation stop with status 137 but I don't know what that's mean.

Regards,

Roxan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: SIGTERM signal in ALE calculation

Post by Yvan Fournier »

Hello,

SIGTERM means killed by the environment, such as when hitting CTRL+C. It could also happen in some cases ig you run out of allocated time. Here it is surprising.

Are you running on a production build or a buid configured with --enable-debug ?

Best regards,

Yvan
Post Reply