Page 1 of 1

SIGTERM signal in ALE calculation

Posted: Fri Apr 01, 2022 2:30 pm
by Roxan
Dear all,

I made a calculation using the routine "cs_user_boundary_conditions_ale.f90" which allows me to move my mesh.
An error that does not seem to be directly related to my calculation stops the calculation after a while and i can't identify why. The error appears at what seem to be a random moments and I don't notice any particularity at this moment, either in the behaviour of the fluid or the mesh. In addition, the calculation converges appropriately.

Here you will find the different information on the error that stops the calculation :
solver script exited with status 137.

Error running the calculation.

Check code_saturne log (listing) and error* files for details.

Error in calculation stage.
Parallel code_saturne on 12 processes.

Preprocessing calculation
-------------------------
Starting calculation
--------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 11 with PID 21217 on node node113 exited on signal 9 (Killed).
--------------------------------------------------------------------------
Post-calculation operations
---------------------------
SIGTERM signal (termination) received.
--> computation interrupted by environment.

Call stack:
1: 0x2b0157aa98e3 <+0x1578e3> (libopen-pal.so.20)
2: 0x2b015798fb39 <opal_progress+0xb9> (libopen-pal.so.20)
3: 0x2b015457254d <mca_pml_ucx_recv+0xdd> (libmpi.so.20)
4: 0x2b01544adeac <ompi_coll_base_allreduce_intra_recursivedoubling+0x4dc> (libmpi.so.20)
5: 0x2b01544779d3 <PMPI_Allreduce+0x173> (libmpi.so.20)
6: 0x2b01519a71ab <cs_gdot+0x4b> (libsaturne-7.0.so)
7: 0x2b015168feaa <cs_equation_iterative_solve_vector+0xb8a> (libsaturne-7.0.so)
8: 0x2b0151658dd2 <+0xffdd2> (libsaturne-7.0.so)
9: 0x2b0151774404 <navstv_+0x48a2> (libsaturne-7.0.so)
10: 0x2b01517a098a <tridim_+0x370b> (libsaturne-7.0.so)
11: 0x2b0151612d8b <caltri_+0x1c7b> (libsaturne-7.0.so)
12: 0x2b015082cddb <main+0x6eb> (libcs_solver-7.0.so)
13: 0x2b0155fc8555 <__libc_start_main+0xf5> (libc.so.6)
14: 0x401b49 <> (cs_solver)
End of stack

If you have any idea where the problem comes from and how to fix it, that would be very helpful.

Best regards,
Roxan

Re: SIGTERM signal in ALE calculation

Posted: Fri Apr 01, 2022 3:54 pm
by Yvan Fournier
Hello,

What do the error* files say (see forum recommendations for list of other recommended info) ?

Regards,

Yvan

Re: SIGTERM signal in ALE calculation

Posted: Mon Apr 04, 2022 9:54 am
by Roxan
Hello,

I only have one error file with the "SIGTERM signal" error and I can't find my problem in the user guide or other forum's topic. The error only appears in the run_solver.log of one processor.

In my output I can see that calculation stop with status 137 but I don't know what that's mean.

Regards,

Roxan

Re: SIGTERM signal in ALE calculation

Posted: Wed Apr 06, 2022 11:23 am
by Yvan Fournier
Hello,

SIGTERM means killed by the environment, such as when hitting CTRL+C. It could also happen in some cases ig you run out of allocated time. Here it is surprising.

Are you running on a production build or a buid configured with --enable-debug ?

Best regards,

Yvan