Page 1 of 2

Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Thu Nov 03, 2022 3:36 pm
by Kanssoune
Hello
I am trying to do a coupling between code_sturne and syrthes. Burt when running, I have the following error message at the Starting calculation:

Starting calculation
--------------------

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:124583] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:124583] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
solver script exited with status 1.

Error running the coupled calculation.

Either code_saturne or SYRTHES may have failed.

Check code_saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.

Post-calculation operations
---------------------------

Error in calculation stage.

In the fluid error file, I have this message:

SIGTERM signal (termination) received.
--> computation interrupted by environment.

Call stack:
1: 0x7f2fbe956027 <epoll_wait+0x57> (libc.so.6)
2: 0x7f2fa917ea29 <ucs_event_set_wait+0x99> (libucs.so.0)
3: 0x7f2fa95e23eb <uct_tcp_iface_progress+0x7b> (libuct.so.0)
4: 0x7f2fa983cada <ucp_worker_progress+0x2a> (libucp.so.0)
5: 0x7f2fbdb3bf94 <opal_progress+0x34> (libopen-pal.so.40)
6: 0x7f2fbdb429d5 <ompi_sync_wait_mt+0xb5> (libopen-pal.so.40)
7: 0x7f2fbf095659 <ompi_request_default_wait+0x1e9> (libmpi.so.40)
8: 0x7f2fbf0c6a58 <PMPI_Intercomm_create+0x3a8> (libmpi.so.40)
9: 0x7f2fc126cf0a <ple_coupling_mpi_intracomm_create+0xda> (libple.so.2)
10: 0x7f2fc16266b1 <cs_syr4_coupling_init_comm+0x141> (libsaturne-7.0.so)
11: 0x7f2fc1628d49 <cs_syr_coupling_all_init+0x739> (libsaturne-7.0.so)
12: 0x7f2fc25e853c <main+0x27c> (libcs_solver-7.0.so)
13: 0x7f2fbe860d85 <__libc_start_main+0xe5> (libc.so.6)
14: 0x40094e <_start+0x2e> (cs_solver)
End of stack
Attached are the listing files (for fluid and solid).

Does anyone have an idea how to get through this? Any help would be greatly appreciated.

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Thu Nov 03, 2022 9:56 pm
by Yvan Fournier
Hello,

I have seen a similar issue several time in the past, though I have not used Syrthes recently. If Syrthes fails, logs can be limited... Do you have any other error message in the log ? Could you run the "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522 to see if you get another error log. In some cases, the syrthes data file can contain a hidden command that gets in the way (set by the Syrthes GUI when checking the mesh or something of the sort).

Also make sure you ask for multiple iterations on the Syrthes side.

Best regards,

Yvan

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Fri Nov 04, 2022 9:30 am
by Kanssoune
Hello Yvan!

Thank you for your quick reply!

I have no other error messages in the logs.

Running "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522, it look that "module" is not recognized:

./run_solver: ligne 8: module : commande introuvable
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:135298] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:135298] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

A suggestion?
Thanks!

Kanssoune

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Sun Nov 06, 2022 1:36 am
by Yvan Fournier
Hello,

Can you post your "run_solver" script ?

Best regards,

Yvan

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Wed Nov 16, 2022 9:03 am
by Kanssoune
Hello,

Sorry, I was sick and today I read your message.

The script is attached.

Best regards,

Kanssoune

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Mon Nov 21, 2022 3:40 pm
by Yvan Fournier
Hello,

Checking how the "run_solver" script is generated, and looking at your script, I do not believe the warning message "module not found" can be ignored.

So I still have the impression the solid domain is causing the issue. Could you post the solid setup, except for the large files (mesh, ...) ?

Best regards,

Yvan Fournier

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Tue Nov 22, 2022 11:27 am
by Kanssoune
Hello,

Thank for your reply.

Attached some files. If I need to post a specific file in addition, do not hesitate.

Best regards,
Kanssoune

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Tue Nov 22, 2022 8:01 pm
by Yvan Fournier
Hello,

Yould you also post your Syrthes installation setup file (setup.ini) and the syrthes.profile (in the "bin" directory of the Syrthes install), as well as the "config.log" file from the code_saturne installation ?

Seeing how early the issue happpens, I wonder if you do not have an MPI library mismatch, and those will help me check.

Best regards,

Yvan

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Wed Nov 23, 2022 4:45 pm
by Kanssoune
Hello,

The requested files are attached.

Thanks for your help.

Best regards,

Kanssoune

Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received

Posted: Thu Nov 24, 2022 4:10 pm
by Yvan Fournier
Hello,

The MPI version seems fine, so I do not see any version mismatch which could explain the issues here.

So back to the beginning....

If your test case is not too large, you can post it or send it to me so that I can see if I reproduce the issue.

Regards,

Yvan