Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello
I am trying to do a coupling between code_sturne and syrthes. Burt when running, I have the following error message at the Starting calculation:
Starting calculation
--------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:124583] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:124583] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
solver script exited with status 1.
Error running the coupled calculation.
Either code_saturne or SYRTHES may have failed.
Check code_saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.
Post-calculation operations
---------------------------
Error in calculation stage.
In the fluid error file, I have this message:
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
1: 0x7f2fbe956027 <epoll_wait+0x57> (libc.so.6)
2: 0x7f2fa917ea29 <ucs_event_set_wait+0x99> (libucs.so.0)
3: 0x7f2fa95e23eb <uct_tcp_iface_progress+0x7b> (libuct.so.0)
4: 0x7f2fa983cada <ucp_worker_progress+0x2a> (libucp.so.0)
5: 0x7f2fbdb3bf94 <opal_progress+0x34> (libopen-pal.so.40)
6: 0x7f2fbdb429d5 <ompi_sync_wait_mt+0xb5> (libopen-pal.so.40)
7: 0x7f2fbf095659 <ompi_request_default_wait+0x1e9> (libmpi.so.40)
8: 0x7f2fbf0c6a58 <PMPI_Intercomm_create+0x3a8> (libmpi.so.40)
9: 0x7f2fc126cf0a <ple_coupling_mpi_intracomm_create+0xda> (libple.so.2)
10: 0x7f2fc16266b1 <cs_syr4_coupling_init_comm+0x141> (libsaturne-7.0.so)
11: 0x7f2fc1628d49 <cs_syr_coupling_all_init+0x739> (libsaturne-7.0.so)
12: 0x7f2fc25e853c <main+0x27c> (libcs_solver-7.0.so)
13: 0x7f2fbe860d85 <__libc_start_main+0xe5> (libc.so.6)
14: 0x40094e <_start+0x2e> (cs_solver)
End of stack
Attached are the listing files (for fluid and solid).
Does anyone have an idea how to get through this? Any help would be greatly appreciated.
I am trying to do a coupling between code_sturne and syrthes. Burt when running, I have the following error message at the Starting calculation:
Starting calculation
--------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:124583] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:124583] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
solver script exited with status 1.
Error running the coupled calculation.
Either code_saturne or SYRTHES may have failed.
Check code_saturne log (listing) and SYRTHES log (syrthes.log)
for details, as well as error* files.
Post-calculation operations
---------------------------
Error in calculation stage.
In the fluid error file, I have this message:
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
1: 0x7f2fbe956027 <epoll_wait+0x57> (libc.so.6)
2: 0x7f2fa917ea29 <ucs_event_set_wait+0x99> (libucs.so.0)
3: 0x7f2fa95e23eb <uct_tcp_iface_progress+0x7b> (libuct.so.0)
4: 0x7f2fa983cada <ucp_worker_progress+0x2a> (libucp.so.0)
5: 0x7f2fbdb3bf94 <opal_progress+0x34> (libopen-pal.so.40)
6: 0x7f2fbdb429d5 <ompi_sync_wait_mt+0xb5> (libopen-pal.so.40)
7: 0x7f2fbf095659 <ompi_request_default_wait+0x1e9> (libmpi.so.40)
8: 0x7f2fbf0c6a58 <PMPI_Intercomm_create+0x3a8> (libmpi.so.40)
9: 0x7f2fc126cf0a <ple_coupling_mpi_intracomm_create+0xda> (libple.so.2)
10: 0x7f2fc16266b1 <cs_syr4_coupling_init_comm+0x141> (libsaturne-7.0.so)
11: 0x7f2fc1628d49 <cs_syr_coupling_all_init+0x739> (libsaturne-7.0.so)
12: 0x7f2fc25e853c <main+0x27c> (libcs_solver-7.0.so)
13: 0x7f2fbe860d85 <__libc_start_main+0xe5> (libc.so.6)
14: 0x40094e <_start+0x2e> (cs_solver)
End of stack
Attached are the listing files (for fluid and solid).
Does anyone have an idea how to get through this? Any help would be greatly appreciated.
- Attachments
-
- solid_listing.txt
- (3.11 KiB) Downloaded 909 times
-
- fluid_listing.txt
- (10.23 KiB) Downloaded 806 times
-
- Posts: 4153
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
I have seen a similar issue several time in the past, though I have not used Syrthes recently. If Syrthes fails, logs can be limited... Do you have any other error message in the log ? Could you run the "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522 to see if you get another error log. In some cases, the syrthes data file can contain a hidden command that gets in the way (set by the Syrthes GUI when checking the mesh or something of the sort).
Also make sure you ask for multiple iterations on the Syrthes side.
Best regards,
Yvan
I have seen a similar issue several time in the past, though I have not used Syrthes recently. If Syrthes fails, logs can be limited... Do you have any other error message in the log ? Could you run the "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522 to see if you get another error log. In some cases, the syrthes data file can contain a hidden command that gets in the way (set by the Syrthes GUI when checking the mesh or something of the sort).
Also make sure you ask for multiple iterations on the Syrthes side.
Best regards,
Yvan
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello Yvan!
Thank you for your quick reply!
I have no other error messages in the logs.
Running "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522, it look that "module" is not recognized:
./run_solver: ligne 8: module : commande introuvable
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:135298] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:135298] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
A suggestion?
Thanks!
Kanssoune
Thank you for your quick reply!
I have no other error messages in the logs.
Running "run_solver" script in /data/test_CS/RESU_COUPLING/20221103-1522, it look that "module" is not recognized:
./run_solver: ligne 8: module : commande introuvable
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 SPLIT FROM 0
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hotcell:135298] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[hotcell:135298] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
A suggestion?
Thanks!
Kanssoune
-
- Posts: 4153
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
Can you post your "run_solver" script ?
Best regards,
Yvan
Can you post your "run_solver" script ?
Best regards,
Yvan
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
Sorry, I was sick and today I read your message.
The script is attached.
Best regards,
Kanssoune
Sorry, I was sick and today I read your message.
The script is attached.
Best regards,
Kanssoune
- Attachments
-
- run_solver.txt
- (1001 Bytes) Downloaded 807 times
-
- Posts: 4153
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
Checking how the "run_solver" script is generated, and looking at your script, I do not believe the warning message "module not found" can be ignored.
So I still have the impression the solid domain is causing the issue. Could you post the solid setup, except for the large files (mesh, ...) ?
Best regards,
Yvan Fournier
Checking how the "run_solver" script is generated, and looking at your script, I do not believe the warning message "module not found" can be ignored.
So I still have the impression the solid domain is causing the issue. Could you post the solid setup, except for the large files (mesh, ...) ?
Best regards,
Yvan Fournier
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
Thank for your reply.
Attached some files. If I need to post a specific file in addition, do not hesitate.
Best regards,
Kanssoune
Thank for your reply.
Attached some files. If I need to post a specific file in addition, do not hesitate.
Best regards,
Kanssoune
- Attachments
-
- mesh_solide.syr_desc.txt
- (77 Bytes) Downloaded 786 times
-
- mesh_solide.syr.txt
- (3.1 MiB) Downloaded 814 times
-
- solid.syd.txt
- (3.13 KiB) Downloaded 807 times
-
- Posts: 4153
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
Yould you also post your Syrthes installation setup file (setup.ini) and the syrthes.profile (in the "bin" directory of the Syrthes install), as well as the "config.log" file from the code_saturne installation ?
Seeing how early the issue happpens, I wonder if you do not have an MPI library mismatch, and those will help me check.
Best regards,
Yvan
Yould you also post your Syrthes installation setup file (setup.ini) and the syrthes.profile (in the "bin" directory of the Syrthes install), as well as the "config.log" file from the code_saturne installation ?
Seeing how early the issue happpens, I wonder if you do not have an MPI library mismatch, and those will help me check.
Best regards,
Yvan
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
The requested files are attached.
Thanks for your help.
Best regards,
Kanssoune
The requested files are attached.
Thanks for your help.
Best regards,
Kanssoune
- Attachments
-
- syrthes.profile.txt
- (3.2 KiB) Downloaded 816 times
-
- setup.ini
- (4.31 KiB) Downloaded 805 times
-
- config.log
- (199.52 KiB) Downloaded 795 times
-
- Posts: 4153
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Coupling Code_Saturne/Syrthes: SIGTERM signal (termination) received
Hello,
The MPI version seems fine, so I do not see any version mismatch which could explain the issues here.
So back to the beginning....
If your test case is not too large, you can post it or send it to me so that I can see if I reproduce the issue.
Regards,
Yvan
The MPI version seems fine, so I do not see any version mismatch which could explain the issues here.
So back to the beginning....
If your test case is not too large, you can post it or send it to me so that I can see if I reproduce the issue.
Regards,
Yvan