ALE implicit coupling error with EBRSM

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

ALE implicit coupling error with EBRSM

Post by daniele »

Hello,

I am facing always the same error as soon as I activate the ALE implicit coupling with a EBRSM simulation:

Code: Select all

Call stack:
   1: 0x2ac8d66da748 <strdep_+0xdd3>                  (libsaturne-6.0.so)
   2: 0x2ac8d66e17af <tridim_+0x40df>                 (libsaturne-6.0.so)
   3: 0x2ac8d655ffd2 <caltri_+0x20f2>                 (libsaturne-6.0.so)
   4: 0x2ac8d629d835 <cs_run+0x5f5>                   (libcs_solver-6.0.so)
   5: 0x2ac8d629d105 <main+0x175>                     (libcs_solver-6.0.so)
   6: 0x2ac8dabf8b35 <__libc_start_main+0xf5>         (libc.so.6)
   7: 0x4016c9     <>                               (cs_solver)
End of stack
The issue seems to occur inside the strdep.f90 routine, which calculates the structure displacement.
The same ALE simulation runs fine with other turbulence models.
The error appears after a few (10 more or less) time steps after activating the ALE implicit coupling module (following a fluid initialization without FSI). I do not see inconsistent values in terms of mesh displacement etc before the error.
Could it be an incomptability between the EBRSM and the implicit coupling?

Thank you very much in advance.
Best regards,
Daniele
Yvan Fournier
Posts: 4077
Joined: Mon Feb 20, 2012 3:25 pm

Re: ALE implicit coupling error with EBRSM

Post by Yvan Fournier »

Hello

I am not aware of any known incompatibility between ALE and EBRSM.

What type of crash do you have (i.e. in the end of the "listing"or in the error_* files). A floating-point exception, a memory error (SIGSEV), something else ?

Do you have a small test case you could share on which we could reproduce/debug this ?

Best regards,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: ALE implicit coupling error with EBRSM

Post by daniele »

Hello Yvan,

Thank you for your answer.
I copied the error in my previous post, but maybe it was not uploaded...
I copy it here again:

Call stack:
1: 0x2ac8d66da748 <strdep_+0xdd3> (libsaturne-6.0.so)
2: 0x2ac8d66e17af <tridim_+0x40df> (libsaturne-6.0.so)
3: 0x2ac8d655ffd2 <caltri_+0x20f2> (libsaturne-6.0.so)
4: 0x2ac8d629d835 <cs_run+0x5f5> (libcs_solver-6.0.so)
5: 0x2ac8d629d105 <main+0x175> (libcs_solver-6.0.so)
6: 0x2ac8dabf8b35 <__libc_start_main+0xf5> (libc.so.6)
7: 0x4016c9 <> (cs_solver)
End of stack

It seems the the code gets stuck inside the strdep.f90 routine.
I do not have a small test case right now. I can try to create one and test it first by my own.
But since I have the error only with the EBRSM, I was wondering if there could be a specific variable (for calculating the force for example) specific to the EBRSM that could cause the issue. It is not a very detailed analysis I know...

Thank you.
Best regards,
Daniele
Yvan Fournier
Posts: 4077
Joined: Mon Feb 20, 2012 3:25 pm

Re: ALE implicit coupling error with EBRSM

Post by Yvan Fournier »

Hello,

Yes, I saw the call stack, but do you have any associated error message ? Possibly in a n error_r* file ?

I do not think there is anything specific to EBRSM, so I would suspect a memory overwrite error somewhere. Or possibly a name collision. The rest of the run_solver.log may help here.

Best regards,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: ALE implicit coupling error with EBRSM

Post by daniele »

Sorry I didn't get what you meant.
The error file contains exactly the call stack details shown in my previous post.
I had a look inside the run_solver.log and actually found something more:

Code: Select all

SIGTERM signal (termination) received.
--> computation interrupted by environment.

Call stack:
   1: 0x2ab30488c983 <+0x156983>                      (libopen-pal.so.20)
   2: 0x2ab304773989 <opal_progress+0xb9>             (libopen-pal.so.20)
   3: 0x2ab303133435 <mca_pml_ob1_recv+0xf5>          (libmpi.so.20)
   4: 0x2ab30307e7ec <ompi_coll_base_allreduce_intra_recursivedoubling+0x4dc> (libmpi.so.20)
   5: 0x2ab303048313 <PMPI_Allreduce+0x173>           (libmpi.so.20)
   6: 0x2ab2ffd00172 <+0x3d4172>                      (libsaturne-6.0.so)
   7: 0x2ab2ffd08120 <cs_sles_it_solve+0x180>         (libsaturne-6.0.so)
   8: 0x2ab2ffced70c <cs_multigrid_solve+0xebc>       (libsaturne-6.0.so)
   9: 0x2ab2ffcee339 <+0x3c2339>                      (libsaturne-6.0.so)
  10: 0x2ab2ffd026a4 <+0x3d66a4>                      (libsaturne-6.0.so)
  11: 0x2ab2ffd08120 <cs_sles_it_solve+0x180>         (libsaturne-6.0.so)
  12: 0x2ab2ffcf371a <cs_sles_solve+0x2aa>            (libsaturne-6.0.so)
  13: 0x2ab2ffcf4bec <cs_sles_solve_native+0x3fc>     (libsaturne-6.0.so)
  14: 0x2ab300140d0b <__cs_c_bindings_MOD_sles_solve_native+0x19b> (libsaturne-6.0.so)
  15: 0x2ab2ffb62495 <resopv_+0xbfbe>                 (libsaturne-6.0.so)
  16: 0x2ab2ffb3dddc <navstv_+0x4e91>                 (libsaturne-6.0.so)
  17: 0x2ab2ffb6e1e0 <tridim_+0x3b10>                 (libsaturne-6.0.so)
  18: 0x2ab2ff9ecfd2 <caltri_+0x20f2>                 (libsaturne-6.0.so)
  19: 0x2ab2ff72a835 <cs_run+0x5f5>                   (libcs_solver-6.0.so)
  20: 0x2ab2ff72a105 <main+0x175>                     (libcs_solver-6.0.so)
  21: 0x2ab304085b35 <__libc_start_main+0xf5>         (libc.so.6)
  22: 0x401729     <>                               (cs_solver)
End of stack
Thank you very much.
Best regards,
Daniele
Yvan Fournier
Posts: 4077
Joined: Mon Feb 20, 2012 3:25 pm

Re: ALE implicit coupling error with EBRSM

Post by Yvan Fournier »

Hello,

What I mean is "are there any other files named "error_r*" matching error files for ranks other than 0 ?" When they are present, they contain the more useful information.

If you have no such files, then you might have some error message in the terminal from which the code was run, or in cas of a batch system, in the job log and error files (this happens for example when the Fortran runtime detects an error).

Best regards,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: ALE implicit coupling error with EBRSM

Post by daniele »

Hello,

Unfortunately there is no error file created beside the one describing exactly the same error found in the listing.
The output file of the batch job gives the following error message (I have no error file created):

Code: Select all

**********************
  Starting calculation
 **********************

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 solver script exited with status 1.

Error running the calculation.

Check Code_Saturne log (listing) and error* files for details.


 *****************************
  Post-calculation operations
 *****************************

 Error in calculation stage.
It is probably not enough to help understanding... :(

Thank you.
Best regards,
Daniele
Yvan Fournier
Posts: 4077
Joined: Mon Feb 20, 2012 3:25 pm

Re: ALE implicit coupling error with EBRSM

Post by Yvan Fournier »

Hello,

If the case is not too large, running it in serial mode may help provide information about the crash.
Otherwise, a small test case would allow debugging.

Best regards,

Yvan
Post Reply