Page 1 of 1
Error with dynamic Smargonisky-Lilly model
Posted: Wed Sep 23, 2015 6:04 pm
by AndrewH
Hello,
I have been receiving the following error while running several simulations with the Dynamic Smargonisky-Lilly model:
SIGFPE signal (floating point exception) intercepted!
Call stack:
1: 0x2aaaab04cf93 <cs_les_filter+0x1223> (libsaturne.so.0)
2: 0x2aaaab07c04a <visdyn_+0x1116> (libsaturne.so.0)
3: 0x2aaaaae48654 <phyvar_+0xb28> (libsaturne.so.0)
4: 0x2aaaaae6cb4e <tridim_+0x1032> (libsaturne.so.0)
5: 0x2aaaaad55efc <caltri_+0x2dbc> (libsaturne.so.0)
6: 0x2aaaaad387b2 <cs_run+0x3f2> (libsaturne.so.0)
7: 0x2aaaaad38241 <main+0x111> (libsaturne.so.0)
8: 0x2aaaae9bcc36 <__libc_start_main+0xe6> (libc.so.6)
9: 0x403989 <> (cs_solver)
End of stack
The error seems to occur only when Code_Saturne saves a checkpoint. There doesn't appear to be anything wrong with my setup and I have no problems when using the WALE model. Looking at the call stack, it appears that the filter function is causing the problem, but I can't find an obvious source in the function. Is this a known problem? Attached is my listing file if it proves useful.
Thank you,
Andrew
Re: Error with dynamic Smargonisky-Lilly model
Posted: Wed Sep 23, 2015 10:14 pm
by Yvan Fournier
Hello,
Did you change some options between the original and restarted computation ? It seems the number of required sweeps is different, so either it may be due to a bug, or to a non-recommended change in your settings.
Also, according to the backtrace the error does not occur during checkpointing itself (though if it only occurs when checkpointing, there may be a memory overwrite issue; it if only occurs when reading a restart, it may be some additional data shouls be read for a "smooth" restart.
Do you have any comparisons with a similar number of time steps when not restarting ? For example, simply checking the evolution of field minima/maxima in the 10-20 iterations before and after restart may help see what is wrong here.
Best regards,
Yvan
Re: Error with dynamic Smargonisky-Lilly model
Posted: Thu Sep 24, 2015 11:25 am
by AndrewH
Hi Yvan,
I change nothing inbetween my original computation and my restart computation. I'm also getting the same problem with three different cases/meshes with different setup. If I run my case without any intermediary checkpoint save, I will still get an error when it saves the final checkpoint at the end and the computation has run 30474 time iterations.
I will try make a spreadsheet of max/min before and after the restart.
Thank you,
Andrew
Re: Error with dynamic Smargonisky-Lilly model
Posted: Thu Sep 24, 2015 12:36 pm
by AndrewH
20150910-1519 -> no intermediary save points, but crashes when the final checkpoint is saved, 30474 time iterations
20150919-1254 -> checkpoint save every 500 iterations, crashes during the checkpoint save
20150924-1110 - > restart of 20150924-1110
For 20150919-1254, there was no statistic outputted for iteration 34634. Included excel sheet is the max/min/avg for velocity and pressure for each iteration. There doesn't appear to be any major jump in stats between restart.
Thank you,
Andrew
Re: Error with dynamic Smargonisky-Lilly model
Posted: Sun Sep 27, 2015 12:53 pm
by Yvan Fournier
Hello,
According to your logs, the crash seems to occur when computing the LES filter in the dynamic viscosity computation, due to a floating-point exception.
Which compiler and build options did you use (if Charles installed the code on the Cray, I can check with him) ?
Floating-point exceptions should be caught in debug builds, not in production builds, though our tests may not be perfect. You might have a real crash, or simply an underflow or even branch prediction optimizer issue (ie. when doing something like "if (abs(x) > epsilon, compute a/x, else compute something else", the compiler computes both branches to save time, and the unused branch causes the crash). The latter would explain that the code is doing fine and suddenly crashes, and making sure floating-point exceptions are not trapped should suffice.
So assuming there is no "worse" bug, the compiler and build options are the info I need to provide a a patch to try.
Regards,
Yvan
Re: Error with dynamic Smargonisky-Lilly model
Posted: Sun Sep 27, 2015 4:17 pm
by AndrewH
Hello Yvan,
I built my version of Code_Saturne (v4.0.1) with the default gcc-4.9.2 compiler that is available on ARCHER (PrgEnv-gnu module) with cray-tpsl/1.4.1. I compiled my own libraries for Python, Libxml2, HDF5, and CGNS, which were used to compile Code_Saturne with. Additionally, I used the following options: --disable-rpath --disable-dlloader --disable-sockets --disable-mei --disable-openmp --with-metis --with-scotch --with-mpi --enable-long-gnum --without-salome --disable-gui --with-modules=no CC=cc FC=gfortran CXX=CC.
Thank you,
Andrew
Re: Error with dynamic Smargonisky-Lilly model
Posted: Sun Sep 27, 2015 11:38 pm
by Yvan Fournier
Hello Andrew,
Hoping as I explained before that the issue is simply a speculation-oriented issue, could you try adding the attached file to your user subroutines ?
Regards,
Yvan
Re: Error with dynamic Smargonisky-Lilly model
Posted: Fri Oct 02, 2015 5:11 pm
by AndrewH
Hi Yvan,
I ran my simulation several times with the fix and it hasn't crashed yet. I'll let you know if this changes.
Thank you,
Andrew