Error with dynamic Smargonisky-Lilly model

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
AndrewH
Posts: 47
Joined: Thu Oct 02, 2014 11:03 am

Error with dynamic Smargonisky-Lilly model

Post by AndrewH »

Hello,

I have been receiving the following error while running several simulations with the Dynamic Smargonisky-Lilly model:

SIGFPE signal (floating point exception) intercepted!

Call stack:
1: 0x2aaaab04cf93 <cs_les_filter+0x1223> (libsaturne.so.0)
2: 0x2aaaab07c04a <visdyn_+0x1116> (libsaturne.so.0)
3: 0x2aaaaae48654 <phyvar_+0xb28> (libsaturne.so.0)
4: 0x2aaaaae6cb4e <tridim_+0x1032> (libsaturne.so.0)
5: 0x2aaaaad55efc <caltri_+0x2dbc> (libsaturne.so.0)
6: 0x2aaaaad387b2 <cs_run+0x3f2> (libsaturne.so.0)
7: 0x2aaaaad38241 <main+0x111> (libsaturne.so.0)
8: 0x2aaaae9bcc36 <__libc_start_main+0xe6> (libc.so.6)
9: 0x403989 <> (cs_solver)
End of stack

The error seems to occur only when Code_Saturne saves a checkpoint. There doesn't appear to be anything wrong with my setup and I have no problems when using the WALE model. Looking at the call stack, it appears that the filter function is causing the problem, but I can't find an obvious source in the function. Is this a known problem? Attached is my listing file if it proves useful.

Thank you,
Andrew
Attachments
listing file.zip
listing file
(144.72 KiB) Downloaded 463 times
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Error with dynamic Smargonisky-Lilly model

Post by Yvan Fournier »

Hello,

Did you change some options between the original and restarted computation ? It seems the number of required sweeps is different, so either it may be due to a bug, or to a non-recommended change in your settings.

Also, according to the backtrace the error does not occur during checkpointing itself (though if it only occurs when checkpointing, there may be a memory overwrite issue; it if only occurs when reading a restart, it may be some additional data shouls be read for a "smooth" restart.

Do you have any comparisons with a similar number of time steps when not restarting ? For example, simply checking the evolution of field minima/maxima in the 10-20 iterations before and after restart may help see what is wrong here.

Best regards,

Yvan
AndrewH
Posts: 47
Joined: Thu Oct 02, 2014 11:03 am

Re: Error with dynamic Smargonisky-Lilly model

Post by AndrewH »

Hi Yvan,

I change nothing inbetween my original computation and my restart computation. I'm also getting the same problem with three different cases/meshes with different setup. If I run my case without any intermediary checkpoint save, I will still get an error when it saves the final checkpoint at the end and the computation has run 30474 time iterations.

I will try make a spreadsheet of max/min before and after the restart.

Thank you,
Andrew
AndrewH
Posts: 47
Joined: Thu Oct 02, 2014 11:03 am

Re: Error with dynamic Smargonisky-Lilly model

Post by AndrewH »

20150910-1519 -> no intermediary save points, but crashes when the final checkpoint is saved, 30474 time iterations

20150919-1254 -> checkpoint save every 500 iterations, crashes during the checkpoint save

20150924-1110 - > restart of 20150924-1110

For 20150919-1254, there was no statistic outputted for iteration 34634. Included excel sheet is the max/min/avg for velocity and pressure for each iteration. There doesn't appear to be any major jump in stats between restart.

Thank you,
Andrew
Attachments
files.zip
files
(9.74 MiB) Downloaded 412 times
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Error with dynamic Smargonisky-Lilly model

Post by Yvan Fournier »

Hello,

According to your logs, the crash seems to occur when computing the LES filter in the dynamic viscosity computation, due to a floating-point exception.

Which compiler and build options did you use (if Charles installed the code on the Cray, I can check with him) ?

Floating-point exceptions should be caught in debug builds, not in production builds, though our tests may not be perfect. You might have a real crash, or simply an underflow or even branch prediction optimizer issue (ie. when doing something like "if (abs(x) > epsilon, compute a/x, else compute something else", the compiler computes both branches to save time, and the unused branch causes the crash). The latter would explain that the code is doing fine and suddenly crashes, and making sure floating-point exceptions are not trapped should suffice.
So assuming there is no "worse" bug, the compiler and build options are the info I need to provide a a patch to try.

Regards,

Yvan
AndrewH
Posts: 47
Joined: Thu Oct 02, 2014 11:03 am

Re: Error with dynamic Smargonisky-Lilly model

Post by AndrewH »

Hello Yvan,

I built my version of Code_Saturne (v4.0.1) with the default gcc-4.9.2 compiler that is available on ARCHER (PrgEnv-gnu module) with cray-tpsl/1.4.1. I compiled my own libraries for Python, Libxml2, HDF5, and CGNS, which were used to compile Code_Saturne with. Additionally, I used the following options: --disable-rpath --disable-dlloader --disable-sockets --disable-mei --disable-openmp --with-metis --with-scotch --with-mpi --enable-long-gnum --without-salome --disable-gui --with-modules=no CC=cc FC=gfortran CXX=CC.

Thank you,
Andrew
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Error with dynamic Smargonisky-Lilly model

Post by Yvan Fournier »

Hello Andrew,

Hoping as I explained before that the issue is simply a speculation-oriented issue, could you try adding the attached file to your user subroutines ?

Regards,

Yvan
Attachments
cs_fp_exception.c
(5.24 KiB) Downloaded 447 times
AndrewH
Posts: 47
Joined: Thu Oct 02, 2014 11:03 am

Re: Error with dynamic Smargonisky-Lilly model

Post by AndrewH »

Hi Yvan,

I ran my simulation several times with the fix and it hasn't crashed yet. I'll let you know if this changes.

Thank you,
Andrew
Post Reply