restart problems with v7

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

restart problems with v7

Post by daniele »

Hello,

Has anyone had problems with restarts in v7?
I cannot manage to restart a simulation. The previous simulation was completed with no errors.
The error looks as a memory issue, could it be due to the environement rather than directly to the code?

Code: Select all

     READING THE MAIN RESTART FILE

 Start reading
 Reading dimensions complete
  Reading the previous time step number (restarting computation)  NTPABS =          100
  Reading the previous time step number (restarting computation)  TTPABS =   0.2000E-01
 Reading options complete
  Read variables from restart: restart/main.csc
 Reading complete

         READING THE AUXILIARY RESTART FILE             



Memory allocation summary
-------------------------

Theoretical current allocated memory:   0 kB
Maximum program memory measure:         1240184 kB
Current program memory measure:         1240184 kB



System error: Cannot allocate memory

../../../src/base/cs_io.c:2167: Fatal error.

Failure to reallocate "inp->buffer" (4611686018427387904 bytes)


Call stack:
   1: 0x2b2c566c5225 <bft_mem_realloc+0x285>          (libsaturne-7.0.so)
   2: 0x2b2c560b702a <cs_io_read_header+0x30a>        (libsaturne-7.0.so)
   3: 0x2b2c560b7821 <cs_io_initialize_with_index+0x181> (libsaturne-7.0.so)
   4: 0x2b2c55fb7aff <cs_restart_create+0x47f>        (libsaturne-7.0.so)
   5: 0x2b2c566a6e73 <__cs_c_bindings_MOD_restart_create+0x1f4> (libsaturne-7.0.so)
   6: 0x2b2c56029f80 <lecamx_+0xf5>                   (libsaturne-7.0.so)
   7: 0x2b2c56028dcd <lecamo_+0x87>                   (libsaturne-7.0.so)
   8: 0x2b2c55ed751c <caltri_+0x107a>                 (libsaturne-7.0.so)
   9: 0x2b2c55c0fe3e <main+0x6ce>                     (libcs_solver-7.0.so)
  10: 0x2b2c5aa44555 <__libc_start_main+0xf5>         (libc.so.6)
  11: 0x402709     <>                               (cs_solver)
End of stack


Thank you very much.
Kind regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: restart problems with v7

Post by Yvan Fournier »

Hello,

This is probably related to a specific option, as we have not encountered this bug so far t my knowledge.

Could you send me a small test case for verification and possibly debugging ?

Best regards,

Yvan
Luciano Garelli
Posts: 280
Joined: Fri Dec 04, 2015 1:42 pm

Re: restart problems with v7

Post by Luciano Garelli »

Hello,

I have faced the same issue as you in one of our clusters with CS 6. I think that the problem is during the writing of the restart files.

The solution that I found was to change the input/output method, from default to serial I/O.
Captura de pantalla de 2021-12-09 09-16-31.png
Regards,
Luciano
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: restart problems with v7

Post by daniele »

Hello Luciano,

I indeed saw (before creating this new subject) your post on the forum related to your restart issues.
I misunderstood the solution you found: I tried to change the read method, and not the write method... Actually I have just done a test changing both methods: the restart seems to work correctly!

Thank you for your help!
Kind regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: restart problems with v7

Post by Yvan Fournier »

Hello,

Do you have more info on the MPI library and file system on which you encountered this issue ?
Also, if you have a small test case on which I could try to reproduce this (in case it is generic and not related to a single system), that would be of interest.

Thanks,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: restart problems with v7

Post by daniele »

Sorry Yvan I missed your last post on this topic, that's why I have not replied before.
I will try to collect the information about our MPI environment and also try to build a small test case (the actual one is too big to be shared).

Thank you (and happy new year!),
Best regards,
Daniele
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: restart problems with v7

Post by daniele »

Hello,

I have collected some information about my system, I am not an expert on these aspects, so if you need further details do not hesitate:


…/modulefiles/tools/openmpi/2.1.1-ucx:

module load ucx/1.5.1
prereq ucx/1.5.1
module-whatis Implementation mpi
conflict openmpi
setenv MPI_ROOT /…/openmpi/2.1.1-ucx
setenv MPIHOME /…/openmpi/2.1.1-ucx
setenv OMPI_MCA_btl_base_warn_component_unused 0
prepend-path PATH /…/openmpi/2.1.1-ucx/bin
prepend-path LD_LIBRARY_PATH /…/openmpi/2.1.1-ucx/lib
prepend-path MANPATH /…/openmpi/2.1.1-ucx/share/man
-------------------------------------------------------------------

$module show ucx/1.5.1
-------------------------------------------------------------------
…/modulefiles/tools/ucx/1.5.1:

module-whatis Support ucx pour openmpi
prepend-path PATH /…/ucx/1.5.1/bin
prepend-path LD_LIBRARY_PATH /…/ucx/1.5.1/lib
-------------------------------------------------------------------


I will try to build a small test case as well.

Thanks.
Kind regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: restart problems with v7

Post by Yvan Fournier »

Hello,

Thanks for the info. In the beginning of your "listing"/run_solver.log file, you also have system info. Could you provide that (editing out if you choose the line with your login name, which I do not need).

Best regards,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: restart problems with v7

Post by daniele »

Hello,

Here are the lines inside run_solver.log:

Code: Select all

Local case configuration:

  Date:                …
  System:              Linux 3.10.0-1127.19.1.el7.x86_64
  Machine:             node112
  Processor:           model name	: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
  Memory:              191887 MB
  User:                ...
  Directory:           …
  MPI ranks:           80 (appnum attribute: 0)
  MPI ranks per node:  40
  OpenMP threads:      1
  Processors/node:     20

  Compilers used for build:
    C compiler:        gcc (GCC) 5.5.0
    C++ compiler:      g++ (GCC) 5.5.0
    Fortran compiler:  GNU Fortran (GCC) 5.5.0

  MPI version 3.1 (Open MPI 2.1.1)

  I/O read method:     standard input and output, serial access
  I/O write method:    standard input and output, serial access
  I/O rank step:        1

  External libraries for partitioning:
    ParMETIS 4.0.3
    SCOTCH 6.1.0
Hope this answers you question.
Kind regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: restart problems with v7

Post by Yvan Fournier »

Hello,

Yes, this is what I wanted to check. OpenMPI 2.1 is quite ancient, though at the time we used it, I did not encounter IO problems (which can depend also on the filesystem configuration, so is difficult to reproduce on another system).

I see that you system repots 20 processors per node, but you are using 40. The reporting/detection might be wrong, but otherwise, I would expect that you get better performance using only as many ranks per node as there are physical processors. Unless perhaps hyperthreading comes into play ? I'm interested in feedback here too.

Best regards,

Yvan
Post Reply