Page 1 of 1

Restart problem

Posted: Tue Aug 13, 2019 8:31 pm
by Luciano Garelli
Hello,

I'm having problem to restart a simulation from a previous one. When I select the checkpoint directory I don't get the information about the last time step and physical time. When the new simulation is lauched the freeze after the read mesh process. The previous simulation finished correctly without errors.

The simulation was runned with CS 6-beta and I use the checkpoint_time_step in the control_file to force a specific checkpoint step
restart.png
(28.44 KiB) Not downloaded yet
In the terminal I get the following error
restart_1.png
Is there any way to recover the data?

Regards,

Luciano

Re: Restart problem

Posted: Wed Aug 14, 2019 12:50 am
by Yvan Fournier
Hello Luciano,

Could you run the dump tool on the main checkpoint:

Code: Select all

code_saturne bdump checkpoint/main
and possibly other files to check whether it may have been corrupted or not ?

Are you using MPI-IO ? If yes, you may try to deactivate it.
Another possible cause for this (besides a bug leading to memory corruption)
would be a large file and a build with --disable-long-gnum.

In case another issue is corrupting the memory, running on a debug build might catch an earlier error.

Best regards,

Yvan

Re: Restart problem

Posted: Wed Aug 14, 2019 1:11 pm
by Luciano Garelli
Hello Yvan,

Thanks for the info, I didn't know the bdump command. The mesh is not so big, 30M of cell. I'm using by default MPI-IO.

Code: Select all

listing:
I/O read method:     collective MPI-IO (explicit offsets)
I/O write method:    collective MPI-IO (explicit offsets)
I/O rank step:        1
I attach the output after run the command bdump on the main file, I think that the file is corrupted.

Thanks for your help.

UPDATE:
I did a test changing the the input/output to Standard I/O, serial, and now I can do the restart. So I guess that the problem was with the MPI I/O.

Regards,
Luciano