Restart problem

Questions and remarks about Code_Saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
Luciano Garelli
Posts: 210
Joined: Fri Dec 04, 2015 1:42 pm

Restart problem

Post by Luciano Garelli » Tue Aug 13, 2019 8:31 pm

Hello,

I'm having problem to restart a simulation from a previous one. When I select the checkpoint directory I don't get the information about the last time step and physical time. When the new simulation is lauched the freeze after the read mesh process. The previous simulation finished correctly without errors.

The simulation was runned with CS 6-beta and I use the checkpoint_time_step in the control_file to force a specific checkpoint step
restart.png
(28.44 KiB) Not downloaded yet
In the terminal I get the following error
restart_1.png
Is there any way to recover the data?

Regards,

Luciano

Yvan Fournier
Posts: 2663
Joined: Mon Feb 20, 2012 3:25 pm

Re: Restart problem

Post by Yvan Fournier » Wed Aug 14, 2019 12:50 am

Hello Luciano,

Could you run the dump tool on the main checkpoint:

Code: Select all

code_saturne bdump checkpoint/main
and possibly other files to check whether it may have been corrupted or not ?

Are you using MPI-IO ? If yes, you may try to deactivate it.
Another possible cause for this (besides a bug leading to memory corruption)
would be a large file and a build with --disable-long-gnum.

In case another issue is corrupting the memory, running on a debug build might catch an earlier error.

Best regards,

Yvan

Luciano Garelli
Posts: 210
Joined: Fri Dec 04, 2015 1:42 pm

Re: Restart problem

Post by Luciano Garelli » Wed Aug 14, 2019 1:11 pm

Hello Yvan,

Thanks for the info, I didn't know the bdump command. The mesh is not so big, 30M of cell. I'm using by default MPI-IO.

Code: Select all

listing:
I/O read method:     collective MPI-IO (explicit offsets)
I/O write method:    collective MPI-IO (explicit offsets)
I/O rank step:        1
I attach the output after run the command bdump on the main file, I think that the file is corrupted.

Thanks for your help.

UPDATE:
I did a test changing the the input/output to Standard I/O, serial, and now I can do the restart. So I guess that the problem was with the MPI I/O.

Regards,
Luciano

Post Reply