Hello,
I'm having problem to restart a simulation from a previous one. When I select the checkpoint directory I don't get the information about the last time step and physical time. When the new simulation is lauched the freeze after the read mesh process. The previous simulation finished correctly without errors.
The simulation was runned with CS 6-beta and I use the checkpoint_time_step in the control_file to force a specific checkpoint step
In the terminal I get the following error
Is there any way to recover the data?
Regards,
Luciano
Restart problem
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4078
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Restart problem
Hello Luciano,
Could you run the dump tool on the main checkpoint:
and possibly other files to check whether it may have been corrupted or not ?
Are you using MPI-IO ? If yes, you may try to deactivate it.
Another possible cause for this (besides a bug leading to memory corruption)
would be a large file and a build with --disable-long-gnum.
In case another issue is corrupting the memory, running on a debug build might catch an earlier error.
Best regards,
Yvan
Could you run the dump tool on the main checkpoint:
Code: Select all
code_saturne bdump checkpoint/main
Are you using MPI-IO ? If yes, you may try to deactivate it.
Another possible cause for this (besides a bug leading to memory corruption)
would be a large file and a build with --disable-long-gnum.
In case another issue is corrupting the memory, running on a debug build might catch an earlier error.
Best regards,
Yvan
-
- Posts: 284
- Joined: Fri Dec 04, 2015 1:42 pm
Re: Restart problem
Hello Yvan,
Thanks for the info, I didn't know the bdump command. The mesh is not so big, 30M of cell. I'm using by default MPI-IO.
I attach the output after run the command bdump on the main file, I think that the file is corrupted.
Thanks for your help.
UPDATE:
I did a test changing the the input/output to Standard I/O, serial, and now I can do the restart. So I guess that the problem was with the MPI I/O.
Regards,
Luciano
Thanks for the info, I didn't know the bdump command. The mesh is not so big, 30M of cell. I'm using by default MPI-IO.
Code: Select all
listing:
I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1
Thanks for your help.
UPDATE:
I did a test changing the the input/output to Standard I/O, serial, and now I can do the restart. So I guess that the problem was with the MPI I/O.
Regards,
Luciano