Restart a simulation without checkpoint directory

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
Oscar
Posts: 25
Joined: Tue Aug 02, 2016 11:38 pm

Restart a simulation without checkpoint directory

Post by Oscar »

Hello,

I am running CS v4.0.5.

I want to restart my simulation from the last saved time step, as I am running on an HPC with time limits. Lets assume that the run which I want to resume from is called "init_01" then I would do something like this in DATA/cs_user_scripts.py:

Code: Select all

    if domain.param == None:
        domain.mesh_input = "RESU/init_01/mesh_input"
        domain.partition_input = None
        domain.restart_input = "RESU/init_01/checkpoint"
However I have no directory called "checkpoint" in the init_01 directory. Presumably I have forgotten to specify that I want it somewhere in the source code...

Is there a way to continue the simulation from the last time step that is contained in my init_01 despite this, so that I don't have to start over? What is the best practice for ensuring that you have a restart point in a simulation? I know that I could for instance save the following to a file called control_file in the result directory during runtime, but this seems a bit tedious and may be easy to forget to do...

Code: Select all

checkpoint_wall_time_interval <wall time interval>
Edit: I forgot to mention that I also know of the fact that setting ntsuit > 0 in cs_user_parameters.f90 is the way to go about saving checkpoint files, however I clearly forgot to do this...
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Restart a simulation without checkpoint directory

Post by Yvan Fournier »

Hello,

This is strange. Checkpoints are enabled by default, though they may be missing if the computation was not interrupted cleanly before creating one.

It is simpler to define checkpoint options using the GUI than with user subroutines.

Regards,

Yvan
Oscar
Posts: 25
Joined: Tue Aug 02, 2016 11:38 pm

Re: Restart a simulation without checkpoint directory

Post by Oscar »

Even when I run with ntsuit=1 (which should save a checkpoint at each time step) fails to save a checkpoint directory in my RESU. Is there something else I need to do to to ensure checkpointing is happening? I cannot use the GUI in my case. Please find attached my listing and source files for this case, Can you see where there problem might be from this?
Attachments
src_listing.zip
(65.27 KiB) Downloaded 307 times
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Restart a simulation without checkpoint directory

Post by Yvan Fournier »

Hello,

There might be complex recomputation of ntsuit.

Did you try with a "clean" stop (such as ntmabs = 10) ?

Why can't you use the GUI ? It should be installable on most machines (and the libxml2 for the reader side on all machines).

Regards,

Yvan
Oscar
Posts: 25
Joined: Tue Aug 02, 2016 11:38 pm

Re: Restart a simulation without checkpoint directory

Post by Oscar »

Hi Yvan,

Thanks for your response. I just tried with ntmabs=10 and it is true that there is now a checkpoint file once the calculation finishes.

However ntmabs is the total desired time steps of my simulation and since it is big I will need to perform restarts. Having checkpoints in between is really essential because I want to save close to the time step I'm at when I get kicked out of the cluster. I thought ntsuit would control this save interval?

I cannot run the GUI because it is not installed on the HPC I am using. I also prefer the terminal as I am more accustomed to it.

Do you have any suggestions for what I can do to solve the ntsuit issue?

Kind regards,

Oscar
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: Restart a simulation without checkpoint directory

Post by Yvan Fournier »

Hello,

Did you check the documentation for ntsuit ? If I remember, it might be the number of checkpoints (4 by default).

You can force a checkpoint at any time using the control_file, or use the control_file to set a restart interval in elapsed (user, not simulation) time, which aligns bettet to batch systems.

But in practice, we always try to set ntmabs so as to finish in the allocated time, then restart with an increased ntmabs (a user scripts example allows you to automate this).

Regards,

Yvan
Oscar
Posts: 25
Joined: Tue Aug 02, 2016 11:38 pm

Re: Restart a simulation without checkpoint directory

Post by Oscar »

Hi Yvan,

Yes I have checked the docs for ntsuit - if it is set to > 0 then it is the period of checkpoints, and so I would have expected it to save every time step when I set it to 1.

I guess I will do as you suggest, run a simulation and then increase the ntmabs for the next one!

Kind regards,

Oscar
Post Reply