Restart under python runcase

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
James McNaughton

Restart under python runcase

Post by James McNaughton »

If using the python runcase for two instances of saturne how do I go about setting up a calculation restart?
I'm used to specifying a path in the runcase file, could I add this within the d1() and d2() parts?
 
Thanks,
James
Yvan Fournier

Re: Restart under python runcase

Post by Yvan Fournier »

Hello,
The only method possible with the new runcase is the "other" method which was already possible before:
copy the RESU/RESTART.* directory to DATA/RESTART (or create a symbolic link to it). This is actually how the GUI works, and how most of us work (or so I believe).
Best regards,
  Yvan
James McNaughton

Re: Restart under python runcase

Post by James McNaughton »

Thanks Yvan, I am very grateful for your help.
I also have a few more questions about the new runcase for running two instances that I'd be appreciate advice with.
There is no choice to increase the number of processors as previously. However, I am running only one simulation (of two instances) and "top" shows me that there are 8 jobs of cs_solver running. Is something clever going on here?
I also tried running a two instace job on the Uni of Manchester cluster and the job submits fine but just sits there. The tmp_Saturne directory has directories 1/ and 2/ but 1/ contains "compile.log" and "src_saturne" only whilst 2/ is empty. Is there something else I should do to run off a cluster with two instances or is this a problem with the cluster that can be sorted out?
 
Thanks again,
James
Yvan Fournier

Re: Restart under python runcase

Post by Yvan Fournier »

Hello James,

There is no choice to increase the number of processors as previously. However, I am running only one simulation (of two instances) and "top" shows me that there are 8 jobs of cs_solver running. Is something clever going on here?

Yes, there is a choice: the global variables you are interested in are the n_procs, n_procs_min, and n_procs_max arguments to the domain creation. When running under a batch system, this works in conjunction with the number of available processors detected.
For example, if you are running on 8 procs (through a batch job), and have n_procs = 1 for domain1, and n_procs = 2 for domain_2, domain_1 will try to use 8 *n_procs_1 / (n_procs_1 + n_procs_2) processors, and domain_2 will use 8 *n_procs_2 / (n_procs_1 + n_procs_2).
The real computation is a bit more complex, as we check for rounding, and readjust things if necessary, and the n_procs_min and n_procs_max domain allow you to define a "hard" limit, not adjusted by the number of available processors. The idea behing this is that you only need to adjust the global number of processors, and the n_procs ratios are kept, but you can still have fine control when you need it.
When not running under a batch system, if you want to use 2 procs for each domain for example, assign n_procs=2 to each domain.
Hoping this explaination was not too convoluted.

I also tried running a two instace job on the Uni of Manchester cluster and the job submits fine but just sits there. The tmp_Saturne directory has directories 1/ and 2/ but 1/ contains "compile.log" and "src_saturne" only whilst 2/ is empty. Is there something else I should do to run off a cluster with two instances or is this a problem with the cluster that can be sorted out?

 This might be normal if domain 1 has user subroutines but domain 2 does not. If both have user subroutines (as should be the case if you are not using the GUI), you may want to check the case's SRC.2 subdirectory. If it is empty, you have a user error, otherwise there may be a bug in the script.
In any case, if a code_saturne instance with user subroutines from SRC.1 is trying to connect with a default instance (empty SRC.2) that has no user subroutines and does not try to connect to 1, the cdoe will probably hang, which is what you seem to get. So you first need to check why 2/ is empty.
Also, the tmp_Saturne directory should contain a script named "run_solver.sh". If that file is not present, there is probably an installation or MPI type detection issue.
Best regards,
  Yvan
 
James McNaughton

Re: Restart under python runcase

Post by James McNaughton »

Thanks again Yvan for your help,
I see the n_procs argument now and can run on multiple processors on my desktop.
The cluster job still will not start: my SRC.1 and SRC.2 have the correct subroutines but no "run_solver.sh" appears in the tmp_Saturne nor does tmp_Saturne/2/ contain anything. I'll look into the errors that you've suggested.
All the best,
James
 
David Monfort

Re: Restart under python runcase

Post by David Monfort »

Hello James,
Could you post the results of 'code_saturne config'? It will tell us which MPI implementation has been found.
David
James McNaughton

Re: Restart under python runcase

Post by James McNaughton »

Hi David,
Here's the output. Let me know if you need anything else.
Cheers,
James
Directories:
dirs.prefix = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2
dirs.exec_prefix = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2
dirs.bindir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/bin
dirs.includedir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/include
dirs.libdir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/lib
dirs.datarootdir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/share
dirs.datadir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/share
dirs.pkgdatadir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/share/ncs
dirs.docdir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/share/doc/ncs
dirs.pdfdir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/share/doc/ncs

Auxiliary information:
dirs.ecs_bindir = /software/Code_Saturne/v2.0rc2/cs-2.0-rc2/bin
dirs.syrthes_prefix =

MPI library information:
mpi_lib.type = OpenMPI
mpi_lib.bindir = /usr/local/openmpi-1.3--ifort-v10--gcc-v3/bin
mpi_lib.libdir = /usr/local/openmpi-1.3--ifort-v10--gcc-v3/lib

Compilers and associated options:
cc = /usr/local/openmpi-1.3--ifort-v10--gcc-v3/bin/mpicc
fc = /usr/local/openmpi-1.3--ifort-v10--gcc-v3/bin/mpif90
cppflags = -D_POSIX_SOURCE -DDEBUG -I/software/Code_Saturne/v2.0rc2/cs-2.0-rc2/include -I/software/Code_Saturne/v2.0rc2/cs-2.0-rc2/include -I/usr/local/openmpi-1.3--ifort-v10--gcc-v3/include -I/usr/include/libxml2
cflags = -std=c99 -funsigned-char -pedantic -W -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wunused -Wfloat-equal -g
fcflags = -cpp -fpic -warn -D_CS_FC_HAVE_FLUSH -g -O0 -traceback -check all -fpe0 -ftrapuv
ldflags = -L/software/Code_Saturne/v2.0rc2/cs-2.0-rc2/lib -L/software/Code_Saturne/v1.4.0/opt/cgnslib_2.5/arch/Linux_x86_64/lib -L/software/Code_Saturne/v1.3.3/opt/med-fichier_2.3.5/arch/Linux_x86_64/lib -L/software/Code_Saturne/v1.3.3/opt/hdf5-1.6.9/arch/Linux_x86_64/lib -L/software/Code_Saturne/v2.0rc2/cs-2.0-rc2/lib -L/usr/local/openmpi-1.3--ifort-v10--gcc-v3/lib -Wl,-export-dynamic -g
libs = -lfvm -lm -lcgns -lmedC -lhdf5 -lfvm_coupl -lbft -lz -lxml2 -lblas -L/usr/local/openmpi-1.3--ifort-v10--gcc-v3/lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -lnsl -lutil -L/usr -limf -lm -L/opt/intel/fce/10.1.012/lib -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64 -lifport -lifcoremt -lsvml -lipgo -lirc -lpthread -lirc_s -ldl
rpath = -Wl,-rpath -Wl,

Compilers and associated options for SYRTHES build:
cc =
fc =
cppflags =
cflags =
fcflags =
ldflags =
libs =
Post Reply