Problem with a parallel job.

Philippe Parnaudeau · Post by **Philippe Parnaudeau** » Sat May 28, 2011 6:59 pm

hello,
I try to simulate a flow over a hot circular cylinder at moderate Re number (around 40 K), with Sutherland law;
The mesh has been generated by gmsh (around 2 10.6 hexahedra).
I use Saturn 1.3 on a UV100 SGI computer and I would like to make a run over 8 cpu, but the job finish around "5 hours" like this :
===============================================================
   ** STOP BECAUSE OF TIME EXCEEDED
      -----------------------------
      MAX NUMBER OF TIME STEP SET TO NTCABS:         81
===============================================================

===============================================================
   ** REMAINING TIME MANAGEMENT
      -------------------------
      REMAINING TIME ALLOCATED TO THE PROCESS   :    0.22790E+04
      ESTIMATED TIME FOR ANOTHER TIME STEP      :    0.24003E+04
        MEAN TIME FOR A TIME STEP               :    0.24151E+03
        TIME FOR THE PREVIOUS TIME STEP         :    0.23660E+03
        SECURITY MARGIN                         :    0.21600E+04
===============================================================

CPU TIME FOR THE TIME STEP               81:        0.23777E+03

and I don't undestand why.

I ask to do 200 iterations and I give 8 hours to do it.

The job is schedduled by "pbs pro" and the order is:
#PBS -q small_para
#PBS -l ncpus=8
#PBS -l mem=20000mb
#PBS -j eo -N hot_cylinder

The result seem to be good, but when I try to restart the job, Results became weird, or to be clear false!

I'm sure I made something wrong, but I don't understand where...

If someone could help me...

Thanks.

Yvan Fournier · Post by **Yvan Fournier** » Sun May 29, 2011 4:27 pm

Hello,
The heuristic for determining the safety margin may be found in armtps.F, and is very empirical. According the the comments, for 100 iterations or less, it should amount to 10% of the allocated time for jobs less than 1000 iterations, 100x the mean cost of an iteration for a job from 1000 to 10000 iterations, and 1% of the allocated time beyond that. Looking at the code itself, I am not quite convinced that is what it does (I also observed a strange result with it recently). You may want to add armtps.F to you user subroutines and modify it to avoid it causing the job to stop prematurely.
Otherwise, it is strange the the restart should give you incorrect results. Did you postprocess the results or check the "listing" file to make sure that the calculation's results just before stopping seemed correct ?
Best regards,
Yvan

Philippe Parnaudeau · Post by **Philippe Parnaudeau** » Mon May 30, 2011 9:34 am

hello,
thanks for your answer.
ok for your suggestion (add armtps.F to you user subroutines and modify it to avoid it causing the job to stop prematurely.), I try to done it.
For the restart problem, I made some investigations and realized new simulation with more elapsed time.
The elapsed time for this long run cover 2 last run (eg a first run + a restart) and results seem to be bad (too important velocity)...
Regards

Philippe Parnaudeau · Post by **Philippe Parnaudeau** » Sun Jun 12, 2011 4:36 pm

After few days past to testing , I think that I allready have a problem when I restart a job.
My problem (described in my first post) converged in the first run :
   ** INFORMATIONS ON THE CONVERGENCE
      -------------------------------
---------------------------------------------------------------
   Variable    Rhs norm      N_iter Norm. residual      derive
---------------------------------------------------------------
c Pressure     0.19328E-02    3929   0.98630E-08   0.71216E+00
c VelocitU     0.40586E+00     124   0.98757E-08   0.43668E-02
c VelocitV     0.27390E-01     146   0.93045E-08   0.11009E-02
c VelocitW     0.47047E-01     136   0.95211E-08   0.10405E-02
c TurbEner     0.13364E-01     120   0.98569E-08   0.37657E-04
c omega        0.16050E+06      32   0.89508E-08   0.34588E+05
c Temp.K       0.12187E+03      72   0.89896E-08   0.11965E-01
---------------------------------------------------------------

   ** INFORMATIONS ON THE VARIABLES
      -----------------------------
---------------------------------------------------------------
   Variable      Min. value    Max. value   Min clip   Max clip
---------------------------------------------------------------
v Pressure    -0.12868E+01   0.79520E+00         --         --
v VelocitU    -0.57194E+00   0.16545E+01         --         --
v VelocitV    -0.91935E+00   0.92499E+00         --         --
v VelocitW    -0.11094E+01   0.15753E+01         --         --
v TurbEner     0.12671E-14   0.18247E+00       1804          0
v omega        0.38811E-01   0.14688E+06         22          0
v Temp.K       0.28766E+03   0.30765E+03          0          0
v Lam. vis     0.17818E-04   0.18764E-04          0          0
v turb. vi     0.35720E-20   0.24151E-01          0          0
v total_pressu 0.10128E+06   0.10132E+06          0          0
v Th. cond     0.24781E-04   0.26098E-04          0          0
---------------------------------------------------------------

Result are physical acceptable...
But when I try to restart after few iteration, I have this result :

** INFORMATIONS ON THE CONVERGENCE
     -------------------------------
--------------------------------------------------------------
Variable    Rhs norm      N_iter Norm. residual      derive
--------------------------------------------------------------
Pressure     0.10919E+00    3920   0.99561E-08   0.99648E+00
VelocitU     0.39989E+00     133   0.94825E-08   0.44341E+01
VelocitV     0.47967E-01     145   0.96760E-08   0.57162E+00
VelocitW     0.20253E+00     147   0.97753E-08   0.16142E+02
TurbEner     0.13308E-01     122   0.88715E-08   0.19021E-03
omega        0.16050E+06      68   0.89448E-08   0.47969E+07
Temp.K       0.11622E+03      78   0.98349E-08   0.20159E+00
--------------------------------------------------------------

** INFORMATIONS ON THE VARIABLES
     -----------------------------
--------------------------------------------------------------
Variable      Min. value    Max. value   Min clip   Max clip
--------------------------------------------------------------
Pressure    -0.12788E+03   0.47269E+02         --         --
VelocitU    -0.31855E+02   0.16582E+02         --         --
VelocitV    -0.66965E+01   0.78254E+01         --         --
VelocitW    -0.33113E+02   0.29405E+02         --         --
TurbEner     0.63856E-13   0.51487E+00       1068          0
omega        0.10912E-03   0.14709E+06       1672          0
Temp.K       0.28718E+03   0.30745E+03          0          0
Lam. vis     0.17802E-04   0.18767E-04          0          0
turb. vi     0.37154E-18   0.23208E-01          0          0
total_pressu 0.10116E+06   0.10134E+06          0          0
Th. cond     0.24759E-04   0.26101E-04          0          0
--------------------------------------------------------------

** INFORMATIONS ON THE CLIPPINGS
     -----------------------------
--------------------------------------------------------------
Variable    Min wo clips Max wo clips   Min clip   Max clip
--------------------------------------------------------------
TurbEner    -0.18892E-01   0.51487E+00       1068          0
omega       -0.42985E+05   0.14709E+06       1672          0
Temp.K       0.28718E+03   0.30745E+03          0          0
--------------------------------------------------------------

I'm sure I'm doing something wrong, and I think the problem concern boundary condition
after the first run, results are :
   ** BOUNDARY CONDITIONS FOR SMOOTH WALLS
   ---------------------------------------
------------------------------------------------------------
Phase      1                            Minimum     Maximum
------------------------------------------------------------
   Rel velocity at the wall uiptn : -0.54381E+00 0.74582E+00
   Friction velocity        uet   : 0.30330E-01 0.15756E+05
   Friction velocity        uk    : 0.00000E+00 0.57993E-01
   Dimensionless distance   yplus : 0.46194E-06 0.71605E+02
   ------------------------------------------------------
   Nb of reversal of the velocity at the wall   :         96
   Nb of faces within the viscous sub-layer     :      43008
   Total number of wall faces                   :      73728
------------------------------------------------------------

and after restart:
** BOUNDARY CONDITIONS FOR SMOOTH WALLS
   ---------------------------------------
------------------------------------------------------------
Phase      1                            Minimum     Maximum
------------------------------------------------------------
   Rel velocity at the wall uiptn : 0.00000E+00 0.19862E+01
   Friction velocity        uet   : 0.23463E-01 0.90612E+05
   Friction velocity        uk    : 0.00000E+00 0.10589E+00
   Dimensionless distance   yplus : 0.13361E-05 0.71597E+02
   ------------------------------------------------------
   Nb of reversal of the velocity at the wall   :          0
   Nb of faces within the viscous sub-layer     :      42952
   Total number of wall faces                   :      73728

More informations :
The symbolic link "SUITE" is made correctly in directory DATA.
I give you in attachments the two listing file...

Thanks in advance.

Philippe Parnaudeau · Post by **Philippe Parnaudeau** » Fri Jun 17, 2011 1:37 pm

More information:
it's seem that an another user have same problem than me :
http://cfd.mace.manchester.ac.uk/twiki/bin/view/Forum/ForumIntro0052

but I can not find any solution...

regards.

Yvan Fournier · Post by **Yvan Fournier** » Fri Jun 17, 2011 9:55 pm

Hello,
It is difficult to determine anything with only some elements of the log files. Note that the information on convergence in the log file is that of the linear solvers for a given time step, not a global convergence indicator for the calculation.
The range of values for uitpn at th boundary does seem strange (I would not expect a negative value in the initial calculation, but I am not an expert on the turbulence wall laws.
Otherwise, the range of values for Y+ seems similar.
How are the results "strange" after restart ? Do you have a postprocessing view illustrating the problem ?
Are you running a steady or unsteady calculation ? The problem reported by another user is in a steady case, and the steady algorithm is more recent (and was not validated as extensively as the unsteady algorimth at the release of 1.3). Large calculations routinely use restarts with no problems, but I am not sure if any of those use the steady algorithm (most calculations using the steady algorithm manage to run in one time allocation slot).
If your issue is with a steady calculation, debugging restarts in version 1.3 will certainly not be a priority for the Code_Saturne team, though if you have the same problem with version 2.0.1, we will look into it
In any case, to debug further, we would need your data setup (user subroutines and/or xml file, and mesh, or a smaller version of your mesh).
Finally, note that although it is not labeled as such, an unsteady calculation with spatially local time step is actually a form of steady algorithm (as the time step is not global, a solution at a given "time" may not be interpreted, but is an intermediate step towards a converged solution. That variant of the algorithm has been more tested over time, and has been seen to actually be more robust in some cases. So you may wish to switch to an "unsteady with time step varying in space" instead.
Best regards,
Yvan Fournier

Philippe Parnaudeau · Mon Jun 20, 2011 10:00 am

hello,
"Note that the information on convergence in the log file is that of the
linear solvers for a given time step, not a global convergence indicator
for the calculation."
ok;
"The range of values for uitpn at th boundary does seem strange (I would
not expect a negative value in the initial calculation, but I am not an
expert on the turbulence wall laws."
The first run is not the initial calculation but the results takes after a shoort run; But whatever, you've right, I've a problem with.
I expect a too large mesh somewhere and I investigate this way for the moment, and same for Y+.
Could you confirm this ?

"How are the results "strange" after restart ?"

A huge jump in the velocity value like this.
After the first run : VelocitU : -0.57194E+00   0.16545E+01
Restart after 1 iteration : VelocitU    -0.31855E+02   0.16582E+02
I don't understand what happens, for me, it's wrong...
But I'm not an expert with RANS simulation, I only know DNS and LES...    So, it's my first steady simulation (RANS)...

I gave (last week) my mesh and my *.xml to d. Monfort;
I understand that I use a old version of Saturn and I start to install the new version ASAP.

Many thanks for answers.

kind regards.

Guest · Post by **Guest** » Tue Jan 31, 2012 2:28 pm

Hello Yvan

please tell me know how to start from the last time step after 100 iterations.

if i give 200 iterations as the solution is not converged

Yvan Fournier · Post by **Yvan Fournier** » Tue Jan 31, 2012 2:45 pm

Hello,
If you use the GUI, check the "restart" section. If you are only using the script, search for "restart" in the runcase. The reference documentation may also provide details, but the GUI should be self-explanatory.
In any case, if you ran 100 iterations in the fist run and want to add 200, you need to set the new number of iterations to 300 (and not 200), as the value given represents the total.
Best regards,
Yvan

code_saturne User's Forum

Problem with a parallel job.

Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.

Re: Problem with a parallel job.