calculation stopped without error profile output

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
yany
Posts: 60
Joined: Fri Aug 04, 2017 11:02 pm

calculation stopped without error profile output

Post by yany »

Hello everyone,
I used the turbo-machinery module in code saturne on archer. I upload the setup-xml and mesh_output on archer. However the calculation always stopped without any error output. This problem confused me several months. Do you have any ideas about that??? Many thanks.

The listing :
command:
./cs_solver --mpi --param setup.xml

***************************************************************

(R)
Code_Saturne

Version 5.0.4


Copyright (C) 1998-2017 EDF S.A., France

revision 5.0.4
build Thu Nov 16 18:20:53 2017
MPI version 3.1 (MPICH 3.2)


The Code_Saturne CFD tool is free software;
you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

The Code_Saturne CFD tool is distributed in the hope that
it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License
for more details.

***************************************************************


Local case configuration:

Date: Tue Mar 20 10:55:18 2018
System: Linux 3.0.101-0.46.1_1.0502.8871-cray_ari_c
Machine: nid04776
Processor: model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Memory: 64523 MB
Directory: /fs3/e01/e01/yyan/rans/0018turbine/case1/RESU/20180319-2006
MPI ranks: 480 (appnum attribute: 0)
OpenMP threads: 1
Processors/node: 1
I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1

External libraries for partitioning:
ParMETIS 4.0.3

Reading metadata from file: "mesh_input"

===============================================================

CALCULATION PREPARATION
=======================


===========================================================




No error detected during the data verification
cs_user_parameters.f90 and others).


===========================================================

CALCULATION PARAMETERS SUMMARY
==============================

-----------------------------------------------------------


** DIMENSIONS
----------

--- Physics
NVAR = 11 (Nb variables )
NSCAL = 0 (Nb scalars )
NSCAUS = 0 (Nb user scalars )
NSCAPP = 0 (Nb specific physics scalars )


-----------------------------------------------------------


** VOF METHOD
----------------------------------------

IVOFMT = -1 ( -1: inactive )
( 0: active )


** HOMOGENEOUS MIXTURE MODEL FOR CAVITATION
----------------------------------------

ICAVIT = -1 (-1: single phase flow )
( 0: no vap./cond. model )
( 1: Merkle's model )


-----------------------------------------------------------


** TIME STEPPING
-------------

--- Per-variable properties

------------------------------------
Variable ISTAT CDTVAR
------------------------------------
Velocity 1 0.1000E+01
Pressure 0 0.1000E+01
r11 1 0.1000E+01
r22 1 0.1000E+01
r33 1 0.1000E+01
r12 1 0.1000E+01
r23 1 0.1000E+01
r13 1 0.1000E+01
epsilon 1 0.1000E+01
----------------------------

ISTAT = 0 ou 1 (1 for unsteady )
CDTVAR > 0 (time step multiplier )

--- Order of base time stepping scheme
ISCHTP = 1 (1: order 1; 2: order 2 )


-----------------------------------------------------------


** STOKES
------

-- Phase continue :

ISTMPF = 1 (time scheme for flow
(0: explicit (THETFL = 0 )
(1: std scheme (Saturne 1.0 )
(2: 2nd-order (THETFL = 0.5 )
THETFL = -0.99900E+03 (theta for mass flow )
IROEXT = 0 (density extrapolation
(0: explicit
(1: n+thetro with thetro=1/2
(2: n+thetro with thetro=1
THETRO = 0.00000E+00 (theta for density
((1+theta).new-theta.old
IVIEXT = 0 (total viscosity extrapolation
(0: explicit
(1: n+thetvi with thetro=1/2
(2: n+thetvi with thetro=1
THETVI = 0.00000E+00 (theta for total viscosity
((1+theta).new-theta.old
ICPEXT = 0 (specific heat extrapolation
(0: explicit
(1: n+thetcp with thetro=1/2
(2: n+thetcp with thetro=1
THETCP = 0.00000E+00 (specific heat theta-scheme
((1+theta).new-theta.old
THETSN = 0.00000E+00 (Nav-Stokes S.T. theta scheme)
((1+theta).new-theta.old
THETST = 0.00000E+00 (Turbulence S.T. theta-scheme)
((1+theta).new-theta.old
EPSUP = 0.10000E-04 (Velocity/pressure coupling
stop test )


-----------------------------------------------------------


** BASE ITERATIVE SOLVERS
----------------------

------------------------------------
Variable EPSILO IDIRCL
------------------------------------
Velocity 0.1000E-07 1
Pressure 0.1000E-07 1
r11 0.1000E-07 1
r22 0.1000E-07 1
r33 0.1000E-07 1
r12 0.1000E-07 1
r23 0.1000E-07 1
r13 0.1000E-07 1
epsilon 0.1000E-07 1
------------------------------------

EPSILO = (resolution precision)
IDIRCL = 0 ou 1 (shift diagonal if
ISTAT=0 and no Dirichlet)


-----------------------------------------------------------


** CALCULATION MANAGEMENT
----------------------

--- Restarted calculation
ISUITE = 1 (1: restarted calculation )
ILEAUX = 0 (1: read restart/auxiliary )
IECAUX = 1 (1: write checkpoint/auxiliary)


--- Calculation time
The numbering of time steps and the measure of simulated
physical time are absolute values, and not values
relative to the current calculation.

INPDT0 = 0 (1: 0 time step calcuation )
NTMABS = 500 (Final time step required )

--- CPU time margin
TMARUS = -0.10000E+01 (CPU time margin before stop )


-----------------------------------------------------------


** INPUT-OUTPUT
------------

--- Restart file
NTSUIT = 100 (Checkpoint frequency )

--- Probe history files
NTHIST = 1 (Output frequency )
FRHIST = -.10000E+01 (Output frequency (s) )
-- -- --

--- Log files
NTLIST = 500 (Output frequency )

Number Name IWARNI verbosity level
(-999: not applicable)

Velocity 0
Pressure 0
r11 0
r22 0
r33 0
r12 0
r23 0
r13 0
epsilon 0
TurbVisc -999
CourantNb -999
FourierNb -999
total_pressure -999
Yplus -999
-- -- --

--- Additional post-processing variables (ipstdv)
ipstfo = 1 (Force exerted by the
fluid on the boundary)
ipstyp = 1 (y+ at boundary)
ipsttp = 0 (T+ at boundary)
ipstft = 1 (Thermal flux at boundary)
ipstnu = 0 (Dimensionless thermal
flux at boundary)


-----------------------------------------------------------


** ALE METHOD (MOVING MESH)
-----------

IALE = 0 (1: activated )
NALINF = 0 (Fluid initialization
iterations)
IFLXMW = 0 (ALE mass flux computation
0: thanks to vertices
1: thanks to mesh velocity)


-----------------------------------------------------------


Postprocessing output writers:
------------------------------

-1: name: results
directory: postprocessing
format: EnSight Gold
options:
time dependency: transient connectivity
output: every 20 time steps and at calculation end

-5: name:
directory: monitoring
format: time_plot
options:
time dependency: fixed mesh
output: every 1 time steps

-6: name:
directory: profiles
format: plot
options:
time dependency: fixed mesh
output: at calculation end


Reading file: mesh_input
Finished reading: mesh_input
No "partition_input/domain_number_480" file available;

----------------------------------------------------------

Partitioning by space-filling curve: Morton (in bounding box).
Number of cells per domain (histogramm):
[ 21207 ; 21208 ] = 480

Partitioning finished (30.9 s)


-------------------------------------------------------
Joining number 1:

Selection criteria: "not (INTERFACE or INTERFACE2)"

Parameters for the joining operation:
Shortest incident edge fraction: 0.10000
Maximum angle between joined face planes: 25.00000

Advanced joining parameters:
Verbosity level: 1
Visualization level: 1
Deepest level reachable in tree building: 30
Max boxes by leaf: 25
Max ratio of linked boxes / init. boxes: 5.00000
Max ratio of boxes for distribution: 2.00000
Merge step tolerance multiplier: 1.00000
Pre-merge factor: 0.05000
Tolerance computation mode: 1
Intersection computation mode: 1
Max. number of equiv. breaks: 500
Max. number of subfaces by face: 200

Before joining
Number of cells: 10179720
Number of interior faces: 0
Number of boundary faces: 61078320
Number of vertices: 81437760

Global number of boundary faces selected for joining: 60668748

Element selection successfully done.
Global min/max. tolerance:

Glob. Num. | Tolerance | Coordinates

22 | 0.000002 | 1.2344007860e-01 8.4999084160e-01 -6.0000000000e-01 | ORI
9506 | 0.002833 | 6.7919450960e-03 1.2067073530e+01 4.8166666670e-01 | ORI
Determination of possible face intersections:

bounding-box tree layout: 3D
Yvan Fournier
Posts: 4209
Joined: Mon Feb 20, 2012 3:25 pm

Re: calculation stopped without error profile output

Post by Yvan Fournier »

Hello,

Do you have any output in the batch log file ? any error_* files ?

It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.

Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.

I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.

Best regards,

Yvan
yany
Posts: 60
Joined: Fri Aug 04, 2017 11:02 pm

Re: calculation stopped without error profile output

Post by yany »

Hello,
There is a error file output, but it's blank. And there is file end with .o5190971 output as followed,

yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***

--------------------------------------------------------------------------------
ModuleCmd_Switch.c(179):ERROR:152: Module 'PrgEnv-cray' is currently not loaded
/home3/e01/e01/yyan
Currently Loaded Modulefiles:
1) modules/3.2.10.6
2) nodestat/2.2-1.0502.60539.1.31.ari
3) sdb/1.1-1.0502.63652.4.25.ari
4) alps/5.2.5-2.0502.9955.44.1.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.46.1_1.0502.8871.22.1-1.0502.21658.55.1
6) udreg/2.3.2-1.0502.10518.2.17.ari
7) ugni/6.0-1.0502.10863.8.29.ari
8) gni-headers/4.0-1.0502.10859.7.8.ari
9) dmapp/7.0.1-1.0502.11080.8.76.ari
10) xpmem/0.1-2.0502.64982.5.3.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.60680.2.4.ari
13) intel/17.0.0.098
14) craype-network-aries
15) craype-ivybridge
16) craype/2.5.10
17) pbs/12.2.401.141761
18) cray-mpich/7.5.5
19) packages-archer
20) xalt/0.6.0
21) cse-compute-defaults/3.0
22) cray-libsci/16.11.1
23) pmi/5.0.13
24) atp/2.1.0
25) PrgEnv-intel/5.2.82
/work/e01/e01/yyan/rans/0018turbine/case1/RESU/20180319-2006
[NID 00046] 2018-03-20 10:39:12 Apid 30319988: initiated application termination
[NID 00045] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00046] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00044] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00043] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
Application 30319988 exit signals: Killed
Application 30319988 resources: utime ~0s, stime ~8s, Rss ~7296, inblocks ~199448, outblocks ~469584
--------------------------------------------------------------------------------

Resources requested: ncpus=192,place=free,walltime=00:20:00
Resources allocated: cpupercent=0,cput=00:00:02,mem=8668kb,ncpus=192,vmem=172772kb,walltime=00:09:21

*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
--------------------------------------------------------------------------------



















Yvan Fournier wrote:Hello,

Do you have any output in the batch log file ? any error_* files ?

It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.

Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.

I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.

Best regards,

Yvan
yany
Posts: 60
Joined: Fri Aug 04, 2017 11:02 pm

Re: calculation stopped without error profile output

Post by yany »

Hi, Yvan,
The output summery file is attached. Could you help me to have a look at it ? Is there anything wrong could be present in this file? Thank you.

All the best, Yan.




Yvan Fournier wrote:Hello,

Do you have any output in the batch log file ? any error_* files ?

It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.

Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.

I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.

Best regards,

Yvan
Attachments
summary.doc
(32.71 KiB) Downloaded 243 times
Yvan Fournier
Posts: 4209
Joined: Mon Feb 20, 2012 3:25 pm

Re: calculation stopped without error profile output

Post by Yvan Fournier »

Hello,

No, it seems even the summary was not updated (which is expected in the case the job is stopped by the batch system/resource manager).

How about the files I recommended (the batch output log, probably in the directory the case was submitted from) ?

Regards,

Yvan
Post Reply