Hello everyone,
I used the turbo-machinery module in code saturne on archer. I upload the setup-xml and mesh_output on archer. However the calculation always stopped without any error output. This problem confused me several months. Do you have any ideas about that??? Many thanks.
The listing :
command:
./cs_solver --mpi --param setup.xml
***************************************************************
(R)
Code_Saturne
Version 5.0.4
Copyright (C) 1998-2017 EDF S.A., France
revision 5.0.4
build Thu Nov 16 18:20:53 2017
MPI version 3.1 (MPICH 3.2)
The Code_Saturne CFD tool is free software;
you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License,
or (at your option) any later version.
The Code_Saturne CFD tool is distributed in the hope that
it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License
for more details.
***************************************************************
Local case configuration:
Date: Tue Mar 20 10:55:18 2018
System: Linux 3.0.101-0.46.1_1.0502.8871-cray_ari_c
Machine: nid04776
Processor: model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Memory: 64523 MB
Directory: /fs3/e01/e01/yyan/rans/0018turbine/case1/RESU/20180319-2006
MPI ranks: 480 (appnum attribute: 0)
OpenMP threads: 1
Processors/node: 1
I/O read method: collective MPI-IO (explicit offsets)
I/O write method: collective MPI-IO (explicit offsets)
I/O rank step: 1
External libraries for partitioning:
ParMETIS 4.0.3
Reading metadata from file: "mesh_input"
===============================================================
CALCULATION PREPARATION
=======================
===========================================================
No error detected during the data verification
cs_user_parameters.f90 and others).
===========================================================
CALCULATION PARAMETERS SUMMARY
==============================
-----------------------------------------------------------
** DIMENSIONS
----------
--- Physics
NVAR = 11 (Nb variables )
NSCAL = 0 (Nb scalars )
NSCAUS = 0 (Nb user scalars )
NSCAPP = 0 (Nb specific physics scalars )
-----------------------------------------------------------
** VOF METHOD
----------------------------------------
IVOFMT = -1 ( -1: inactive )
( 0: active )
** HOMOGENEOUS MIXTURE MODEL FOR CAVITATION
----------------------------------------
ICAVIT = -1 (-1: single phase flow )
( 0: no vap./cond. model )
( 1: Merkle's model )
-----------------------------------------------------------
** TIME STEPPING
-------------
--- Per-variable properties
------------------------------------
Variable ISTAT CDTVAR
------------------------------------
Velocity 1 0.1000E+01
Pressure 0 0.1000E+01
r11 1 0.1000E+01
r22 1 0.1000E+01
r33 1 0.1000E+01
r12 1 0.1000E+01
r23 1 0.1000E+01
r13 1 0.1000E+01
epsilon 1 0.1000E+01
----------------------------
ISTAT = 0 ou 1 (1 for unsteady )
CDTVAR > 0 (time step multiplier )
--- Order of base time stepping scheme
ISCHTP = 1 (1: order 1; 2: order 2 )
-----------------------------------------------------------
** STOKES
------
-- Phase continue :
ISTMPF = 1 (time scheme for flow
(0: explicit (THETFL = 0 )
(1: std scheme (Saturne 1.0 )
(2: 2nd-order (THETFL = 0.5 )
THETFL = -0.99900E+03 (theta for mass flow )
IROEXT = 0 (density extrapolation
(0: explicit
(1: n+thetro with thetro=1/2
(2: n+thetro with thetro=1
THETRO = 0.00000E+00 (theta for density
((1+theta).new-theta.old
IVIEXT = 0 (total viscosity extrapolation
(0: explicit
(1: n+thetvi with thetro=1/2
(2: n+thetvi with thetro=1
THETVI = 0.00000E+00 (theta for total viscosity
((1+theta).new-theta.old
ICPEXT = 0 (specific heat extrapolation
(0: explicit
(1: n+thetcp with thetro=1/2
(2: n+thetcp with thetro=1
THETCP = 0.00000E+00 (specific heat theta-scheme
((1+theta).new-theta.old
THETSN = 0.00000E+00 (Nav-Stokes S.T. theta scheme)
((1+theta).new-theta.old
THETST = 0.00000E+00 (Turbulence S.T. theta-scheme)
((1+theta).new-theta.old
EPSUP = 0.10000E-04 (Velocity/pressure coupling
stop test )
-----------------------------------------------------------
** BASE ITERATIVE SOLVERS
----------------------
------------------------------------
Variable EPSILO IDIRCL
------------------------------------
Velocity 0.1000E-07 1
Pressure 0.1000E-07 1
r11 0.1000E-07 1
r22 0.1000E-07 1
r33 0.1000E-07 1
r12 0.1000E-07 1
r23 0.1000E-07 1
r13 0.1000E-07 1
epsilon 0.1000E-07 1
------------------------------------
EPSILO = (resolution precision)
IDIRCL = 0 ou 1 (shift diagonal if
ISTAT=0 and no Dirichlet)
-----------------------------------------------------------
** CALCULATION MANAGEMENT
----------------------
--- Restarted calculation
ISUITE = 1 (1: restarted calculation )
ILEAUX = 0 (1: read restart/auxiliary )
IECAUX = 1 (1: write checkpoint/auxiliary)
--- Calculation time
The numbering of time steps and the measure of simulated
physical time are absolute values, and not values
relative to the current calculation.
INPDT0 = 0 (1: 0 time step calcuation )
NTMABS = 500 (Final time step required )
--- CPU time margin
TMARUS = -0.10000E+01 (CPU time margin before stop )
-----------------------------------------------------------
** INPUT-OUTPUT
------------
--- Restart file
NTSUIT = 100 (Checkpoint frequency )
--- Probe history files
NTHIST = 1 (Output frequency )
FRHIST = -.10000E+01 (Output frequency (s) )
-- -- --
--- Log files
NTLIST = 500 (Output frequency )
Number Name IWARNI verbosity level
(-999: not applicable)
Velocity 0
Pressure 0
r11 0
r22 0
r33 0
r12 0
r23 0
r13 0
epsilon 0
TurbVisc -999
CourantNb -999
FourierNb -999
total_pressure -999
Yplus -999
-- -- --
--- Additional post-processing variables (ipstdv)
ipstfo = 1 (Force exerted by the
fluid on the boundary)
ipstyp = 1 (y+ at boundary)
ipsttp = 0 (T+ at boundary)
ipstft = 1 (Thermal flux at boundary)
ipstnu = 0 (Dimensionless thermal
flux at boundary)
-----------------------------------------------------------
** ALE METHOD (MOVING MESH)
-----------
IALE = 0 (1: activated )
NALINF = 0 (Fluid initialization
iterations)
IFLXMW = 0 (ALE mass flux computation
0: thanks to vertices
1: thanks to mesh velocity)
-----------------------------------------------------------
Postprocessing output writers:
------------------------------
-1: name: results
directory: postprocessing
format: EnSight Gold
options:
time dependency: transient connectivity
output: every 20 time steps and at calculation end
-5: name:
directory: monitoring
format: time_plot
options:
time dependency: fixed mesh
output: every 1 time steps
-6: name:
directory: profiles
format: plot
options:
time dependency: fixed mesh
output: at calculation end
Reading file: mesh_input
Finished reading: mesh_input
No "partition_input/domain_number_480" file available;
----------------------------------------------------------
Partitioning by space-filling curve: Morton (in bounding box).
Number of cells per domain (histogramm):
[ 21207 ; 21208 ] = 480
Partitioning finished (30.9 s)
-------------------------------------------------------
Joining number 1:
Selection criteria: "not (INTERFACE or INTERFACE2)"
Parameters for the joining operation:
Shortest incident edge fraction: 0.10000
Maximum angle between joined face planes: 25.00000
Advanced joining parameters:
Verbosity level: 1
Visualization level: 1
Deepest level reachable in tree building: 30
Max boxes by leaf: 25
Max ratio of linked boxes / init. boxes: 5.00000
Max ratio of boxes for distribution: 2.00000
Merge step tolerance multiplier: 1.00000
Pre-merge factor: 0.05000
Tolerance computation mode: 1
Intersection computation mode: 1
Max. number of equiv. breaks: 500
Max. number of subfaces by face: 200
Before joining
Number of cells: 10179720
Number of interior faces: 0
Number of boundary faces: 61078320
Number of vertices: 81437760
Global number of boundary faces selected for joining: 60668748
Element selection successfully done.
Global min/max. tolerance:
Glob. Num. | Tolerance | Coordinates
22 | 0.000002 | 1.2344007860e-01 8.4999084160e-01 -6.0000000000e-01 | ORI
9506 | 0.002833 | 6.7919450960e-03 1.2067073530e+01 4.8166666670e-01 | ORI
Determination of possible face intersections:
bounding-box tree layout: 3D
calculation stopped without error profile output
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4209
- Joined: Mon Feb 20, 2012 3:25 pm
Re: calculation stopped without error profile output
Hello,
Do you have any output in the batch log file ? any error_* files ?
It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.
Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.
I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.
Best regards,
Yvan
Do you have any output in the batch log file ? any error_* files ?
It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.
Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.
I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.
Best regards,
Yvan
Re: calculation stopped without error profile output
Hello,
There is a error file output, but it's blank. And there is file end with .o5190971 output as followed,
yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
--------------------------------------------------------------------------------
ModuleCmd_Switch.c(179):ERROR:152: Module 'PrgEnv-cray' is currently not loaded
/home3/e01/e01/yyan
Currently Loaded Modulefiles:
1) modules/3.2.10.6
2) nodestat/2.2-1.0502.60539.1.31.ari
3) sdb/1.1-1.0502.63652.4.25.ari
4) alps/5.2.5-2.0502.9955.44.1.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.46.1_1.0502.8871.22.1-1.0502.21658.55.1
6) udreg/2.3.2-1.0502.10518.2.17.ari
7) ugni/6.0-1.0502.10863.8.29.ari
8) gni-headers/4.0-1.0502.10859.7.8.ari
9) dmapp/7.0.1-1.0502.11080.8.76.ari
10) xpmem/0.1-2.0502.64982.5.3.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.60680.2.4.ari
13) intel/17.0.0.098
14) craype-network-aries
15) craype-ivybridge
16) craype/2.5.10
17) pbs/12.2.401.141761
18) cray-mpich/7.5.5
19) packages-archer
20) xalt/0.6.0
21) cse-compute-defaults/3.0
22) cray-libsci/16.11.1
23) pmi/5.0.13
24) atp/2.1.0
25) PrgEnv-intel/5.2.82
/work/e01/e01/yyan/rans/0018turbine/case1/RESU/20180319-2006
[NID 00046] 2018-03-20 10:39:12 Apid 30319988: initiated application termination
[NID 00045] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00046] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00044] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00043] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
Application 30319988 exit signals: Killed
Application 30319988 resources: utime ~0s, stime ~8s, Rss ~7296, inblocks ~199448, outblocks ~469584
--------------------------------------------------------------------------------
Resources requested: ncpus=192,place=free,walltime=00:20:00
Resources allocated: cpupercent=0,cput=00:00:02,mem=8668kb,ncpus=192,vmem=172772kb,walltime=00:09:21
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
--------------------------------------------------------------------------------
There is a error file output, but it's blank. And there is file end with .o5190971 output as followed,
yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
*** yyan Job: 5190879.sdb started: 20/03/18 10:29:56 host: mom3 ***
--------------------------------------------------------------------------------
ModuleCmd_Switch.c(179):ERROR:152: Module 'PrgEnv-cray' is currently not loaded
/home3/e01/e01/yyan
Currently Loaded Modulefiles:
1) modules/3.2.10.6
2) nodestat/2.2-1.0502.60539.1.31.ari
3) sdb/1.1-1.0502.63652.4.25.ari
4) alps/5.2.5-2.0502.9955.44.1.ari
5) lustre-cray_ari_s/2.5_3.0.101_0.46.1_1.0502.8871.22.1-1.0502.21658.55.1
6) udreg/2.3.2-1.0502.10518.2.17.ari
7) ugni/6.0-1.0502.10863.8.29.ari
8) gni-headers/4.0-1.0502.10859.7.8.ari
9) dmapp/7.0.1-1.0502.11080.8.76.ari
10) xpmem/0.1-2.0502.64982.5.3.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.60680.2.4.ari
13) intel/17.0.0.098
14) craype-network-aries
15) craype-ivybridge
16) craype/2.5.10
17) pbs/12.2.401.141761
18) cray-mpich/7.5.5
19) packages-archer
20) xalt/0.6.0
21) cse-compute-defaults/3.0
22) cray-libsci/16.11.1
23) pmi/5.0.13
24) atp/2.1.0
25) PrgEnv-intel/5.2.82
/work/e01/e01/yyan/rans/0018turbine/case1/RESU/20180319-2006
[NID 00046] 2018-03-20 10:39:12 Apid 30319988: initiated application termination
[NID 00045] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00046] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00044] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
[NID 00043] 2018-03-20 10:39:14 Apid 30319988: OOM killer terminated this process.
Application 30319988 exit signals: Killed
Application 30319988 resources: utime ~0s, stime ~8s, Rss ~7296, inblocks ~199448, outblocks ~469584
--------------------------------------------------------------------------------
Resources requested: ncpus=192,place=free,walltime=00:20:00
Resources allocated: cpupercent=0,cput=00:00:02,mem=8668kb,ncpus=192,vmem=172772kb,walltime=00:09:21
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
*** yyan Job: 5190879.sdb ended: 20/03/18 10:39:17 queue: S5184105 ***
--------------------------------------------------------------------------------
Yvan Fournier wrote:Hello,
Do you have any output in the batch log file ? any error_* files ?
It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.
Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.
I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.
Best regards,
Yvan
Re: calculation stopped without error profile output
Hi, Yvan,
The output summery file is attached. Could you help me to have a look at it ? Is there anything wrong could be present in this file? Thank you.
All the best, Yan.
The output summery file is attached. Could you help me to have a look at it ? Is there anything wrong could be present in this file? Thank you.
All the best, Yan.
Yvan Fournier wrote:Hello,
Do you have any output in the batch log file ? any error_* files ?
It seems the computation got "stuck" in a given stage and did not progress, so you would expect a "time up" type message in the batch log.
Attaching a debugger here would help pinpoint where the computation is "stuck". On some systems, we have issues with "MPI_Alltoallv" operations hanging when too many nodes are exchanging data. This should not happen, but is MPI library-related, so settings on the MPI side may help.
I am not sure whether you are using vanilla MPICH or Intel-MPI (the "listing" does not really tell). With Intel MPI, some environment variables may help, especially I_MPI_ADJUST_ALLTOALLV.
Best regards,
Yvan
- Attachments
-
- summary.doc
- (32.71 KiB) Downloaded 243 times
-
- Posts: 4209
- Joined: Mon Feb 20, 2012 3:25 pm
Re: calculation stopped without error profile output
Hello,
No, it seems even the summary was not updated (which is expected in the case the job is stopped by the batch system/resource manager).
How about the files I recommended (the batch output log, probably in the directory the case was submitted from) ?
Regards,
Yvan
No, it seems even the summary was not updated (which is expected in the case the job is stopped by the batch system/resource manager).
How about the files I recommended (the batch output log, probably in the directory the case was submitted from) ?
Regards,
Yvan