Has anyone had problems with restarts in v7?
I cannot manage to restart a simulation. The previous simulation was completed with no errors.
The error looks as a memory issue, could it be due to the environement rather than directly to the code?
I indeed saw (before creating this new subject) your post on the forum related to your restart issues.
I misunderstood the solution you found: I tried to change the read method, and not the write method... Actually I have just done a test changing both methods: the restart seems to work correctly!
Do you have more info on the MPI library and file system on which you encountered this issue ?
Also, if you have a small test case on which I could try to reproduce this (in case it is generic and not related to a single system), that would be of interest.
Sorry Yvan I missed your last post on this topic, that's why I have not replied before.
I will try to collect the information about our MPI environment and also try to build a small test case (the actual one is too big to be shared).
Thank you (and happy new year!),
Best regards,
Daniele
Thanks for the info. In the beginning of your "listing"/run_solver.log file, you also have system info. Could you provide that (editing out if you choose the line with your login name, which I do not need).
Local case configuration:
Date: …
System: Linux 3.10.0-1127.19.1.el7.x86_64
Machine: node112
Processor: model name : Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
Memory: 191887 MB
User: ...
Directory: …
MPI ranks: 80 (appnum attribute: 0)
MPI ranks per node: 40
OpenMP threads: 1
Processors/node: 20
Compilers used for build:
C compiler: gcc (GCC) 5.5.0
C++ compiler: g++ (GCC) 5.5.0
Fortran compiler: GNU Fortran (GCC) 5.5.0
MPI version 3.1 (Open MPI 2.1.1)
I/O read method: standard input and output, serial access
I/O write method: standard input and output, serial access
I/O rank step: 1
External libraries for partitioning:
ParMETIS 4.0.3
SCOTCH 6.1.0
Hope this answers you question.
Kind regards,
Daniele
Yes, this is what I wanted to check. OpenMPI 2.1 is quite ancient, though at the time we used it, I did not encounter IO problems (which can depend also on the filesystem configuration, so is difficult to reproduce on another system).
I see that you system repots 20 processors per node, but you are using 40. The reporting/detection might be wrong, but otherwise, I would expect that you get better performance using only as many ranks per node as there are physical processors. Unless perhaps hyperthreading comes into play ? I'm interested in feedback here too.