Hello everyone
I am trying to run saturne a cluster using MPI mvapich2 on 4 processor for a preliminary test. I have an error concerning SGSEGV signal. Could someone help me with this? My log file is attached.
SIGSEGV signal
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: SIGSEGV signal
Hello,
This may be an installation issue, as the code crashes very early on, but parts of the XML file and user subroutines are applied first, so we can't help you much if you don't post those...
Also, it is strange that you have 4 log files: this is not the default, so if you changed a setting, it may be explained, otherwise it is definitely an installation issue...
Regards,
Yvan
This may be an installation issue, as the code crashes very early on, but parts of the XML file and user subroutines are applied first, so we can't help you much if you don't post those...
Also, it is strange that you have 4 log files: this is not the default, so if you changed a setting, it may be explained, otherwise it is definitely an installation issue...
Regards,
Yvan
Re: SIGSEGV signal
Thank's Yvan.
There is no user subroutine. I made this case study just to verify m .pbs submission file but it crashes. This case study works on my work station very well but even in parallel but i have this error on cluster.
There is no user subroutine. I made this case study just to verify m .pbs submission file but it crashes. This case study works on my work station very well but even in parallel but i have this error on cluster.
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: SIGSEGV signal
Hello,
Could you still post the xml file ?
Also, another test would be to edit run_solver.sh in the execution directory
so as to replace:
--param fm
With:
--quality
and re-run run_solver.sh (possibly adding the BATCH templates from runcase if necessary).
This way, you will only compute quality criteria, and not load the XML file. This may help determine whether the crash is due to the xml file (or libxml2 library installation) or something else.
Your trace seems to refer to environment modules. Code_Saturne tries to detect which are loaded at install time, and reload the same modules at run time. Depending on your module command version, this may fail, so you may want to try adding:
--with-modules=no
to the configure line and load modules separately (in your own environment).
We do not have experience with MVAPICH, so if none of the previous tests help, you may want to install a serial only version of the code on the cluster, using all the same tools, but adding --without-mpi to the configure line, just to see if the problem is due to MVAPICH (even on a single rank, it may try to run MPI_Init()). If that is the cause, you then have options in the DATA/cs_user_scripts.py (to be copied from DATA/REFERENCE) to modify parts of the MPI launch command.
Regards,
Yvan
Could you still post the xml file ?
Also, another test would be to edit run_solver.sh in the execution directory
so as to replace:
--param fm
With:
--quality
and re-run run_solver.sh (possibly adding the BATCH templates from runcase if necessary).
This way, you will only compute quality criteria, and not load the XML file. This may help determine whether the crash is due to the xml file (or libxml2 library installation) or something else.
Your trace seems to refer to environment modules. Code_Saturne tries to detect which are loaded at install time, and reload the same modules at run time. Depending on your module command version, this may fail, so you may want to try adding:
--with-modules=no
to the configure line and load modules separately (in your own environment).
We do not have experience with MVAPICH, so if none of the previous tests help, you may want to install a serial only version of the code on the cluster, using all the same tools, but adding --without-mpi to the configure line, just to see if the problem is due to MVAPICH (even on a single rank, it may try to run MPI_Init()). If that is the cause, you then have options in the DATA/cs_user_scripts.py (to be copied from DATA/REFERENCE) to modify parts of the MPI launch command.
Regards,
Yvan
Re: SIGSEGV signal
Thanks Yvan
I will try to do these tests.
I also attached the xml file. The installed software on the cluster is without the GUI.
I will try to do these tests.
I also attached the xml file. The installed software on the cluster is without the GUI.
- Attachments
-
- fm.txt
- (7.49 KiB) Downloaded 291 times
Last edited by fomeh on Fri May 03, 2013 10:41 pm, edited 1 time in total.
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: SIGSEGV signal
Hello,
The XML file reads fine a a workstation I tested it on (It fails much later as I tested it on another mesh, but I just wanted to test the initialization anyways).
Regards,
Yvan
The XML file reads fine a a workstation I tested it on (It fails much later as I tested it on another mesh, but I just wanted to test the initialization anyways).
Regards,
Yvan
Re: SIGSEGV signal
I recompiled the code with gcc and openMPI on the cluster. It works finally.
Thank's Yvan for your valuable help
Thank's Yvan for your valuable help