Page 1 of 1
SIGSEGV signal
Posted: Fri May 03, 2013 12:49 am
by fomeh
Hello everyone
I am trying to run saturne a cluster using MPI mvapich2 on 4 processor for a preliminary test. I have an error concerning SGSEGV signal. Could someone help me with this? My log file is attached.
Re: SIGSEGV signal
Posted: Fri May 03, 2013 2:52 am
by fomeh
An update: i have this error even on 1 processor.
Re: SIGSEGV signal
Posted: Fri May 03, 2013 7:47 am
by Yvan Fournier
Hello,
This may be an installation issue, as the code crashes very early on, but parts of the XML file and user subroutines are applied first, so we can't help you much if you don't post those...
Also, it is strange that you have 4 log files: this is not the default, so if you changed a setting, it may be explained, otherwise it is definitely an installation issue...
Regards,
Yvan
Re: SIGSEGV signal
Posted: Fri May 03, 2013 1:02 pm
by fomeh
Thank's Yvan.
There is no user subroutine. I made this case study just to verify m .pbs submission file but it crashes. This case study works on my work station very well but even in parallel but i have this error on cluster.
Re: SIGSEGV signal
Posted: Fri May 03, 2013 1:21 pm
by Yvan Fournier
Hello,
Could you still post the xml file ?
Also, another test would be to edit run_solver.sh in the execution directory
so as to replace:
--param fm
With:
--quality
and re-run run_solver.sh (possibly adding the BATCH templates from runcase if necessary).
This way, you will only compute quality criteria, and not load the XML file. This may help determine whether the crash is due to the xml file (or libxml2 library installation) or something else.
Your trace seems to refer to environment modules. Code_Saturne tries to detect which are loaded at install time, and reload the same modules at run time. Depending on your module command version, this may fail, so you may want to try adding:
--with-modules=no
to the configure line and load modules separately (in your own environment).
We do not have experience with MVAPICH, so if none of the previous tests help, you may want to install a serial only version of the code on the cluster, using all the same tools, but adding --without-mpi to the configure line, just to see if the problem is due to MVAPICH (even on a single rank, it may try to run MPI_Init()). If that is the cause, you then have options in the DATA/cs_user_scripts.py (to be copied from DATA/REFERENCE) to modify parts of the MPI launch command.
Regards,
Yvan
Re: SIGSEGV signal
Posted: Fri May 03, 2013 3:35 pm
by fomeh
Thanks Yvan
I will try to do these tests.
I also attached the xml file. The installed software on the cluster is without the GUI.
Re: SIGSEGV signal
Posted: Fri May 03, 2013 5:41 pm
by Yvan Fournier
Hello,
The XML file reads fine a a workstation I tested it on (It fails much later as I tested it on another mesh, but I just wanted to test the initialization anyways).
Regards,
Yvan
Re: SIGSEGV signal
Posted: Fri May 03, 2013 7:25 pm
by fomeh
I recompiled the code with gcc and openMPI on the cluster. It works finally.
Thank's Yvan for your valuable help