Some problems on BlueGene/Q

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Yvan Fournier
Posts: 4250
Joined: Mon Feb 20, 2012 3:25 pm

Re: Some problems on BlueGene/Q

Post by Yvan Fournier »

Hello,

For the Fortran compiler, I believe I used bgxlf95 instead of bgf95 (iif both aliases exist). You may also try bgxlf95_r.

From the info I have, the EDF and IDRIS BlueGene/Q's have the same driver version, so I assume they have the same compiler version.

But I don't recall having the same issue in debug mode as you do (I recall having had some minor issues at some points, but don't remember how they were solved; if they had been major issues, I would probably remember)

Did your try running "make" after the first failure to see if the bug is reproduced (internal compiler errors may not always be systematic) ?

At worse, you can "cd" into the build directory's "src/base" directory, run "make" locally, let it fail, and re-run the last compile command (that of stdtcl.f90) manually, removing the "-g" compiler option. If that works, go back to the toplevel build directory, and run "make" to continue compilation.

This still does not explain why your regular build fails. Do you have any user subroutines ? Did you try running without them ? Could you run a much smaller case with the same configuration (xml + user subroutines) on a workstation, or better, on a workstation using Valgrind ?

Any of these might help debug your case.

Regards,

Yvan
zeph67
Posts: 53
Joined: Tue Oct 23, 2012 5:54 pm

Re: Some problems on BlueGene/Q

Post by zeph67 »

I ran exactly the same case on a station (with an optimized build). Everything was OK, the results were even published. That proves totally that my problem on that BlueGene is only an installation issue.

Also, I tried several combinations for Fortran and C compilers, everything led to the same error I mentionned in my last message.

Now, I'm gonna try your last suggestions, and of course I'll let you know.

Thanks !
zeph67
Posts: 53
Joined: Tue Oct 23, 2012 5:54 pm

Re: Some problems on BlueGene/Q

Post by zeph67 »

Dear Yvan,

I tried your last suggestions. None of them allowed a correct compilation for the debug compute build...

Best regards.
Yvan Fournier
Posts: 4250
Joined: Mon Feb 20, 2012 3:25 pm

Re: Some problems on BlueGene/Q

Post by Yvan Fournier »

Hello,

Could you post your "config.log" file, the output the the failing "make" command, and possibly the output of the "env" command ?

Regards,

Yvan
zeph67
Posts: 53
Joined: Tue Oct 23, 2012 5:54 pm

Re: Some problems on BlueGene/Q

Post by zeph67 »

Hello Yvan,

I suggest, for the moment, we put aside this debug-build problem.

Here you'll find the config.log (for the compute build, without --enable-debug) and "env.log" which contains the result of the env command.

Thanks !
Attachments
env.log
(2.67 KiB) Downloaded 601 times
config.log
(426.13 KiB) Downloaded 576 times
Yvan Fournier
Posts: 4250
Joined: Mon Feb 20, 2012 3:25 pm

Re: Some problems on BlueGene/Q

Post by Yvan Fournier »

Hello,

I still find it surprising that debug builds fail.

A difference between your build and ours is that you do not use the GUI.

Did you run your case under Valgrind on a workstation ? The fact that it does not crash does not prove that there is no error, such as writing past an array, or some undefined or compiler-defined behavior.

I already suggested removing your user subroutines. As you are not using the GUI, this is not possible,
so my suggestions would be :

- unless you have a very good reason not to, please use an XML file generated by the GUI.
- test under Valgrind
- otherwise, try to use only minimal user subroutines and see if this improves this

As you have not posted your user subroutines, I can't even have on opinion on whether your programming is good or not, but statistically, a majority of crashes are due to user subroutine or input errors, and having the routines apparently work correctly on one configuration is not a guarantee that there is no error.

Regards,

Yvan
zeph67
Posts: 53
Joined: Tue Oct 23, 2012 5:54 pm

Re: Some problems on BlueGene/Q

Post by zeph67 »

Hello Yvan,

Thanks for those instructions. In the attached tgz-ed folder, I've put the RESU's of :
- my "full" case, run on the BGQ and on a workstation using Valgrind (VD),
- a "simple" case, also run on the BGQ and on a workstation using Valgrind (VD).

The RESU's are shortened, for global file transfer easiness. If you need other informations or files, let me know.
Not that the outputs of Valgrind are contained in files named "log.txt"

Concerning the GUI, I cannot install a GUI-build on the BGQ, because Python-Qt4 and related packages that are required, are not installed on the frontend. Of course I could download and compile them on my own account, but I'm not sure how to link them correctly to the CS build.

However, those tests are quite useful. The common bug found by Valgrind seems to concern something with MPI (see cs_base.c, line 1112).

Thanks for your help !
Attachments
zeph67_1711.tgz
(399.5 KiB) Downloaded 605 times
Yvan Fournier
Posts: 4250
Joined: Mon Feb 20, 2012 3:25 pm

Re: Some problems on BlueGene/Q

Post by Yvan Fournier »

Hello,

Regarding the GUI, you do not need to install Qt on the Blue Gene frontend :
- installing libxml is enough; you can then use an xml file generated on a workstation
- since the System on the front-end is probaly RHEL 6, Qt4, pyqt4, etc. exist as packages; in our case, we simply asked our admins if they could add required the Red-Hat packages to the front-end (which they did), and no install was required

In any case, Valgrind only finds a minor issue inside OpenMPI, so that test does not reveal anything. But it is always good practice to try this first, as it saves a lot of time in many cases (not this time).

But your BQ/Q listing, even on the simple case, complains that uref is not defined. I recommend defining uref in cs_user_parameters.f90 (in usipgl). It might explain the crash

Regards,

Yvan
zeph67
Posts: 53
Joined: Tue Oct 23, 2012 5:54 pm

Re: Some problems on BlueGene/Q

Post by zeph67 »

Hello Yvan,

I tried your advice concerning UREF. It didn't help. I also tried ITURB=0.
Anyway, all turbulent variables are initialized in cs_user_initialization, so it seems that there is no need for UREF. Otherwise, the case would not run on a workstation neither.

Besides, one can see in my simple case, that my routines are not involved in the crash. Indeed, I put some debug flags "coucou" at the beginnings and the ends of the routines (except in those of cs_user_parameters.f90)
All flags appear when running.
In the full case, the crash happens just before calling resssg : the "coucou" flag appears but just below, I try to display the values (l. 589) of icepdc , icetsm and itpsmp, and there is a crash.

Let's recall that, on a workstation, everything runs fine.

Now, concerning XML on the BG/Q, I'm gonna try that right now.

Thanks, and best regards.
Yvan Fournier
Posts: 4250
Joined: Mon Feb 20, 2012 3:25 pm

Re: Some problems on BlueGene/Q

Post by Yvan Fournier »

Hello,

I'll try running your simple case on our Blue Gene/Q, but I'm out of office, and won't have access to it until 1 week from now. If you make any progress in "debugging", please keep me informed.
Do you know how to use the BlueGene/Q core files or debugging mode ? your admins may help you there.

Otherwise, I'll check if I reproduce this specific crash on our machine.

Regards,

Yvan
Post Reply