Page 1 of 1

InfiniBand Error

Posted: Wed Jul 15, 2020 3:48 am
by zhao guang
Hi:
When I do parallel computing(It can calculate and output correct results), but cluster constantly outputs error messages to a file named "messages" (/var/log/messages). As the calculation progresses, the file takes up more and more space, causing the cluster to can not work.

I used Infiniband (IB) technology when i do the calculation. There may be a collision between Gigabit lan and Infiniband.

I want to know whether there are some especial parameter settings can avoid cluster outputs these error messages. I would appreciate it if you can help me!

ps: I uploaded a file named "config.log". This file contains my Code_Saturne-5.0.8 installation path. Please take a look at the installation path .I'm not sure my installation path is correct.

Re: InfiniBand Error

Posted: Wed Jul 15, 2020 10:27 am
by Yvan Fournier
Hello,

This should probably be in the "install" section.

Did you do a post-install (see installation documentation) to make sure the mpiexec/mpirun command matche the installed MPI ?

In your config.log, it seems that an OpenMPI module was loaded, but the detected MPI version matching your "mpicc/mpicxx/mpif90" is MPICH, YOu can use either MPICH or OpenMPI, but having environments mixing both is not a good idea.

Regards,

Yvan

Re: InfiniBand Error

Posted: Wed Jul 15, 2020 1:26 pm
by zhao guang
Thank you very much for your reply. I will try it according to your suggestion.

Guang