mpi error

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
mkendrick

mpi error

Post by mkendrick »

Hello I'm experiencing an mpi error that seems to be localised to code saturne.
using:
openmpi v1.8.3
or
openmpi v1.6.5 (which shipped with cs v3.0.5)
code_saturne v3.0.5

error message with ompi 1.8.3:

Code: Select all

mca_oob_tcp_recv_handler: invalid message type: 14
This message is sent once per node.
No listing is created as the job fails to start. I've tried appending the run command with and without --mpi with no noticeable effect


error message with ompi 1.6.5:
this produces a listing and error file. Also I have included the output to stderr named error_file (this includes some junk from modules which can be ignored)



in both cases I've also tried adding to the mpi command

Code: Select all

--mca oob_tcp_listen_mode listen_thread
(this was to tackle the problem seen when using ompi 1.8.3)

I've also tried to force mpi to use infiniband using

Code: Select all

--mca btl openib

Code: Select all

--mca btl ^tcp
With no noticeable effect.

Finally worth mentioning I have checked to see if the NFS is version 3 (it is) but it hasn't been mounted using 'noac'. I'm hoping to try this next week as I don't have root access.

Any help is much appreciated.

Cheers,
Martyn
Attachments
errors.tar.gz
(40 KiB) Downloaded 179 times
Yvan Fournier
Posts: 4208
Joined: Mon Feb 20, 2012 3:25 pm

Re: mpi error

Post by Yvan Fournier »

Hello,

There seems to be a problem with using MPI-IO with your system.

You can disable MPI-IO with Code_Saturne either globally (--disable-mpi-io at the configure stage) or as part of a case's performance tuning setup. If you are using a serial file system or mount such as simple NFS (not Parallel NFS), you might as well disable MPI-IO completely to avoid this kind of issue.

Regards,

Yvan
Post Reply