mpi error
Posted: Fri Mar 20, 2015 5:38 pm
Hello I'm experiencing an mpi error that seems to be localised to code saturne.
using:
openmpi v1.8.3
or
openmpi v1.6.5 (which shipped with cs v3.0.5)
code_saturne v3.0.5
error message with ompi 1.8.3:
This message is sent once per node.
No listing is created as the job fails to start. I've tried appending the run command with and without --mpi with no noticeable effect
error message with ompi 1.6.5:
this produces a listing and error file. Also I have included the output to stderr named error_file (this includes some junk from modules which can be ignored)
in both cases I've also tried adding to the mpi command
(this was to tackle the problem seen when using ompi 1.8.3)
I've also tried to force mpi to use infiniband using
With no noticeable effect.
Finally worth mentioning I have checked to see if the NFS is version 3 (it is) but it hasn't been mounted using 'noac'. I'm hoping to try this next week as I don't have root access.
Any help is much appreciated.
Cheers,
Martyn
using:
openmpi v1.8.3
or
openmpi v1.6.5 (which shipped with cs v3.0.5)
code_saturne v3.0.5
error message with ompi 1.8.3:
Code: Select all
mca_oob_tcp_recv_handler: invalid message type: 14
No listing is created as the job fails to start. I've tried appending the run command with and without --mpi with no noticeable effect
error message with ompi 1.6.5:
this produces a listing and error file. Also I have included the output to stderr named error_file (this includes some junk from modules which can be ignored)
in both cases I've also tried adding to the mpi command
Code: Select all
--mca oob_tcp_listen_mode listen_thread
I've also tried to force mpi to use infiniband using
Code: Select all
--mca btl openib
Code: Select all
--mca btl ^tcp
Finally worth mentioning I have checked to see if the NFS is version 3 (it is) but it hasn't been mounted using 'noac'. I'm hoping to try this next week as I don't have root access.
Any help is much appreciated.
Cheers,
Martyn