Code_Saturne not running on ARCHER

All questions about installation
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
iorishx
Posts: 20
Joined: Fri Jun 19, 2015 11:33 am

Code_Saturne not running on ARCHER

Post by iorishx »

Hello, guys.

The recent module changes on ARCHER caused my CS 4.0.1 errors as follows:
"Rank 0 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1396f00, scnts=0xf420e8, sdispls=0x1397080, MPI_BYTE, rbuf=0xef1060, rcnts=0xf41de8, rdispls=0x1397680, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 3 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1305990, scnts=0x1747dc8, sdispls=0x1305b10, MPI_BYTE, rbuf=0x8d3c40, rcnts=0x1747ac8, rdispls=0x1306110, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 11 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1470040, scnts=0x1b00a48, sdispls=0x14701c0, MPI_BYTE, rbuf=0x795510, rcnts=0x1b00748, rdispls=0x14707c0, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)"

Does anybody know the reason for these errors?

Much appreciated!

Regards,
Sean
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Code_Saturne not running on ARCHER

Post by Yvan Fournier »

Hello,

The code probably needs a reinstall if "background" libraries have changed.

This is also a good opportunity to upgrade to bugfix release 4.0.4, or 4.0.5 (which I'll probalby release this afternoon).

Regards,

Yvan
iorishx
Posts: 20
Joined: Fri Jun 19, 2015 11:33 am

Re: Code_Saturne not running on ARCHER

Post by iorishx »

Thanks for getting back to me, Yvan.

The thing is I need to run my restart files so I can finish the previous simulation.

I tried to run the my 4.0.1 checkpoints with ARCHER central built CS 4.0.3 but the simulation diverged. I do not know if this is due to the version difference, but I was running fine with 4.0.1 before.

I assume a re-install of at least 4.0.1 is necessary.

Thank you.

Regards,
Sean
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Code_Saturne not running on ARCHER

Post by Yvan Fournier »

Hello,

This is strange, because restart files build by 4.0.1 should be fully compatible with 4.0.3.

I'll try to do some checking on a simple case relative to this.

Regards,

Yvan
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Code_Saturne not running on ARCHER

Post by Yvan Fournier »

Hello Again,

I just checked on a simple test case, and the restart files generated with versions 4.0.1 and 4.0.4 are identical, so the divergence you obtained with 4.0.3 is probably due to another issue.

I'm moving this thread to "installation", as it fits better there.

Regards,

Yvan
iorishx
Posts: 20
Joined: Fri Jun 19, 2015 11:33 am

Re: Code_Saturne not running on ARCHER

Post by iorishx »

I just tried with a same restart case (from 4.0.1) with CS 4.0.1 and 4.0.4 respectively.

The consistent 4.0.1 gives me normal solutions, but 4.0.4 gives obvious fluctuations on yplus, which is exactly same with the divergence last time on 4.0.3.

Just for your information.

Regards,
Sean
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: Code_Saturne not running on ARCHER

Post by Yvan Fournier »

Hello,

I just went through changes in versions 4.0.x, and can't see what should cause this (there is one fix for wall BC's for low-Reynolds number turbulence models going from 4.0.3 to 4.0.4 which may be important, but nothing of the sort between 4.0.1 and 4.0.3).

Do any of the versions you tested use OpenMP ? Are there any other installation differences between versions 4.0.1 and 4.0.3 ? (perhaps different compilers) ?

Otherwise, is there a small case you could post or send which illustrates the different behaviour you obtain ? What options are you using ?

Regards,

Yvan
iorishx
Posts: 20
Joined: Fri Jun 19, 2015 11:33 am

Re: Code_Saturne not running on ARCHER

Post by iorishx »

Hi, sorry for the late reply.

I tested the case with own-built 4.0.4 again, and it looks fine except the max yplus every step is slightly smaller than the results from own-built 4.0.1, but the simulation is running without diverging problem.

The 4.0.3 I mentioned previously is the ARCHER central built version, so the difference should come from the compiling difference.

I will stick to the own-built version in case the divergence happens again.

Thanks.

Sean
Post Reply