Hello, guys.
The recent module changes on ARCHER caused my CS 4.0.1 errors as follows:
"Rank 0 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1396f00, scnts=0xf420e8, sdispls=0x1397080, MPI_BYTE, rbuf=0xef1060, rcnts=0xf41de8, rdispls=0x1397680, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 3 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1305990, scnts=0x1747dc8, sdispls=0x1305b10, MPI_BYTE, rbuf=0x8d3c40, rcnts=0x1747ac8, rdispls=0x1306110, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 11 [Tue Mar 15 10:46:07 2016] [c0-0c2s10n3] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x1470040, scnts=0x1b00a48, sdispls=0x14701c0, MPI_BYTE, rbuf=0x795510, rcnts=0x1b00748, rdispls=0x14707c0, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)"
Does anybody know the reason for these errors?
Much appreciated!
Regards,
Sean
Code_Saturne not running on ARCHER
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4206
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Code_Saturne not running on ARCHER
Hello,
The code probably needs a reinstall if "background" libraries have changed.
This is also a good opportunity to upgrade to bugfix release 4.0.4, or 4.0.5 (which I'll probalby release this afternoon).
Regards,
Yvan
The code probably needs a reinstall if "background" libraries have changed.
This is also a good opportunity to upgrade to bugfix release 4.0.4, or 4.0.5 (which I'll probalby release this afternoon).
Regards,
Yvan
Re: Code_Saturne not running on ARCHER
Thanks for getting back to me, Yvan.
The thing is I need to run my restart files so I can finish the previous simulation.
I tried to run the my 4.0.1 checkpoints with ARCHER central built CS 4.0.3 but the simulation diverged. I do not know if this is due to the version difference, but I was running fine with 4.0.1 before.
I assume a re-install of at least 4.0.1 is necessary.
Thank you.
Regards,
Sean
The thing is I need to run my restart files so I can finish the previous simulation.
I tried to run the my 4.0.1 checkpoints with ARCHER central built CS 4.0.3 but the simulation diverged. I do not know if this is due to the version difference, but I was running fine with 4.0.1 before.
I assume a re-install of at least 4.0.1 is necessary.
Thank you.
Regards,
Sean
-
- Posts: 4206
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Code_Saturne not running on ARCHER
Hello,
This is strange, because restart files build by 4.0.1 should be fully compatible with 4.0.3.
I'll try to do some checking on a simple case relative to this.
Regards,
Yvan
This is strange, because restart files build by 4.0.1 should be fully compatible with 4.0.3.
I'll try to do some checking on a simple case relative to this.
Regards,
Yvan
-
- Posts: 4206
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Code_Saturne not running on ARCHER
Hello Again,
I just checked on a simple test case, and the restart files generated with versions 4.0.1 and 4.0.4 are identical, so the divergence you obtained with 4.0.3 is probably due to another issue.
I'm moving this thread to "installation", as it fits better there.
Regards,
Yvan
I just checked on a simple test case, and the restart files generated with versions 4.0.1 and 4.0.4 are identical, so the divergence you obtained with 4.0.3 is probably due to another issue.
I'm moving this thread to "installation", as it fits better there.
Regards,
Yvan
Re: Code_Saturne not running on ARCHER
I just tried with a same restart case (from 4.0.1) with CS 4.0.1 and 4.0.4 respectively.
The consistent 4.0.1 gives me normal solutions, but 4.0.4 gives obvious fluctuations on yplus, which is exactly same with the divergence last time on 4.0.3.
Just for your information.
Regards,
Sean
The consistent 4.0.1 gives me normal solutions, but 4.0.4 gives obvious fluctuations on yplus, which is exactly same with the divergence last time on 4.0.3.
Just for your information.
Regards,
Sean
-
- Posts: 4206
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Code_Saturne not running on ARCHER
Hello,
I just went through changes in versions 4.0.x, and can't see what should cause this (there is one fix for wall BC's for low-Reynolds number turbulence models going from 4.0.3 to 4.0.4 which may be important, but nothing of the sort between 4.0.1 and 4.0.3).
Do any of the versions you tested use OpenMP ? Are there any other installation differences between versions 4.0.1 and 4.0.3 ? (perhaps different compilers) ?
Otherwise, is there a small case you could post or send which illustrates the different behaviour you obtain ? What options are you using ?
Regards,
Yvan
I just went through changes in versions 4.0.x, and can't see what should cause this (there is one fix for wall BC's for low-Reynolds number turbulence models going from 4.0.3 to 4.0.4 which may be important, but nothing of the sort between 4.0.1 and 4.0.3).
Do any of the versions you tested use OpenMP ? Are there any other installation differences between versions 4.0.1 and 4.0.3 ? (perhaps different compilers) ?
Otherwise, is there a small case you could post or send which illustrates the different behaviour you obtain ? What options are you using ?
Regards,
Yvan
Re: Code_Saturne not running on ARCHER
Hi, sorry for the late reply.
I tested the case with own-built 4.0.4 again, and it looks fine except the max yplus every step is slightly smaller than the results from own-built 4.0.1, but the simulation is running without diverging problem.
The 4.0.3 I mentioned previously is the ARCHER central built version, so the difference should come from the compiling difference.
I will stick to the own-built version in case the divergence happens again.
Thanks.
Sean
I tested the case with own-built 4.0.4 again, and it looks fine except the max yplus every step is slightly smaller than the results from own-built 4.0.1, but the simulation is running without diverging problem.
The 4.0.3 I mentioned previously is the ARCHER central built version, so the difference should come from the compiling difference.
I will stick to the own-built version in case the divergence happens again.
Thanks.
Sean