Errors of CS 4.0.1 when compiled on ARCHER
Posted: Mon Apr 04, 2016 1:57 pm
Hello, everyone.
I was trying to compile CS in my own directory again due to the recent library update on ARCHER.
I exactly followed the summary of the configuration options and full compile instructions at http://www.archer.ac.uk/documentation/s ... phase2.php for reference.
The initialization was fine as I have obtained a result file folder with a mesh_input in it.
But when I tried to run CS, it gave errors like:
"Rank 11 [Mon Apr 4 13:39:09 2016] [c0-0c0s1n1] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x14091c0, scnts=0x13d8838, sdispls=0x1409340, MPI_BYTE, rbuf=0x8c50f0, rcnts=0x13d8538, rdispls=0x1409940, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 9 [Mon Apr 4 13:39:09 2016] [c0-0c0s1n1] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x85edd0, scnts=0x1a5a848, sdispls=0x85ef50, MPI_BYTE, rbuf=0x11dbd70, rcnts=0x1a5a548, rdispls=0x85f550, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
......" (See attached for details)
I am very confused about these errors and it may have loaded an incorrect version of one of its dependencies or so.
I also found the error happened when the computation was at the step "Partitioning 13476280 cells to 192 domains on 192 ranks (SCOTCH_dgraphPart)." May be it is SCOTCH problem?
Does anyone know about this?
Sean
I was trying to compile CS in my own directory again due to the recent library update on ARCHER.
I exactly followed the summary of the configuration options and full compile instructions at http://www.archer.ac.uk/documentation/s ... phase2.php for reference.
The initialization was fine as I have obtained a result file folder with a mesh_input in it.
But when I tried to run CS, it gave errors like:
"Rank 11 [Mon Apr 4 13:39:09 2016] [c0-0c0s1n1] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x14091c0, scnts=0x13d8838, sdispls=0x1409340, MPI_BYTE, rbuf=0x8c50f0, rcnts=0x13d8538, rdispls=0x1409940, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
Rank 9 [Mon Apr 4 13:39:09 2016] [c0-0c0s1n1] Fatal error in PMPI_Alltoallv: Other MPI error, error stack:
PMPI_Alltoallv(557)............: MPI_Alltoallv(sbuf=0x85edd0, scnts=0x1a5a848, sdispls=0x85ef50, MPI_BYTE, rbuf=0x11dbd70, rcnts=0x1a5a548, rdispls=0x85f550, MPI_BYTE, MPI_COMM_WORLD) failed
MPIR_Alltoallv_impl(380).......:
MPIDI_CRAY_ugni_alltoallv(1373):
MPIU_ugni_wait_rdma_events(412): GNI_CqGetEvent (GNI_RC_SUCCESS)
......" (See attached for details)
I am very confused about these errors and it may have loaded an incorrect version of one of its dependencies or so.
I also found the error happened when the computation was at the step "Partitioning 13476280 cells to 192 domains on 192 ranks (SCOTCH_dgraphPart)." May be it is SCOTCH problem?
Does anyone know about this?
Sean