user routine parallelization issues - v7

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
daniele
Posts: 149
Joined: Wed Feb 01, 2017 11:42 am

user routine parallelization issues - v7

Post by daniele »

Hello,

I am facing issues in making a user routine working correctly, when parallelized.
The specific "code" is written inside the routine "cs_boundary_conditions_ale.f90" (subroutine usalcl), in v7.
The routine loops on the faces of one boundary surface, and stores the position of the nodes:

Code: Select all

allocate(lstelt(nfabor))

call getfbr('BC1', nlelt1, lstelt)

if(.not.allocated(y_v)) then
  allocate(y_v(nnod))
endif
if(.not.allocated(y_v_paral)) then
  allocate(y_v_paral(nnod))
endif

! We store inside y_v (or y_v_paral) the y-coordinate of all vertices --> y_v_paral will have a size equal to k_par 
k=1
do ilelt = 1, nlelt1
   ifac = lstelt(ilelt)
   do ii = ipnfbr(ifac), ipnfbr(ifac+1)-1
      inod = nodfbr(ii)
      y_v(k) = (xyzno0(3,inod))
      k=k+1
   enddo
enddo

! Parallelization of y_v to y_v_par
k_par = k-1
if (irangp.ge.0) then
      call parcpt(k_par)
      call cs_parall_allgather_r(k-1,k_par,y_v,y_v_paral)
endif
Up to 20 CPU on the same node, the routine works correctly. It shows issues when switching to higher CPU number (I have tested it on a 40 CPU node), and also when I want to parallelize the simulation on more the one node (for example, I tried on two nodes of 20 CPU each). The error I get is the following:

Code: Select all

MPI_ABORT was invoked on rank 15 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[node141:106113] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[node141:106113] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 solver script exited with status 1.

Error running the calculation.
Moreover, the behavior seems case dependent: for certain cases, the same routine works without problems even when parallelized on several nodes...
Any suggestion or idea on what could create the problem?

Thank you very much in advance for your help.
Kind regards,
Daniele
daniele
Posts: 149
Joined: Wed Feb 01, 2017 11:42 am

Re: user routine parallelization issues - v7

Post by daniele »

Hello,

I add that the issue seems to come from "cs_parall_allgather_r()", since commenting this line prevents the problem, and printing "k_par" after the "call parcpt(k_par)" shows the correct result.

Thank you.
Kind regards,
Daniele
Yvan Fournier
Posts: 4080
Joined: Mon Feb 20, 2012 3:25 pm

Re: user routine parallelization issues - v7

Post by Yvan Fournier »

Hello,

In general, I recommend avoiding "allgather" operations when possible, unless the volume of associated data remains small.

In your case, the allocation for y_v_parall may be too small, as it is based on the local number of vertices, and not the total number of boundary vertices using that BC.

You need to sum the local sites before using allgather (you van check examples for this).

Also note that the Fortran version of ALE boundary conditions has been removed after v8.0, so starting from v8.1 (end of 2023) and In the next major release (9.0 In June 2025), you will need to switch to the C version.

Best regards,

Yvan
daniele
Posts: 149
Joined: Wed Feb 01, 2017 11:42 am

Re: user routine parallelization issues - v7

Post by daniele »

Hello Yvan,

Thank you very much for seeing so quickly the issue... the problems does seem to come from the allocation of y_v_parall, increasing its size solves the problem.
Thank you also for the suggestion and for the updates about v8.

Kind regards,
Daniele
Post Reply