issues with user routine parallelization

Questions and remarks about code_saturne usage
Forum rules
Please read the forum usage recommendations before posting.
Post Reply
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

issues with user routine parallelization

Post by daniele »

Hello,

I hope somebody can help me in understanding some specific issues related to parallelization of user routines.

I am trying to store inside an array the face index of the faces of a given mesh boundary (called CYL_WALL).
Things are easy for a serial simulation, become more complex for parallel simulations.
These are the lines of my cs_user_extra_operation.f90 routine concerned:

Code: Select all

write(cond,"(A,1pe12.5,A,1pe12.5)") "CYL_WALL and y < ",0.1," and y > ",0.05
 
 call getfbr(cond, nlelt, lstelt)
    
    if (irangp.ge.0) then
       do ilelt = 1, nlelt
          ifac_local(ilelt) = lstelt(ilelt)
          ifac = lstelt(ilelt)
          surf_j(jj) = surf_j(jj)+surfbn(ifac)
       enddo

       call parrsm(N,surf_j)
       
       nleltg = nlelt
       call parcpt(nleltg)
       write(nfecra,*) 'nlelt_getfbr', nleltg
       ifac_local_r = real(ifac_local)
       call cs_parall_allgather_r(nlelt,nleltg,ifac_local_r,ifac_par)
       do ilelt = 1, nleltg
         ifac = ifac_par(ilelt)
         face_dz(jj,ilelt) = ifac
       enddo
       
    else !serial

    do ilelt = 1, nlelt
       ifac = lstelt(ilelt)
       ifac_test(ilelt) = lstelt(ilelt)
       face_dz(jj,ilelt) = ifac
       surf_j(jj) = surf_j(jj)+surfbn(ifac)
    enddo
      
    
    endif
If I print the ifac_local and ifac_par arrays (which contain the indexes stored in a parallel simulation), I notice that the indexes inside them are different from the ones contained in the ifac_test array (which contain the indexes stored in a serial simulation): is this normal, i.e., face/cell indexes values are different depending on the number of CPU used?
If a print the value of surf_j(jj), I find exactly the same value between the serial and parallel approaches, which seems to suggest that seleceted faces are the same for the approaches, but their index is different. Is that right?

A further question is: once the matrix "face_dz" is built, can I access in the user routine all elements of the matrix which contains face index values (ifac) that are spread on all different ranks, by simply:

Code: Select all

do ii = 1,N
   do jj = 1,N
      ifac = face_dz(ii,jj)
      surf_j(ii) = surf_j(ii)+surfbn(ifac)
   enddo
enddo


Or are the "ifac" of other ranks invisible to the rank on which the routine is run?

Hope my questions are clear.
Thank you very much for your help.
Best regards,
Daniele
Luciano Garelli
Posts: 280
Joined: Fri Dec 04, 2015 1:42 pm

Re: issues with user routine parallelization

Post by Luciano Garelli »

Hello,

When do you mention that "the indexes inside them are different from the ones contained in the ifac_test array", the values or the order are different?

Why are you doing this? "ifac_local_r = real(ifac_local)". It should be a integer list.

And about the face_dz() matrix, you are filling each row of size "nleltg" with the face index stored in the array "ifac_par()" and "ifac_par() contains the list (total) of the selected faces. So, each row should be the same if I'm not wrong.

Regards,
Luciano
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: issues with user routine parallelization

Post by daniele »

Hello,

Thanks for your help. Actually the value of the indexes is different, there are index values inside ifac_par which are not present inside ifac_test, and vice-versa.

I have to do "ifac_local_r = real(ifac_local)" because the parallel function cs_parall_allgather_r requires real arrays, and there is no analogous function for integer arrays. Anyway, I don't think this is a problem, I double checked and this operation does not change the value of the index, which is then reconverted to integer through "ifac=ifac_par(ilelt)", where "ifac" is an integer.

Regarding "face_dz", actually I wonder if once filled with values from all different ranks, I can access somewhere else in the routine, for instance, "face_dz(2,3000)", if "3000" is a face index stored in a rank different from the one (rank 0 I guess) where the routine is executed. I don't know if what I am saying is clear (and consistent)...

Thank you very much.
Kind regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: issues with user routine parallelization

Post by Yvan Fournier »

Hello,

Do you really need to store the face index ?

In any case, all standard indexes in the mesh structure are local. A processor "sees" only its local part of the mesh, and not the rest. This is the key to handling very large meshes, that would not fit in the memory of a single node.

There are operators and structures present in code_saturne to move data at some stages, especially when reading or writing checkpoint/restart files and visualization output in a partition-independent manner, but they are not accessible from Fortran, only from C. In the cs_glob_mesh structure (see src/mesh/cs_mesh.h), the global_celll_num, global_i_face_num, global_b_face_num, and global,vtx_num provide the mappings from a local id (partition-dependent) to a global id ("stable") for cells, interior faces, boundary_faces and vertices respectively.

Local ids are signed and 0-based (0 to n-1), global ones 1-based (1 to n).

In any case, you cannot easily access a value from another processor on a given processor, unless you call the data movement operators such as those in cs_all_to_all.

Also, converting from integer to real and then back is not safe. Unless you know exactly what the range of a 32-bit integer is compared to that of the "fractional" part of a 32-bit real, and are sure it is larger. In IEEE floating-point standard, 23 bits of a real number are available for the fraction. Without working out all the details, the general idea is that for large numbers, your conversion back to an integer may not provide the original value. 2**23 is about 8 million, so this can be quickly reached...

Finally, when summing interior face values, be careful, because a face on a parallel boundary could be counted twice. This is why on some examples you will see a "if (ifacel(1, iel). le. ncel)" or its C equivalent "if (i_face_cells[0] .lt .n_cells" clause, because since faces have the same orientation on each rank, if the cell "inside" a face (based on the face"s normal pointing "out") is a ghost cell on one rank, it is a regular cell on the adjacent rank, and vice versa). So this simple test avoids double-counting.

Best regards,

Yvan.
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: issues with user routine parallelization

Post by daniele »

Hello,

Thank for all these details.

I did not know that C operators allowed to perform more operations than in Fortran... Nevertheless, I am doing all this inside the “cs_user_boundary_conditions_ale.f90”, in order to impose mesh deformation: I am not sure the analogous C routine exist?

To answer your question “Do you really need to store the face index?”: I want to calculate the force on several different portion of a given wall boundary, at the end of each time step. That’s why I build the matrix “face_dz(i,j)”: “i” is the specific portion of the boundary and “j” is the face index which is gradually stored through a do loop. The idea is to build this matrix only once, at the first time step, and then just loop on the stored face indexes after each time step to obtain the force vector “force(i)”. So yes, I don’t see any other way to do what I want, other than using the face indexes. This works perfectly in serial mode.
Otherwise I need to revise the entire “algorithm” and use barely geometrical conditions to define each boundary portion, instead of face indexes. But that would be less straightforward.

Thank you very much.
Best regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: issues with user routine parallelization

Post by Yvan Fournier »

Hello,

Yes, we are still missing a conversion of ALE BC's to C. We are in the process of moving all BC's to C, using the "xdef" mechanism used for CDO, but this has been a backgound task so far, and we need to finish it.

You can also call C from Fortran (using iso_C_bindings if necessary), but it is not very user-friendly...

But if you need to calculate a force, you could define a field on the boundary faces (accessible by name both in Fortran and C), and do the force computations in cs_user_extra_operations.c at the end of the previous time step. That would probably be the most practical solution today.

Best regards,

Yvan
daniele
Posts: 148
Joined: Wed Feb 01, 2017 11:42 am

Re: issues with user routine parallelization

Post by daniele »

Hello,

Thank you, yes the option of defining separate "sub-domains" on the faces in the meshing process (is this what you mean by "field"?) would allow to access directly the face portions, but becomes maybe not very practical for a large number of fields.

If I write a condition like:

Code: Select all

write(cond,"(A,1pe12.5,A,1pe12.5)") "CYL_WALL and y < ",0.1," and y > ",0.05
call getfbr(cond, nlelt, lstelt)
Then calculate the force on the selected faces, and then use "parsom" to get the sum of the forces on all ranks, do you confirm that I obtain the correct total force?
Moreover, in order to select the nodes on the face and impose their displacement, do you confirm that using the same "call getfbr(cond,...,...)" above allows to access all nodes in all ranks?

Since if this is right, I may still manage to do what I want without having to define all separate face domains.

Thank you very much for your help.
Best regards,
Daniele
Yvan Fournier
Posts: 4070
Joined: Mon Feb 20, 2012 3:25 pm

Re: issues with user routine parallelization

Post by Yvan Fournier »

Hello,

Yes, this should work. In C, you can also directly use zones defined through the GUI, but in Fortran, this is still the only way.

I recommend defining the string directly '("CYL_WALL and y < 0.1 and y > 0.05") unless you need to change the x an y values in a loop (if you have many different selection criteria), as it is easier to check.

Since using this type of criteria allows accessing al local faces on a given rank, this can also be used to access nodes/vertices, using the boundary face->vertices connectivity, to access all matching nodes on that rank.

Best regards,

Yvan
Post Reply