Parallel computation of loops

Mohammad · Post by **Mohammad** » Sun Nov 10, 2019 12:55 am

Hello,

I have the following loop in cs_user_extra_operations.c file:

  for (cs_lnum_t i = 0; i < n_faces; i++) {
    face_id = face_list[i];
    iel = b_face_cells[face_id];
    Tau_Wall_Mean +=  Tau_wall[iel];
 }
  FILE *f1 = NULL;
     f1 = fopen("MEAN_SHEAR.dat","a");
     fprintf(f1, "%i\t%f\n", ntcabs, Tau_Wall_Mean/n_faces);
     fclose (f1);

This code calculates a variable(Tau_Wall_Mean) in a loop and then writes it to a file(MEAN_SHEAR.dat). I use 8 cores for processing.

When I open the exported file, It gives me 8 different values at each time step number which means that each core is computing (maybe a part of) the loop separately and gives a different value and the code does not collect the result from the cores.

How can I force the loop to collect the results from all cores and give me just one number?

Does this problem occur just for the loops which a variable should be summed by its previous value?

CS version: 5.0.9

Regards,
Mohammad

Post by **Yvan Fournier** » Sun Nov 10, 2019 6:50 pm

Hello,

Are you using MPI or OpenMP parallelism ?

There are user examples handling parallelism at least for MPI, so check the cs_user_extra_operations variants.

Regards,

Yvan

Mohammad · Post by **Mohammad** » Mon Nov 11, 2019 11:02 pm

Hello and thank you!

I use MPI.
I checked all those files it's a bit confusing. Some of them use this condition to write only one output at each time step so I used it and it worked:

Code: Select all

if (cs_glob_rank_id <= 0)

I don't know what is cs_glob_rank_id and it also doesn't have a definition in doxygen.
There's just a comment above one of them which says:

/* Only process of rank 0 (parallel) or -1 (scalar) writes to this file. */

and also some examples use the following command at the end of a summing loop which seems to sum all values on all processors, so, I added it to my code:

Code: Select all

cs_parall_sum(1, CS_FLOAT, &Tau_Wall_Mean);

When I use those codes, It gives me different output averages for different number of cores! For example if I use 8 cores the average of my outputs becomes 0.8. If I use 4 cores it becomes 0.2 and for single core its -0.0002!

It's really confusing!

My modified code is now:

Code: Select all

  for (cs_lnum_t i = 0; i < n_faces; i++) {
    face_id = face_list[i];
    iel = b_face_cells[face_id];
    Tau_Wall_Mean +=  Tau_wall[iel];
 }
cs_parall_sum(1, CS_FLOAT, &Tau_Wall_Mean);

if (cs_glob_rank_id <= 0){
  FILE *f1 = NULL;
     f1 = fopen("MEAN_SHEAR.dat","a");
     fprintf(f1, "%i\t%f\n", ntcabs, Tau_Wall_Mean/n_faces);
     fclose (f1);
}

Regards,

Mohammad

Luciano Garelli · Post by **Luciano Garelli** » Tue Nov 12, 2019 2:02 pm

Hello,

cs_glob_rank_id will give you the rank of a MPI process in case of parallelism and it will take the values between 0<=cs_glob_rank_id< number of process. In case of a serial run cs_glob_rank_id =-1, so only process of rank=0 (parallel) or -1 (serial) will write to this file.

In you code, the n_faces will give you the local number of faces, so if you need to compute an average over the faces you have to do the parellel sum of n_faces to get the total number of faces, or divide Tau_Wall_Mean/n_faces after the loop before writting. If you need Tau_Wall_Mean just write it without divide by n_faces.

Regards,

Luciano

Mohammad · Post by **Mohammad** » Tue Nov 12, 2019 2:37 pm

Hello,

Thank you very much Lucian, your helps solved the problem.

Regards,
Mohammad

code_saturne User's Forum

Parallel computation of loops

Parallel computation of loops

Re: Parallel computation of loops

Re: Parallel computation of loops

Re: Parallel computation of loops

Re: Parallel computation of loops