Problem to model solidification of a binary alloy with CDO approach

MatthieuM · Post by **MatthieuM** » Mon Nov 27, 2023 12:38 pm

Hello Yvan,

Thank you for these clarifications, I didn't think the gap of memory usage between the two approaches would be that wide.

Regarding the convergence of my simulations, I noticed a strong impact of the following parameter :

Code: Select all

    cs_navsto_param_set(nsp, CS_NSKEY_GD_SCALE_COEF, "5e3");

which seems to be linked to an augmentation of the linear system. I have to read some advised literature about this (the Milani's PhD seems to address the subject), but at this time I have no idea to provide a "relevant value" concerning this parameter, I just observed convergence is much better with this coefficient in the range [1000; 5000] rather than 0 or its default value 1.

Best regards,
Matthieu

Jerome B. · Post by **Jerome B.** » Thu Nov 30, 2023 11:45 am

Hello Matthieu,

As Yvan said, a direct solver has a memory consumption which is much higher than an iterative solver. The reading of your log files is clear : Elapsed time for the full computation = 29791s and elapsed time in solving the NavSto system = 29679s

Moreover, there are some differences in the CDO part w.r.t. the FV part of code_saturne. The Navier-Stokes system to solve is a coupled system (velocity, pressure). The resolution is an Augmented Lagrangian Uzawa (please refer to https://hal.science/hal-04087358/document for more details) which increases the stencil of the discretization with the add of a grad-div contribution.

Since the velocity unknowns are located at faces (all the components), the velocity block which is solved by MUMPS is of size 3*3*n_cells for a 3D Cartesian mesh. In a 3D mesh built from a one-layer extrusion, this is worse 3*4*n_cells. So the system to solve with MUMPS is of size 3M.

To speed-up the calculation (but do not wait for miracles), here are some ideas :
(a) use 1MPI and 8openMP (or 2MPI and 4 openMP) since the scaling of MUMPS is not as good as the scaling of code_saturne
(b) compile MUMPS with optimized BLAS (there many operation relying on BLAS3 and this can bring some improvements)
(c) Optimize the MUMPS options : choice of the renumbering algorithm (this can slow down the analysis step but speed up the factorization step which is the most time consuming) for your kind of grids AMD or QAMD could be better than the default choice, use analysis by block (this is a vector-valued system).

You can specify all available options of MUMPS using the following function

Code: Select all

#if defined(HAVE_MUMPS)
/*----------------------------------------------------------------------------*/
/*!
 * \brief Function pointer for advanced user settings of a MUMPS solver.
 *        This function is called two times during the setup stage.
 *        1. Before the analysis step
 *        2. Before the factorization step
 *
 * One can recover the MUMPS step through the "job" member.
 * MUMPS_JOB_ANALYSIS or MUMPS_JOB_FACTORIZATION
 *
 * Note: if the context pointer is non-NULL, it must point to valid data
 * when the selection function is called so that structure should
 * not be temporary (i.e. local);
 *
 * \param[in]      slesp    pointer to the related cs_param_sles_t structure
 * \param[in, out] context  pointer to optional (untyped) value or structure
 * \param[in, out] pmumps   pointer to DMUMPS_STRUC_C or SMUMPS_STRUC_C struct.
 */
/*----------------------------------------------------------------------------*/

void
cs_user_sles_mumps_hook(const cs_param_sles_t   *slesp,
                        void                    *context,
                        void                    *pmumps)
{
  CS_UNUSED(slesp);
  CS_UNUSED(context);

  DMUMPS_STRUC_C  *mumps = pmumps;
  assert(mumps != NULL);

  /* Choose the way numbering is performed inside MUMPS.
   * This option may have a strong effect on the elapsed time
   * 0: AMD
   * 3: Scotch (need a MUMPS library compiled with Scotch)
   * 4: PORD
   * 5: METIS (need a MUMPS library compiled with METIS)
   * 7: automatic choice done by MUMPS
   */

  if (mumps != NULL)
    mumps->ICNTL(7) = 0;
    
    /* Adapt any option from the MUMPS user book following the previous example */
}
#endif  /* HAVE_MUMPS */

One more thing (only possible for 3D mesh with one layer extrusion). It is possible to "remove" from the linear system extruded boundary faces. Here is an example with boundary faces tagged with "Z0" or "Z1" (adapt this example and add it to a cs_user_mesh.c file)

Code: Select all

/*----------------------------------------------------------------------------*/
/*!
 * \brief Apply partial modifications to the mesh after the preprocessing
 *        and initial postprocessing mesh building stage.
 *
 * \param[in,out] mesh  pointer to a cs_mesh_t structure
 * \param[in,out] mesh_quantities pointer to a cs_mesh_quantities_t structure
*/
/*----------------------------------------------------------------------------*/

void
cs_user_mesh_modify_partial(cs_mesh_t             *mesh,
                            cs_mesh_quantities_t  *mesh_quantities)
{
  cs_lnum_t   n_faces = 0;
  cs_lnum_t  *face_ids = NULL;

  BFT_MALLOC(face_ids, mesh->n_b_faces, cs_lnum_t);

  cs_selector_get_b_face_list("Z0 or Z1", &n_faces, face_ids);

  cs_preprocess_mesh_selected_b_faces_ignore(mesh,
                                             mesh_quantities,
                                             n_faces,
                                             face_ids);

  BFT_FREE(face_ids);
}

When I apply all these options/techniques, I may achieve a speed-up up to a factor 5 or 6.

For the second question, it seems that the non-linear algorithm (Picard algo.) does not converge. The max. number of iterations is reached.
Could you please send me your settings ?
I think that a first test is to switch to a linearized algorithm and see what happens. It may be useful to reduce the time step for the first iterations. I can adapt your settings to do this.

Best regards,
Jerome

MatthieuM · Post by **MatthieuM** » Tue Dec 05, 2023 2:53 pm

Hello Jérôme,

Thank you for your detailed reply and all these informations. I'll take the time to read the reference you suggest and to test modifications you propose.

Concerning the diverging simulation, here are the used settings (almost the same the last files you sent me, except for an initial solute concentration), the mesh was previously joined (Mesh_protoIngot_2d_17355elts.med). As I said in my last reply, I noticed a significant improvement of the convergence with "high values" of the augmented parameter (default value is 1) :

Code: Select all

cs_navsto_param_set(nsp, CS_NSKEY_GD_SCALE_COEF, "5e3");

But tested values are arbitrary, with no idea to provide a "relevant value" concerning this parameter, and I didn't test yet with thermal exchange and solidification. As you suggest, I'll also do tests with a smaller time step, and with a linearized algorithm.

I will give you feedback,
Best regards,
Matthieu

code_saturne User's Forum

Problem to model solidification of a binary alloy with CDO approach

Re: Problem to model solidification of a binary alloy with CDO approach

Re: Problem to model solidification of a binary alloy with CDO approach

Re: Problem to model solidification of a binary alloy with CDO approach