Run Code_Saturne on GPU

Ruonan · Post by **Ruonan** » Mon Mar 07, 2022 9:48 pm

Dear developers,

Could you please help me with this question?

Currently my CPU resources are limited but I have some GPUs available. So I'd like to know if I can take advantage of GPU.

I can see some posts saying that it may not be a good idea to use GPU, but it's a few years ago. Do you have any experience with this recently? For running on GPU, is it easy to achieve much higher performance? Is it easy to do it or do I need to do much work on my side? I run LES with a mesh size of about 10~20 million nodes.

Thank you very much,
Ruonan

Post by **Yvan Fournier** » Tue Mar 08, 2022 9:31 am

Hello,

As of today, only a part of some linear solvers may be run on GPU, which is not enough to provide you an increase in performance.

We are working on GPU support, but it is not ready yet...

Best regards,

Yvan

Ruonan · Post by **Ruonan** » Tue Mar 08, 2022 11:17 am

Hello Yvan,

Thank you very much for letting me know. I will stick with CPU at this moment.

Best regards,
Ruonan

alberto.finardi · Post by **alberto.finardi** » Tue May 06, 2025 10:26 am

Hi everyone,

I noticed the "--enable-cuda-offload" and "--enable-cuda" flags in the documentation, but I'm curious about additional details regarding CUDA acceleration compared to CPU-only processing.

Specifically, I'm interested in:

Performance improvements you've experienced
Implementation challenges or tips
Configuration settings that made a difference
Any benchmarks comparing CPU vs GPU acceleration

Has anyone seen significant speed improvements when enabling CUDA? Any insights would be greatly appreciated!

Thanks in advance!

Post by **Yvan Fournier** » Thu May 08, 2025 1:39 am

Hello,

This is still very much work in progress, so performance is still in flux, and we will update some benchmarks soon.

Solvers such as simple iterative linear solvers and gradient reconstructions can run faster on a GPU than on a CPU, but we still lose performance over memory transfers due to some scattered operations running on CPU only, so we are deploying more and more loops on GPU kernels. Speeding up the multigrid solvers is also difficult, though on a single MPI ranks at least, we seem a bit faster than HYPRE, at least for our main benchmark case. The code works well with MPI also, but we have not run as many comparisons yet.

I would like to write an article detailing our GPU work and choices, but have been too busy so far with work on the code (both CPU and GPU aspects) to get started...

If you are to run code_saturne on a GPU, if using MPI with more than one MPI rank per GPU, using MPS is essential. Otherwise, you get strongly degraded performance. Otherwise, you need a large enough mesh (and at least a few hundred thousand cells per GPU) pour the GPU performance to be worthwhile.
When testing with HYPRE for performance comparisons, building HYPRE with a memory pool is important. Without that (we used the built-in option, not Umpire), performance is degraded by a factor of 10.

We have not yet ported the code to AMD GPU's, but most of the GPU code uses C++ "parallel for" templates with lambda functions (similar in approach to though much more specialized and limited than Kokkos, with OpenMP, CUdA, and SYCL back-ends), to make future porting easier.

In any case, we ave a few algorithms specifically adapted to running on a GPU, bust for most operators, our priority is reducing memory transfers and fusing simple kernels where possible, with a common CPU/GPU codebase, as this is the only "sustainable" option for us given our team size.

Best regards,

Yvan

Best regards,

Yvan

code_saturne User's Forum

Run Code_Saturne on GPU

Run Code_Saturne on GPU

Re: Run Code_Saturne on GPU

Re: Run Code_Saturne on GPU

Re: Run Code_Saturne on GPU

Re: Run Code_Saturne on GPU