Dear developers,
Could you please help me with this question?
Currently my CPU resources are limited but I have some GPUs available. So I'd like to know if I can take advantage of GPU.
I can see some posts saying that it may not be a good idea to use GPU, but it's a few years ago. Do you have any experience with this recently? For running on GPU, is it easy to achieve much higher performance? Is it easy to do it or do I need to do much work on my side? I run LES with a mesh size of about 10~20 million nodes.
Thank you very much,
Ruonan
Run Code_Saturne on GPU
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4220
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Run Code_Saturne on GPU
Hello,
As of today, only a part of some linear solvers may be run on GPU, which is not enough to provide you an increase in performance.
We are working on GPU support, but it is not ready yet...
Best regards,
Yvan
As of today, only a part of some linear solvers may be run on GPU, which is not enough to provide you an increase in performance.
We are working on GPU support, but it is not ready yet...
Best regards,
Yvan
Re: Run Code_Saturne on GPU
Hello Yvan,
Thank you very much for letting me know. I will stick with CPU at this moment.
Best regards,
Ruonan
Thank you very much for letting me know. I will stick with CPU at this moment.
Best regards,
Ruonan
-
- Posts: 1
- Joined: Tue May 06, 2025 10:17 am
Re: Run Code_Saturne on GPU
Hi everyone,
I noticed the "--enable-cuda-offload" and "--enable-cuda" flags in the documentation, but I'm curious about additional details regarding CUDA acceleration compared to CPU-only processing.
Specifically, I'm interested in:
Thanks in advance!
I noticed the "--enable-cuda-offload" and "--enable-cuda" flags in the documentation, but I'm curious about additional details regarding CUDA acceleration compared to CPU-only processing.
Specifically, I'm interested in:
- Performance improvements you've experienced
- Implementation challenges or tips
- Configuration settings that made a difference
- Any benchmarks comparing CPU vs GPU acceleration
Thanks in advance!
-
- Posts: 4220
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Run Code_Saturne on GPU
Hello,
This is still very much work in progress, so performance is still in flux, and we will update some benchmarks soon.
Solvers such as simple iterative linear solvers and gradient reconstructions can run faster on a GPU than on a CPU, but we still lose performance over memory transfers due to some scattered operations running on CPU only, so we are deploying more and more loops on GPU kernels. Speeding up the multigrid solvers is also difficult, though on a single MPI ranks at least, we seem a bit faster than HYPRE, at least for our main benchmark case. The code works well with MPI also, but we have not run as many comparisons yet.
I would like to write an article detailing our GPU work and choices, but have been too busy so far with work on the code (both CPU and GPU aspects) to get started...
If you are to run code_saturne on a GPU, if using MPI with more than one MPI rank per GPU, using MPS is essential. Otherwise, you get strongly degraded performance. Otherwise, you need a large enough mesh (and at least a few hundred thousand cells per GPU) pour the GPU performance to be worthwhile.
When testing with HYPRE for performance comparisons, building HYPRE with a memory pool is important. Without that (we used the built-in option, not Umpire), performance is degraded by a factor of 10.
We have not yet ported the code to AMD GPU's, but most of the GPU code uses C++ "parallel for" templates with lambda functions (similar in approach to though much more specialized and limited than Kokkos, with OpenMP, CUdA, and SYCL back-ends), to make future porting easier.
In any case, we ave a few algorithms specifically adapted to running on a GPU, bust for most operators, our priority is reducing memory transfers and fusing simple kernels where possible, with a common CPU/GPU codebase, as this is the only "sustainable" option for us given our team size.
Best regards,
Yvan
Best regards,
Yvan
This is still very much work in progress, so performance is still in flux, and we will update some benchmarks soon.
Solvers such as simple iterative linear solvers and gradient reconstructions can run faster on a GPU than on a CPU, but we still lose performance over memory transfers due to some scattered operations running on CPU only, so we are deploying more and more loops on GPU kernels. Speeding up the multigrid solvers is also difficult, though on a single MPI ranks at least, we seem a bit faster than HYPRE, at least for our main benchmark case. The code works well with MPI also, but we have not run as many comparisons yet.
I would like to write an article detailing our GPU work and choices, but have been too busy so far with work on the code (both CPU and GPU aspects) to get started...
If you are to run code_saturne on a GPU, if using MPI with more than one MPI rank per GPU, using MPS is essential. Otherwise, you get strongly degraded performance. Otherwise, you need a large enough mesh (and at least a few hundred thousand cells per GPU) pour the GPU performance to be worthwhile.
When testing with HYPRE for performance comparisons, building HYPRE with a memory pool is important. Without that (we used the built-in option, not Umpire), performance is degraded by a factor of 10.
We have not yet ported the code to AMD GPU's, but most of the GPU code uses C++ "parallel for" templates with lambda functions (similar in approach to though much more specialized and limited than Kokkos, with OpenMP, CUdA, and SYCL back-ends), to make future porting easier.
In any case, we ave a few algorithms specifically adapted to running on a GPU, bust for most operators, our priority is reducing memory transfers and fusing simple kernels where possible, with a common CPU/GPU codebase, as this is the only "sustainable" option for us given our team size.
Best regards,
Yvan
Best regards,
Yvan