Page 1 of 1

Issue when using GPU accelerators

Posted: Wed Sep 28, 2016 12:53 pm
by msgsvc
Hello,

There seems to be something wrong outside my code as I edit the default C compiler flags(CFLAGS) to "-ta=tesla", and the feedback is accelerate the part of code as follow.

Code: Select all

_mat_vec_p_l_csr:
   2239, Generating copyin(mc->val[:ms->n_rows+1],ms->row_index[:ms->n_rows+1],ms->col_id[:ms->row_index->],x[:ms->n_rows+1])
   2241, Accelerator kernel generated
         Generating Tesla code
       2242, #pragma acc loop gang, worker(32) /* blockIdx.x threadIdx.y */
       2249, #pragma acc loop vector(32) /* threadIdx.x */
             Sum reduction generated for sii
   2241, Generating copyout(y[:n_rows])
         Generating copyin(ms[:1])
   2249, Loop is parallelizable
   2270, Generating copyin(mc->val[:ms->n_rows+1],ms->row_index[:ms->n_rows+1],ms->col_id[:ms->row_index->],x[:ms->n_rows+1])
   2272, Accelerator kernel generated
         Generating Tesla code
       2273, #pragma acc loop gang, worker(32) /* blockIdx.x threadIdx.y */
       2280, #pragma acc loop vector(32) /* threadIdx.x */
             Sum reduction generated for sii
   2272, Generating copyout(y[:n_rows])
         Generating copyin(ms[:1])
   2280, Loop is parallelizable
But when I run the command which is ./SaturneGUI, I get some errors about accelerators. Here is the errors:

Code: Select all

Current file: /home/huchuanwei/Desktop/saturne_build2.3/prod/dbg/src/alge/../../../../code_saturne-4.0.5/src/alge/cs_matrix.c
function: _mat_vec_p_l_csr
line: 2241
Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
Is there an option to choose the available accelerators?

Best Regards,

Jackie

Re: Issue when using GPU accelerators

Posted: Thu Sep 29, 2016 1:42 am
by Yvan Fournier
Hello,

As I wrote in previous messages, we have not worked on direct integration of GPU acceleration yet (except indirectly, through PETSc), so you are basically experimenting with this first.

I doubt automatic acceleration will provide very fast code, but the we welcome your feedback (did you replace OpenMP pragmas with OpenACC, or is this automatic) ?

I suspect the flags you provided are not sufficient (you might need to pass similar option to LDFLAGS, CXXFLAGS, and possibly FCFLAGS, or you might need additional flags).

You might also simply be missing initialization code.

What toolset are you using ? Check its documentation both for compiler and link flags. You may want to post your config.log also.

Regards,

Yvan