There seems to be something wrong outside my code as I edit the default C compiler flags(CFLAGS) to "-ta=tesla", and the feedback is accelerate the part of code as follow.
Code: Select all
_mat_vec_p_l_csr:
2239, Generating copyin(mc->val[:ms->n_rows+1],ms->row_index[:ms->n_rows+1],ms->col_id[:ms->row_index->],x[:ms->n_rows+1])
2241, Accelerator kernel generated
Generating Tesla code
2242, #pragma acc loop gang, worker(32) /* blockIdx.x threadIdx.y */
2249, #pragma acc loop vector(32) /* threadIdx.x */
Sum reduction generated for sii
2241, Generating copyout(y[:n_rows])
Generating copyin(ms[:1])
2249, Loop is parallelizable
2270, Generating copyin(mc->val[:ms->n_rows+1],ms->row_index[:ms->n_rows+1],ms->col_id[:ms->row_index->],x[:ms->n_rows+1])
2272, Accelerator kernel generated
Generating Tesla code
2273, #pragma acc loop gang, worker(32) /* blockIdx.x threadIdx.y */
2280, #pragma acc loop vector(32) /* threadIdx.x */
Sum reduction generated for sii
2272, Generating copyout(y[:n_rows])
Generating copyin(ms[:1])
2280, Loop is parallelizable
Code: Select all
Current file: /home/huchuanwei/Desktop/saturne_build2.3/prod/dbg/src/alge/../../../../code_saturne-4.0.5/src/alge/cs_matrix.c
function: _mat_vec_p_l_csr
line: 2241
Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
Best Regards,
Jackie