I am calculating some values on the GPU which form one row of the matrix. At the moment I copy them back a row at a time to the matrix on the CPU and then copy the whole matrix back to the GPU. This is clearly wasteful:
The device pointers are defined as in testing_dgetrf_gpu_f.f in RC$:
Code: Select all
real, dimension(4) :: devptrA, devptrBCode: Select all
call cublas_get_matrix(n, 1, size_of_elt, devptrD, n,
$ G(1,jrow),n)Code: Select all
!---- devPtrA = G
call cublas_set_matrix(n, n, size_of_elt, G, ldda, devptrA, ldda)
Code: Select all
call cublas_dcopy(n,devptrD,1,devptrXXX,1)If I can crack this I can save two complete matrix transfers and the memory of the array on the CPU.
It would help to have some explanation for the design decision to change the type of these pointers from RC3 to RC4
Please help if you can.
Thanks
John