MAGAMA routines and CUDA kernels
Posted: Wed Nov 06, 2019 4:02 pm
Hello,
I have been using MAGMA(BLAS). I have been experiencing some bottlenecks in my code since some operations are performed in the cpu. Basically I perform some operations via MAGMA, bring the matrices to the host, back to the device and so forth. I have two options to speed up my code either use pthreads library or perform the operations in the GPU (they are simple comparisons/operations extremely suitable for the CUDA framework). My question is if I can access the arrays created by MAGMA routines via a CUDA kernel, perform some operations at the GPU and then either call MAGMA routines from a CUDA kernel or download them to the host and lauch the routine, and thus avoiding the overhead of multiple siple oeprations and/or the communication device-host.
I am using C, and MAGMA compiled with BLAS. The pseudocode is:
Setting matrices at cpu
for:
Matrix Multiplications via MAGMA
Download to Host
Check which coefficients are positive and negative
Depending on the result multiply each column from the matrix by a scalar (different scalar per column)
As you can see, I keep downloading everything to the host after the matrix multiplications, but I know that all the other operations are simple enough and are completely suitable for a GPU and CUDA. I would be happy if either an other matrix is created with the new matrix coefficients or the original matrix is modified from the GPU. I don't know if I can access the coefficients via pointers in the host with the CUDA kernels, or how they behave.
I am using double precision routines.
Hope my explanation is not a mess. Thanks for your time!!!!
I have been using MAGMA(BLAS). I have been experiencing some bottlenecks in my code since some operations are performed in the cpu. Basically I perform some operations via MAGMA, bring the matrices to the host, back to the device and so forth. I have two options to speed up my code either use pthreads library or perform the operations in the GPU (they are simple comparisons/operations extremely suitable for the CUDA framework). My question is if I can access the arrays created by MAGMA routines via a CUDA kernel, perform some operations at the GPU and then either call MAGMA routines from a CUDA kernel or download them to the host and lauch the routine, and thus avoiding the overhead of multiple siple oeprations and/or the communication device-host.
I am using C, and MAGMA compiled with BLAS. The pseudocode is:
Setting matrices at cpu
for:
Matrix Multiplications via MAGMA
Download to Host
Check which coefficients are positive and negative
Depending on the result multiply each column from the matrix by a scalar (different scalar per column)
As you can see, I keep downloading everything to the host after the matrix multiplications, but I know that all the other operations are simple enough and are completely suitable for a GPU and CUDA. I would be happy if either an other matrix is created with the new matrix coefficients or the original matrix is modified from the GPU. I don't know if I can access the coefficients via pointers in the host with the CUDA kernels, or how they behave.
I am using double precision routines.
Hope my explanation is not a mess. Thanks for your time!!!!