MAGMA SVD implementation on GPUs?

edmundberry · Post by **edmundberry** » Tue Sep 29, 2015 7:17 pm

Hello MAGMA experts,

What is the status of MAGMA's SVD implementation regarding GPUs?

- Does Magma use GPUs to implement SVD? It looks like MAGMA's dgesvd function only accepts CPU pointers for arguments (or maybe I have misinterpreted the documentation):
http://icl.cs.utk.edu/projectsfiles/mag ... river.html

- If I have a device pointer (pointer to a matrix on a GPU device), do I have to copy it back to the host (CPU) before I can use MAGMA's SVD functionality?

- Regardless, is there example code for dgesvd? I couldn't find any example code online.

Thank you!

Best,
Edmund

mgates3 · Post by **mgates3** » Wed Sep 30, 2015 3:06 pm

Yes, it is GPU accelerated. The input matrix is given in CPU memory, but internally gets copied to the GPU for computation.

Unfortunately, this means currently if your matrix is on the GPU, you have to copy it to the CPU before calling magma_dgesvd.

There are examples in magma/testing/testing_dgesvd.cpp and testing_dgesdd.cpp. I recommend using dgesdd (divide and conquer) instead of dgesvd (QR iteration), as dgesdd is faster in both MAGMA and LAPACK when computing singular vectors.

-mark

RGomez · Post by **RGomez** » Wed Aug 10, 2016 4:18 pm

Hello,

I found it more appropriate to continue the discussion here rather than opening a new topic. In my case, I am using magma_dgesdd and it works when I "magma_malloc_cpu" all the arguments, but if fails if I have them in the GPU. My question is which arguments (if any) should be passed from the GPU in order to make the running time optimal.

Originally my A is in the CPU, but I was trying to send it to the GPU as well as the workspace variable "work", before calling magma_dgesdd.

Otherwise it looks a bit strange to believe that the calculations are done in the GPU if the workspace is in the CPU (maybe you can give an heuristical explanation of how this works?)

Thanks a lot!

mgates3 · Post by **mgates3** » Thu Aug 11, 2016 8:42 am

magma_dgesdd takes all its arguments on the CPU. It simply replaces lapack's dgesdd.

MAGMA is a hybrid CPU + GPU library. Some of its calculations are done on the CPU, so it needs workspace there. It also relies on some routines from LAPACK, such as dbdsdc (divide-and-conquer), which need CPU workspace. MAGMA internally allocates additional memory on the GPU.

Generally, MAGMA routines with no suffix take their input arguments in CPU memory, while routines with a _gpu suffix take (at least some of) their arguments in GPU memory. Generally, variables prefixed with "d" are on the GPU device, such as "dA" (on GPU) vs. "A" (on CPU). See the documentation:

http://icl.cs.utk.edu/projectsfiles/mag ... tines.html
http://icl.cs.utk.edu/projectsfiles/mag ... ables.html

-mark

RGomez · Post by **RGomez** » Thu Aug 11, 2016 1:43 pm

Ok, I guess I was missinterpreting the documentation!

Thank you

cdeterman · Post by **cdeterman** » Wed Aug 24, 2016 3:25 pm

mgates3 wrote:Yes, it is GPU accelerated. The input matrix is given in CPU memory, but internally gets copied to the GPU for computation.

Unfortunately, this means currently if your matrix is on the GPU, you have to copy it to the CPU before calling magma_dgesvd.

-mark

If the input matrix is internally copied to the GPU for computation wouldn't it be relatively simple to create an additionial function with _gpu suffix that passes in an existing gpu matrix and omit the copy part? Or are the internals more complex that they require the matrix to be on the CPU at different times.

mgates3 · Post by **mgates3** » Wed Aug 24, 2016 8:20 pm

The SVD is a rather complex code. It doesn't just allocate dA on the GPU, copy A to dA, then do work. It extensively use other routines like geqrf, gebrd, gelqf, unmqr, ungqr, bdsdc, etc. We would need to replace all of those with _gpu variants. Some already exist; some do not. Some do not yet even have GPU-accelerated implementations yet (like bdsdc).

So it's a good goal to have, and we may eventually get there, but it isn't trivial.

-mark

RGomez · Post by **RGomez** » Thu Aug 25, 2016 1:08 pm

Yes, it's highly non-trivial. Right now im running tests for Magma dgesdd and it's surprising to find out that is more slow than a regular Maple (Lapack based) computation. I wonder if I'm doing something wrong or the code for SVD is not really optimized for GPU yet.

Code: Select all

~/magma-2.0.2/testing$ ./testing_dgesdd
% MAGMA 2.0.2  compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. OpenMP threads 8. MKL 11.2.3, MKL threads 4. 
% device 0: GeForce GTS 450, 1764.0 MHz clock, 1023.2 MB memory, capability 2.1
% Thu Aug 25 12:38:13 2016
% Usage: ./testing_dgesdd [options] [-h|--help]

% jobz   M     N  CPU time (sec)  GPU time (sec)   |S1-S2|   |A-USV^H|   |I-UU^H|/M   |I-VV^H|/N   S sorted
%==========================================================================================================
   N  1088  1088    ---              0.49            ---   
   N  2112  2112    ---              2.10            ---   
   N  3136  3136    ---              6.45            ---   
   N  4160  4160    ---             14.64            ---   
   N  5184  5184    ---             27.89            ---   
   N  6208  6208    ---             47.25            ---   
   N  7232  7232    ---             75.09            ---

While Maple times (not even GPU time!) are:

Code: Select all

Maple time: n= 1100 	 0.285000 
Maple time: n= 2100 	 2.770000 
Maple time: n= 3100 	 2.959000 
Maple time: n= 4100 	 6.757000

mgates3 · Post by **mgates3** » Thu Aug 25, 2016 5:52 pm

GeForce cards are designed for graphics and gaming, which primarily use single-precision. Their support for double-precision math is slow -- perhaps 8x slower than single-precision -- while with a high-end Tesla card, double would only be 2x slower than single (same as CPU). You may have better results with single-precision.

BTW, you can add -l or --lapack flag to get LAPACK CPU times from testing_dgesdd or testing_sgesdd.

-mark

MAGMA Forum

MAGMA SVD implementation on GPUs?

MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?

Re: MAGMA SVD implementation on GPUs?