Hi, I am new to magma. I am wondering if I can run multiple magma gpu functions concurrently?
I know some of the magma functions are hybrid and some are working on just GPU. How can I check that?
I am using these functions:
magma_dgemm(MagmaNoTrans, MagmaNoTrans,n,n,n,scale1,d_a,n,d_b,n,scale2,d_c,n);
magma_dgemv(MagmaNoTrans,n,n,scale1,d_a,n,d_workvec1,inc,scale2,d_workvec2,inc);
magma_dpotrs_gpu(MagmaUpper,k,incx,d_a,k,d_workvec1,k,&info);
magma_dpotrf_gpu(MagmaUpper,blksize,d_a,blksize,&info);
magma_dnrm2(temp,d_a,incx);
Also, if they can be executed concurrently, how can I change my code to do so? Where can I add the stream? Thanks!
concurrent kernel execution in Magma
Re: concurrent kernel execution in Magma
Most MAGMA BLAS kernels are asynchronous and run only on the GPU (not hybrid). This includes gemm, gemv, nrm2.
magma_dgemm, magma_dgemv, magma_dnrm2 are simply wrappers around the respective cublas functions. They are async. In the new 1.5 beta 3, the doxygen documentation (in docs/html/index.html) has a section for BLAS and auxiliary functions.
Most other MAGMA functions are synchronous and hybrid (use both CPU and GPU). This includes posv, potrf, gesv, getrf, etc.
What do you mean by concurrently? If you mean executing two async MAGMA BLAS GPU kernels from a single CPU thread, then you just need to use two different CUDA streams.
If you mean executing two async. MAGMA BLAS GPU kernels from different CPU threads, then you have to be careful to lock things. Something like this should work (from magmablasSetKernelStream docs):
where lock() and unlock() are some mutex lock functions that you provide, e.g. pthread_mutex_lock.
If you mean executing two hybrid MAGMA functions like getrf in different CPU threads, that won't currently work.
-mark
magma_dgemm, magma_dgemv, magma_dnrm2 are simply wrappers around the respective cublas functions. They are async. In the new 1.5 beta 3, the doxygen documentation (in docs/html/index.html) has a section for BLAS and auxiliary functions.
Most other MAGMA functions are synchronous and hybrid (use both CPU and GPU). This includes posv, potrf, gesv, getrf, etc.
What do you mean by concurrently? If you mean executing two async MAGMA BLAS GPU kernels from a single CPU thread, then you just need to use two different CUDA streams.
Code: Select all
magmaSetKernelStream( stream1 )
magma_dgemm( ... );
magmaSetKernelStream( stream2 )
magma_dgemm( ... );
Code: Select all
thread 1 thread 2
------------------------------ ------------------------------
1. lock()
2. magmablasSetKernelStream( s1 )
3. magma_dgemm( ... )
4. unlock()
5. lock()
6. magmablasSetKernelStream( s2 )
7. magma_dgemm( ... )
8. unlock()
If you mean executing two hybrid MAGMA functions like getrf in different CPU threads, that won't currently work.
-mark
Last edited by mgates3 on Wed Jul 23, 2014 6:56 pm, edited 1 time in total.
Reason: finish sentence about doxygen
Reason: finish sentence about doxygen
Re: concurrent kernel execution in Magma
Note that this changed in MAGMA 2.0, using magma_v2.h. Since each gemm call now takes a queue, you no longer call magmablasSetKernelStream, and there is no need for locks.
-mark
-mark