![]() |
MAGMA
1.7.0
Matrix Algebra for GPU and Multicore Architectures
|
Functions | |
magma_int_t | magma_is_devptr (const void *A) |
For debugging purposes, determines whether a pointer points to CPU or GPU memory. More... | |
magma_int_t | magma_get_parallel_numthreads () |
Returns the number of threads to use for parallel sections of MAGMA. More... | |
magma_int_t | magma_get_lapack_numthreads () |
Returns the number of threads currently used for LAPACK and BLAS. More... | |
void | magma_set_lapack_numthreads (magma_int_t threads) |
Sets the number of threads to use for LAPACK and BLAS. More... | |
magma_int_t | magma_get_omp_numthreads () |
Returns the number of threads currently used for OMP sections. More... | |
void | magma_set_omp_numthreads (magma_int_t threads) |
Sets the number of threads to use for parallel section. More... | |
void | magma_xerbla (const char *srname, magma_int_t minfo) |
magma_xerbla is an error handler for the MAGMA routines. More... | |
cublasStatus_t | magmablasSetKernelStream (magma_queue_t stream) |
magmablasSetKernelStream sets the CUDA stream that MAGMA BLAS and CUBLAS (v1) routines use (unless explicitly given a stream). More... | |
cublasStatus_t | magmablasGetKernelStream (magma_queue_t *stream) |
magmablasGetKernelStream gets the CUDA stream that MAGMA BLAS routines use. More... | |
magma_int_t | cusparse2magma_error (cusparseStatus_t status) |
Maps a cuSPARSE error to a MAGMA error. More... | |
magma_int_t cusparse2magma_error | ( | cusparseStatus_t | status | ) |
Maps a cuSPARSE error to a MAGMA error.
[in] | status | cuSPARSE error |
magma_int_t magma_get_lapack_numthreads | ( | ) |
Returns the number of threads currently used for LAPACK and BLAS.
Typically, the number of threads is initially set by the environment variables OMP_NUM_THREADS or MKL_NUM_THREADS.
If MAGMA is compiled with MAGMA_WITH_MKL, this queries MKL; else if MAGMA is compiled with OpenMP, this queries OpenMP; else this returns 1.
magma_int_t magma_get_omp_numthreads | ( | ) |
Returns the number of threads currently used for OMP sections.
Typically, the number of threads is initially set by the environment variables OMP_NUM_THREADS.
magma_int_t magma_get_parallel_numthreads | ( | ) |
Returns the number of threads to use for parallel sections of MAGMA.
Typically, it is initially set by the environment variables OMP_NUM_THREADS or MAGMA_NUM_THREADS.
If MAGMA_NUM_THREADS is set, this returns min( num_cores, MAGMA_NUM_THREADS ); else if MAGMA is compiled with OpenMP, this queries OpenMP and returns min( num_cores, OMP_NUM_THREADS ); else this returns num_cores.
For the number of cores, if MAGMA is compiled with hwloc, this queries hwloc; else it queries sysconf (on Unix) or GetSystemInfo (on Windows).
magma_int_t magma_is_devptr | ( | const void * | A | ) |
For debugging purposes, determines whether a pointer points to CPU or GPU memory.
On CUDA architecture 2.0 cards with unified addressing, CUDA can tell if it is a device pointer or pinned host pointer. For malloc'd host pointers, cudaPointerGetAttributes returns error, implying it is a (non-pinned) host pointer.
On older cards, this cannot determine if it is CPU or GPU memory.
A | pointer to test |
void magma_set_lapack_numthreads | ( | magma_int_t | threads | ) |
Sets the number of threads to use for LAPACK and BLAS.
This is often used to set BLAS to be single-threaded during sections where MAGMA uses explicit pthread parallelism. Example use:
nthread_save = magma_get_lapack_numthreads(); magma_set_lapack_numthreads( 1 ); ... launch pthreads, do work, terminate pthreads ... magma_set_lapack_numthreads( nthread_save );
If MAGMA is compiled with MAGMA_WITH_MKL, this sets MKL threads; else if MAGMA is compiled with OpenMP, this sets OpenMP threads; else this does nothing.
[in] | threads | INTEGER Number of threads to use. threads >= 1. If threads < 1, this silently does nothing. |
void magma_set_omp_numthreads | ( | magma_int_t | threads | ) |
Sets the number of threads to use for parallel section.
Internal routines.
This is often used to set BLAS to be single-threaded during sections where MAGMA uses explicit pthread parallelism. Example use:
nthread_save = magma_get_omp_numthreads(); magma_set_omp_numthreads( 1 ); ... launch pthreads, do work, terminate pthreads ... magma_set_omp_numthreads( nthread_save );
[in] | threads | INTEGER Number of threads to use. threads >= 1. If threads < 1, this silently does nothing. |
void magma_xerbla | ( | const char * | srname, |
magma_int_t | minfo | ||
) |
magma_xerbla is an error handler for the MAGMA routines.
It is called by a MAGMA routine if an input parameter has an invalid value. It prints an error message.
Installers may consider modifying it to call system-specific exception-handling facilities.
[in] | srname | CHAR* The name of the subroutine that called XERBLA. In C/C++ it is convenient to use "__func__". |
[in] | minfo | INTEGER Note minfo's sign is opposite info's normal sign. |
Normally:
These conditions are also reported, but normally code should not call xerbla for these runtime errors:
cublasStatus_t magmablasGetKernelStream | ( | magma_queue_t * | stream | ) |
magmablasGetKernelStream gets the CUDA stream that MAGMA BLAS routines use.
[out] | stream | magma_queue_t The CUDA stream. |
cublasStatus_t magmablasSetKernelStream | ( | magma_queue_t | stream | ) |
magmablasSetKernelStream sets the CUDA stream that MAGMA BLAS and CUBLAS (v1) routines use (unless explicitly given a stream).
In a multi-threaded application, be careful to avoid race conditions when using this. For instance, if calls are executed in this order:
thread 1 thread 2 ------------------------------ ------------------------------ 1. magmablasSetKernelStream( s1 ) 2. magmablasSetKernelStream( s2 ) 3. magma_dgemm( ... ) 4. magma_dgemm( ... )
both magma_dgemm would occur on stream s2. A lock should be used to prevent this, so the dgemm in thread 1 uses stream s1, and the dgemm in thread 2 uses s2:
thread 1 thread 2 ------------------------------ ------------------------------ 1. lock() 2. magmablasSetKernelStream( s1 ) 3. magma_dgemm( ... ) 4. unlock() 5. lock() 6. magmablasSetKernelStream( s2 ) 7. magma_dgemm( ... ) 8. unlock()
Most BLAS calls in MAGMA, such as magma_dgemm, are asynchronous, so the lock will only have to wait until dgemm is queued, not until it is finished.
[in] | stream | magma_queue_t The CUDA stream. |