![]() |
MAGMA
1.5.0
Matrix Algebra for GPU and Multicore Architectures
|
Level-2, matrix–vector operations: \( O(n^2) \) operations on \( O(n^2) \) data; memory bound. More...
Modules | |
single precision | |
double precision | |
single-complex precision | |
double-complex precision | |
Functions | |
magma_int_t | magmablas_csymv_tesla_work (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy, magmaFloatComplex *dwork, magma_int_t lwork) |
magmablas_csymv_work performs the matrix-vector operation: More... | |
magma_int_t | magmablas_dsymv_tesla_work (magma_uplo_t uplo, magma_int_t n, double alpha, const double *A, magma_int_t lda, const double *x, magma_int_t incx, double beta, double *y, magma_int_t incy, double *dwork, magma_int_t lwork) |
magmablas_dsymv_work performs the matrix-vector operation: More... | |
magma_int_t | magmablas_ssymv_tesla_work (magma_uplo_t uplo, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy, float *dwork, magma_int_t lwork) |
magmablas_ssymv_work performs the matrix-vector operation: More... | |
Level-2, matrix–vector operations: \( O(n^2) \) operations on \( O(n^2) \) data; memory bound.
magma_int_t magmablas_csymv_tesla_work | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
const magmaFloatComplex * | A, | ||
magma_int_t | lda, | ||
const magmaFloatComplex * | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex * | y, | ||
magma_int_t | incy, | ||
magmaFloatComplex * | dwork, | ||
magma_int_t | lwork | ||
) |
magmablas_csymv_work performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
the interface of magmablas_csymv_work is different from magmablas_csymv in the last argument dwork
As magma implements csymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.
The size of dwork is lda * ceil(n/thread_x) where thread_x = 64
magamblasw_csymv requires users to explicitly a working space, while magmablas_csymv is a wrapper routine of magmabalsw_csymv allocating the working space inside the routine and provides the same interface with cublas.
If users need to call csymv frequently, we suggest to use magmablas_csymv_work instead of magmablas_csymv. As the overhead of allocating and free in device memory in magmablas_csymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.
magma_int_t magmablas_dsymv_tesla_work | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
double | alpha, | ||
const double * | A, | ||
magma_int_t | lda, | ||
const double * | x, | ||
magma_int_t | incx, | ||
double | beta, | ||
double * | y, | ||
magma_int_t | incy, | ||
double * | dwork, | ||
magma_int_t | lwork | ||
) |
magmablas_dsymv_work performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
the interface of magmablas_dsymv_work is different from magmablas_dsymv in the last argument dwork
As magma implements dsymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.
The size of dwork is lda * ceil(n/thread_x) where thread_x = 64
magamblasw_dsymv requires users to explicitly a working space, while magmablas_dsymv is a wrapper routine of magmabalsw_dsymv allocating the working space inside the routine and provides the same interface with cublas.
If users need to call dsymv frequently, we suggest to use magmablas_dsymv_work instead of magmablas_dsymv. As the overhead of allocating and free in device memory in magmablas_dsymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.
magma_int_t magmablas_ssymv_tesla_work | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
float | alpha, | ||
const float * | A, | ||
magma_int_t | lda, | ||
const float * | x, | ||
magma_int_t | incx, | ||
float | beta, | ||
float * | y, | ||
magma_int_t | incy, | ||
float * | dwork, | ||
magma_int_t | lwork | ||
) |
magmablas_ssymv_work performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
the interface of magmablas_ssymv_work is different from magmablas_ssymv in the last argument dwork
As magma implements ssymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.
The size of dwork is lda * ceil(n/thread_x) where thread_x = 64
magamblasw_ssymv requires users to explicitly a working space, while magmablas_ssymv is a wrapper routine of magmabalsw_ssymv allocating the working space inside the routine and provides the same interface with cublas.
If users need to call ssymv frequently, we suggest to use magmablas_ssymv_work instead of magmablas_ssymv. As the overhead of allocating and free in device memory in magmablas_ssymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.