MAGMA  1.5.0
Matrix Algebra for GPU and Multicore Architectures
 All Files Functions Groups
Level-2 BLAS

Level-2, matrix–vector operations: \( O(n^2) \) operations on \( O(n^2) \) data; memory bound. More...

Modules

 single precision
 
 double precision
 
 single-complex precision
 
 double-complex precision
 

Functions

magma_int_t magmablas_csymv_tesla_work (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy, magmaFloatComplex *dwork, magma_int_t lwork)
 magmablas_csymv_work performs the matrix-vector operation: More...
 
magma_int_t magmablas_dsymv_tesla_work (magma_uplo_t uplo, magma_int_t n, double alpha, const double *A, magma_int_t lda, const double *x, magma_int_t incx, double beta, double *y, magma_int_t incy, double *dwork, magma_int_t lwork)
 magmablas_dsymv_work performs the matrix-vector operation: More...
 
magma_int_t magmablas_ssymv_tesla_work (magma_uplo_t uplo, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy, float *dwork, magma_int_t lwork)
 magmablas_ssymv_work performs the matrix-vector operation: More...
 

Detailed Description

Level-2, matrix–vector operations: \( O(n^2) \) operations on \( O(n^2) \) data; memory bound.

Function Documentation

magma_int_t magmablas_csymv_tesla_work ( magma_uplo_t  uplo,
magma_int_t  n,
magmaFloatComplex  alpha,
const magmaFloatComplex *  A,
magma_int_t  lda,
const magmaFloatComplex *  x,
magma_int_t  incx,
magmaFloatComplex  beta,
magmaFloatComplex *  y,
magma_int_t  incy,
magmaFloatComplex *  dwork,
magma_int_t  lwork 
)

magmablas_csymv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.

the interface of magmablas_csymv_work is different from magmablas_csymv in the last argument dwork

As magma implements csymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.

The size of dwork is lda * ceil(n/thread_x) where thread_x = 64

magamblasw_csymv requires users to explicitly a working space, while magmablas_csymv is a wrapper routine of magmabalsw_csymv allocating the working space inside the routine and provides the same interface with cublas.

If users need to call csymv frequently, we suggest to use magmablas_csymv_work instead of magmablas_csymv. As the overhead of allocating and free in device memory in magmablas_csymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.

magma_int_t magmablas_dsymv_tesla_work ( magma_uplo_t  uplo,
magma_int_t  n,
double  alpha,
const double *  A,
magma_int_t  lda,
const double *  x,
magma_int_t  incx,
double  beta,
double *  y,
magma_int_t  incy,
double *  dwork,
magma_int_t  lwork 
)

magmablas_dsymv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.

the interface of magmablas_dsymv_work is different from magmablas_dsymv in the last argument dwork

As magma implements dsymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.

The size of dwork is lda * ceil(n/thread_x) where thread_x = 64

magamblasw_dsymv requires users to explicitly a working space, while magmablas_dsymv is a wrapper routine of magmabalsw_dsymv allocating the working space inside the routine and provides the same interface with cublas.

If users need to call dsymv frequently, we suggest to use magmablas_dsymv_work instead of magmablas_dsymv. As the overhead of allocating and free in device memory in magmablas_dsymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.

magma_int_t magmablas_ssymv_tesla_work ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  A,
magma_int_t  lda,
const float *  x,
magma_int_t  incx,
float  beta,
float *  y,
magma_int_t  incy,
float *  dwork,
magma_int_t  lwork 
)

magmablas_ssymv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.

the interface of magmablas_ssymv_work is different from magmablas_ssymv in the last argument dwork

As magma implements ssymv through two steps: 1) perform the multiplication in each thread blocks and put the intermediate value in a space of device memory which we call working space. dwork is the working space 2) sum the intermediate values and store the final result in y.

The size of dwork is lda * ceil(n/thread_x) where thread_x = 64

magamblasw_ssymv requires users to explicitly a working space, while magmablas_ssymv is a wrapper routine of magmabalsw_ssymv allocating the working space inside the routine and provides the same interface with cublas.

If users need to call ssymv frequently, we suggest to use magmablas_ssymv_work instead of magmablas_ssymv. As the overhead of allocating and free in device memory in magmablas_ssymv would hurt performance. Our tests show that this penalty is about 10Gflop/s when matrix size is around 10000.