![]() |
MAGMA
1.5.0
Matrix Algebra for GPU and Multicore Architectures
|
Functions | |
magma_int_t | magmablas_chemv_work (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy, magmaFloatComplex *dwork, magma_int_t lwork) |
magmablas_chemv_work performs the matrix-vector operation: More... | |
magma_int_t | magmablas_chemv (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy) |
magmablas_chemv performs the matrix-vector operation: More... | |
magma_int_t | magmablas_chemv_mgpu_offset (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, magmaFloatComplex **A, magma_int_t lda, magmaFloatComplex **x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex **y, magma_int_t incy, magmaFloatComplex **work, magma_int_t lwork, magma_int_t num_gpus, magma_int_t nb, magma_int_t offset, magma_queue_t stream[][10]) |
magmablas_chemv performs the matrix-vector operation: More... | |
magma_int_t | magmablas_csymv_work (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy, magmaFloatComplex *dwork, magma_int_t lwork) |
magmablas_csymv_work performs the matrix-vector operation: More... | |
magma_int_t | magmablas_csymv (magma_uplo_t uplo, magma_int_t n, magmaFloatComplex alpha, const magmaFloatComplex *A, magma_int_t lda, const magmaFloatComplex *x, magma_int_t incx, magmaFloatComplex beta, magmaFloatComplex *y, magma_int_t incy) |
magmablas_csymv performs the matrix-vector operation: More... | |
magma_int_t magmablas_chemv | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
const magmaFloatComplex * | A, | ||
magma_int_t | lda, | ||
const magmaFloatComplex * | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex * | y, | ||
magma_int_t | incy | ||
) |
magmablas_chemv performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian matrix.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero. |
[in] | alpha | COMPLEX. On entry, ALPHA specifies the scalar alpha. |
[in] | A | COMPLEX array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent. |
[in] | x | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x. |
[in] | incx | INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero. |
[in] | beta | COMPLEX. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input. |
[in,out] | y | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y. |
[in] | incy | INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero. |
magma_int_t magmablas_chemv_mgpu_offset | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
magmaFloatComplex ** | A, | ||
magma_int_t | lda, | ||
magmaFloatComplex ** | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex ** | y, | ||
magma_int_t | incy, | ||
magmaFloatComplex ** | work, | ||
magma_int_t | lwork, | ||
magma_int_t | num_gpus, | ||
magma_int_t | nb, | ||
magma_int_t | offset, | ||
magma_queue_t | stream[][10] | ||
) |
magmablas_chemv performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero. |
[in] | alpha | COMPLEX. On entry, ALPHA specifies the scalar alpha. |
[in] | A | COMPLEX array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent. |
[in] | x | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x. |
[in] | incx | INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero. |
[in] | beta | COMPLEX. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input. |
[in,out] | y | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y. |
[in] | incy | INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero. |
magma_int_t magmablas_chemv_work | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
const magmaFloatComplex * | A, | ||
magma_int_t | lda, | ||
const magmaFloatComplex * | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex * | y, | ||
magma_int_t | incy, | ||
magmaFloatComplex * | dwork, | ||
magma_int_t | lwork | ||
) |
magmablas_chemv_work performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian matrix.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero. |
[in] | alpha | COMPLEX. On entry, ALPHA specifies the scalar alpha. |
[in] | A | COMPLEX array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent. |
[in] | x | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x. |
[in] | incx | INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero. |
[in] | beta | COMPLEX. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input. |
[in,out] | y | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y. |
[in] | incy | INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero. |
[in] | dwork | (workspace) COMPLEX array on the GPU, dimension (MAX(1, LWORK)), |
[in] | lwork | INTEGER. The dimension of the array DWORK. LWORK >= LDA * ceil( N / NB_X ), where NB_X = 64. |
MAGMA implements chemv through two steps: 1) perform the multiplication in each thread block and put the intermediate value in dwork. 2) sum the intermediate values and store the final result in y.
magamblas_chemv_work requires users to provide a workspace, while magmablas_chemv is a wrapper routine allocating the workspace inside the routine and provides the same interface as cublas.
If users need to call chemv frequently, we suggest using magmablas_chemv_work instead of magmablas_chemv. As the overhead to allocate and free in device memory in magmablas_chemv would hurt performance. Our tests show that this penalty is about 10 Gflop/s when the matrix size is around 10000.
magma_int_t magmablas_csymv | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
const magmaFloatComplex * | A, | ||
magma_int_t | lda, | ||
const magmaFloatComplex * | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex * | y, | ||
magma_int_t | incy | ||
) |
magmablas_csymv performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n complex symmetric matrix.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero. |
[in] | alpha | COMPLEX. On entry, ALPHA specifies the scalar alpha. |
[in] | A | COMPLEX array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent. |
[in] | x | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x. |
[in] | incx | INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero. |
[in] | beta | COMPLEX. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input. |
[in,out] | y | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y. |
[in] | incy | INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero. |
magma_int_t magmablas_csymv_work | ( | magma_uplo_t | uplo, |
magma_int_t | n, | ||
magmaFloatComplex | alpha, | ||
const magmaFloatComplex * | A, | ||
magma_int_t | lda, | ||
const magmaFloatComplex * | x, | ||
magma_int_t | incx, | ||
magmaFloatComplex | beta, | ||
magmaFloatComplex * | y, | ||
magma_int_t | incy, | ||
magmaFloatComplex * | dwork, | ||
magma_int_t | lwork | ||
) |
magmablas_csymv_work performs the matrix-vector operation:
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are n element vectors and A is an n by n complex symmetric matrix.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero. |
[in] | alpha | COMPLEX. On entry, ALPHA specifies the scalar alpha. |
[in] | A | COMPLEX array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent. |
[in] | x | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x. |
[in] | incx | INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero. |
[in] | beta | COMPLEX. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input. |
[in,out] | y | COMPLEX array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y. |
[in] | incy | INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero. |
[in] | dwork | (workspace) COMPLEX array on the GPU, dimension (MAX(1, LWORK)), |
[in] | lwork | INTEGER. The dimension of the array DWORK. LWORK >= LDA * ceil( N / NB_X ), where NB_X = 64. |
MAGMA implements csymv through two steps: 1) perform the multiplication in each thread block and put the intermediate value in dwork. 2) sum the intermediate values and store the final result in y.
magamblas_csymv_work requires users to provide a workspace, while magmablas_csymv is a wrapper routine allocating the workspace inside the routine and provides the same interface as cublas.
If users need to call csymv frequently, we suggest using magmablas_csymv_work instead of magmablas_csymv. As the overhead to allocate and free in device memory in magmablas_csymv would hurt performance. Our tests show that this penalty is about 10 Gflop/s when the matrix size is around 10000.