MAGMA  1.5.0
Matrix Algebra for GPU and Multicore Architectures
 All Functions Groups
double-complex precision

Functions

void magmablas_zgemv_batched (magma_trans_t trans, magma_int_t m, magma_int_t n, magmaDoubleComplex alpha, magmaDoubleComplex **A_array, magma_int_t lda, magmaDoubleComplex **x_array, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex **y_array, magma_int_t incy, magma_int_t batchCount)
 This routine computes Y = alpha opt(A) x + beta y, on the GPU, where A = A_array[i],x = x_array[i] and y = y_array[i], i=[0,batchCount-1]. More...
 
magma_int_t magmablas_zhemv_work (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *A, magma_int_t lda, const magmaDoubleComplex *x, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex *y, magma_int_t incy, magmaDoubleComplex *dwork, magma_int_t lwork)
 magmablas_zhemv_work performs the matrix-vector operation: More...
 
magma_int_t magmablas_zhemv (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *A, magma_int_t lda, const magmaDoubleComplex *x, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex *y, magma_int_t incy)
 magmablas_zhemv performs the matrix-vector operation: More...
 
magma_int_t magmablas_zhemv_mgpu_offset (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex alpha, magmaDoubleComplex **A, magma_int_t lda, magmaDoubleComplex **x, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex **y, magma_int_t incy, magmaDoubleComplex **work, magma_int_t lwork, magma_int_t num_gpus, magma_int_t nb, magma_int_t offset, magma_queue_t stream[][10])
 magmablas_zhemv performs the matrix-vector operation: More...
 
magma_int_t magmablas_zsymv_work (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *A, magma_int_t lda, const magmaDoubleComplex *x, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex *y, magma_int_t incy, magmaDoubleComplex *dwork, magma_int_t lwork)
 magmablas_zsymv_work performs the matrix-vector operation: More...
 
magma_int_t magmablas_zsymv (magma_uplo_t uplo, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *A, magma_int_t lda, const magmaDoubleComplex *x, magma_int_t incx, magmaDoubleComplex beta, magmaDoubleComplex *y, magma_int_t incy)
 magmablas_zsymv performs the matrix-vector operation: More...
 

Detailed Description

Function Documentation

void magmablas_zgemv_batched ( magma_trans_t  trans,
magma_int_t  m,
magma_int_t  n,
magmaDoubleComplex  alpha,
magmaDoubleComplex **  A_array,
magma_int_t  lda,
magmaDoubleComplex **  x_array,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex **  y_array,
magma_int_t  incy,
magma_int_t  batchCount 
)

This routine computes Y = alpha opt(A) x + beta y, on the GPU, where A = A_array[i],x = x_array[i] and y = y_array[i], i=[0,batchCount-1].

This is a batched version.

Parameters
[in]transCHARACTER*1. On entry, TRANS specifies the form of op( A ) to be used in the matrix multiplication as follows: = 'N': op( A ) = A. = 'T': op( A ) = A**T. = 'C': op( A ) = A**H.
[in]mINTEGER. On entry, M specifies the number of rows of the matrix opt(A).
[in]nINTEGER. On entry, N specifies the number of columns of the matrix opt(A)
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]A_arrayA = A_array[i] A: COMPLEX*16 array of dimension ( LDA, n ) on the GPU.
[in]ldaINTEGER. LDA specifies the leading dimension of A.
[in]x_arrayx = x_array[i] x: COMPLEX*16 array of dimension n.
[in]betaDOUBLE PRECISION. On entry, BETA specifies the scalar beta.
[out]y_arrayy = y_array[i]: y: COMPLEX*16 array of dimension n. On exit y = alpha opt(A) x + beta y.
[in]batchCountINTEGER number of pointers contained in A_array, x_array and y_array.
magma_int_t magmablas_zhemv ( magma_uplo_t  uplo,
magma_int_t  n,
magmaDoubleComplex  alpha,
const magmaDoubleComplex *  A,
magma_int_t  lda,
const magmaDoubleComplex *  x,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex *  y,
magma_int_t  incy 
)

magmablas_zhemv performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian matrix.

Parameters
[in]uplomagma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in]nINTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]ACOMPLEX*16 array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in]ldaINTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in]xCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in]incxINTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in]betaCOMPLEX*16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out]yCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in]incyINTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
magma_int_t magmablas_zhemv_mgpu_offset ( magma_uplo_t  uplo,
magma_int_t  n,
magmaDoubleComplex  alpha,
magmaDoubleComplex **  A,
magma_int_t  lda,
magmaDoubleComplex **  x,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex **  y,
magma_int_t  incy,
magmaDoubleComplex **  work,
magma_int_t  lwork,
magma_int_t  num_gpus,
magma_int_t  nb,
magma_int_t  offset,
magma_queue_t  stream[][10] 
)

magmablas_zhemv performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.

Parameters
[in]uplomagma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in]nINTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]ACOMPLEX*16 array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in]ldaINTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in]xCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in]incxINTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in]betaCOMPLEX*16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out]yCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in]incyINTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
magma_int_t magmablas_zhemv_work ( magma_uplo_t  uplo,
magma_int_t  n,
magmaDoubleComplex  alpha,
const magmaDoubleComplex *  A,
magma_int_t  lda,
const magmaDoubleComplex *  x,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex *  y,
magma_int_t  incy,
magmaDoubleComplex *  dwork,
magma_int_t  lwork 
)

magmablas_zhemv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n Hermitian matrix.

Parameters
[in]uplomagma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in]nINTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]ACOMPLEX*16 array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in]ldaINTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in]xCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in]incxINTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in]betaCOMPLEX*16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out]yCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in]incyINTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
[in]dwork(workspace) COMPLEX*16 array on the GPU, dimension (MAX(1, LWORK)),
[in]lworkINTEGER. The dimension of the array DWORK. LWORK >= LDA * ceil( N / NB_X ), where NB_X = 64.

MAGMA implements zhemv through two steps: 1) perform the multiplication in each thread block and put the intermediate value in dwork. 2) sum the intermediate values and store the final result in y.

magamblas_zhemv_work requires users to provide a workspace, while magmablas_zhemv is a wrapper routine allocating the workspace inside the routine and provides the same interface as cublas.

If users need to call zhemv frequently, we suggest using magmablas_zhemv_work instead of magmablas_zhemv. As the overhead to allocate and free in device memory in magmablas_zhemv would hurt performance. Our tests show that this penalty is about 10 Gflop/s when the matrix size is around 10000.

magma_int_t magmablas_zsymv ( magma_uplo_t  uplo,
magma_int_t  n,
magmaDoubleComplex  alpha,
const magmaDoubleComplex *  A,
magma_int_t  lda,
const magmaDoubleComplex *  x,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex *  y,
magma_int_t  incy 
)

magmablas_zsymv performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n complex symmetric matrix.

Parameters
[in]uplomagma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in]nINTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]ACOMPLEX*16 array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in]ldaINTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in]xCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in]incxINTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in]betaCOMPLEX*16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out]yCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in]incyINTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
magma_int_t magmablas_zsymv_work ( magma_uplo_t  uplo,
magma_int_t  n,
magmaDoubleComplex  alpha,
const magmaDoubleComplex *  A,
magma_int_t  lda,
const magmaDoubleComplex *  x,
magma_int_t  incx,
magmaDoubleComplex  beta,
magmaDoubleComplex *  y,
magma_int_t  incy,
magmaDoubleComplex *  dwork,
magma_int_t  lwork 
)

magmablas_zsymv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n complex symmetric matrix.

Parameters
[in]uplomagma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:
  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in]nINTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in]alphaCOMPLEX*16. On entry, ALPHA specifies the scalar alpha.
[in]ACOMPLEX*16 array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in]ldaINTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in]xCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in]incxINTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in]betaCOMPLEX*16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out]yCOMPLEX*16 array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in]incyINTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
[in]dwork(workspace) COMPLEX*16 array on the GPU, dimension (MAX(1, LWORK)),
[in]lworkINTEGER. The dimension of the array DWORK. LWORK >= LDA * ceil( N / NB_X ), where NB_X = 64.

MAGMA implements zsymv through two steps: 1) perform the multiplication in each thread block and put the intermediate value in dwork. 2) sum the intermediate values and store the final result in y.

magamblas_zsymv_work requires users to provide a workspace, while magmablas_zsymv is a wrapper routine allocating the workspace inside the routine and provides the same interface as cublas.

If users need to call zsymv frequently, we suggest using magmablas_zsymv_work instead of magmablas_zsymv. As the overhead to allocate and free in device memory in magmablas_zsymv would hurt performance. Our tests show that this penalty is about 10 Gflop/s when the matrix size is around 10000.