![]() |
MAGMA
1.5.0
Matrix Algebra for GPU and Multicore Architectures
|
Functions | |
magma_int_t | magma_ztrsm_m (magma_int_t nrgpu, magma_side_t side, magma_uplo_t uplo, magma_trans_t transa, magma_diag_t diag, magma_int_t m, magma_int_t n, magmaDoubleComplex alpha, magmaDoubleComplex *A, magma_int_t lda, magmaDoubleComplex *B, magma_int_t ldb) |
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of. More... | |
void | magmablas_zgemm (magma_trans_t TRANSA, magma_trans_t TRANSB, magma_int_t m, magma_int_t n, magma_int_t k, magmaDoubleComplex alpha, const magmaDoubleComplex *d_A, magma_int_t lda, const magmaDoubleComplex *d_B, magma_int_t ldb, magmaDoubleComplex beta, magmaDoubleComplex *d_C, magma_int_t ldc) |
ZGEMM performs one of the matrix-matrix operations. More... | |
void | magmablas_zgemm_reduce (magma_int_t m, magma_int_t n, magma_int_t k, magmaDoubleComplex alpha, const magmaDoubleComplex *d_A, magma_int_t lda, const magmaDoubleComplex *d_B, magma_int_t ldb, magmaDoubleComplex beta, magmaDoubleComplex *d_C, magma_int_t ldc) |
ZGEMM_REDUCE performs one of the matrix-matrix operations. More... | |
void | magmablas_zher2k_mgpu2 (magma_uplo_t uplo, magma_trans_t trans, magma_int_t n, magma_int_t k, magmaDoubleComplex alpha, magmaDoubleComplex *dA[], magma_int_t lda, magma_int_t aoffset, magmaDoubleComplex *dB[], magma_int_t ldb, magma_int_t boffset, double beta, magmaDoubleComplex *dC[], magma_int_t ldc, magma_int_t coffset, magma_int_t ngpu, magma_int_t nb, magma_queue_t streams[][20], magma_int_t nstream) |
ZHER2K performs one of the Hermitian rank 2k operations. More... | |
void | magmablas_zher2k_mgpu_spec (magma_uplo_t uplo, magma_trans_t trans, magma_int_t n, magma_int_t k, magmaDoubleComplex alpha, magmaDoubleComplex *dA[], magma_int_t lda, magma_int_t aoffset, magmaDoubleComplex *dB[], magma_int_t ldb, magma_int_t boffset, double beta, magmaDoubleComplex *dC[], magma_int_t ldc, magma_int_t coffset, magma_int_t ngpu, magma_int_t nb, magma_queue_t streams[][20], magma_int_t nstream) |
ZHER2K performs one of the Hermitian rank 2k operations. More... | |
void | magmablas_ztrsm_work (magma_side_t side, magma_uplo_t uplo, magma_trans_t transA, magma_diag_t diag, magma_int_t m, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *dA, magma_int_t ldda, magmaDoubleComplex *dB, magma_int_t lddb, magma_int_t flag, magmaDoubleComplex *d_dinvA, magmaDoubleComplex *dX) |
ztrsm_work solves one of the matrix equations on gpu More... | |
void | magmablas_ztrsm (magma_side_t side, magma_uplo_t uplo, magma_trans_t transA, magma_diag_t diag, magma_int_t m, magma_int_t n, magmaDoubleComplex alpha, const magmaDoubleComplex *dA, magma_int_t ldda, magmaDoubleComplex *dB, magma_int_t lddb) |
void | magmablas_ztrtri_diag_stream (magma_uplo_t uplo, magma_diag_t diag, magma_int_t n, const magmaDoubleComplex *dA, magma_int_t ldda, magmaDoubleComplex *d_dinvA, magma_queue_t stream) |
Inverts the NB x NB diagonal blocks. More... | |
void | magmablas_ztrtri_diag (magma_uplo_t uplo, magma_diag_t diag, magma_int_t n, const magmaDoubleComplex *dA, magma_int_t ldda, magmaDoubleComplex *d_dinvA) |
magma_int_t magma_ztrsm_m | ( | magma_int_t | nrgpu, |
magma_side_t | side, | ||
magma_uplo_t | uplo, | ||
magma_trans_t | transa, | ||
magma_diag_t | diag, | ||
magma_int_t | m, | ||
magma_int_t | n, | ||
magmaDoubleComplex | alpha, | ||
magmaDoubleComplex * | A, | ||
magma_int_t | lda, | ||
magmaDoubleComplex * | B, | ||
magma_int_t | ldb | ||
) |
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of.
op( A ) = A or op( A ) = A**T or op( A ) = conj( A**T ).
The matrix X is overwritten on B.
[in] | nrgpu | INTEGER Number of GPUs to use. |
[in] | side | magma_side_t. On entry, SIDE specifies whether op( A ) appears on the left or right of X as follows:
|
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the matrix A is an upper or lower triangular matrix as follows:
|
[in] | transa | magma_trans_t. On entry, TRANSA specifies the form of op( A ) to be used in the matrix multiplication as follows:
|
[in] | diag | magma_diag_t. On entry, DIAG specifies whether or not A is unit triangular as follows:
|
[in] | m | INTEGER. On entry, M specifies the number of rows of B. M must be at least zero. |
[in] | n | INTEGER. On entry, N specifies the number of columns of B. N must be at least zero. |
[in] | alpha | COMPLEX_16. On entry, ALPHA specifies the scalar alpha. When alpha is zero then A is not referenced and B need not be set before entry. |
[in] | A | COMPLEX_16 array of DIMENSION ( LDA, k ), where k is m when SIDE = MagmaLeft and is n when SIDE = MagmaRight. Before entry with UPLO = MagmaUpper, the leading k by k upper triangular part of the array A must contain the upper triangular matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading k by k lower triangular part of the array A must contain the lower triangular matrix and the strictly upper triangular part of A is not referenced. Note that when DIAG = MagmaUnit, the diagonal elements of A are not referenced either, but are assumed to be unity. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When SIDE = MagmaLeft then LDA >= max( 1, m ), when SIDE = MagmaRight then LDA >= max( 1, n ). |
[in,out] | B | COMPLEX_16 array of DIMENSION ( LDB, n ). Before entry, the leading m by n part of the array B must contain the right-hand side matrix B, and on exit is overwritten by the solution matrix X. |
[in] | ldb | INTEGER. On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. LDB must be at least max( 1, m ). |
void magmablas_zgemm | ( | magma_trans_t | TRANSA, |
magma_trans_t | TRANSB, | ||
magma_int_t | m, | ||
magma_int_t | n, | ||
magma_int_t | k, | ||
magmaDoubleComplex | alpha, | ||
const magmaDoubleComplex * | d_A, | ||
magma_int_t | lda, | ||
const magmaDoubleComplex * | d_B, | ||
magma_int_t | ldb, | ||
magmaDoubleComplex | beta, | ||
magmaDoubleComplex * | d_C, | ||
magma_int_t | ldc | ||
) |
ZGEMM performs one of the matrix-matrix operations.
C = alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X**T or op( X ) = X**H,
alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
[in] | TRANSA | CHARACTER*1. On entry, TRANSA specifies the form of op( A ) to be used in the matrix multiplication as follows:
|
[in] | TRANSB | CHARACTER*1. On entry, TRANSB specifies the form of op( B ) to be used in the matrix multiplication as follows:
|
[in] | m | INTEGER. On entry, M specifies the number of rows of the matrix op( d_A ) and of the matrix d_C. M must be at least zero. |
[in] | n | INTEGER. On entry, N specifies the number of columns of the matrix op( d_B ) and the number of columns of the matrix d_C. N must be at least zero. |
[in] | k | INTEGER. On entry, K specifies the number of columns of the matrix op( d_A ) and the number of rows of the matrix op( d_B ). K must be at least zero. |
[in] | alpha | COMPLEX_16 On entry, ALPHA specifies the scalar alpha. |
[in] | d_A | COMPLEX_16 array of DIMENSION ( LDA, ka ), where ka is k when TRANSA = MagmaNoTrans, and is m otherwise. Before entry with TRANSA = MagmaNoTrans, the leading m by k part of the array d_A must contain the matrix d_A, otherwise the leading k by m part of the array d_A must contain the matrix d_A. |
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When TRANSA = MagmaNoTrans then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ). |
[in] | d_B | COMPLEX_16 array of DIMENSION ( LDB, kb ), where kb is n when TRANSB = MagmaNoTrans, and is k otherwise. Before entry with TRANSB = MagmaNoTrans, the leading k by n part of the array d_B must contain the matrix d_B, otherwise the leading n by k part of the array d_B must contain the matrix d_B. |
[in] | ldb | INTEGER. On entry, LDB specifies the first dimension of d_B as declared in the calling (sub) program. When TRANSB = MagmaNoTrans then LDB must be at least max( 1, k ), otherwise LDB must be at least max( 1, n ). |
[in] | beta | COMPLEX_16. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then d_C need not be set on input. |
[in,out] | d_C | COMPLEX_16 array of DIMENSION ( LDC, n ). Before entry, the leading m by n part of the array d_C must contain the matrix d_C, except when beta is zero, in which case d_C need not be set on entry. On exit, the array d_C is overwritten by the m by n matrix ( alpha*op( d_A )*op( d_B ) + beta*d_C ). |
[in] | ldc | INTEGER. On entry, LDC specifies the first dimension of d_C as declared in the calling (sub) program. LDC must be at least max( 1, m ). |
void magmablas_zgemm_reduce | ( | magma_int_t | m, |
magma_int_t | n, | ||
magma_int_t | k, | ||
magmaDoubleComplex | alpha, | ||
const magmaDoubleComplex * | d_A, | ||
magma_int_t | lda, | ||
const magmaDoubleComplex * | d_B, | ||
magma_int_t | ldb, | ||
magmaDoubleComplex | beta, | ||
magmaDoubleComplex * | d_C, | ||
magma_int_t | ldc | ||
) |
ZGEMM_REDUCE performs one of the matrix-matrix operations.
C := alpha*A^T*B + beta*C,
where alpha and beta are scalars, and A, B and C are matrices, with A a k-by-m matrix, B a k-by-n matrix, and C an m-by-n matrix.
This routine is tuned for m, n << k. Typically, m and n are expected to be less than 128.
void magmablas_zher2k_mgpu2 | ( | magma_uplo_t | uplo, |
magma_trans_t | trans, | ||
magma_int_t | n, | ||
magma_int_t | k, | ||
magmaDoubleComplex | alpha, | ||
magmaDoubleComplex * | dA[], | ||
magma_int_t | lda, | ||
magma_int_t | aoffset, | ||
magmaDoubleComplex * | dB[], | ||
magma_int_t | ldb, | ||
magma_int_t | boffset, | ||
double | beta, | ||
magmaDoubleComplex * | dC[], | ||
magma_int_t | ldc, | ||
magma_int_t | coffset, | ||
magma_int_t | ngpu, | ||
magma_int_t | nb, | ||
magma_queue_t | streams[][20], | ||
magma_int_t | nstream | ||
) |
ZHER2K performs one of the Hermitian rank 2k operations.
C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C,
or
C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C,
where alpha and beta are scalars with beta real, C is an n by n Hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array C is to be referenced as follows:
|
[in] | trans | magma_trans_t. On entry, TRANS specifies the operation to be performed as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix C. N must be at least zero. |
[in] | k | INTEGER. On entry with TRANS = MagmaNoTrans, K specifies the number of columns of the matrices A and B, and on entry with TRANS = MagmaConjTrans, K specifies the number of rows of the matrices A and B. K must be at least zero. |
[in] | alpha | COMPLEX*16. On entry, ALPHA specifies the scalar alpha. |
[in] | dA | COMPLEX*16 array of DIMENSION ( LDA, ka ), where ka is k when TRANS = MagmaNoTrans, and is n otherwise. Before entry with TRANS = MagmaNoTrans, the leading n by k part of the array A must contain the matrix A, otherwise the leading k by n part of the array A must contain the matrix A. |
[TODO: describe distribution: duplicated on all GPUs.]
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When TRANS = MagmaNoTrans then LDA must be at least max( 1, n ), otherwise LDA must be at least max( 1, k ). |
[in] | aoffset | INTEGER Row offset to start sub-matrix of dA. Uses dA(aoffset:aoffset+n, :). 0 <= aoffset < lda. |
[in] | dB | COMPLEX*16 array of DIMENSION ( LDB, kb ), where kb is k when TRANS = MagmaNoTrans, and is n otherwise. Before entry with TRANS = MagmaNoTrans, the leading n by k part of the array B must contain the matrix B, otherwise the leading k by n part of the array B must contain the matrix B. |
[TODO: describe distribution: duplicated on all GPUs.]
[in] | ldb | INTEGER. On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When TRANS = MagmaNoTrans then LDB must be at least max( 1, n ), otherwise LDB must be at least max( 1, k ). |
[in] | boffset | INTEGER Row offset to start sub-matrix of dB. Uses dB(boffset:boffset+n, :). 0 <= boffset < ldb. |
[in] | beta | DOUBLE PRECISION. On entry, BETA specifies the scalar beta. |
[in,out] | dC | COMPLEX*16 array of DIMENSION ( LDC, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array C must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of the array C is overwritten by the upper triangular part of the updated matrix. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array C must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of the array C is overwritten by the lower triangular part of the updated matrix. Note that the imaginary parts of the diagonal elements need not be set, they are assumed to be zero, and on exit they are set to zero. [TODO: verify] |
[TODO: describe distribution: 1D column block-cyclic across GPUs.]
[in] | ldc | INTEGER. On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, n ). |
[in] | coffset | INTEGER. Row and column offset to start sub-matrix of dC. Uses dC(coffset:coffset+n, coffset:coffset+n). 0 <= coffset < ldc. |
[in] | ngpu | INTEGER. Number of GPUs over which matrix C is distributed. |
[in] | nb | INTEGER. Block size used for distribution of C. |
[in] | streams | array of CUDA streams, of dimension NGPU by 20. Streams to use for running multiple GEMMs in parallel. Only up to NSTREAM streams are used on each GPU. |
[in] | nstream | INTEGER. Number of streams to use on each device |
void magmablas_zher2k_mgpu_spec | ( | magma_uplo_t | uplo, |
magma_trans_t | trans, | ||
magma_int_t | n, | ||
magma_int_t | k, | ||
magmaDoubleComplex | alpha, | ||
magmaDoubleComplex * | dA[], | ||
magma_int_t | lda, | ||
magma_int_t | aoffset, | ||
magmaDoubleComplex * | dB[], | ||
magma_int_t | ldb, | ||
magma_int_t | boffset, | ||
double | beta, | ||
magmaDoubleComplex * | dC[], | ||
magma_int_t | ldc, | ||
magma_int_t | coffset, | ||
magma_int_t | ngpu, | ||
magma_int_t | nb, | ||
magma_queue_t | streams[][20], | ||
magma_int_t | nstream | ||
) |
ZHER2K performs one of the Hermitian rank 2k operations.
C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C,
or
C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C,
where alpha and beta are scalars with beta real, C is an n by n Hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
This version assumes C has been symmetrized, so both upper and lower are stored, and it maintains the symmetry, doing twice the operations.
[in] | uplo | magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array C is to be referenced as follows:
|
[in] | trans | magma_trans_t. On entry, TRANS specifies the operation to be performed as follows:
|
[in] | n | INTEGER. On entry, N specifies the order of the matrix C. N must be at least zero. |
[in] | k | INTEGER. On entry with TRANS = MagmaNoTrans, K specifies the number of columns of the matrices A and B, and on entry with TRANS = MagmaConjTrans, K specifies the number of rows of the matrices A and B. K must be at least zero. |
[in] | alpha | COMPLEX*16. On entry, ALPHA specifies the scalar alpha. |
[in] | dA | COMPLEX*16 array of DIMENSION ( LDA, ka ), where ka is k when TRANS = MagmaNoTrans, and is n otherwise. Before entry with TRANS = MagmaNoTrans, the leading n by k part of the array A must contain the matrix A, otherwise the leading k by n part of the array A must contain the matrix A. |
[TODO: describe distribution: duplicated on all GPUs.]
[in] | lda | INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When TRANS = MagmaNoTrans then LDA must be at least max( 1, n ), otherwise LDA must be at least max( 1, k ). |
[in] | aoffset | INTEGER Row offset to start sub-matrix of dA. Uses dA(aoffset:aoffset+n, :). 0 <= aoffset < lda. |
[in] | dB | COMPLEX*16 array of DIMENSION ( LDB, kb ), where kb is k when TRANS = MagmaNoTrans, and is n otherwise. Before entry with TRANS = MagmaNoTrans, the leading n by k part of the array B must contain the matrix B, otherwise the leading k by n part of the array B must contain the matrix B. |
[TODO: describe distribution: duplicated on all GPUs.]
[in] | ldb | INTEGER. On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When TRANS = MagmaNoTrans then LDB must be at least max( 1, n ), otherwise LDB must be at least max( 1, k ). |
[in] | boffset | INTEGER Row offset to start sub-matrix of dB. Uses dB(boffset:boffset+n, :). 0 <= boffset < ldb. |
[in] | beta | DOUBLE PRECISION. On entry, BETA specifies the scalar beta. |
[in,out] | dC | COMPLEX*16 array of DIMENSION ( LDC, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array C must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of the array C is overwritten by the upper triangular part of the updated matrix. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array C must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of the array C is overwritten by the lower triangular part of the updated matrix. Note that the imaginary parts of the diagonal elements need not be set, they are assumed to be zero, and on exit they are set to zero. [TODO: verify] |
[TODO: describe distribution: 1D column block-cyclic across GPUs.]
[in] | ldc | INTEGER. On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, n ). |
[in] | coffset | INTEGER. Row and column offset to start sub-matrix of dC. Uses dC(coffset:coffset+n, coffset:coffset+n). 0 <= coffset < ldc. |
[in] | ngpu | INTEGER. Number of GPUs over which matrix C is distributed. |
[in] | nb | INTEGER. Block size used for distribution of C. |
[in] | streams | array of CUDA streams, of dimension NGPU by 20. Streams to use for running multiple GEMMs in parallel. Only up to NSTREAM streams are used on each GPU. |
[in] | nstream | INTEGER. Number of streams to use on each device |
void magmablas_ztrsm | ( | magma_side_t | side, |
magma_uplo_t | uplo, | ||
magma_trans_t | transA, | ||
magma_diag_t | diag, | ||
magma_int_t | m, | ||
magma_int_t | n, | ||
magmaDoubleComplex | alpha, | ||
const magmaDoubleComplex * | dA, | ||
magma_int_t | ldda, | ||
magmaDoubleComplex * | dB, | ||
magma_int_t | lddb | ||
) |
void magmablas_ztrsm_work | ( | magma_side_t | side, |
magma_uplo_t | uplo, | ||
magma_trans_t | transA, | ||
magma_diag_t | diag, | ||
magma_int_t | m, | ||
magma_int_t | n, | ||
magmaDoubleComplex | alpha, | ||
const magmaDoubleComplex * | dA, | ||
magma_int_t | ldda, | ||
magmaDoubleComplex * | dB, | ||
magma_int_t | lddb, | ||
magma_int_t | flag, | ||
magmaDoubleComplex * | d_dinvA, | ||
magmaDoubleComplex * | dX | ||
) |
ztrsm_work solves one of the matrix equations on gpu
op(A)*X = alpha*B, or X*op(A) = alpha*B,
where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op(A) is one of
op(A) = A, or op(A) = A^T, or op(A) = A^H.
The matrix X is overwritten on B.
This is an asynchronous version of magmablas_ztrsm with flag, d_dinvA and dX workspaces as arguments.
[in] | side | magma_side_t. On entry, side specifies whether op(A) appears on the left or right of X as follows:
|
[in] | uplo | magma_uplo_t. On entry, uplo specifies whether the matrix A is an upper or lower triangular matrix as follows:
|
[in] | transA | magma_trans_t. On entry, transA specifies the form of op(A) to be used in the matrix multiplication as follows:
|
[in] | diag | magma_diag_t. On entry, diag specifies whether or not A is unit triangular as follows:
|
[in] | m | INTEGER. On entry, m specifies the number of rows of B. m must be at least zero. |
[in] | n | INTEGER. On entry, n specifies the number of columns of B. n must be at least zero. |
[in] | alpha | COMPLEX_16. On entry, alpha specifies the scalar alpha. When alpha is zero then A is not referenced and B need not be set before entry. |
[in] | dA | COMPLEX_16 array of DIMENSION ( ldda, k ), where k is m when side = MagmaLeft and is n when side = MagmaRight. Before entry with uplo = MagmaUpper, the leading k by k upper triangular part of the array A must contain the upper triangular matrix and the strictly lower triangular part of A is not referenced. Before entry with uplo = MagmaLower, the leading k by k lower triangular part of the array A must contain the lower triangular matrix and the strictly upper triangular part of A is not referenced. Note that when diag = MagmaUnit, the diagonal elements of A are not referenced either, but are assumed to be unity. |
[in] | ldda | INTEGER. On entry, ldda specifies the first dimension of A as declared in the calling (sub) program. When side = MagmaLeft then ldda must be at least max( 1, m ), when side = MagmaRight then ldda must be at least max( 1, n ). |
[in,out] | dB | COMPLEX_16 array of DIMENSION ( lddb, n ). Before entry, the leading m by n part of the array B must contain the right-hand side matrix B, and on exit is overwritten by the solution matrix X. |
[in] | lddb | INTEGER. On entry, lddb specifies the first dimension of B as declared in the calling (sub) program. lddb must be at least max( 1, m ). |
[in] | flag | BOOLEAN. If flag is true, invert diagonal blocks. If flag is false, assume diagonal blocks are already inverted. |
d_dinvA | (workspace) on device. If side == MagmaLeft, d_dinvA must be of size >= ((m+NB-1)/NB)*NB*NB, If side == MagmaRight, d_dinvA must be of size >= ((n+NB-1)/NB)*NB*NB, where NB = 128. | |
dX | (workspace) size m*n, on device. | |
[in] | stream | magma_queue_t Stream to execute in. |
void magmablas_ztrtri_diag | ( | magma_uplo_t | uplo, |
magma_diag_t | diag, | ||
magma_int_t | n, | ||
const magmaDoubleComplex * | dA, | ||
magma_int_t | ldda, | ||
magmaDoubleComplex * | d_dinvA | ||
) |
void magmablas_ztrtri_diag_stream | ( | magma_uplo_t | uplo, |
magma_diag_t | diag, | ||
magma_int_t | n, | ||
const magmaDoubleComplex * | dA, | ||
magma_int_t | ldda, | ||
magmaDoubleComplex * | d_dinvA, | ||
magma_queue_t | stream | ||
) |
Inverts the NB x NB diagonal blocks.
This routine is used in ztrsm.
Same as ztrtri_diag, but adds stream argument.
ztrtri_diag inverts the NB x NB diagonal blocks of A.
[in] | uplo | magma_uplo_t. On entry, uplo specifies whether the matrix A is an upper or lower triangular matrix as follows:
|
[in] | diag | magma_diag_t. On entry, diag specifies whether or not A is unit triangular as follows:
|
[in] | n | INTEGER. On entry, n specifies the order of the matrix A. N >= 0. |
[in] | dA | COMPLEX_16 array of dimension ( ldda, n ) The triangular matrix A. If UPLO = 'U', the leading N-by-N upper triangular part of A contains the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If UPLO = 'L', the leading N-by-N lower triangular part of A contains the lower triangular matrix, and the strictly upper triangular part of A is not referenced. If DIAG = 'U', the diagonal elements of A are also not referenced and are assumed to be 1. |
[in] | ldda | INTEGER. The leading dimension of the array A. LDDA >= max(1,N). |
[out] | d_dinvA | COMPLEX_16 array of dimension (NB, ((n+NB-1)/NB)*NB), where NB = 128. On exit, contains inverses of the NB-by-NB diagonal blocks of A. |
[in] | stream | magma_queue_t Stream to execute in. |