MAGMA  1.5.0
Matrix Algebra for GPU and Multicore Architectures
 All Functions Groups
single-complex precision

Functions

magma_int_t magma_clahr2 (magma_int_t n, magma_int_t k, magma_int_t nb, magmaFloatComplex *dA, magma_int_t ldda, magmaFloatComplex *dV, magma_int_t lddv, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *T, magma_int_t ldt, magmaFloatComplex *Y, magma_int_t ldy)
 CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_clahr2_m (magma_int_t n, magma_int_t k, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *tau, magmaFloatComplex *T, magma_int_t ldt, magmaFloatComplex *Y, magma_int_t ldy, struct cgehrd_data *data)
 CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero. More...
 
magma_int_t magma_clahru (magma_int_t n, magma_int_t ihi, magma_int_t k, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex *dA, magma_int_t ldda, magmaFloatComplex *dY, magma_int_t lddy, magmaFloatComplex *dV, magma_int_t lddv, magmaFloatComplex *dT, magmaFloatComplex *dwork)
 CLAHRU is an auxiliary MAGMA routine that is used in CGEHRD to update the trailing sub-matrices after the reductions of the corresponding panels. More...
 
magma_int_t magma_clahru_m (magma_int_t n, magma_int_t ihi, magma_int_t k, magma_int_t nb, magmaFloatComplex *A, magma_int_t lda, struct cgehrd_data *data)
 CLAHRU is an auxiliary MAGMA routine that is used in CGEHRD to update the trailing sub-matrices after the reductions of the corresponding panels. More...
 
magma_int_t magma_clatrsd (magma_uplo_t uplo, magma_trans_t trans, magma_diag_t diag, magma_bool_t normin, magma_int_t n, const magmaFloatComplex *A, magma_int_t lda, magmaFloatComplex lambda, magmaFloatComplex *x, float *scale, float *cnorm, magma_int_t *info)
 CLATRSD solves one of the triangular systems with modified diagonal (A - lambda*I) * x = s*b, (A - lambda*I)**T * x = s*b, or (A - lambda*I)**H * x = s*b, with scaling to prevent overflow. More...
 

Detailed Description

Function Documentation

magma_int_t magma_clahr2 ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex *  dA,
magma_int_t  ldda,
magmaFloatComplex *  dV,
magma_int_t  lddv,
magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex *  tau,
magmaFloatComplex *  T,
magma_int_t  ldt,
magmaFloatComplex *  Y,
magma_int_t  ldy 
)

CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by CGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]dACOMPLEX array on the GPU, dimension (LDDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements in rows K:N of the first NB columns are overwritten with the matrix Y.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[out]dVCOMPLEX array on the GPU, dimension (LDDV, NB) On exit this n-by-nb array contains the Householder vectors of the transformation.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in,out]ACOMPLEX array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_clahr2_m ( magma_int_t  n,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex *  tau,
magmaFloatComplex *  T,
magma_int_t  ldt,
magmaFloatComplex *  Y,
magma_int_t  ldy,
struct cgehrd_data *  data 
)

CLAHR2 reduces the first NB columns of a complex general n-BY-(n-k+1) matrix A so that elements below the k-th subdiagonal are zero.

The reduction is performed by an orthogonal similarity transformation Q' * A * Q. The routine returns the matrices V and T which determine Q as a block reflector I - V*T*V', and also the matrix Y = A * V. (Note this is different than LAPACK, which computes Y = A * V * T.)

This is an auxiliary routine called by CGEHRD.

Parameters
[in]nINTEGER The order of the matrix A.
[in]kINTEGER The offset for the reduction. Elements below the k-th subdiagonal in the first NB columns are reduced to zero. K < N.
[in]nbINTEGER The number of columns to be reduced.
[in,out]ACOMPLEX array, dimension (LDA,N-K+1) On entry, the n-by-(n-k+1) general matrix A. On exit, the elements on and above the k-th subdiagonal in the first NB columns are overwritten with the corresponding elements of the reduced matrix; the elements below the k-th subdiagonal, with the array TAU, represent the matrix Q as a product of elementary reflectors. The other columns of A are unchanged. See Further Details.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[out]tauCOMPLEX array, dimension (NB) The scalar factors of the elementary reflectors. See Further Details.
[out]TCOMPLEX array, dimension (LDT,NB) The upper triangular matrix T.
[in]ldtINTEGER The leading dimension of the array T. LDT >= NB.
[out]YCOMPLEX array, dimension (LDY,NB) The n-by-nb matrix Y.
[in]ldyINTEGER The leading dimension of the array Y. LDY >= N.
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

The matrix Q is represented as a product of nb elementary reflectors

Q = H(1) H(2) . . . H(nb).

Each H(i) has the form

H(i) = I - tau * v * v'

where tau is a complex scalar, and v is a complex vector with v(1:i+k-1) = 0, v(i+k) = 1; v(i+k+1:n) is stored on exit in A(i+k+1:n,i), and tau in TAU(i).

The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to apply the transformation to the unreduced part of the matrix, using an update of the form: A := (I - V*T*V') * (A - Y*T*V').

The contents of A on exit are illustrated by the following example with n = 7, k = 3 and nb = 2:

   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( a   a   a   a   a )
   ( h   h   a   a   a )
   ( v1  h   a   a   a )
   ( v1  v2  a   a   a )
   ( v1  v2  a   a   a )

where "a" denotes an element of the original matrix A, h denotes a modified element of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).

This implementation follows the hybrid algorithm and notations described in

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

magma_int_t magma_clahru ( magma_int_t  n,
magma_int_t  ihi,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex *  dA,
magma_int_t  ldda,
magmaFloatComplex *  dY,
magma_int_t  lddy,
magmaFloatComplex *  dV,
magma_int_t  lddv,
magmaFloatComplex *  dT,
magmaFloatComplex *  dwork 
)

CLAHRU is an auxiliary MAGMA routine that is used in CGEHRD to update the trailing sub-matrices after the reductions of the corresponding panels.

See further details below.

Parameters
[in]nINTEGER The order of the matrix A. N >= 0.
[in]ihiINTEGER Last row to update. Same as IHI in cgehrd.
[in]kINTEGER Number of rows of the matrix Am (see details below)
[in]nbINTEGER Block size
[out]ACOMPLEX array, dimension (LDA,N-K) On entry, the N-by-(N-K) general matrix to be updated. The computation is done on the GPU. After Am is updated on the GPU only Am(1:NB) is transferred to the CPU - to update the corresponding Am matrix. See Further Details below.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[in,out]dACOMPLEX array on the GPU, dimension (LDDA,N-K). On entry, the N-by-(N-K) general matrix to be updated. On exit, the 1st K rows (matrix Am) of A are updated by applying an orthogonal transformation from the right Am = Am (I-V T V'), and sub-matrix Ag is updated by Ag = (I - V T V') Ag (I - V T V(NB+1:)' ) where Q = I - V T V' represent the orthogonal matrix (as a product of elementary reflectors V) used to reduce the current panel of A to upper Hessenberg form. After Am is updated Am(:,1:NB) is sent to the CPU. See Further Details below.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,N).
[in,out]dY(workspace) COMPLEX array on the GPU, dimension (LDDY, NB). On entry the (N-K)-by-NB Y = A V. It is used internally as workspace, so its value is changed on exit.
[in]lddyINTEGER The leading dimension of the array dY. LDDY >= max(1,N).
[in,out]dV(workspace) COMPLEX array on the GPU, dimension (LDDV, NB). On entry the (N-K)-by-NB matrix V of elementary reflectors used to reduce the current panel of A to upper Hessenberg form. The rest K-by-NB part is used as workspace. V is unchanged on exit.
[in]lddvINTEGER The leading dimension of the array dV. LDDV >= max(1,N).
[in]dTCOMPLEX array on the GPU, dimension (NB, NB). On entry the NB-by-NB upper trinagular matrix defining the orthogonal Hessenberg reduction transformation matrix for the current panel. The lower triangular part are 0s.
dwork(workspace) COMPLEX array on the GPU, dimension N*NB.

Further Details

This implementation follows the algorithm and notations described in:

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

The difference is that here Am is computed on the GPU. M is renamed Am, G is renamed Ag.

magma_int_t magma_clahru_m ( magma_int_t  n,
magma_int_t  ihi,
magma_int_t  k,
magma_int_t  nb,
magmaFloatComplex *  A,
magma_int_t  lda,
struct cgehrd_data *  data 
)

CLAHRU is an auxiliary MAGMA routine that is used in CGEHRD to update the trailing sub-matrices after the reductions of the corresponding panels.

See further details below.

Parameters
[in]nINTEGER The order of the matrix A. N >= 0.
[in]ihiINTEGER Last row to update. Same as IHI in cgehrd.
[in]kINTEGER Number of rows of the matrix Am (see details below)
[in]nbINTEGER Block size
[out]ACOMPLEX array, dimension (LDA,N-K) On entry, the N-by-(N-K) general matrix to be updated. The computation is done on the GPU. After Am is updated on the GPU only Am(1:NB) is transferred to the CPU - to update the corresponding Am matrix. See Further Details below.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max(1,N).
[in,out]dataStructure with pointers to dA, dT, dV, dW, dY which are distributed across multiple GPUs.

Further Details

This implementation follows the algorithm and notations described in:

S. Tomov and J. Dongarra, "Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing," University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 24, 2009.

The difference is that here Am is computed on the GPU. M is renamed Am, G is renamed Ag.

magma_int_t magma_clatrsd ( magma_uplo_t  uplo,
magma_trans_t  trans,
magma_diag_t  diag,
magma_bool_t  normin,
magma_int_t  n,
const magmaFloatComplex *  A,
magma_int_t  lda,
magmaFloatComplex  lambda,
magmaFloatComplex *  x,
float *  scale,
float *  cnorm,
magma_int_t *  info 
)

CLATRSD solves one of the triangular systems with modified diagonal (A - lambda*I) * x = s*b, (A - lambda*I)**T * x = s*b, or (A - lambda*I)**H * x = s*b, with scaling to prevent overflow.

Here A is an upper or lower triangular matrix, A**T denotes the transpose of A, A**H denotes the conjugate transpose of A, x and b are n-element vectors, and s is a scaling factor, usually less than or equal to 1, chosen so that the components of x will be less than the overflow threshold. If the unscaled problem will not cause overflow, the Level 2 BLAS routine CTRSV is called. If the matrix A is singular (A(j,j) = 0 for some j), then s is set to 0 and a non-trivial solution to A*x = 0 is returned.

This version subtracts lambda from the diagonal, for use in ctrevc to compute eigenvectors. It does not modify A during the computation.

Parameters
[in]uplomagma_uplo_t Specifies whether the matrix A is upper or lower triangular.
  • = MagmaUpper: Upper triangular
  • = MagmaLower: Lower triangular
[in]transmagma_trans_t Specifies the operation applied to A.
  • = MagmaNoTrans: Solve (A - lambda*I) * x = s*b (No transpose)
  • = MagmaTrans: Solve (A - lambda*I)**T * x = s*b (Transpose)
  • = MagmaConjTrans: Solve (A - lambda*I)**H * x = s*b (Conjugate transpose)
[in]diagmagma_diag_t Specifies whether or not the matrix A is unit triangular.
  • = MagmaNonUnit: Non-unit triangular
  • = MagmaUnit: Unit triangular
[in]norminmagma_bool_t Specifies whether CNORM has been set or not.
  • = MagmaTrue: CNORM contains the column norms on entry
  • = MagmaFalse: CNORM is not set on entry. On exit, the norms will be computed and stored in CNORM.
[in]nINTEGER The order of the matrix A. N >= 0.
[in]ACOMPLEX array, dimension (LDA,N) The triangular matrix A. If UPLO = MagmaUpper, the leading n by n upper triangular part of the array A contains the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If UPLO = MagmaLower, the leading n by n lower triangular part of the array A contains the lower triangular matrix, and the strictly upper triangular part of A is not referenced. If DIAG = MagmaUnit, the diagonal elements of A are also not referenced and are assumed to be 1.
[in]ldaINTEGER The leading dimension of the array A. LDA >= max (1,N).
[in]lambdaCOMPLEX Eigenvalue to subtract from diagonal of A.
[in,out]xCOMPLEX array, dimension (N) On entry, the right hand side b of the triangular system. On exit, X is overwritten by the solution vector x.
[out]scaleREAL The scaling factor s for the triangular system A * x = s*b, A**T * x = s*b, or A**H * x = s*b. If SCALE = 0, the matrix A is singular or badly scaled, and the vector x is an exact or approximate solution to A*x = 0.
[in,out]cnorm(input or output) REAL array, dimension (N)
  • If NORMIN = MagmaTrue, CNORM is an input argument and CNORM(j) contains the norm of the off-diagonal part of the j-th column of A. If TRANS = MagmaNoTrans, CNORM(j) must be greater than or equal to the infinity-norm, and if TRANS = MagmaTrans or MagmaConjTrans, CNORM(j) must be greater than or equal to the 1-norm.
  • If NORMIN = MagmaFalse, CNORM is an output argument and CNORM(j) returns the 1-norm of the offdiagonal part of the j-th column of A.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -k, the k-th argument had an illegal value

Further Details

A rough bound on x is computed; if that is less than overflow, CTRSV is called, otherwise, specific code is used which checks for possible overflow or divide-by-zero at every operation.

A columnwise scheme is used for solving A*x = b. The basic algorithm if A is lower triangular is

 x[1:n] := b[1:n]
 for j = 1, ..., n
      x(j) := x(j) / A(j,j)
      x[j+1:n] := x[j+1:n] - x(j) * A[j+1:n,j]
 end

Define bounds on the components of x after j iterations of the loop: M(j) = bound on x[1:j] G(j) = bound on x[j+1:n] Initially, let M(0) = 0 and G(0) = max{x(i), i=1,...,n}.

Then for iteration j+1 we have M(j+1) <= G(j) / | A(j+1,j+1) | G(j+1) <= G(j) + M(j+1) * | A[j+2:n,j+1] | <= G(j) ( 1 + CNORM(j+1) / | A(j+1,j+1) | )

where CNORM(j+1) is greater than or equal to the infinity-norm of column j+1 of A, not counting the diagonal. Hence

G(j) <= G(0) product ( 1 + CNORM(i) / | A(i,i) | ) 1<=i<=j and

|x(j)| <= ( G(0) / |A(j,j)| ) product ( 1 + CNORM(i) / |A(i,i)| ) 1<=i< j

Since |x(j)| <= M(j), we use the Level 2 BLAS routine CTRSV if the reciprocal of the largest M(j), j=1,..,n, is larger than max(underflow, 1/overflow).

The bound on x(j) is also used to determine when a step in the columnwise method can be performed without fear of overflow. If the computed bound is greater than a large constant, x is scaled to prevent overflow, but if the bound overflows, x is set to 0, x(j) to 1, and scale to 0, and a non-trivial solution to A*x = 0 is found.

Similarly, a row-wise scheme is used to solve A**T *x = b or A**H *x = b. The basic algorithm for A upper triangular is

 for j = 1, ..., n
      x(j) := ( b(j) - A[1:j-1,j]' * x[1:j-1] ) / A(j,j)
 end

We simultaneously compute two bounds G(j) = bound on ( b(i) - A[1:i-1,i]' * x[1:i-1] ), 1<=i<=j M(j) = bound on x(i), 1<=i<=j

The initial values are G(0) = 0, M(0) = max{b(i), i=1,..,n}, and we add the constraint G(j) >= G(j-1) and M(j) >= M(j-1) for j >= 1. Then the bound on x(j) is

 M(j) <= M(j-1) * ( 1 + CNORM(j) ) / | A(j,j) |

      <= M(0) * product ( ( 1 + CNORM(i) ) / |A(i,i)| )
                1<=i<=j

and we can safely call CTRSV if 1/M(n) and 1/G(n) are both greater than max(underflow, 1/overflow).