MAGMA 2.9.0
Matrix Algebra for GPU and Multicore Architectures
Loading...
Searching...
No Matches
gegqr: QR factorization and generate Q

Functions

magma_int_t magma_cgegqr_expert_gpu_work (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaFloatComplex_ptr dA, magma_int_t ldda, void *host_work, magma_int_t *lwork_host, void *device_work, magma_int_t *lwork_device, magma_int_t *info, magma_queue_t queue)
 CGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:
 
magma_int_t magma_cgegqr_gpu (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaFloatComplex_ptr dA, magma_int_t ldda, magmaFloatComplex_ptr dwork, magmaFloatComplex *work, magma_int_t *info)
 CGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:
 
magma_int_t magma_dgegqr_expert_gpu_work (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaDouble_ptr dA, magma_int_t ldda, void *host_work, magma_int_t *lwork_host, void *device_work, magma_int_t *lwork_device, magma_int_t *info, magma_queue_t queue)
 DGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:
 
magma_int_t magma_dgegqr_gpu (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaDouble_ptr dA, magma_int_t ldda, magmaDouble_ptr dwork, double *work, magma_int_t *info)
 DGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:
 
magma_int_t magma_sgegqr_expert_gpu_work (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaFloat_ptr dA, magma_int_t ldda, void *host_work, magma_int_t *lwork_host, void *device_work, magma_int_t *lwork_device, magma_int_t *info, magma_queue_t queue)
 SGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:
 
magma_int_t magma_sgegqr_gpu (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaFloat_ptr dA, magma_int_t ldda, magmaFloat_ptr dwork, float *work, magma_int_t *info)
 SGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:
 
magma_int_t magma_zgegqr_expert_gpu_work (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaDoubleComplex_ptr dA, magma_int_t ldda, void *host_work, magma_int_t *lwork_host, void *device_work, magma_int_t *lwork_device, magma_int_t *info, magma_queue_t queue)
 ZGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:
 
magma_int_t magma_zgegqr_gpu (magma_int_t ikind, magma_int_t m, magma_int_t n, magmaDoubleComplex_ptr dA, magma_int_t ldda, magmaDoubleComplex_ptr dwork, magmaDoubleComplex *work, magma_int_t *info)
 ZGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:
 

Detailed Description

Function Documentation

◆ magma_cgegqr_expert_gpu_work()

magma_int_t magma_cgegqr_expert_gpu_work ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaFloatComplex_ptr dA,
magma_int_t ldda,
void * host_work,
magma_int_t * lwork_host,
void * device_work,
magma_int_t * lwork_device,
magma_int_t * info,
magma_queue_t queue )

CGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

This is an expert API, exposing more controls to the user

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_cgeqr2x3_gpu) and magma_cungqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dACOMPLEX array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
[out]host_workCPU workspace, size determined by lwork_host On exit, the first n^2 COMPLEX elements hold the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[in,out]lwork_hostINTEGER pointer The size of the CPU workspace (host_work) in bytes
  • lwork_host[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_host. The workspace itself is not referenced, and no computations is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_host.
Parameters
device_workGPU workspace, size determined by lwork_device
[in,out]lwork_deviceINTEGER pointer The size of the GPU workspace (device_work) in bytes
  • lwork_device[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_device. The workspace itself is not referenced, and no computation is performed.
  • lwork_device[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_device.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.
[in]queuemagma_queue_t
  • created/destroyed by the user outside the routine

◆ magma_cgegqr_gpu()

magma_int_t magma_cgegqr_gpu ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaFloatComplex_ptr dA,
magma_int_t ldda,
magmaFloatComplex_ptr dwork,
magmaFloatComplex * work,
magma_int_t * info )

CGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_cgeqr2x3_gpu) and magma_cungqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dACOMPLEX array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
dwork(GPU workspace) COMPLEX array, dimension: n^2 for ikind = 1 3 n^2 + min(m, n) + 2 for ikind = 2 0 (not used) for ikind = 3 n^2 for ikind = 4
[out]work(CPU workspace) COMPLEX array. The workspace size has changed for ikind = 1 since release 2.9.0 5 n^2 + 7n + 64 for ikind = 1 (not backward compatible) 3 n^2 otherwise (backward compatible) On exit, work(1:n^2) holds the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.

◆ magma_dgegqr_expert_gpu_work()

magma_int_t magma_dgegqr_expert_gpu_work ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaDouble_ptr dA,
magma_int_t ldda,
void * host_work,
magma_int_t * lwork_host,
void * device_work,
magma_int_t * lwork_device,
magma_int_t * info,
magma_queue_t queue )

DGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

This is an expert API, exposing more controls to the user

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_dgeqr2x3_gpu) and magma_dorgqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dADOUBLE PRECISION array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
[out]host_workCPU workspace, size determined by lwork_host On exit, the first n^2 DOUBLE PRECISION elements hold the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[in,out]lwork_hostINTEGER pointer The size of the CPU workspace (host_work) in bytes
  • lwork_host[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_host. The workspace itself is not referenced, and no computations is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_host.
Parameters
device_workGPU workspace, size determined by lwork_device
[in,out]lwork_deviceINTEGER pointer The size of the GPU workspace (device_work) in bytes
  • lwork_device[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_device. The workspace itself is not referenced, and no computation is performed.
  • lwork_device[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_device.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.
[in]queuemagma_queue_t
  • created/destroyed by the user outside the routine

◆ magma_dgegqr_gpu()

magma_int_t magma_dgegqr_gpu ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaDouble_ptr dA,
magma_int_t ldda,
magmaDouble_ptr dwork,
double * work,
magma_int_t * info )

DGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_dgeqr2x3_gpu) and magma_dorgqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dADOUBLE PRECISION array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
dwork(GPU workspace) DOUBLE PRECISION array, dimension: n^2 for ikind = 1 3 n^2 + min(m, n) + 2 for ikind = 2 0 (not used) for ikind = 3 n^2 for ikind = 4
[out]work(CPU workspace) DOUBLE PRECISION array. The workspace size has changed for ikind = 1 since release 2.9.0 5 n^2 + 7n + 64 for ikind = 1 (not backward compatible) 3 n^2 otherwise (backward compatible) On exit, work(1:n^2) holds the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.

◆ magma_sgegqr_expert_gpu_work()

magma_int_t magma_sgegqr_expert_gpu_work ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaFloat_ptr dA,
magma_int_t ldda,
void * host_work,
magma_int_t * lwork_host,
void * device_work,
magma_int_t * lwork_device,
magma_int_t * info,
magma_queue_t queue )

SGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

This is an expert API, exposing more controls to the user

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_sgeqr2x3_gpu) and magma_sorgqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dAREAL array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
[out]host_workCPU workspace, size determined by lwork_host On exit, the first n^2 REAL elements hold the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[in,out]lwork_hostINTEGER pointer The size of the CPU workspace (host_work) in bytes
  • lwork_host[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_host. The workspace itself is not referenced, and no computations is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_host.
Parameters
device_workGPU workspace, size determined by lwork_device
[in,out]lwork_deviceINTEGER pointer The size of the GPU workspace (device_work) in bytes
  • lwork_device[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_device. The workspace itself is not referenced, and no computation is performed.
  • lwork_device[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_device.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.
[in]queuemagma_queue_t
  • created/destroyed by the user outside the routine

◆ magma_sgegqr_gpu()

magma_int_t magma_sgegqr_gpu ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaFloat_ptr dA,
magma_int_t ldda,
magmaFloat_ptr dwork,
float * work,
magma_int_t * info )

SGEGQR orthogonalizes the N vectors given by a real M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_sgeqr2x3_gpu) and magma_sorgqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dAREAL array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
dwork(GPU workspace) REAL array, dimension: n^2 for ikind = 1 3 n^2 + min(m, n) + 2 for ikind = 2 0 (not used) for ikind = 3 n^2 for ikind = 4
[out]work(CPU workspace) REAL array. The workspace size has changed for ikind = 1 since release 2.9.0 5 n^2 + 7n + 64 for ikind = 1 (not backward compatible) 3 n^2 otherwise (backward compatible) On exit, work(1:n^2) holds the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.

◆ magma_zgegqr_expert_gpu_work()

magma_int_t magma_zgegqr_expert_gpu_work ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaDoubleComplex_ptr dA,
magma_int_t ldda,
void * host_work,
magma_int_t * lwork_host,
void * device_work,
magma_int_t * lwork_device,
magma_int_t * info,
magma_queue_t queue )

ZGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

This is an expert API, exposing more controls to the user

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_zgeqr2x3_gpu) and magma_zungqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dACOMPLEX_16 array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
[out]host_workCPU workspace, size determined by lwork_host On exit, the first n^2 COMPLEX_16 elements hold the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[in,out]lwork_hostINTEGER pointer The size of the CPU workspace (host_work) in bytes
  • lwork_host[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_host. The workspace itself is not referenced, and no computations is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_host.
Parameters
device_workGPU workspace, size determined by lwork_device
[in,out]lwork_deviceINTEGER pointer The size of the GPU workspace (device_work) in bytes
  • lwork_device[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork_device. The workspace itself is not referenced, and no computation is performed.
  • lwork_device[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork_device.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.
[in]queuemagma_queue_t
  • created/destroyed by the user outside the routine

◆ magma_zgegqr_gpu()

magma_int_t magma_zgegqr_gpu ( magma_int_t ikind,
magma_int_t m,
magma_int_t n,
magmaDoubleComplex_ptr dA,
magma_int_t ldda,
magmaDoubleComplex_ptr dwork,
magmaDoubleComplex * work,
magma_int_t * info )

ZGEGQR orthogonalizes the N vectors given by a complex M-by-N matrix A:

A = Q * R.

On exit, if successful, the orthogonal vectors Q overwrite A and R is given in work (on the CPU memory). The routine is designed for tall-and-skinny matrices: M >> N, N <= 128.

This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate.

Parameters
[in]ikindINTEGER Several versions are implemented indiceted by the ikind value: 1: This version uses normal equations and SVD in an iterative process that makes the computation numerically accurate. 2: This version uses a standard LAPACK-based orthogonalization through MAGMA's QR panel factorization (magma_zgeqr2x3_gpu) and magma_zungqr 3: Modified Gram-Schmidt (MGS)
  1. Cholesky QR [ Note: this method uses the normal equations which squares the condition number of A, therefore ||I - Q'Q|| < O(eps cond(A)^2) ]
[in]mINTEGER The number of rows of the matrix A. m >= n >= 0.
[in]nINTEGER The number of columns of the matrix A. 128 >= n >= 0.
[in,out]dACOMPLEX_16 array on the GPU, dimension (ldda,n) On entry, the m-by-n matrix A. On exit, the m-by-n matrix Q with orthogonal columns.
[in]lddaINTEGER The leading dimension of the array dA. LDDA >= max(1,m). To benefit from coalescent memory accesses LDDA must be divisible by 16.
dwork(GPU workspace) COMPLEX_16 array, dimension: n^2 for ikind = 1 3 n^2 + min(m, n) + 2 for ikind = 2 0 (not used) for ikind = 3 n^2 for ikind = 4
[out]work(CPU workspace) COMPLEX_16 array. The workspace size has changed for ikind = 1 since release 2.9.0 5 n^2 + 7n + 64 for ikind = 1 (not backward compatible) 3 n^2 otherwise (backward compatible) On exit, work(1:n^2) holds the rectangular matrix R. Preferably, for higher performance, work should be in pinned memory.
[out]infoINTEGER
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: for ikind = 1 and 4, the normal equations were not positive definite, so the factorization could not be completed, and the solution has not been computed. For ikind = 3, the space is not linearly independent. For all these cases the rank (< n) of the space is returned.