![]() |
MAGMA 2.10.0
Matrix Algebra for GPU and Multicore Architectures
|
Functions | |
| magma_int_t | magma_cgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_cgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_cgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF. | |
| magma_int_t | magma_cgetrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaFloatComplex **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_cgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_cgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_cgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| This is an internal routine that might have many assumption. | |
| magma_int_t | magma_cgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloatComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue) |
| This is an internal routine. | |
| magma_int_t | magma_cgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_dgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_dgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF. | |
| magma_int_t | magma_dgetrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, double **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_dgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_dgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_dgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| This is an internal routine that might have many assumption. | |
| magma_int_t | magma_dgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDouble_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue) |
| This is an internal routine. | |
| magma_int_t | magma_dgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgetrf_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_sgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_sgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF. | |
| magma_int_t | magma_sgetrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, float **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_sgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_sgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_sgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| This is an internal routine that might have many assumption. | |
| magma_int_t | magma_sgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloat_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue) |
| This is an internal routine. | |
| magma_int_t | magma_sgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgetrf_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_zgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_zgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF. | |
| magma_int_t | magma_zgetrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaDoubleComplex **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_zgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_zgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting. | |
| magma_int_t | magma_zgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| This is an internal routine that might have many assumption. | |
| magma_int_t | magma_zgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDoubleComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue) |
| This is an internal routine. | |
| magma_int_t | magma_zgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_cgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_cgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A. | |
| magma_int_t | magma_cgetrf_batched_smallsq_noshfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_dgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_dgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A. | |
| magma_int_t | magma_dgetrf_batched_smallsq_noshfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_sgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_sgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A. | |
| magma_int_t | magma_sgetrf_batched_smallsq_noshfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices. | |
| magma_int_t | magma_zgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges. | |
| magma_int_t | magma_zgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue) |
| zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A. | |
| magma_int_t | magma_zgetrf_batched_smallsq_noshfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue) |
| zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges. | |
| magma_int_t magma_cgbsv_batched_work | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaFloatComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory. |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgbsv_batched | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaFloatComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgbtrf_batched_work | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaFloatComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.
| [in] | lddab | INTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgbtrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaFloatComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_cgbtrs_batched | ( | magma_trans_t | transA, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaFloatComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF.
This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)
| [in] | transA | magma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B) |
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_nopiv_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| float * | dtol_array, | ||
| float | eps, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_nopiv_expert_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| float * | dtol_array, | ||
| float | eps, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_nopiv_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_recpanel_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | min_recpnb, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t ** | dpivinfo_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
This is an internal routine that might have many assumption.
Documentation is not fully completed
CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | min_recpnb | INTEGER. Internal use. The recursive nb |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for A. |
| [in] | aj | INTEGER Column offset for A. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dpivinfo_array | Array of pointers, dimension (batchCount), for internal use. |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_recpanel_native | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | recnb, | ||
| magmaFloatComplex_ptr | dA, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | dipiv, | ||
| magma_int_t * | dipivinfo, | ||
| magma_int_t * | dinfo, | ||
| magma_int_t | gbstep, | ||
| magma_event_t | events[2], | ||
| magma_queue_t | queue, | ||
| magma_queue_t | update_queue ) |
This is an internal routine.
CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a GPU-only routine. The host CPU is not used.
| [in] | m | INTEGER The number of rows the matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns the matrix A. N >= 0. |
| [in,out] | dA | A COMPLEX array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of A. LDDA >= max(1,M). |
| [out] | dipiv | An INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dipivinfo | An INTEGER array, for internal use. |
| [out] | dinfo | INTEGER, stored on the GPU
|
| [in] | gbstep | INTEGER internal use. |
| [in] | queues | Array of magma_queue_t, size 2 Queues to execute in. |
| magma_int_t magma_cgetrf_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbsv_batched_work | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| double ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory. |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbsv_batched | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| double ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbtrf_batched_work | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| double ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.
| [in] | lddab | INTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbtrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| double ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_dgbtrs_batched | ( | magma_trans_t | transA, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| double ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF.
This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)
| [in] | transA | magma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B) |
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_nopiv_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| double ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| double * | dtol_array, | ||
| double | eps, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_nopiv_expert_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| double ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| double * | dtol_array, | ||
| double | eps, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_nopiv_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| double ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_recpanel_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | min_recpnb, | ||
| double ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t ** | dpivinfo_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
This is an internal routine that might have many assumption.
Documentation is not fully completed
DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | min_recpnb | INTEGER. Internal use. The recursive nb |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for A. |
| [in] | aj | INTEGER Column offset for A. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dpivinfo_array | Array of pointers, dimension (batchCount), for internal use. |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_recpanel_native | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | recnb, | ||
| magmaDouble_ptr | dA, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | dipiv, | ||
| magma_int_t * | dipivinfo, | ||
| magma_int_t * | dinfo, | ||
| magma_int_t | gbstep, | ||
| magma_event_t | events[2], | ||
| magma_queue_t | queue, | ||
| magma_queue_t | update_queue ) |
This is an internal routine.
DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a GPU-only routine. The host CPU is not used.
| [in] | m | INTEGER The number of rows the matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns the matrix A. N >= 0. |
| [in,out] | dA | A DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of A. LDDA >= max(1,M). |
| [out] | dipiv | An INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dipivinfo | An INTEGER array, for internal use. |
| [out] | dinfo | INTEGER, stored on the GPU
|
| [in] | gbstep | INTEGER internal use. |
| [in] | queues | Array of magma_queue_t, size 2 Queues to execute in. |
| magma_int_t magma_dgetrf_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| double ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| double ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbsv_batched_work | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| float ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory. |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbsv_batched | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| float ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbtrf_batched_work | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| float ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.
| [in] | lddab | INTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbtrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| float ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_sgbtrs_batched | ( | magma_trans_t | transA, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| float ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF.
This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)
| [in] | transA | magma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B) |
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_nopiv_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| float ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| float * | dtol_array, | ||
| float | eps, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_nopiv_expert_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| float ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| float * | dtol_array, | ||
| float | eps, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_nopiv_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| float ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_recpanel_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | min_recpnb, | ||
| float ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t ** | dpivinfo_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
This is an internal routine that might have many assumption.
Documentation is not fully completed
SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | min_recpnb | INTEGER. Internal use. The recursive nb |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for A. |
| [in] | aj | INTEGER Column offset for A. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dpivinfo_array | Array of pointers, dimension (batchCount), for internal use. |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_recpanel_native | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | recnb, | ||
| magmaFloat_ptr | dA, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | dipiv, | ||
| magma_int_t * | dipivinfo, | ||
| magma_int_t * | dinfo, | ||
| magma_int_t | gbstep, | ||
| magma_event_t | events[2], | ||
| magma_queue_t | queue, | ||
| magma_queue_t | update_queue ) |
This is an internal routine.
SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a GPU-only routine. The host CPU is not used.
| [in] | m | INTEGER The number of rows the matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns the matrix A. N >= 0. |
| [in,out] | dA | A REAL array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of A. LDDA >= max(1,M). |
| [out] | dipiv | An INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dipivinfo | An INTEGER array, for internal use. |
| [out] | dinfo | INTEGER, stored on the GPU
|
| [in] | gbstep | INTEGER internal use. |
| [in] | queues | Array of magma_queue_t, size 2 Queues to execute in. |
| magma_int_t magma_sgetrf_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| float ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| float ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbsv_batched_work | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaDoubleComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory. |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbsv_batched | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaDoubleComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbtrf_batched_work | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaDoubleComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.
| [in] | lddab | INTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in,out] | device_work | Workspace, allocated on device memory |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbtrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaDoubleComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_zgbtrs_batched | ( | magma_trans_t | transA, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magmaDoubleComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF.
This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)
| [in] | transA | magma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B) |
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_nopiv_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| double * | dtol_array, | ||
| double | eps, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_nopiv_expert_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| double * | dtol_array, | ||
| double | eps, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
It replaces tiny pivots smaller than a specified tolerance by that tolerance.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [in] | dtol_array | Array of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter |
| [in] | eps | DOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_nopiv_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_recpanel_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | min_recpnb, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t ** | dpivinfo_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
This is an internal routine that might have many assumption.
Documentation is not fully completed
ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | m | INTEGER The number of rows of each matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns of each matrix A. N >= 0. |
| [in] | min_recpnb | INTEGER. Internal use. The recursive nb |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for A. |
| [in] | aj | INTEGER Column offset for A. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dpivinfo_array | Array of pointers, dimension (batchCount), for internal use. |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_recpanel_native | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | recnb, | ||
| magmaDoubleComplex_ptr | dA, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | dipiv, | ||
| magma_int_t * | dipivinfo, | ||
| magma_int_t * | dinfo, | ||
| magma_int_t | gbstep, | ||
| magma_event_t | events[2], | ||
| magma_queue_t | queue, | ||
| magma_queue_t | update_queue ) |
This is an internal routine.
ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a GPU-only routine. The host CPU is not used.
| [in] | m | INTEGER The number of rows the matrix A. M >= 0. |
| [in] | n | INTEGER The number of columns the matrix A. N >= 0. |
| [in,out] | dA | A COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of A. LDDA >= max(1,M). |
| [out] | dipiv | An INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dipivinfo | An INTEGER array, for internal use. |
| [out] | dinfo | INTEGER, stored on the GPU
|
| [in] | gbstep | INTEGER internal use. |
| [in] | queues | Array of magma_queue_t, size 2 Queues to execute in. |
| magma_int_t magma_zgetrf_vbatched_max_nocheck_work | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magma_int_t | max_m, | ||
| magma_int_t | max_n, | ||
| magma_int_t | max_minmn, | ||
| magma_int_t | max_mxn, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in] | MAX_M | INTEGER The maximum number of rows across the batch |
| [in] | MAX_N | INTEGER The maximum number of columns across the batch |
| [in] | MAX_MINMN | INTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount |
| [in] | MAX_MxN | INTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | WORK | VOID pointer A workspace of size LWORK[0] |
| [in,out] | LWORK | INTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_vbatched | ( | magma_int_t * | m, |
| magma_int_t * | n, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t * | ldda, | ||
| magma_int_t ** | dipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.
| [in] | M | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0. |
| [in] | N | Array of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0. |
| [in,out] | dA_array | Array of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | Array of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]). |
| [out] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgbsv_batched_fused_sm | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magmaFloatComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgbtrf_batched_fused_sm | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaFloatComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36 * + + + + * * u13 u24 u35 u46 a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked
| magma_int_t magma_cgbtrf_batched_sliding_window_loopout | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaFloatComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in,out] | device_work | Workspace, allocated on device memory by the user |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_cgbtrf_batched_sliding_window_loopin | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaFloatComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_cgetf2_nopiv_internal_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
This routine can deal with matrices of limited widths, so it is for internal use.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is a batched version that factors batchCount M-by-N matrices in parallel.
| [in] | m | INTEGER The number of rows the matrix A. N >= 0. |
| [in] | n | INTEGER The number of columns of the matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for dA_array. |
| [in] | aj | INTEGER Column offset for dA_array. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER Internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_cgetrf_batched_smallsq_noshfl | ( | magma_int_t | n, |
| magmaFloatComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | n | INTEGER The size of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbsv_batched_fused_sm | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| double ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgbtrf_batched_fused_sm | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| double ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36 * + + + + * * u13 u24 u35 u46 a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked
| magma_int_t magma_dgbtrf_batched_sliding_window_loopout | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| double ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in,out] | device_work | Workspace, allocated on device memory by the user |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_dgbtrf_batched_sliding_window_loopin | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| double ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_dgetf2_nopiv_internal_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| double ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
This routine can deal with matrices of limited widths, so it is for internal use.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is a batched version that factors batchCount M-by-N matrices in parallel.
| [in] | m | INTEGER The number of rows the matrix A. N >= 0. |
| [in] | n | INTEGER The number of columns of the matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for dA_array. |
| [in] | aj | INTEGER Column offset for dA_array. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER Internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_dgetrf_batched_smallsq_noshfl | ( | magma_int_t | n, |
| double ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | n | INTEGER The size of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbsv_batched_fused_sm | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| float ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgbtrf_batched_fused_sm | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| float ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36 * + + + + * * u13 u24 u35 u46 a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked
| magma_int_t magma_sgbtrf_batched_sliding_window_loopout | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| float ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in,out] | device_work | Workspace, allocated on device memory by the user |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_sgbtrf_batched_sliding_window_loopin | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| float ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_sgetf2_nopiv_internal_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| float ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
This routine can deal with matrices of limited widths, so it is for internal use.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is a batched version that factors batchCount M-by-N matrices in parallel.
| [in] | m | INTEGER The number of rows the matrix A. N >= 0. |
| [in] | n | INTEGER The number of columns of the matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for dA_array. |
| [in] | aj | INTEGER Column offset for dA_array. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER Internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_sgetrf_batched_smallsq_noshfl | ( | magma_int_t | n, |
| float ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | n | INTEGER The size of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbsv_batched_fused_sm | ( | magma_int_t | n, |
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magma_int_t | nrhs, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magmaDoubleComplex ** | dB_array, | ||
| magma_int_t | lddb, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.
This is the batched version of the routine.
| [in] | n | INTEGER The order of the matrix A. n >= 0. |
| [in] | kl | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | ku | INTEGER The number of superdiagonals within the band of A. KL >= 0. |
| [in] | nrhs | INTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0. |
| [in] | dA_array | Array of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1). |
| [in] | dipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [in,out] | dB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X. |
| [in] | lddb | INTEGER The leading dimension of each array B. LDDB >= max(1, N). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgbtrf_batched_fused_sm | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaDoubleComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | nthreads, | ||
| magma_int_t | ntcol, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | nthreads | INTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1) |
| [in] | ntcol | INTEGER The number of concurrent factorizations in a thread-block ntcol >= 1 |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36 * + + + + * * u13 u24 u35 u46 a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked
| magma_int_t magma_zgbtrf_batched_sliding_window_loopout | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaDoubleComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| void * | device_work, | ||
| magma_int_t * | lwork, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in,out] | device_work | Workspace, allocated on device memory by the user |
| [in,out] | lwork | INTEGER pointer The size of the workspace (device_work) in bytes
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_zgbtrf_batched_sliding_window_loopin | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magma_int_t | kl, | ||
| magma_int_t | ku, | ||
| magmaDoubleComplex ** | dAB_array, | ||
| magma_int_t | lddab, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.
This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.
| [in] | M | INTEGER The number of rows of the matrix A. M >= 0. |
| [in] | N | INTEGER The number of columns of the matrix A. N >= 0. |
| [in] | KL | INTEGER The number of subdiagonals within the band of A. KL >= 0. |
| [in] | KU | INTEGER The number of superdiagonals within the band of A. KU >= 0. |
| [in,out] | dAB_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) |
On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.
| [in] | LDDAB | INTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1. |
| [out] | dIPIV_array | Array of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | dINFO_array | INTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:
On entry: On exit:
* * + + + * * * u14 u25 u36
* + + + + * * u13 u24 u35 u46
a12 a23 a34 a45 a56 * u12 u23 u34 u45 u56
a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *
Array elements marked * are not used by the routine; elements marked
| magma_int_t magma_zgetf2_nopiv_internal_batched | ( | magma_int_t | m, |
| magma_int_t | n, | ||
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ai, | ||
| magma_int_t | aj, | ||
| magma_int_t | ldda, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | gbstep, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
This routine can deal with matrices of limited widths, so it is for internal use.
The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is a batched version that factors batchCount M-by-N matrices in parallel.
| [in] | m | INTEGER The number of rows the matrix A. N >= 0. |
| [in] | n | INTEGER The number of columns of the matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored. |
| [in] | ai | INTEGER Row offset for dA_array. |
| [in] | aj | INTEGER Column offset for dA_array. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | gbstep | INTEGER Internal use. |
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |
| magma_int_t magma_zgetrf_batched_smallsq_noshfl | ( | magma_int_t | n, |
| magmaDoubleComplex ** | dA_array, | ||
| magma_int_t | ldda, | ||
| magma_int_t ** | ipiv_array, | ||
| magma_int_t * | info_array, | ||
| magma_int_t | batchCount, | ||
| magma_queue_t | queue ) |
zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
This routine can deal only with square matrices of size up to 32
The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Level 3 BLAS version of the algorithm.
This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.
| [in] | n | INTEGER The size of each matrix A. N >= 0. |
| [in,out] | dA_array | Array of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored. |
| [in] | ldda | INTEGER The leading dimension of each array A. LDDA >= max(1,M). |
| [out] | ipiv_array | Array of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i). |
| [out] | info_array | Array of INTEGERs, dimension (batchCount), for corresponding matrices.
|
| [in] | batchCount | INTEGER The number of matrices to operate on. |
| [in] | queue | magma_queue_t Queue to execute in. |