MAGMA 2.10.0
Matrix Algebra for GPU and Multicore Architectures
Loading...
Searching...
No Matches

Functions

magma_int_t magma_cgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF.
 
magma_int_t magma_cgetrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaFloatComplex **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_cgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_cgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_cgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_cgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloatComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_cgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF.
 
magma_int_t magma_dgetrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, double **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_dgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_dgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_dgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_dgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDouble_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_dgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetrf_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF.
 
magma_int_t magma_sgetrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, float **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_sgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, float *dtol_array, float eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_sgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_sgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_sgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloat_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_sgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetrf_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF.
 
magma_int_t magma_zgetrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetrf_nopiv_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaDoubleComplex **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_zgetrf_nopiv_expert_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, double *dtol_array, double eps, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_zgetrf_nopiv_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.
 
magma_int_t magma_zgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_zgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDoubleComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_zgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_cgetrf_batched_smallsq_noshfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_dgetrf_batched_smallsq_noshfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_sgetrf_batched_smallsq_noshfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_zgetrf_batched_smallsq_noshfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 

Detailed Description

Function Documentation

◆ magma_cgbsv_batched_work()

magma_int_t magma_cgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbsv_batched()

magma_int_t magma_cgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched_work()

magma_int_t magma_cgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched()

magma_int_t magma_cgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrs_batched()

magma_int_t magma_cgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_batched()

magma_int_t magma_cgetrf_batched ( magma_int_t m,
magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_nopiv_vbatched_max_nocheck_work()

magma_int_t magma_cgetrf_nopiv_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
float * dtol_array,
float eps,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_nopiv_expert_vbatched()

magma_int_t magma_cgetrf_nopiv_expert_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
float * dtol_array,
float eps,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_nopiv_vbatched()

magma_int_t magma_cgetrf_nopiv_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_recpanel_batched()

magma_int_t magma_cgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
magmaFloatComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_recpanel_native()

magma_int_t magma_cgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaFloatComplex_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA COMPLEX array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_cgetrf_vbatched_max_nocheck_work()

magma_int_t magma_cgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_vbatched()

magma_int_t magma_cgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched_work()

magma_int_t magma_dgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched()

magma_int_t magma_dgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched_work()

magma_int_t magma_dgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched()

magma_int_t magma_dgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrs_batched()

magma_int_t magma_dgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_batched()

magma_int_t magma_dgetrf_batched ( magma_int_t m,
magma_int_t n,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_nopiv_vbatched_max_nocheck_work()

magma_int_t magma_dgetrf_nopiv_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
double ** dA_array,
magma_int_t * ldda,
double * dtol_array,
double eps,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_nopiv_expert_vbatched()

magma_int_t magma_dgetrf_nopiv_expert_vbatched ( magma_int_t * m,
magma_int_t * n,
double ** dA_array,
magma_int_t * ldda,
double * dtol_array,
double eps,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_nopiv_vbatched()

magma_int_t magma_dgetrf_nopiv_vbatched ( magma_int_t * m,
magma_int_t * n,
double ** dA_array,
magma_int_t * ldda,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_recpanel_batched()

magma_int_t magma_dgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
double ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_recpanel_native()

magma_int_t magma_dgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaDouble_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_dgetrf_vbatched_max_nocheck_work()

magma_int_t magma_dgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
double ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_vbatched()

magma_int_t magma_dgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
double ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched_work()

magma_int_t magma_sgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched()

magma_int_t magma_sgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched_work()

magma_int_t magma_sgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched()

magma_int_t magma_sgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrs_batched()

magma_int_t magma_sgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_batched()

magma_int_t magma_sgetrf_batched ( magma_int_t m,
magma_int_t n,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_nopiv_vbatched_max_nocheck_work()

magma_int_t magma_sgetrf_nopiv_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
float ** dA_array,
magma_int_t * ldda,
float * dtol_array,
float eps,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_nopiv_expert_vbatched()

magma_int_t magma_sgetrf_nopiv_expert_vbatched ( magma_int_t * m,
magma_int_t * n,
float ** dA_array,
magma_int_t * ldda,
float * dtol_array,
float eps,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_nopiv_vbatched()

magma_int_t magma_sgetrf_nopiv_vbatched ( magma_int_t * m,
magma_int_t * n,
float ** dA_array,
magma_int_t * ldda,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_recpanel_batched()

magma_int_t magma_sgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
float ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_recpanel_native()

magma_int_t magma_sgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaFloat_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA REAL array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_sgetrf_vbatched_max_nocheck_work()

magma_int_t magma_sgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
float ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_vbatched()

magma_int_t magma_sgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
float ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched_work()

magma_int_t magma_zgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched()

magma_int_t magma_zgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched_work()

magma_int_t magma_zgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched()

magma_int_t magma_zgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrs_batched()

magma_int_t magma_zgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_batched()

magma_int_t magma_zgetrf_batched ( magma_int_t m,
magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_nopiv_vbatched_max_nocheck_work()

magma_int_t magma_zgetrf_nopiv_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
double * dtol_array,
double eps,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_nopiv_expert_vbatched()

magma_int_t magma_zgetrf_nopiv_expert_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
double * dtol_array,
double eps,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

It replaces tiny pivots smaller than a specified tolerance by that tolerance.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

This is the expert version taking an extra parameter for the tolerance for diagonal elements. Small diagonal elements will be replaced by the specified tolerance preserving the sign and the info array will report the number of replacements. This is useful in the context of static pivoting used in sparse solvers such as SuperLU, where the tolerance would be the the norm of the matrix scaled by the machine epsilon for example.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[in]dtol_arrayArray of DOUBLEs, dimension (batchCount), for corresponding matrices. Each is the tolerance that is compared to the diagonal element before the column is scaled by its inverse. If the value of the diagonal is less than the threshold, the diagonal is replaced by the threshold. If the array is set to NULL, then the threshold is set to the eps parameter
[in]epsDOUBLE The value to use for the tolerance for all matrices if the dtol_array is NULL
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations. If a tolerance array is specified the value shows the number of times a tiny pivot was replaced
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_nopiv_vbatched()

magma_int_t magma_zgetrf_nopiv_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF NOPIV computes an LU factorization of a general M-by-N matrix A without pivoting.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_recpanel_batched()

magma_int_t magma_zgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
magmaDoubleComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_recpanel_native()

magma_int_t magma_zgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaDoubleComplex_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_zgetrf_vbatched_max_nocheck_work()

magma_int_t magma_zgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_vbatched()

magma_int_t magma_zgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbsv_batched_fused_sm()

magma_int_t magma_cgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched_fused_sm()

magma_int_t magma_cgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrf_batched_sliding_window_loopout()

magma_int_t magma_cgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrf_batched_sliding_window_loopin()

magma_int_t magma_cgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgetf2_nopiv_internal_batched()

magma_int_t magma_cgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_batched_smallsq_noshfl()

magma_int_t magma_cgetrf_batched_smallsq_noshfl ( magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched_fused_sm()

magma_int_t magma_dgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched_fused_sm()

magma_int_t magma_dgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrf_batched_sliding_window_loopout()

magma_int_t magma_dgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrf_batched_sliding_window_loopin()

magma_int_t magma_dgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgetf2_nopiv_internal_batched()

magma_int_t magma_dgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
double ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_batched_smallsq_noshfl()

magma_int_t magma_dgetrf_batched_smallsq_noshfl ( magma_int_t n,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched_fused_sm()

magma_int_t magma_sgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched_fused_sm()

magma_int_t magma_sgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrf_batched_sliding_window_loopout()

magma_int_t magma_sgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrf_batched_sliding_window_loopin()

magma_int_t magma_sgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgetf2_nopiv_internal_batched()

magma_int_t magma_sgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
float ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_batched_smallsq_noshfl()

magma_int_t magma_sgetrf_batched_smallsq_noshfl ( magma_int_t n,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched_fused_sm()

magma_int_t magma_zgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched_fused_sm()

magma_int_t magma_zgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrf_batched_sliding_window_loopout()

magma_int_t magma_zgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrf_batched_sliding_window_loopin()

magma_int_t magma_zgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgetf2_nopiv_internal_batched()

magma_int_t magma_zgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_batched_smallsq_noshfl()

magma_int_t magma_zgetrf_batched_smallsq_noshfl ( magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.