MAGMA 2.9.0
Matrix Algebra for GPU and Multicore Architectures
Loading...
Searching...
No Matches
setmatrix_transpose: CPU => GPU

Functions

void magmablas_csetmatrix_transpose (magma_int_t m, magma_int_t n, magma_int_t nb, const magmaFloatComplex *hA, magma_int_t lda, magmaFloatComplex_ptr dAT, magma_int_t ldda, magmaFloatComplex_ptr dwork, magma_int_t lddw, magma_queue_t queues[2])
 Copy and transpose matrix hA on CPU host to dAT on GPU device.
 
void magmablas_csetmatrix_transpose_mgpu (magma_int_t ngpu, magma_int_t m, magma_int_t n, magma_int_t nb, const magmaFloatComplex *hA, magma_int_t lda, magmaFloatComplex_ptr dAT[], magma_int_t ldda, magmaFloatComplex_ptr dwork[], magma_int_t lddw, magma_queue_t queues[][2])
 Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.
 
void magmablas_dsetmatrix_transpose (magma_int_t m, magma_int_t n, magma_int_t nb, const double *hA, magma_int_t lda, magmaDouble_ptr dAT, magma_int_t ldda, magmaDouble_ptr dwork, magma_int_t lddw, magma_queue_t queues[2])
 Copy and transpose matrix hA on CPU host to dAT on GPU device.
 
void magmablas_dsetmatrix_transpose_mgpu (magma_int_t ngpu, magma_int_t m, magma_int_t n, magma_int_t nb, const double *hA, magma_int_t lda, magmaDouble_ptr dAT[], magma_int_t ldda, magmaDouble_ptr dwork[], magma_int_t lddw, magma_queue_t queues[][2])
 Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.
 
void magmablas_ssetmatrix_transpose (magma_int_t m, magma_int_t n, magma_int_t nb, const float *hA, magma_int_t lda, magmaFloat_ptr dAT, magma_int_t ldda, magmaFloat_ptr dwork, magma_int_t lddw, magma_queue_t queues[2])
 Copy and transpose matrix hA on CPU host to dAT on GPU device.
 
void magmablas_ssetmatrix_transpose_mgpu (magma_int_t ngpu, magma_int_t m, magma_int_t n, magma_int_t nb, const float *hA, magma_int_t lda, magmaFloat_ptr dAT[], magma_int_t ldda, magmaFloat_ptr dwork[], magma_int_t lddw, magma_queue_t queues[][2])
 Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.
 
void magmablas_zsetmatrix_transpose (magma_int_t m, magma_int_t n, magma_int_t nb, const magmaDoubleComplex *hA, magma_int_t lda, magmaDoubleComplex_ptr dAT, magma_int_t ldda, magmaDoubleComplex_ptr dwork, magma_int_t lddw, magma_queue_t queues[2])
 Copy and transpose matrix hA on CPU host to dAT on GPU device.
 
void magmablas_zsetmatrix_transpose_mgpu (magma_int_t ngpu, magma_int_t m, magma_int_t n, magma_int_t nb, const magmaDoubleComplex *hA, magma_int_t lda, magmaDoubleComplex_ptr dAT[], magma_int_t ldda, magmaDoubleComplex_ptr dwork[], magma_int_t lddw, magma_queue_t queues[][2])
 Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.
 

Detailed Description

Function Documentation

◆ magmablas_csetmatrix_transpose()

void magmablas_csetmatrix_transpose ( magma_int_t m,
magma_int_t n,
magma_int_t nb,
const magmaFloatComplex * hA,
magma_int_t lda,
magmaFloatComplex_ptr dAT,
magma_int_t ldda,
magmaFloatComplex_ptr dwork,
magma_int_t lddw,
magma_queue_t queues[2] )

Copy and transpose matrix hA on CPU host to dAT on GPU device.

Parameters
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[in]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[out]dATThe n-by-m matrix A^T on the GPU, of dimension (ldda,m).
[in]lddaLeading dimension of matrix dAT. ldda >= n.
[out]dworkWorkspace on the GPU, of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queuesArray of two queues, to pipeline operation.

◆ magmablas_csetmatrix_transpose_mgpu()

void magmablas_csetmatrix_transpose_mgpu ( magma_int_t ngpu,
magma_int_t m,
magma_int_t n,
magma_int_t nb,
const magmaFloatComplex * hA,
magma_int_t lda,
magmaFloatComplex_ptr dAT[],
magma_int_t ldda,
magmaFloatComplex_ptr dwork[],
magma_int_t lddw,
magma_queue_t queues[][2] )

Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.

Parameters
[in]ngpuNumber of GPUs over which dAT is distributed.
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[out]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[in]dATArray of ngpu pointers, one per GPU, that store the disributed n-by-m matrix A^T on the GPUs, each of dimension (ldda,m).
[in]lddaLeading dimension of each matrix dAT on each GPU. ngpu*ldda >= n.
[out]dworkArray of ngpu pointers, one per GPU, that store the workspaces on each GPU, each of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queues2D array of dimension (ngpu,2), with two queues per GPU.

◆ magmablas_dsetmatrix_transpose()

void magmablas_dsetmatrix_transpose ( magma_int_t m,
magma_int_t n,
magma_int_t nb,
const double * hA,
magma_int_t lda,
magmaDouble_ptr dAT,
magma_int_t ldda,
magmaDouble_ptr dwork,
magma_int_t lddw,
magma_queue_t queues[2] )

Copy and transpose matrix hA on CPU host to dAT on GPU device.

Parameters
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[in]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[out]dATThe n-by-m matrix A^T on the GPU, of dimension (ldda,m).
[in]lddaLeading dimension of matrix dAT. ldda >= n.
[out]dworkWorkspace on the GPU, of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queuesArray of two queues, to pipeline operation.

◆ magmablas_dsetmatrix_transpose_mgpu()

void magmablas_dsetmatrix_transpose_mgpu ( magma_int_t ngpu,
magma_int_t m,
magma_int_t n,
magma_int_t nb,
const double * hA,
magma_int_t lda,
magmaDouble_ptr dAT[],
magma_int_t ldda,
magmaDouble_ptr dwork[],
magma_int_t lddw,
magma_queue_t queues[][2] )

Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.

Parameters
[in]ngpuNumber of GPUs over which dAT is distributed.
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[out]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[in]dATArray of ngpu pointers, one per GPU, that store the disributed n-by-m matrix A^T on the GPUs, each of dimension (ldda,m).
[in]lddaLeading dimension of each matrix dAT on each GPU. ngpu*ldda >= n.
[out]dworkArray of ngpu pointers, one per GPU, that store the workspaces on each GPU, each of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queues2D array of dimension (ngpu,2), with two queues per GPU.

◆ magmablas_ssetmatrix_transpose()

void magmablas_ssetmatrix_transpose ( magma_int_t m,
magma_int_t n,
magma_int_t nb,
const float * hA,
magma_int_t lda,
magmaFloat_ptr dAT,
magma_int_t ldda,
magmaFloat_ptr dwork,
magma_int_t lddw,
magma_queue_t queues[2] )

Copy and transpose matrix hA on CPU host to dAT on GPU device.

Parameters
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[in]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[out]dATThe n-by-m matrix A^T on the GPU, of dimension (ldda,m).
[in]lddaLeading dimension of matrix dAT. ldda >= n.
[out]dworkWorkspace on the GPU, of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queuesArray of two queues, to pipeline operation.

◆ magmablas_ssetmatrix_transpose_mgpu()

void magmablas_ssetmatrix_transpose_mgpu ( magma_int_t ngpu,
magma_int_t m,
magma_int_t n,
magma_int_t nb,
const float * hA,
magma_int_t lda,
magmaFloat_ptr dAT[],
magma_int_t ldda,
magmaFloat_ptr dwork[],
magma_int_t lddw,
magma_queue_t queues[][2] )

Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.

Parameters
[in]ngpuNumber of GPUs over which dAT is distributed.
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[out]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[in]dATArray of ngpu pointers, one per GPU, that store the disributed n-by-m matrix A^T on the GPUs, each of dimension (ldda,m).
[in]lddaLeading dimension of each matrix dAT on each GPU. ngpu*ldda >= n.
[out]dworkArray of ngpu pointers, one per GPU, that store the workspaces on each GPU, each of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queues2D array of dimension (ngpu,2), with two queues per GPU.

◆ magmablas_zsetmatrix_transpose()

void magmablas_zsetmatrix_transpose ( magma_int_t m,
magma_int_t n,
magma_int_t nb,
const magmaDoubleComplex * hA,
magma_int_t lda,
magmaDoubleComplex_ptr dAT,
magma_int_t ldda,
magmaDoubleComplex_ptr dwork,
magma_int_t lddw,
magma_queue_t queues[2] )

Copy and transpose matrix hA on CPU host to dAT on GPU device.

Parameters
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[in]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[out]dATThe n-by-m matrix A^T on the GPU, of dimension (ldda,m).
[in]lddaLeading dimension of matrix dAT. ldda >= n.
[out]dworkWorkspace on the GPU, of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queuesArray of two queues, to pipeline operation.

◆ magmablas_zsetmatrix_transpose_mgpu()

void magmablas_zsetmatrix_transpose_mgpu ( magma_int_t ngpu,
magma_int_t m,
magma_int_t n,
magma_int_t nb,
const magmaDoubleComplex * hA,
magma_int_t lda,
magmaDoubleComplex_ptr dAT[],
magma_int_t ldda,
magmaDoubleComplex_ptr dwork[],
magma_int_t lddw,
magma_queue_t queues[][2] )

Copy and transpose matrix hA on CPU host to dAT, which is distributed row block cyclic over multiple GPUs.

Parameters
[in]ngpuNumber of GPUs over which dAT is distributed.
[in]mNumber of rows of input matrix hA. m >= 0.
[in]nNumber of columns of input matrix hA. n >= 0.
[in]nbBlock size. nb >= 0.
[out]hAThe m-by-n matrix A on the CPU, of dimension (lda,n).
[in]ldaLeading dimension of matrix hA. lda >= m.
[in]dATArray of ngpu pointers, one per GPU, that store the disributed n-by-m matrix A^T on the GPUs, each of dimension (ldda,m).
[in]lddaLeading dimension of each matrix dAT on each GPU. ngpu*ldda >= n.
[out]dworkArray of ngpu pointers, one per GPU, that store the workspaces on each GPU, each of dimension (2*lddw*nb).
[in]lddwLeading dimension of dwork. lddw >= m.
[in]queues2D array of dimension (ngpu,2), with two queues per GPU.