![]() |
MAGMA 2.9.0
Matrix Algebra for GPU and Multicore Architectures
|
Topics | |
Error handling | |
Testing routines | |
Thread management | |
Timer utilities | |
QR panel to q, q to panel | |
GPU Kernels | |
Macros | |
#define | MAGMA_UNUSED(var) |
Suppress "warning: unused variable" in a portable fashion. | |
Functions | |
void | magma_swp2pswp (magma_trans_t trans, magma_int_t n, magma_int_t *ipiv, magma_int_t *newipiv) |
Auxiliary function: ipiv(i) indicates that row i has been swapped with ipiv(i) from top to bottom. | |
void | magma_indices_1D_bcyclic (magma_int_t nb, magma_int_t ngpu, magma_int_t dev, magma_int_t j0, magma_int_t j1, magma_int_t *dj0, magma_int_t *dj1) |
Convert global indices [j0, j1) to local indices [dj0, dj1) on GPU dev, according to 1D block cyclic distribution. | |
void magma_swp2pswp | ( | magma_trans_t | trans, |
magma_int_t | n, | ||
magma_int_t * | ipiv, | ||
magma_int_t * | newipiv ) |
Auxiliary function: ipiv(i) indicates that row i has been swapped with ipiv(i) from top to bottom.
This function rearranges ipiv into newipiv where row i has to be moved to newipiv(i). The new pivoting allows for parallel processing vs the original one assumes a specific ordering and has to be done sequentially.
void magma_indices_1D_bcyclic | ( | magma_int_t | nb, |
magma_int_t | ngpu, | ||
magma_int_t | dev, | ||
magma_int_t | j0, | ||
magma_int_t | j1, | ||
magma_int_t * | dj0, | ||
magma_int_t * | dj1 ) |
Convert global indices [j0, j1) to local indices [dj0, dj1) on GPU dev, according to 1D block cyclic distribution.
Note j0 and dj0 are inclusive, while j1 and dj1 are exclusive. This is consistent with the C++ container notion of first and last.
Example with n = 75, nb = 10, ngpu = 3. Distribution of columns (ranges are inclusive):
local dj: 0- 9, 10-19, 20-29 ----------------------------------------------- dev 0: 3 blocks, global j: 0- 9, 30-39, 60-69 dev 1: 3 blocks, global j: 10-19, 40-49, 70-74 (partial) dev 2: 2 block, global j: 20-29, 50-59
Calls return:
input global j=13-68 inclusive => output nb=10, ngpu=3, dev=0, j0=13, j1=69 => dj0=10, dj1=29 (i.e., global j= 30-39, 60-68) nb=10, ngpu=3, dev=1, j0=13, j1=69 => dj0= 3, dj1=20 (i.e., global j=13-19, 40-49) nb=10, ngpu=3, dev=2, j0=13, j1=69 => dj0= 0, dj1=20 (i.e., global j=20-29, 50-59) input global j=13-69 inclusive => output nb=10, ngpu=3, dev=0, j0=13, j1=70 => dj0=10, dj1=30 (i.e., global j= 30-39, 60-69) nb=10, ngpu=3, dev=1, j0=13, j1=70 => dj0= 3, dj1=20 (i.e., global j=13-19, 40-49) nb=10, ngpu=3, dev=2, j0=13, j1=70 => dj0= 0, dj1=20 (i.e., global j=20-29, 50-59) input global j=13-70 inclusive => output nb=10, ngpu=3, dev=0, j0=13, j1=71 => dj0=10, dj1=30 (i.e., global j= 30-39, 60-69) nb=10, ngpu=3, dev=1, j0=13, j1=71 => dj0= 3, dj1=21 (i.e., global j=13-19, 40-49, 70) nb=10, ngpu=3, dev=2, j0=13, j1=71 => dj0= 0, dj1=20 (i.e., global j=20-29, 50-59) input global j=13-71 inclusive => output nb=10, ngpu=3, dev=0, j0=13, j1=72 => dj0=10, dj1=30 (i.e., global j= 30-39, 60-69) nb=10, ngpu=3, dev=1, j0=13, j1=72 => dj0= 3, dj1=22 (i.e., global j=13-19, 40-49, 70-71) nb=10, ngpu=3, dev=2, j0=13, j1=72 => dj0= 0, dj1=20 (i.e., global j=20-29, 50-59)