MAGMA  2.7.1
Matrix Algebra for GPU and Multicore Architectures
 All Classes Files Functions Friends Groups Pages
Internal routines

============================================================ More...

Modules

 Error handling
 
 Testing routines
 
 Thread management
 
 Timer utilities
 
 QR panel to q, q to panel
 
 GPU Kernels
 

Macros

#define MAGMA_UNUSED(var)
 Suppress "warning: unused variable" in a portable fashion.
 

Functions

void magma_swp2pswp (magma_trans_t trans, magma_int_t n, magma_int_t *ipiv, magma_int_t *newipiv)
 Auxiliary function: ipiv(i) indicates that row i has been swapped with ipiv(i) from top to bottom. More...
 
void magma_indices_1D_bcyclic (magma_int_t nb, magma_int_t ngpu, magma_int_t dev, magma_int_t j0, magma_int_t j1, magma_int_t *dj0, magma_int_t *dj1)
 Convert global indices [j0, j1) to local indices [dj0, dj1) on GPU dev, according to 1D block cyclic distribution. More...
 

Detailed Description

============================================================

Function Documentation

void magma_swp2pswp ( magma_trans_t  trans,
magma_int_t  n,
magma_int_t *  ipiv,
magma_int_t *  newipiv 
)

Auxiliary function: ipiv(i) indicates that row i has been swapped with ipiv(i) from top to bottom.

This function rearranges ipiv into newipiv where row i has to be moved to newipiv(i). The new pivoting allows for parallel processing vs the original one assumes a specific ordering and has to be done sequentially.

void magma_indices_1D_bcyclic ( magma_int_t  nb,
magma_int_t  ngpu,
magma_int_t  dev,
magma_int_t  j0,
magma_int_t  j1,
magma_int_t *  dj0,
magma_int_t *  dj1 
)

Convert global indices [j0, j1) to local indices [dj0, dj1) on GPU dev, according to 1D block cyclic distribution.

Note j0 and dj0 are inclusive, while j1 and dj1 are exclusive. This is consistent with the C++ container notion of first and last.

Example with n = 75, nb = 10, ngpu = 3. Distribution of columns (ranges are inclusive):

                  local dj:  0- 9, 10-19, 20-29
-----------------------------------------------
dev 0:  3 blocks, global j:  0- 9, 30-39, 60-69
dev 1:  3 blocks, global j: 10-19, 40-49, 70-74 (partial)
dev 2:  2 block,  global j: 20-29, 50-59

Calls return:

input global j=13-68 inclusive      =>  output
nb=10, ngpu=3, dev=0, j0=13, j1=69  =>  dj0=10, dj1=29 (i.e., global j=       30-39, 60-68)
nb=10, ngpu=3, dev=1, j0=13, j1=69  =>  dj0= 3, dj1=20 (i.e., global j=13-19, 40-49)
nb=10, ngpu=3, dev=2, j0=13, j1=69  =>  dj0= 0, dj1=20 (i.e., global j=20-29, 50-59)

input global j=13-69 inclusive      =>  output
nb=10, ngpu=3, dev=0, j0=13, j1=70  =>  dj0=10, dj1=30 (i.e., global j=       30-39, 60-69)
nb=10, ngpu=3, dev=1, j0=13, j1=70  =>  dj0= 3, dj1=20 (i.e., global j=13-19, 40-49)
nb=10, ngpu=3, dev=2, j0=13, j1=70  =>  dj0= 0, dj1=20 (i.e., global j=20-29, 50-59)

input global j=13-70 inclusive      =>  output
nb=10, ngpu=3, dev=0, j0=13, j1=71  =>  dj0=10, dj1=30 (i.e., global j=       30-39, 60-69)
nb=10, ngpu=3, dev=1, j0=13, j1=71  =>  dj0= 3, dj1=21 (i.e., global j=13-19, 40-49, 70)
nb=10, ngpu=3, dev=2, j0=13, j1=71  =>  dj0= 0, dj1=20 (i.e., global j=20-29, 50-59)

input global j=13-71 inclusive      =>  output
nb=10, ngpu=3, dev=0, j0=13, j1=72  =>  dj0=10, dj1=30 (i.e., global j=       30-39, 60-69)
nb=10, ngpu=3, dev=1, j0=13, j1=72  =>  dj0= 3, dj1=22 (i.e., global j=13-19, 40-49, 70-71)
nb=10, ngpu=3, dev=2, j0=13, j1=72  =>  dj0= 0, dj1=20 (i.e., global j=20-29, 50-59)