PLASMA  2.4.5
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups
core_ztrdalg_v2.c File Reference
#include <lapacke.h>
#include "common.h"
Include dependency graph for core_ztrdalg_v2.c:

Go to the source code of this file.

Macros

#define A(m, n)   BLKADDR(A, PLASMA_Complex64_t, m, n)

Functions

void CORE_ztrdalg_v2 (PLASMA_enum uplo, PLASMA_desc *pA, PLASMA_Complex64_t *V, PLASMA_Complex64_t *TAU, int grsiz, int lcsweep, int id, int blksweep)
void QUARK_CORE_ztrdalg_v2 (Quark *quark, Quark_Task_Flags *task_flags, int uplo, PLASMA_desc *pA, PLASMA_Complex64_t *V, PLASMA_Complex64_t *TAU, int grsiz, int lcsweep, int id, int blksweep)
void CORE_ztrdalg_v2_quark (Quark *quark)

Detailed Description

PLASMA core_blas kernel PLASMA is a software package provided by Univ. of Tennessee, Univ. of California Berkeley and Univ. of Colorado Denver

Version:
2.4.5
Author:
Azzam Haidar
Date:
2011-05-15 normal z -> c d s

Definition in file core_ztrdalg_v2.c.


Macro Definition Documentation

#define A (   m,
 
)    BLKADDR(A, PLASMA_Complex64_t, m, n)

Definition at line 126 of file core_ztrdalg_v2.c.


Function Documentation

void CORE_ztrdalg_v2 ( PLASMA_enum  uplo,
PLASMA_desc pA,
PLASMA_Complex64_t V,
PLASMA_Complex64_t TAU,
int  grsiz,
int  lcsweep,
int  id,
int  blksweep 
)

CORE_ztrdalg_v2 is a part of the tridiagonal reduction algorithm (bulgechasing) It correspond to a local driver of the kernels that should be executed on a single core.

Parameters:
[in]uplo
  • PlasmaLower:
  • PlasmaUpper:
[in]NThe order of the matrix A. N >= 0.
[in]NBThe size of the Bandwidth of the matrix A, which correspond to the tile size. NB >= 0.
[in]pAA pointer to the descriptor of the matrix A.
[out]VPLASMA_Complex64_t array, dimension (N). The scalar elementary reflectors are written in this array. So it is used as a workspace for V at each step of the bulge chasing algorithm.
[out]TAUPLASMA_Complex64_t array, dimension (N). The scalar factors of the elementary reflectors are written in thisarray. So it is used as a workspace for TAU at each step of the bulge chasing algorithm.
[in]iInteger that refer to the current sweep. (outer loop).
[in]jInteger that refer to the sweep to chase.(inner loop).
[in]mInteger that refer to a sweep step, to ensure order dependencies.
[in]grsizInteger that refer to the size of a group. group mean the number of kernel that should be executed sequentially on the same core. group size is a trade-off between locality (cache reuse) and parallelism. a small group size increase parallelism while a large group size increase cache reuse.
Returns:
Return values:
PLASMA_SUCCESSsuccessful exit
<0if -i, the i-th argument had an illegal value

Definition at line 82 of file core_ztrdalg_v2.c.

References A, CORE_zhbelr(), CORE_zhblrx(), CORE_zhbrce(), plasma_desc_t::dtyp, plasma_desc_t::m, plasma_desc_t::mb, min, plasma_desc_t::nt, and plasma_element_size().

{
PLASMA_desc A = *pA;
size_t eltsize = plasma_element_size(A.dtyp);
int N, NB;
int i, blkid, st, ed, KDM1;
int NT=pA->nt;
N = A.m;
NB = A.mb;
KDM1 = NB-1;
/* code for all tiles */
for (i = 0; i < grsiz ; i++) {
blkid = id+i;
st = min(blkid*NB+lcsweep+1, N-1);
ed = min(st+KDM1, N-1);
/*printf(" COUCOU voici N %5d NB %5d st %5d ed %5d lcsweep %5d id %5d blkid %5d\n",N, NB, st, ed, lcsweep, id, blkid);*/
if(st==ed) /* quick return in case of last tile */
return;
st =st +1; /* because kernel are still in fortran way */
ed =ed +1;
if(blkid==blksweep){
CORE_zhbelr(uplo, N, &A, V, TAU, st, ed, eltsize);
if(id!=(NT-1))CORE_zhbrce(uplo, N, &A, V, TAU, st, ed, eltsize);
}else{
CORE_zhblrx(uplo, N, &A, V, TAU, st, ed, eltsize);
if(id!=(NT-1))CORE_zhbrce(uplo, N, &A, V, TAU, st, ed, eltsize);
}
}
}

Here is the call graph for this function:

Here is the caller graph for this function:

void CORE_ztrdalg_v2_quark ( Quark quark)

Definition at line 175 of file core_ztrdalg_v2.c.

References CORE_ztrdalg_v2(), quark_unpack_args_8, TAU, uplo, and V.

{
int uplo;
int grsiz, lcsweep, id, blksweep;
quark_unpack_args_8(quark, uplo, pA, V, TAU, grsiz, lcsweep, id, blksweep);
CORE_ztrdalg_v2(uplo, pA, V, TAU, grsiz, lcsweep, id, blksweep);
}

Here is the call graph for this function:

Here is the caller graph for this function:

void QUARK_CORE_ztrdalg_v2 ( Quark quark,
Quark_Task_Flags task_flags,
int  uplo,
PLASMA_desc pA,
PLASMA_Complex64_t V,
PLASMA_Complex64_t TAU,
int  grsiz,
int  lcsweep,
int  id,
int  blksweep 
)

Definition at line 127 of file core_ztrdalg_v2.c.

References A, CORE_ztrdalg_v2_quark(), INOUT, NODEP, plasma_desc_t::nt, QUARK_Insert_Task_Packed(), QUARK_Task_Init(), QUARK_Task_Pack_Arg(), and VALUE.

{
Quark_Task *MYTASK;
int ii, cur_id, NT=pA->nt;
//printf("coucou from quark function id %d lcsweep %d blksweep %d grsiz %d NT %d\n", id, lcsweep, blksweep, grsiz, NT);
MYTASK = QUARK_Task_Init( quark, CORE_ztrdalg_v2_quark, task_flags);
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(int), &uplo, VALUE );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_desc), pA, NODEP );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), V, NODEP );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), TAU, NODEP );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(int), &grsiz, VALUE );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(int), &lcsweep, VALUE );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(int), &id, VALUE );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(int), &blksweep, VALUE );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(id, id ), INOUT );
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(id+1, id ), INOUT );
if( id<(NT-1) )
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(id+1, id+1), INOUT );
if( id<(NT-2) )
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(id+2, id+1), INOUT );
cur_id = id;
for (ii = 1; ii < grsiz ; ii++) {
cur_id = cur_id+1;
if( id<(NT-1) )
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(cur_id+1, cur_id+1), INOUT );
if( id<(NT-2) )
QUARK_Task_Pack_Arg(quark, MYTASK, sizeof(PLASMA_Complex64_t), A(cur_id+2, cur_id+1), INOUT );
}
QUARK_Insert_Task_Packed(quark, MYTASK);
}

Here is the call graph for this function: