PLASMA  2.4.5
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups
core_stsmlq_sytra1.c File Reference
#include <lapacke.h>
#include "common.h"
Include dependency graph for core_stsmlq_sytra1.c:

Go to the source code of this file.

Macros

#define REAL

Functions

int CORE_stsmlq_sytra1 (int side, int trans, int m1, int n1, int m2, int n2, int k, int ib, float *A1, int lda1, float *A2, int lda2, float *V, int ldv, float *T, int ldt, float *WORK, int ldwork)
void QUARK_CORE_stsmlq_sytra1 (Quark *quark, Quark_Task_Flags *task_flags, int side, int trans, int m1, int n1, int m2, int n2, int k, int ib, int nb, float *A1, int lda1, float *A2, int lda2, float *V, int ldv, float *T, int ldt)
void CORE_stsmlq_sytra1_quark (Quark *quark)

Detailed Description

PLASMA core_blas kernel PLASMA is a software package provided by Univ. of Tennessee, Univ. of California Berkeley and Univ. of Colorado Denver

Version:
2.4.5
Author:
Hatem Ltaief
Mathieu Faverge
Azzam Haidar
Date:
2010-11-15 s Tue Nov 22 14:35:23 2011

Definition in file core_stsmlq_sytra1.c.


Macro Definition Documentation

#define REAL

Definition at line 20 of file core_stsmlq_sytra1.c.


Function Documentation

int CORE_stsmlq_sytra1 ( int  side,
int  trans,
int  m1,
int  n1,
int  m2,
int  n2,
int  k,
int  ib,
float *  A1,
int  lda1,
float *  A2,
int  lda2,
float *  V,
int  ldv,
float *  T,
int  ldt,
float *  WORK,
int  ldwork 
)

CORE_stsmlq_sytra1: see CORE_stsmlq

This kernel applies a Right transformation on | A1' A2 | and does not handle the transpose of A1. Needs therefore to make the explicit transpose of A1 before and after the application of the block of reflectors Can be further optimized by changing accordingly the underneath kernel ztsrfb!

Parameters:
[in]side
  • PlasmaLeft : apply Q or Q**T from the Left;
  • PlasmaRight : apply Q or Q**T from the Right.
[in]trans
  • PlasmaNoTrans : No transpose, apply Q;
  • PlasmaTrans : ConjTranspose, apply Q**T.
[in]M1The number of rows of the tile A1. M1 >= 0.
[in]N1The number of columns of the tile A1. N1 >= 0.
[in]M2The number of rows of the tile A2. M2 >= 0. M2 = M1 if side == PlasmaRight.
[in]N2The number of columns of the tile A2. N2 >= 0. N2 = N1 if side == PlasmaLeft.
[in]KThe number of elementary reflectors whose product defines the matrix Q.
[in]IBThe inner-blocking size. IB >= 0.
[in,out]A1On entry, the M1-by-N1 tile A1. On exit, A1 is overwritten by the application of Q.
[in]LDA1The leading dimension of the array A1. LDA1 >= max(1,M1).
[in,out]A2On entry, the M2-by-N2 tile A2. On exit, A2 is overwritten by the application of Q.
[in]LDA2The leading dimension of the tile A2. LDA2 >= max(1,M2).
[in]VThe i-th row must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by CORE_STSLQT in the first k rows of its array argument V.
[in]LDVThe leading dimension of the array V. LDV >= max(1,K).
[out]TThe IB-by-N1 triangular factor T of the block reflector. T is upper triangular by block (economic storage); The rest of the array is not referenced.
[in]LDTThe leading dimension of the array T. LDT >= IB.
[out]WORKWorkspace array of size LDWORK-by-M1 if side == PlasmaLeft LDWORK-by-IB if side == PlasmaRight
[in]LDWORKThe leading dimension of the array WORK. LDWORK >= max(1,IB) if side == PlasmaLeft LDWORK >= max(1,N1) if side == PlasmaRight
Returns:
Return values:
PLASMA_SUCCESSsuccessful exit
<0if -i, the i-th argument had an illegal value

Definition at line 125 of file core_stsmlq_sytra1.c.

References CORE_stsmlq(), coreblas_error, and PLASMA_SUCCESS.

{
int i, j;
if ( (m1 != n1) ) {
coreblas_error(3, "Illegal value of M1, N1");
return -3;
}
/* in-place transposition of A1 */
for (j = 0; j < n1; j++){
A1[j + j*lda1] = (A1[j + j*lda1]);
for (i = j+1; i < m1; i++){
*WORK = *(A1 + i + j*lda1);
*(A1 + i + j*lda1) = (*(A1 + j + i*lda1));
*(A1 + j + i*lda1) = (*WORK);
}
}
CORE_stsmlq(side, trans, m1, n1, m2, n2, k, ib,
A1, lda1, A2, lda2,
V, ldv, T, ldt,
WORK, ldwork);
/* in-place transposition of A1 */
for (j = 0; j < n1; j++){
A1[j + j*lda1] = (A1[j + j*lda1]);
for (i = j+1; i < m1; i++){
*WORK = *(A1 + i + j*lda1);
*(A1 + i + j*lda1) = (*(A1 + j + i*lda1));
*(A1 + j + i*lda1) = (*WORK);
}
}
}

Here is the call graph for this function:

Here is the caller graph for this function:

void CORE_stsmlq_sytra1_quark ( Quark quark)

This kernel applies a Right transformation on | A1' A2 | and does not handle the transpose of A1. Needs therefore to make the explicit transpose of A1 before and after the application of the block of reflectors Can be further optimized by changing accordingly the underneath kernel ztsrfb!

Definition at line 218 of file core_stsmlq_sytra1.c.

References CORE_stsmlq_sytra1(), quark_unpack_args_18, side, T, trans, and V.

{
int side;
int trans;
int m1;
int n1;
int m2;
int n2;
int k;
int ib;
float *A1;
int lda1;
float *A2;
int lda2;
float *V;
int ldv;
float *T;
int ldt;
float *WORK;
int ldwork;
quark_unpack_args_18(quark, side, trans, m1, n1, m2, n2, k, ib,
A1, lda1, A2, lda2, V, ldv, T, ldt, WORK, ldwork);
CORE_stsmlq_sytra1(side, trans, m1, n1, m2, n2, k, ib,
A1, lda1, A2, lda2, V, ldv, T, ldt, WORK, ldwork);
}

Here is the call graph for this function:

Here is the caller graph for this function:

void QUARK_CORE_stsmlq_sytra1 ( Quark quark,
Quark_Task_Flags task_flags,
int  side,
int  trans,
int  m1,
int  n1,
int  m2,
int  n2,
int  k,
int  ib,
int  nb,
float *  A1,
int  lda1,
float *  A2,
int  lda2,
float *  V,
int  ldv,
float *  T,
int  ldt 
)

Definition at line 174 of file core_stsmlq_sytra1.c.

References CORE_stsmlq_sytra1_quark(), INOUT, INPUT, PlasmaLeft, QUARK_Insert_Task(), QUARK_REGION_D, QUARK_REGION_U, SCRATCH, and VALUE.

{
int ldwork = side == PlasmaLeft ? ib : nb;
sizeof(PLASMA_enum), &side, VALUE,
sizeof(PLASMA_enum), &trans, VALUE,
sizeof(int), &m1, VALUE,
sizeof(int), &n1, VALUE,
sizeof(int), &m2, VALUE,
sizeof(int), &n2, VALUE,
sizeof(int), &k, VALUE,
sizeof(int), &ib, VALUE,
sizeof(float)*nb*nb, A1, INOUT|QUARK_REGION_U|QUARK_REGION_D,
sizeof(int), &lda1, VALUE,
sizeof(float)*nb*nb, A2, INOUT,
sizeof(int), &lda2, VALUE,
sizeof(float)*nb*nb, V, INPUT,
sizeof(int), &ldv, VALUE,
sizeof(float)*ib*nb, T, INPUT,
sizeof(int), &ldt, VALUE,
sizeof(float)*ib*nb, NULL, SCRATCH,
sizeof(int), &ldwork, VALUE,
0);
}

Here is the call graph for this function:

Here is the caller graph for this function: