PLASMA  2.4.5
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups
core_ztsmqr_hetra1.c File Reference
#include <lapacke.h>
#include "common.h"
Include dependency graph for core_ztsmqr_hetra1.c:

Go to the source code of this file.

Macros

#define COMPLEX

Functions

int CORE_ztsmqr_hetra1 (int side, int trans, int m1, int n1, int m2, int n2, int k, int ib, PLASMA_Complex64_t *A1, int lda1, PLASMA_Complex64_t *A2, int lda2, PLASMA_Complex64_t *V, int ldv, PLASMA_Complex64_t *T, int ldt, PLASMA_Complex64_t *WORK, int ldwork)
void QUARK_CORE_ztsmqr_hetra1 (Quark *quark, Quark_Task_Flags *task_flags, int side, int trans, int m1, int n1, int m2, int n2, int k, int ib, int nb, PLASMA_Complex64_t *A1, int lda1, PLASMA_Complex64_t *A2, int lda2, PLASMA_Complex64_t *V, int ldv, PLASMA_Complex64_t *T, int ldt)
void CORE_ztsmqr_hetra1_quark (Quark *quark)

Detailed Description

PLASMA core_blas kernel PLASMA is a software package provided by Univ. of Tennessee, Univ. of California Berkeley and Univ. of Colorado Denver

Version:
2.4.5
Author:
Hatem Ltaief
Mathieu Faverge
Jakub Kurzak
Azzam Haidar
Date:
2010-11-15 normal z -> c d s

Definition in file core_ztsmqr_hetra1.c.


Macro Definition Documentation

#define COMPLEX

Definition at line 21 of file core_ztsmqr_hetra1.c.


Function Documentation

int CORE_ztsmqr_hetra1 ( int  side,
int  trans,
int  m1,
int  n1,
int  m2,
int  n2,
int  k,
int  ib,
PLASMA_Complex64_t A1,
int  lda1,
PLASMA_Complex64_t A2,
int  lda2,
PLASMA_Complex64_t V,
int  ldv,
PLASMA_Complex64_t T,
int  ldt,
PLASMA_Complex64_t WORK,
int  ldwork 
)

CORE_ztsmqr_hetra1: see CORE_ztsmqr

This kernel applies a left transformation on | A1'| | A2 |

Needs therefore to make the explicit transpose of A1 before and after the application of the block of reflectors Can be further optimized by changing accordingly the underneath kernel ztsrfb!

Parameters:
[in]side
  • PlasmaLeft : apply Q or Q**H from the Left;
  • PlasmaRight : apply Q or Q**H from the Right.
[in]trans
  • PlasmaNoTrans : No transpose, apply Q;
  • PlasmaConjTrans : ConjTranspose, apply Q**H.
[in]m1The number of rows of the tile A1. M1 >= 0.
[in]n1The number of columns of the tile A1. N1 >= 0.
[in]m2The number of rows of the tile A2. M2 >= 0. M2 = M1 if side == PlasmaRight.
[in]n2The number of columns of the tile A2. N2 >= 0. N2 = N1 if side == PlasmaLeft.
[in]kThe number of elementary reflectors whose product defines the matrix Q.
[in]ibThe inner-blocking size. IB >= 0.
[in,out]A1On entry, the M1-by-N1 tile A1. On exit, A1 is overwritten by the application of Q.
[in]lda1The leading dimension of the array A1. LDA1 >= max(1,M1).
[in,out]A2On entry, the M2-by-N2 tile A2. On exit, A2 is overwritten by the application of Q.
[in]lda2The leading dimension of the tile A2. LDA2 >= max(1,M2).
[in]VThe i-th row must contain the vector which defines the elementary reflector H(i), for i = 1,2,...,k, as returned by CORE_ZTSQRT in the first k columns of its array argument V.
[in]ldvThe leading dimension of the array V. LDV >= max(1,K).
[out]TThe IB-by-N1 triangular factor T of the block reflector. T is upper triangular by block (economic storage); The rest of the array is not referenced.
[in]ldtThe leading dimension of the array T. LDT >= IB.
[out]WORKWorkspace array of size LDWORK-by-N1 if side == PlasmaLeft LDWORK-by-IB if side == PlasmaRight
[in]ldworkThe leading dimension of the array WORK. LDWORK >= max(1,IB) if side == PlasmaLeft LDWORK >= max(1,M1) if side == PlasmaRight
Returns:
Return values:
PLASMA_SUCCESSsuccessful exit
<0if -i, the i-th argument had an illegal value

Definition at line 127 of file core_ztsmqr_hetra1.c.

References conj(), CORE_ztsmqr(), coreblas_error, and PLASMA_SUCCESS.

{
int i, j;
if ( (m1 != n1) ) {
coreblas_error(3, "Illegal value of M1, N1");
return -3;
}
/* in-place transposition of A1 */
for (j = 0; j < n1; j++){
A1[j + j*lda1] = conj(A1[j + j*lda1]);
for (i = j+1; i < m1; i++){
*WORK = *(A1 + i + j*lda1);
*(A1 + i + j*lda1) = conj(*(A1 + j + i*lda1));
*(A1 + j + i*lda1) = conj(*WORK);
}
}
CORE_ztsmqr(side, trans, m1, n1, m2, n2, k, ib, A1, lda1, A2, lda2, V, ldv, T, ldt, WORK, ldwork);
/* in-place transposition of A1 */
for (j = 0; j < n1; j++){
A1[j + j*lda1] = conj(A1[j + j*lda1]);
for (i = j+1; i < m1; i++){
*WORK = *(A1 + i + j*lda1);
*(A1 + i + j*lda1) = conj(*(A1 + j + i*lda1));
*(A1 + j + i*lda1) = conj(*WORK);
}
}
}

Here is the call graph for this function:

Here is the caller graph for this function:

void CORE_ztsmqr_hetra1_quark ( Quark quark)

Definition at line 212 of file core_ztsmqr_hetra1.c.

References CORE_ztsmqr_hetra1(), quark_unpack_args_18, side, T, trans, and V.

{
int side;
int trans;
int m1;
int n1;
int m2;
int n2;
int k;
int ib;
int lda1;
int lda2;
int ldv;
int ldt;
int ldwork;
quark_unpack_args_18(quark, side, trans, m1, n1, m2, n2, k, ib, A1, lda1, A2, lda2, V, ldv, T, ldt, WORK, ldwork);
CORE_ztsmqr_hetra1(side, trans, m1, n1, m2, n2, k, ib, A1, lda1, A2, lda2, V, ldv, T, ldt, WORK, ldwork);
}

Here is the call graph for this function:

Here is the caller graph for this function:

void QUARK_CORE_ztsmqr_hetra1 ( Quark quark,
Quark_Task_Flags task_flags,
int  side,
int  trans,
int  m1,
int  n1,
int  m2,
int  n2,
int  k,
int  ib,
int  nb,
PLASMA_Complex64_t A1,
int  lda1,
PLASMA_Complex64_t A2,
int  lda2,
PLASMA_Complex64_t V,
int  ldv,
PLASMA_Complex64_t T,
int  ldt 
)

Definition at line 173 of file core_ztsmqr_hetra1.c.

References CORE_ztsmqr_hetra1_quark(), INOUT, INPUT, PlasmaLeft, QUARK_Insert_Task(), QUARK_REGION_D, QUARK_REGION_L, SCRATCH, and VALUE.

{
int ldwork = side == PlasmaLeft ? ib : nb;
sizeof(PLASMA_enum), &side, VALUE,
sizeof(PLASMA_enum), &trans, VALUE,
sizeof(int), &m1, VALUE,
sizeof(int), &n1, VALUE,
sizeof(int), &m2, VALUE,
sizeof(int), &n2, VALUE,
sizeof(int), &k, VALUE,
sizeof(int), &ib, VALUE,
sizeof(int), &lda1, VALUE,
sizeof(PLASMA_Complex64_t)*nb*nb, A2, INOUT,
sizeof(int), &lda2, VALUE,
sizeof(PLASMA_Complex64_t)*nb*nb, V, INPUT,
sizeof(int), &ldv, VALUE,
sizeof(PLASMA_Complex64_t)*ib*nb, T, INPUT,
sizeof(int), &ldt, VALUE,
sizeof(PLASMA_Complex64_t)*ib*nb, NULL, SCRATCH,
sizeof(int), &ldwork, VALUE,
0);
}

Here is the call graph for this function:

Here is the caller graph for this function: