LAPACK/ScaLAPACK Development

by **AlexanderAndrianov** » Thu Mar 27, 2008 2:49 pm

While working on LDL factorization of symmetric indefinite matrices it came to my attention that some key LAPACK routines like *lasyf and *lahef could benefit a lot from having a routine what would compute lower/upper triangular of the product C = L * W^T of two rectangular matrices. Those naturally arise in the LDL factorizations where W is a product L * D with D being a symmetric matrix containing only diagonal blocks of sizes 1-by-1 and 2-by-2. Thus C is a symmetric matrix but this is not obvious if you look at the form of C as defined.

Typically C is computed in LAPACK routines through blocking and using *GEMV inside some loop to calculate diagonal part of each block. This is obviously inefficient as level 3 BLAS routine would provide better FLOP/Data acess ration and much beter performance overall.

It is also apparent that some sparse LDL solvers (including I believe MA57) would greatly benefit from new BLAS routine computing C directly via a single BLAS 3 call.

It turns out that BLAST already has a proposed routine called SY_TRIDIAG_R2K which computes the product L * D * L^T. However it is not clear as to how efficient an implementation of such a function would be and also how it would fit into the LAPACK framework since L * D is already precomputed inside *lasyf and *lahef routines and all is needed is just the product of L and (L * D)^T. SY_TRIDIAG_R2K would simply waste time redoing this computation. Moreover as far as implementation of the proposed routine C = op(A) * op(B), it is trivial given the implementation of the routine *SYRK since it would only require very minor modifications. From my own experience, it took me about 25 minutes to write it when i had my own implementation of *SYRK. The kernels are exactly the same: only the arguments are difefrent.

It would be nice if this would get investigated as part of PLASMA developement. I believe thus modified *lasyf and *lahef routines would have performance quite close to that of the blocked Cholesky factorization.

by **Julien Langou** » Sat Mar 29, 2008 2:43 pm

Hello Alexander,

thanks for your input.

While working on LDL factorization of symmetric indefinite matrices it
came to my attention that some key LAPACK routines like *lasyf and
*lahef could benefit a lot from having a routine what would compute
lower/upper triangular of the product C = L * W^T of two rectangular
matrices. Those naturally arise in the LDL factorizations where W is a
product L * D with D being a symmetric matrix containing only diagonal
blocks of sizes 1-by-1 and 2-by-2. Thus C is a symmetric matrix but this is
not obvious if you look at the form of C as defined.

Typically C is computed in LAPACK routines through blocking and using
*GEMV inside some loop to calculate diagonal part of each block. This is
obviously inefficient as level 3 BLAS routine would provide better
FLOP/Data acess ration and much beter performance overall.

It is also apparent that some sparse LDL solvers (including I believe
MA57) would greatly benefit from new BLAS routine computing C directly
via a single BLAS 3 call.

this C := A * D * A^T operation would be very used if in the BLAS (at the
very least by the LDLT people). It is nice to have this kind of post to push
in that direction.
See as well
https://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=613

It turns out that BLAST already has a proposed routine called
SY_TRIDIAG_R2K which computes the product L * D * L^T.

I think you mean SY_TRIDIAG_RK and not SY_TRIDIAG_R2K ?

However it is not clear as to how efficient an implementation of such a
function would be and also how it would fit into the LAPACK framework
since L * D is already precomputed inside *lasyf and *lahef routines and
all is needed is just the product of L and (L * D)^T. SY_TRIDIAG_R2K
would simply waste time redoing this computation.

I think the proposal at that time was to rewrite the LAPACK routine so
that it uses naturally SY_TRIDIAG_RK.

Moreover as far as implementation of the proposed routine
C = op(A) * op(B), it is trivial given the implementation of the routine
*SYRK since it would only require very minor modifications.

I understand your argument but how would SY_TRIDIAG_RK be any
harder to implement using your trick?

Bottom line: you seem to promote a C = op(A) * op(B) type of operation
while I would rather see a C = A * E * A^T type of of operation. The main
argument being memory. (B is too much.)

I believe thus modified *lasyf and *lahef routines would have
performance quite close to that of the blocked Cholesky factorization

Well I kindly disagree. I think you are optimistic. The main difference
between Cholesky and LDLT is pivoting. Our current pivoting strategy
prevents us from doing a blocked Cholesky code. Now you can argue
that you want to change the pivoting strategy and then this becomes
interesting ...

Julien.

LAPACK/ScaLAPACK Development

New BLAS routine proposal

New BLAS routine proposal

Who is online