Recent Progress on Accelerating Dense Eigenvalue Solvers
Abstract
The need for solving large-scale dense eigenvalue problems appears in a number
of application areas, in particular in computational chemistry and physics. The
development of algorithms and implementations, well adapted to heterogeneous,
massively parallel computing environments, turns out to be a highly nontrivial
task. The aim of this minisymposium is to report on significant progress made
in this direction during the last few years.
Organizers
- Daniel Kressner, EPFL, Switzerland
- Hatem Ltaief, KAUST Supercomputing Laboratory, Saudi Arabia
- Piotr Luszczek, University of Tennessee, Knoxville, USA
- Robert A. van de Geijn, University of Texas, Austin, USA
Part I
Tuesday, February 26
MS75
9:30 AM - 11:30 AM Room: Harbor Ballroom I - Conference Level
- 9:30-9:55 Designing Fast Eigenvalue Solvers on Manycore Systems
Piotr Luszczek, University of Tennessee, Knoxville, USA
- 10:00-10:25 Parallel Multishift QR and QZ Algorithms with Advanced Deflation Strategies - Recent Progress
Bo T. Kågström, Umeå University, Sweden
Abstract. Key techniques used in our novel distributed memory parallel QR
and QZ algorithms include multi-window bulge chain chasing and distributed
aggressive early deflation (AED), which enable level-3 chasing and delayed
update operations as well as improved eigenvalue convergence. Recent
progress includes a multi-level recursive approach for performing AED in a
parallel environment leading to communication avoiding algorithms via data
redistribution. Application and test benchmarks confirm the superb
performance of our parallel library software.
- 10:30-10:55 The Parallel Nonsymmetric QR Algorithm with Aggressive Early Deflation
Meiyue Shao, EPFL, Switzerland
Abstract. We present the new parallel nonsymmetric QR algorithm in
ScaLAPACK v2.0. The multiwindow bulge chain chasing approach ensures that
most computations in the bulge chasing stage are performed in level 3 BLAS.
We also develop multilevel aggressive early deflation algorithms which
decrease the total amount of communications. These techniques make the new
approach significantly outperform the pipelined QR algorithm in ScaLAPACK
v1.8.0. Both performance models and numerical experiments demonstrate the
efficiency of our new approach.
- 11:00-11:25 Towards a Fine-Grained Parallel Implementation of the Nonsymmetric QR Algorithm
Lars Karlsson and updated Bo T. Kågström, Umeå University, Sweden
Abstract. We present the first step towards a fine-grained parallel
implementation of the non-symmetric QR algorithm targeting multi-core
processors and shared memory. Our primary goal is high performance on the
full range from small to large problems. To reach this goal, we overlap the
critical path with delayed updates and parallelize both the bulge chasing
and the aggressive early deflation kernels. In addition, different
scheduling techniques and matrix storage formats are evaluated.
Part II
Tuesday, February 26
MS94
2:00 PM - 4:00 PM
Room: Harbor Ballroom I - Conference Level
- 2:00-2:25 Avoiding Communication in Parallel Bidiagonalization of Band Matrices
Grey Ballard, Nicholas Knight, and James W. Demmel, University of California, Berkeley, USA
Abstract. Successive band reduction is a technique for reducing a symmetric
band matrix to tridiagonal form for the symmetric eigenproblem. We have shown
that a careful reformulation of the technique can asymptotically reduce the
communication costs (i.e., data movement) on a sequential machine, compared to
standard algorithms. In this talk, we will present the application of this
approach to the reduction of a non-symmetric band matrix to bidiagonal form
(for the SVD) in the distributed-memory parallel setting.
- 2:30-2:55 Improved Accuracy for MR3-based Eigensolvers
Paolo Bientinesi, RWTH Aachen University, Germany
Abstract. A number of algorithms exist for the dense Hermitian eigenproblem. In
many cases, MRRR is the fastest one, although it does not deliver the same
accuracy as Divide&Conquer or the QR algorithm. We demonstrate how the use of
mixed precisions in MRRR-based eigensolvers leads to an improved orthogonality,
even surpassing the accuracy of DC and QR. Our approach comes with limited
performance penalty, and increases both robustness and scalability.
- 3:00-3:25 Restructuring the Symmetric QR Algorithm for Performance
Robert A. van de Geijn, University of Texas, Austin, USA
- 3:30-3:55
Spectral Divide-and-conquer Algorithms for Generalized Eigenvalue Problems
Yuji Nakatsukasa, University of Manchester, United Kingdom
Abstract. Spectral divide-and-conquer (SDC) algorithms solve eigenvalue
problems by computing an invariant subspace for a subset of the spectrum to
decouple the problem into two smaller subproblems. Recently Nakatsukasa and
Higham introduced a stable and efficient SDC algorithm for the symmetric
eigenvalue problem and the SVD. We propose an SDC algorithm for generalized
eigenvalue problems that minimizes communication (its main computational
kernels are QR factorization and matrix-multiplication) and has operation count
within a small constant factor of that for the standard QZ algorithm.