%0 Generic %D 2024 %T CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT) %A Maksim Melnichenko %A Oleg Balabanov %A Riley Murray %A James Demmel %A Michael W. Mahoney %A Piotr Luszczek %X This paper develops and analyzes a new algorithm for QR decomposition with column pivoting (QRCP) of rectangular matrices with large row counts. The algorithm combines methods from randomized numerical linear algebra in a particularly careful way in order to accelerate both pivot decisions for the input matrix and the process of decomposing the pivoted matrix into the QR form. The source of the latter acceleration is a use of randomized preconditioning and CholeskyQR. Comprehensive analysis is provided in both exact and finite-precision arithmetic to characterize the algorithm's rank-revealing properties and its numerical stability granted probabilistic assumptions of the sketching operator. An implementation of the proposed algorithm is described and made available inside the open-source RandLAPACK library, which itself relies on RandBLAS - also available in open-source format. Experiments with this implementation on an Intel Xeon Gold 6248R CPU demonstrate order-of-magnitude speedups relative to LAPACK's standard function for QRCP, and comparable performance to a specialized algorithm for unpivoted QR of tall matrices, which lacks the strong rank-revealing properties of the proposed method. %I arXiv %8 2024-02 %G eng %U https://arxiv.org/abs/2311.08316 %0 Generic %D 2022 %T Randomized Numerical Linear Algebra: A Perspective on the Field with an Eye to Software %A Riley Murray %A James Demmel %A Michael W. Mahoney %A N. Benjamin Erichson %A Maksim Melnichenko %A Osman Asif Malik %A Laura Grigori %A Piotr Luszczek %A Michał Dereziński %A Miles E. Lopes %A Tianyu Liang %A Hengrui Luo %A Jack Dongarra %K Randomized algorithms %X Randomized numerical linear algebra – RandNLA, for short – concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in and was largely driven by the development of scalable algorithms for machine learning and statistical data analysis applications. However, the true potential of RandNLA only came into focus once it began to integrate with the fields of numerical analysis and “classical” numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that are every bit as reliable as algorithms that might be found in libraries such as LAPACK. The spectrum of possibilities offered by RandNLA has created a virtuous cycle of contributions by numerical analysts, statisticians, theoretical computer scientists, and the machine learning community. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, and NVIDIA’s cuSOLVER. In view of these developments, we believe the time is ripe to accelerate the adoption of RandNLA in the scientific community. In particular, we believe the community stands to benefit significantly from a suitably defined “RandBLAS” and “RandLAPACK,” to serve as standard libraries for RandNLA, in much the same way that BLAS and LAPACK serve as standards for deterministic linear algebra. This monograph surveys the field of RandNLA as a step toward building mean- ingful RandBLAS and RandLAPACK libraries. Section 1 begins by setting scope and design principles for RandLAPACK and summarizing subsequent sections of the monograph. Section 2 focuses on RandBLAS, which is to be responsible for sketching. Details of functionality suitable for RandLAPACK are covered in the five sections that follow. Specifically, Sections 3 to 5 cover least squares and optimization, low- rank approximation, and other select problems that are well-understood in how they benefit from randomized algorithms. The remaining sections – on statistical leverage scores (Section 6) and tensor computations (Section 7) – read more like traditional surveys. The different flavor of these latter sections reflects how, in our assessment, the literature on these topics is still maturing. We provide a substantial amount of pseudo-code and supplementary material over the course of five appendices. Much of the pseudo-code has been tested via publicly available Matlab and Python implementations. %B University of California, Berkeley EECS Technical Report %I University of California, Berkeley %8 2022-11 %G eng %U https://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-258.html %R 10.48550/arXiv.2302.1147 %0 Generic %D 2020 %T Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC) %A James Demmel %A Jack Dongarra %A Julie Langou %A Julien Langou %A Piotr Luszczek %A Michael Mahoney %X The convergence of several unprecedented changes, including formidable new system design constraints and revolutionary levels of heterogeneity, has made it clear that much of the essential software infrastructure of computational science and engineering is, or will soon be, obsolete. Math libraries have historically been in the vanguard of software that must be adapted first to such changes, both because these low-level workhorses are so critical to the accuracy and performance of so many different types of applications, and because they have proved to be outstanding vehicles for finding and implementing solutions to the problems that novel architectures pose. Under the Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC) project, the principal designers of the Linear Algebra PACKage (LAPACK) and the Scalable Linear Algebra PACKage (ScaLAPACK), the combination of which is abbreviated Sca/LAPACK, aim to enhance and update these libraries for the ongoing revolution in processor architecture, system design, and application requirements by incorporating them into a layered package of software components—the BALLISTIC ecosystem—that provides users seamless access to state-of-the-art solver implementations through familiar and improved Sca/LAPACK interfaces. %B LAPACK Working Notes %I University of Tennessee %8 2020/07 %G eng %0 Generic %D 2018 %T Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem %A Osni Marques %A James Demmel %A Paulo B. Vasconcelos %X In this paper, we present an algorithm for the singular value decomposition (SVD) of a bidiagonal matrix by means of the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm is particularly suited for the computation of a subset of singular values and corresponding vectors. We focus on a sequential version of the algorithm, and discuss special cases and implementation details. We use a large set of bidiagonal matrices to assess the accuracy of the implementation in single and double precision, as well as to identify potential shortcomings. We show that the algorithm can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show time comparisons of an implementation that uses the strategy discussed in the paper as a building block for the computation of the SVD of general matrices. %B LAPACK Working Note %I University of Tennessee %8 2018-04 %G eng %0 Journal Article %J SIAM Journal on Matrix Analysis and Application %D 2014 %T Communication-Avoiding Symmetric-Indefinite Factorization %A Grey Ballard %A Dulceneia Becker %A James Demmel %A Jack Dongarra %A Alex Druinsky %A I Peled %A Oded Schwartz %A Sivan Toledo %A Ichitaro Yamazaki %K plasma %X We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal. %B SIAM Journal on Matrix Analysis and Application %V 35 %P 1364-1406 %8 2014-07 %G eng %N 4 %0 Journal Article %J IPDPS 2013 (submitted) %D 2013 %T Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures %A Ichitaro Yamazaki %A Dulceneia Becker %A Jack Dongarra %A Alex Druinsky %A I. Peled %A Sivan Toledo %A Grey Ballard %A James Demmel %A Oded Schwartz %X Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen’s algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques; e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms. %B IPDPS 2013 (submitted) %C Boston, MA %8 2013-00 %G eng %0 Book Section %B Handbook of Linear Algebra %D 2013 %T LAPACK %A Zhaojun Bai %A James Demmel %A Jack Dongarra %A Julien Langou %A Jenny Wang %X With a substantial amount of new material, the Handbook of Linear Algebra, Second Edition provides comprehensive coverage of linear algebra concepts, applications, and computational software packages in an easy-to-use format. It guides you from the very elementary aspects of the subject to the frontiers of current research. Along with revisions and updates throughout, the second edition of this bestseller includes 20 new chapters. %B Handbook of Linear Algebra %7 Second %I CRC Press %C Boca Raton, FL %@ 9781466507289 %G eng %0 Journal Article %J SciDAC Review %D 2009 %T Accelerating Time-To-Solution for Computational Science and Engineering %A James Demmel %A Jack Dongarra %A Armando Fox %A Sam Williams %A Vasily Volkov %A Katherine Yelick %B SciDAC Review %8 2009-00 %G eng %0 Generic %D 2009 %T Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects %A Emmanuel Agullo %A James Demmel %A Jack Dongarra %A Bilel Hadri %A Jakub Kurzak %A Julien Langou %A Hatem Ltaeif %A Piotr Luszczek %A Rajib Nath %A Stanimire Tomov %A Asim YarKhan %A Vasily Volkov %I The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09) %C Portland, OR %8 2009-11 %G eng %0 Conference Proceedings %B Journal of Physics: Conference Series %D 2009 %T Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects %A Emmanuel Agullo %A James Demmel %A Jack Dongarra %A Bilel Hadri %A Jakub Kurzak %A Julien Langou %A Hatem Ltaeif %A Piotr Luszczek %A Stanimire Tomov %K magma %K plasma %B Journal of Physics: Conference Series %V 180 %8 2009-00 %G eng %0 Generic %D 2008 %T Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project) %A Marc Baboulin %A James Demmel %A Jack Dongarra %A Stanimire Tomov %A Vasily Volkov %I The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08) %C Austin, TX %8 2008-11 %G eng %0 Generic %D 2008 %T HPCS Library Study Effort %A Jack Dongarra %A James Demmel %A Parry Husbands %A Piotr Luszczek %B University of Tennessee Computer Science Technical Report, UT-CS-08-617 %8 2008-01 %G eng %0 Journal Article %J PARA 2006 %D 2006 %T Prospectus for the Next LAPACK and ScaLAPACK Libraries %A James Demmel %A Jack Dongarra %A B. Parlett %A William Kahan %A Ming Gu %A David Bindel %A Yozo Hida %A Xiaoye Li %A Osni Marques %A Jason E. Riedy %A Christof Voemel %A Julien Langou %A Piotr Luszczek %A Jakub Kurzak %A Alfredo Buttari %A Julien Langou %A Stanimire Tomov %B PARA 2006 %C Umea, Sweden %8 2006-06 %G eng %0 Journal Article %D 2005 %T LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers %A James Demmel %A Jack Dongarra %I LAPACK Working Note 164 %8 2005-01 %G eng %0 Conference Proceedings %B IEEE Proceedings (to appear) %D 2004 %T Self Adapting Linear Algebra Algorithms and Software %A James Demmel %A Jack Dongarra %A Victor Eijkhout %A Erika Fuentes %A Antoine Petitet %A Rich Vuduc %A Clint Whaley %A Katherine Yelick %K salsa %K sans %B IEEE Proceedings (to appear) %8 2004-00 %G eng %0 Journal Article %J ACM Transactions on Mathematical Software %D 2002 %T An Updated Set of Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B ACM Transactions on Mathematical Software %V 28 %P 135-151 %8 2002-12 %G eng %R 10.1145/567806.567807 %0 Journal Article %J (an update), submitted to ACM TOMS %D 2001 %T Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B (an update), submitted to ACM TOMS %8 2001-02 %G eng %0 Journal Article %J Philadelphia: Society for Industrial and Applied Mathematics %D 1999 %T LAPACK Users' Guide, 3rd ed. %A Ed Anderson %A Zhaojun Bai %A Christian Bischof %A Susan Blackford %A James Demmel %A Jack Dongarra %A Jeremy Du Croz %A Anne Greenbaum %A Sven Hammarling %A Alan McKenney %A Danny Sorensen %B Philadelphia: Society for Industrial and Applied Mathematics %8 1999-01 %G eng