%0 Generic
%D 2022
%T Communication Avoiding LU with Tournament Pivoting in SLATE
%A Rabab Alomairy
%A Mark Gates
%A Sebastien Cayrols
%A Dalal Sukkari
%A Kadir Akbudak
%A Asim YarKhan
%A Paul Bagwell
%A Jack Dongarra
%B SLATE Working Notes
%8 2022-01
%G eng
%0 Conference Paper
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%D 2021
%T Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems
%A Qinglei Cao
%A Yu Pei
%A Kadir Akbudak
%A George Bosilca
%A Hatem Ltaief
%A David Keyes
%A Jack Dongarra
%K asynchronous executions and load balancing
%K dynamic runtime system
%K environmental applications
%K High-performance computing
%K low-rank matrix computations
%K task-based programming model
%K user productivity
%X The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, lowrank matrix approximations, where the main idea consists of exploiting data sparsity typically by compressing off-diagonal tiles up to an application-specific accuracy threshold, have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires to extend PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be taken at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of Matern matrix kernel, which exhibits challenging nonuniform ´high ranks in off-diagonal tiles. We first provide a dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%I IEEE
%C Portland, OR
%8 2021-05
%G eng
%0 Generic
%D 2021
%T SLATE Performance Improvements: QR and Eigenvalues
%A Kadir Akbudak
%A Paul Bagwell
%A Sebastien Cayrols
%A Mark Gates
%A Dalal Sukkari
%A Asim YarKhan
%A Jack Dongarra
%B SLATE Working Notes
%8 2021-04
%G eng
%0 Conference Paper
%B Platform for Advanced Scientific Computing Conference (PASC20)
%D 2020
%T Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications
%A Qinglei Cao
%A Yu Pei
%A Kadir Akbudak
%A Aleksandr Mikhalev
%A George Bosilca
%A Hatem Ltaief
%A David Keyes
%A Jack Dongarra
%X Climate and weather can be predicted statistically via geospatial Maximum Likelihood Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE-based iterative optimization procedure requires the solving of large-scale linear systems that performs a Cholesky factorization on a symmetric positive-definite covariance matrix---a demanding dense factorization in terms of memory footprint and computation. We propose a novel solution to this problem: at the mathematical level, we reduce the computational requirement by exploiting the data sparsity structure of the matrix off-diagonal tiles by means of low-rank approximations; and, at the programming-paradigm level, we integrate PaRSEC, a dynamic, task-based runtime to reach unparalleled levels of efficiency for solving extreme-scale linear algebra matrix operations. The resulting solution leverages fine-grained computations to facilitate asynchronous execution while providing a flexible data distribution to mitigate load imbalance. Performance results are reported using 3D synthetic datasets up to 42M geospatial locations on 130, 000 cores, which represent a cornerstone toward fast and accurate predictions of environmental applications.
%B Platform for Advanced Scientific Computing Conference (PASC20)
%I ACM
%C Geneva, Switzerland
%8 2020-06
%G eng
%R https://doi.org/10.1145/3394277.3401846
%0 Conference Paper
%B Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19
%D 2019
%T Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools
%A Qinglei Cao
%A Yu Pei
%A Thomas Herault
%A Kadir Akbudak
%A Aleksandr Mikhalev
%A George Bosilca
%A Hatem Ltaief
%A David Keyes
%A Jack Dongarra
%B Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19
%I ACM
%C Denver, CO
%8 2019-11
%G eng