Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era

Gates, Mark; YarKhan, Asim; Sukkari, Dalal; Akbudak, Kadir; Cayrols, Sebastien; Bielich, Daniel; Abdelfattah, Ahmad; Farhan, Mohammed Al; Dongarra, Jack

Submitted by webmaster on Mon, 08/04/2025 - 17:38

Title	Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era
Publication Type	Conference Paper
Year of Publication	2022
Authors	Gates, M., A. YarKhan, D. Sukkari, K. Akbudak, S. Cayrols, D. Bielich, A. Abdelfattah, M. Al Farhan, and J. Dongarra
Conference Name	2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
Date Published	2022-11
Publisher	IEEE
Conference Location	Dallas, TX, USA
Keywords	distributed computing, GPU computing, numerical linear algebra
Abstract	The SLATE project is implementing a distributed dense linear algebra library for highly-scalable distributed-memory accelerator-based computer systems. The goal is to provide a library that can be easily ported to different hardware (CPUs, GPUs, accelerators) and will provide high performance for machines into the future. Current ports include CPUs, CUDA, ROCm, and oneAPI. We achieve both performance and portability by leveraging several layers and abstractions, including OpenMP tasks to track data dependencies, MPI for distributed communication, and the BLAS++ and LAPACK++ libraries developed as a portable layer across vendor-optimized CPU and GPU BLAS and LAPACK functionality. We rely on the C++ standard library and templating to reduce code duplication for better maintainability. The few kernels not present in BLAS are implemented in CUDA, HIP, and OpenMP target offload, and are easily ported to new platforms.
URL	https://ieeexplore.ieee.org/document/10024624
DOI	10.1109/P3HPC56579.2022.00009

Project Tags:

slate

External Publication Flag: