Batched BLAS

SC24 BoF Session

November 19, 2024, Atlanta, GA, USA

Batch computations solve relatively small, independent problems on HPC architectures. For over a decade, there has been a huge demand for high-performance batch linear algebra (LA) software, especially for uniform batches. However, we might be just scratching the surface of what batch LA software can offer. From non-uniform batches to batch sparse algorithms and even JIT-compiled linear operators, applications are constantly pushing the boundaries of batch LA software. Interested audience members are encouraged to attend this BoF, listen to short presentations by experts from academia and the industry, and share their feedback and experiences with the community.

Over the past ten years, the demand for high performance batch linear algebra has skyrocketed. Prior to 2014, there was very limited or no support of batch linear algebra software by the industry. As of today, every major vendor (Intel, NVIDIA, and AMD) supports some form of batch LA functionality in their software stacks. In addition, the research community has significantly contributed to the development of batch LA software beyond vendor support, and expanded its adoption in critical applications. Experts from the research community and the industry gathered on multiple occasions to share their work and ideas, develop a standard interface for batch BLAS, and identify potential challenges and future directions. This will be the third SC BoF about batch LA software. The first two BoFs were held in 2017 and 2018, and focused mostly on batch BLAS on dense matrices. At this BoF, we will push the boundaries beyond batch BLAS operations, and discuss new aspects of batch LA software. Batch sparse LA algorithms, both direct and iterative, have attained noticeable interest thanks to the ECP project. Non-uniform batches are also gaining ground in sparse direct solvers. We also show use cases where scientists have to build linear operators at run-time in order to achieve high performance tensor contraction kernels in finite element analysis. The BoF has a good representation from the academia (University of Tennessee, and Technial University in Munich), US National Laboratorys (Sandia National Lab. and Lawrence Livermore National Lab.), and the industry (Intel and NVIDIA).

SC24 BoF Session: Batched, Reproducible, and Reduced Precision BLAS

Presenter	Affiliation	Title	File
Ahmad Abdelfattah	University of Tennessee	Introduction / MAGMA	Download
Christoph Klein	NVIDIA	Batch Linear Algebra APIs in the NVIDIA Math Libraries	Download
Sarah Knepper	Intel	Evolution of Batched Linear Algebra in Intel oneAPI Math Kernel Library (Intel oneMKL)	Download
Hartwig Anzt	Technical University of Munich / University of Tennessee	The Future is Sparse – also for Batched Functionality?	Download
Siva Rajamanickam	Sandia National Laboratories	Kokkos Kernels Batched Linear Algebra	Download
Tzanio Kolev	Lawrence Livermore National Laboratory	Tzanio: Batched Linear Algebra for High-Order Finite Elements	Download