Batched BLAS

Overview

A current trend in high-performance computing is to decompose a large linear algebra problem into batches containing thousands of smaller problems, which can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS operations in parallel whilst making efficient use of their hardware.

Workshops

SC24 BoF Session

SIAM-CSE19

SC18 BoF Session

SC17 BoF Session

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2017

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2016

Papers

Jack Dongarra, Iain Duff, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jonathan Hogg, Pedro Valero Lara, Piotr Luszczek, Mawussi Zounon, Samuel D. Relton, Stanimire Tomov, Timothy Costa, and Sarah Knepper "Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification”, July 2018.
Samuel D. Relton, Pedro Valero-Lara, and Mawussi Zounon, "A Comparison of Potential Interfaces for Batched BLAS Computations”, NLAFET Working Note 5, August 2016.
Jack Dongarra, Iain Duff, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jonathan Hogg, Pedro Valero Lara, Mawussi Zounon, Samuel D. Relton, and Stanimire Tomov, "A Proposed API for Batched Basic Linear Algebra Subprograms”, Draft Report, May 2016.
Peter Ahrens, Hong Diep Nguyen, and James Demmel, "Efficient Reproducible Floating Point Summation and BLAS”, Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report no. UCB/EECS-2015-229, December 2015.

Overview

Workshops

SC24 BoF Session

SIAM-CSE19

SC18 BoF Session

SC17 BoF Session

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2017

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2016

Papers

Batched BLAS SC25 Handout

Batched BLAS Poster

Batched BLAS Slides

Links

ReproBLAS

Compact Batched API Document

Overview

Workshops

SC24 BoF Session

SIAM-CSE19

SC18 BoF Session

SC17 BoF Session

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2017

Workshop on Batched, Reproducible, and Reduced Precision BLAS 2016

Papers

Batched BLAS SC25 Handout

Batched BLAS Poster

Batched BLAS Slides

Links

ReproBLAS

Compact Batched API Document

Sponsored By

Industry Support From