MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs

Submitted by webmaster on Wed, 07/28/2021 - 13:44

Title	MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs
Publication Type	Tech Report
Year of Publication	2016
Authors	Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra
Technical Report Series Title	Innovative Computing Laboratory Technical Report
Number	ICL-UT-16-02
Date Published	2016-08
Institution	University of Tennessee
Abstract	A particularly challenging class of problems arising in many applications, called batched problems, involves linear algebra operations on many small-sized matrices. We proposed and designed batched BLAS (Basic Linear Algebra Subroutines), Level-2 GEMV and Level-3 GEMM, to solve them. We illustrate how batched GEMV and GEMM to be able to assist batched advance factorization (e.g. bi-diagonalization) and other BLAS routines (e.g. triangular solve) to achieve optimal performance on GPUs. Our solutions achieved up to 2.8-3× speedups compared to CUBLAS and MKL solutions, wherever possible. We illustrated the batched methodology on a real-world Hydrodynamic application by reformulating the tensor operations into batched BLAS GEMV and GEMM operations. A 2.5× speedup and a 1.4× greenup are obtained by changing 10% of the code. We accelerated and scaled it on Titan supercomputer to 4096 nodes.

Project Tags:

File:

External Publication Flag: