Submitted by webmaster on
Title | MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs |
Publication Type | Tech Report |
Year of Publication | 2016 |
Authors | Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra |
Technical Report Series Title | Innovative Computing Laboratory Technical Report |
Number | ICL-UT-16-02 |
Date Published | 2016-08 |
Institution | University of Tennessee |
Abstract | A particularly challenging class of problems arising in many applications, called batched problems, involves linear algebra operations on many small-sized matrices. We proposed and designed batched BLAS (Basic Linear Algebra Subroutines), Level-2 GEMV and Level-3 GEMM, to solve them. We illustrate how batched GEMV and GEMM to be able to assist batched advance factorization (e.g. bi-diagonalization) and other BLAS routines (e.g. triangular solve) to achieve optimal performance on GPUs. Our solutions achieved up to 2.8-3× speedups compared to CUBLAS and MKL solutions, wherever possible. We illustrated the batched methodology on a real-world Hydrodynamic application by reformulating the tensor operations into batched BLAS GEMV and GEMM operations. A 2.5× speedup and a 1.4× greenup are obtained by changing 10% of the code. We accelerated and scaled it on Titan supercomputer to 4096 nodes. |