Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Submitted by claxton on Mon, 11/27/2017 - 12:01

Title	Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning
Publication Type	Conference Paper
Year of Publication	2017
Authors	Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti
Conference Name	46th International Conference on Parallel Processing (ICPP)
Date Published	2017-08
Publisher	IEEE
Conference Location	Bristol, United Kingdom
Keywords	graphics processing units, Jacobian matrices, Kernel, linear systems, Parallel processing, Sparse matrices
Abstract	We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. All kernels heavily exploit the registers of the graphics processing unit (GPU) in order to deliver high performance for small problems. The development of these kernels is motivated by the need for tackling this embarrassingly parallel scenario in the context of block-Jacobi preconditioning that is relevant for the iterative solution of sparse linear systems.
URL	http://ieeexplore.ieee.org/abstract/document/8025283/?reload=true
DOI	10.1109/ICPP.2017.18

Project Tags:

External Publication Flag: