Domain Overlap for Iterative Sparse Triangular Solves on GPUs

Hartwig Anzt; Edmond Chow; Daniel Szyld; Jack Dongarra

Submitted by webmaster on Wed, 12/07/2016 - 09:18

Title	Domain Overlap for Iterative Sparse Triangular Solves on GPUs
Publication Type	Conference Proceedings
Year of Publication	2016
Authors	Anzt, H., E. Chow, D. Szyld, and J. Dongarra
Editor	Bungartz, H-J., P. Neumann, and W. E. Nagel
Conference Name	Software for Exascale Computing - SPPEXA
Series Title	Lecture Notes in Computer Science and Engineering
Volume	113
Pagination	527–545
Date Published	2016-09
Publisher	Springer International Publishing
Abstract	Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the block-asynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and time-to-solution.
DOI	10.1007/978-3-319-40528-5_24