|Domain Overlap for Iterative Sparse Triangular Solves on GPUs
|Year of Publication
|Anzt, H., E. Chow, D. Szyld, and J. Dongarra
|Bungartz, H-J., P. Neumann, and W. E. Nagel
|Software for Exascale Computing - SPPEXA
|Lecture Notes in Computer Science and Engineering
|Springer International Publishing
|Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the block-asynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and time-to-solution.