Evaluating Asynchronous Schwarz Solvers on GPUs

Pratik Nayak; Terry Cojean; Hartwig Anzt

Submitted by scrawford on Wed, 02/03/2021 - 16:03

Title	Evaluating Asynchronous Schwarz Solvers on GPUs
Publication Type	Journal Article
Year of Publication	2020
Authors	Nayak, P., T. Cojean, and H. Anzt
Journal	International Journal of High Performance Computing Applications
Date Published	2020-08
Keywords	abstract Schwarz methods, Asynchronous solvers, exascale, GPUs, multicore processors, parallel numerical linear algebra
Abstract	With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL’s Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem.
DOI	10.1177/1094342020946814

Project Tags:

peeks

External Publication Flag: