%0 Journal Article %J International Journal of High Performance Computing Applications %D 2020 %T Evaluating Asynchronous Schwarz Solvers on GPUs %A Pratik Nayak %A Terry Cojean %A Hartwig Anzt %K abstract Schwarz methods %K Asynchronous solvers %K exascale %K GPUs %K multicore processors %K parallel numerical linear algebra %X With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL’s Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem. %B International Journal of High Performance Computing Applications %8 2020-08 %G eng %R https://doi.org/10.1177/1094342020946814 %0 Conference Paper %B International Conference on Computational Science (ICCS 2020) %D 2020 %T heFFTe: Highly Efficient FFT for Exascale %A Alan Ayala %A Stanimire Tomov %A Azzam Haidar %A Jack Dongarra %K exascale %K FFT %K gpu %K scalable algorithm %X Exascale computing aspires to meet the increasing demands from large scientific applications. Software targeting exascale is typically designed for heterogeneous architectures; henceforth, it is not only important to develop well-designed software, but also make it aware of the hardware architecture and efficiently exploit its power. Currently, several and diverse applications, such as those part of the Exascale Computing Project (ECP) in the United States, rely on efficient computation of the Fast Fourier Transform (FFT). In this context, we present the design and implementation of heFFTe (Highly Efficient FFT for Exascale) library, which targets the upcoming exascale supercomputers. We provide highly (linearly) scalable GPU kernels that achieve more than 40× speedup with respect to local kernels from CPU state-of-the-art libraries, and over 2× speedup for the whole FFT computation. A communication model for parallel FFTs is also provided to analyze the bottleneck for large-scale problems. We show experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 24,576 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs. %B International Conference on Computational Science (ICCS 2020) %C Amsterdam, Netherlands %8 2020-06 %G eng %R https://doi.org/10.1007/978-3-030-50371-0_19 %0 Conference Paper %B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) %D 2020 %T The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale %A George Bosilca %A Robert Harrison %A Thomas Herault %A Mohammad Mahdi Javanmard %A Poornima Nookala %A Edward Valeev %K dag %K dataflow %K exascale %K graph %K High-performance computing %K workflow %X We describe TESSE, an emerging general-purpose, open-source software ecosystem that attacks the twin challenges of programmer productivity and portable performance for advanced scientific applications on modern high-performance computers. TESSE builds upon and extends the ParsecDAG/-dataflow runtime with a new Domain Specific Languages (DSL) and new integration capabilities. Motivating this work is our belief that such a dataflow model, perhaps with applications composed in domain specific languages, can overcome many of the challenges faced by a wide variety of irregular applications that are poorly served by current programming and execution models. Two such applications from many-body physics and applied mathematics are briefly explored. This paper focuses upon the Template Task Graph (TTG), which is TESSE's main C++ Api that provides a powerful work/data-flow programming model. Algorithms on spatial trees, block-sparse tensors, and wave fronts are used to illustrate the API and associated concepts, as well as to compare with related approaches. %B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) %I IEEE %8 2020-11 %G eng %R https://doi.org/10.1109/ESPM251964.2020.00011 %0 Journal Article %J Concurrency Computation: Practice and Experience %D 2018 %T A Survey of MPI Usage in the US Exascale Computing Project %A David E. Bernholdt %A Swen Boehm %A George Bosilca %A Manjunath Gorentla Venkata %A Ryan E. Grant %A Thomas Naughton %A Howard P. Pritchard %A Martin Schulz %A Geoffroy R. Vallee %K exascale %K MPI %X The Exascale Computing Project (ECP) is currently the primary effort in theUnited States focused on developing “exascale” levels of computing capabilities, including hardware, software, and applications. In order to obtain amore thorough understanding of how the software projects under the ECPare using, and planning to use theMessagePassing Interface (MPI), and help guide the work of our own project within the ECP, we created a survey.Of the 97 ECP projects active at the time the survey was distributed, we received 77 responses, 56 of which reported that their projects were usingMPI. This paper reports the results of that survey for the benefit of the broader community of MPI developers. %B Concurrency Computation: Practice and Experience %8 2018-09 %G eng %9 Special Issue %R https://doi.org/10.1002/cpe.4851 %0 Book Section %B Contemporary High Performance Computing: From Petascale Toward Exascale %D 2013 %T HPC Challenge: Design, History, and Implementation Highlights %A Jack Dongarra %A Piotr Luszczek %K exascale %K hpc challenge %K hpcc %B Contemporary High Performance Computing: From Petascale Toward Exascale %I Taylor and Francis %C Boca Raton, FL %@ 978-1-4665-6834-1 %G eng %& 2