Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs

TitleAccelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs
Publication TypeConference Paper
Year of Publication2024
AuthorsLin, P. T., P. Nayak, A. Kashi, D. Kulkarni, A. Scheinberg, and H. Anzt
EditorWeiland, M., S. Neuwirth, C. Kruse, and T. Weinzierl
Conference NameISC High Performance 2024 International Workshops
Date Published2024-12
PublisherSpringer, Cham
Conference LocationHamburg, Germany
ISBN Number978-3-031-73715-2
Abstract

High-fidelity numerical simulations are necessary to drive design choices for future fusion devices, e.g. the ITER tokamak. XGC is a gyrokinetic Particle-in-Cell (PIC) application optimized for modeling the edge region plasma. The Coulomb collision operator is one of the more computationally expensive components of XGC. It requires linear solutions for a large number of small matrices with an identical sparsity pattern. These are still performed on the CPU, a major bottleneck given that exascale-class machines have over 95% of their compute performance on the GPUs. As the collision operator matrices are sparse, well-conditioned, and of medium size, batched iterative solvers utilizing sparse data structures are an attractive option.

We showcase the acceleration of XGC with an integration of the Ginkgo batched iterative solvers with realistic test cases from ITER and DIII-D. We build on our previous work, which focused on integration into a collision kernel proxy application, showing the substantial promise of Ginkgo’s solvers. We present results obtained from three platforms: NVIDIA A100 GPUs (NERSC Perlmutter), AMD MI250X GPUs (OLCF Frontier) and Intel Max 1550 GPUs (ALCF Aurora) and show the reduction in time provided by the Ginkgo solver compared with the CPU solver. We present a weak scaling study to almost full-scale on the NVIDIA platform. The results show that Ginkgo’s batched sparse iterative solvers enable efficient utilization of the GPU for this problem. The performance portability of Ginkgo in conjunction with Kokkos (used within XGC as the heterogeneous programming model) allows seamless execution on exascale-oriented heterogeneous architectures.

URLhttps://link.springer.com/10.1007/978-3-031-73716-9
DOI10.1007/978-3-031-73716-9
External Publication Flag: