Ginkgo—A math library designed for platform portability

Terry Cojean; Yu-Hsiang Mike Tsai; Hartwig Anzt

Submitted by claxton on Thu, 12/01/2022 - 15:16

Title	Ginkgo—A math library designed for platform portability
Publication Type	Journal Article
Year of Publication	2022
Authors	Cojean, T., Y-H. Mike Tsai, and H. Anzt
Journal	Parallel Computing
Volume	111
Pagination	102902
Date Published	2022-02
ISSN	0167-8191
Keywords	AMD, Intel, nVidia, performance portability, Platform Portability, Porting to GPU accelerators
Abstract	In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime longer than a specific system, e.g., a supercomputer, and it is important for the domain scientists that realize their scientific application in a software framework and want to be able to run on one or another system. On a high level, there exist two approaches for realizing platform portability: (1) implementing software using a portability layer leveraging any technique which always generates specific kernels from another language or through an interface for running on different architectures; and (2) providing backends for different hardware architectures, with the backends typically differing in how and in which programming language functionality is realized due to using the language of choice for each hardware (e.g., CUDA kernels for NVIDIA GPUs, SYCL (DPC++) kernels to targeting Intel GPUs and other supported hardware, …). In practice, these two approaches can be combined in applications to leverage their respective strengths. In this paper, we present how we realize portability across different hardware architectures for the Ginkgo library by following the second strategy and the goal to not only port to new hardware architectures but also achieve good performance. We present the Ginkgo library design, separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also present the performance we achieve with this approach for distinct hardware backends.
URL	https://www.sciencedirect.com/science/article/pii/S0167819122000096
DOI	10.1016/j.parco.2022.102902

External Publication Flag: