Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization

Submitted by scrawford on Mon, 07/09/2018 - 10:46

Title	Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization
Publication Type	Conference Paper
Year of Publication	2018
Authors	Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra
Conference Name	IEEE High Performance Extreme Computing Conference (HPEC’18)
Date Published	2018-09
Publisher	IEEE
Conference Location	Waltham, MA
Abstract	This paper introduces several frameworks for the design and implementation of high performance GPU kernels that target batch workloads with irregular sizes. Such workloads are ubiquitous in many scientific applications, including sparse direct solvers, astrophysics, and quantum chemistry. The paper addresses two main categories of frameworks, taking the Cholesky factorization as a case study. The first uses hostside kernel launches, and the second uses device-side launches. Within each category, different design options are introduced, with an emphasis on the advantages and the disadvantages of each approach. Our best performing design outperforms the state-of-the-art CPU implementation, scoring up to 4.7× speedup in double precision on a Pascal P100 GPU.

Project Tags:

File:

External Publication Flag: