Submitted by scrawford on
Title | Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs |
Publication Type | Journal Article |
Year of Publication | 2018 |
Authors | Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 29 |
Issue | 12 |
Pagination | 2700–2712 |
Date Published | 2018-12 |
Keywords | Dense linear solvers, energy efficiency, GPU computing |
Abstract | Graphics Processing Units (GPUs) are widely used in accelerating dense linear solvers. The matrix factorizations, which dominate the runtime for these solvers, are often designed using a hybrid scheme, where GPUs perform trailing matrix updates, while the CPUs perform the panel factorizations. Consequently, hybrid solutions require high-end CPUs and optimized CPU software in order to deliver high performance. Furthermore, they lack the energy efficiency inherent for GPUs due to the use of less energy-efficient CPUs, as well as CPU-GPU communications. This paper presents analysis and design techniques that overcome the shortcomings of the hybrid algorithms, and allow the design of high-performance and energy-efficient dense LU and Cholesky factorizations that use GPUs only. The full GPU solution eliminates the need for a high-end CPU and optimized CPU software, which leads to a better energy efficiency. We discuss different design choices, and introduce optimized GPU kernels for panel factorizations. The developed solutions achieve 90+ percent of the performance of optimized hybrid solutions, while improving the energy efficiency by 50 percent. They outperform the vendor library by 30-50 percent in single precision, and 15-50 percent in double precision. We also show that hybrid designs trail the proposed solutions in performance when optimized CPU software is not available. |
DOI | 10.1109/TPDS.2018.2842785 |