Submitted by scrawford on
Title | Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham |
Conference Name | The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) |
Date Published | 2018-11 |
Publisher | IEEE |
Conference Location | Dallas, TX |
Abstract | Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax = b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16-FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic. |
DOI | 10.1109/SC.2018.00050 |