High-performance Cholesky Factorization for GPU-only Execution

Azzam Haidar; Ahmad Abdelfattah; Stanimire Tomov; Jack Dongarra

Submitted by scrawford on Mon, 11/06/2017 - 12:03

Title	High-performance Cholesky Factorization for GPU-only Execution
Publication Type	Conference Paper
Year of Publication	2017
Authors	Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra
Conference Name	Proceedings of the General Purpose GPUs (GPGPU-10)
Date Published	2017-02
Publisher	ACM
Conference Location	Austin, TX
Abstract	We present our performance analysis, algorithm designs, and the optimizations needed for the development of high-performance GPU-only algorithms, and in particular, for the dense Cholesky factorization. In contrast to currently promoted designs that solve parallelism challenges on multicore architectures by representing algorithms as Directed Acyclic Graphs (DAGs), where nodes are tasks of fine granularity and edges are the dependencies between the tasks, our designs explicitly target manycore architectures like GPUs and feature coarse granularity tasks (that can be hierarchically split into fine grain data-parallel subtasks). Furthermore, in contrast to hybrid algorithms that schedule difficult to parallelize tasks on CPUs, we develop highly-efficient code for entirely GPU execution. GPU-only codes remove the expensive CPU-to-GPU communications and the tuning challenges related to slow CPU and/or low CPU-to-GPU bandwidth. We show that on latest GPUs, like the P100, this becomes so important that the GPU-only code even outperforms the hybrid MAGMA algorithms when the CPU tasks and communications can not be entirely overlapped with GPU computations. We achieve up to 4,300 GFlop/s in double precision on a P100 GPU, which is about 7-8× faster than high-end multicore CPUs, e.g., two 10-cores Intel Xeon E5-2650 v3 Haswell CPUs, where MKL runs up to about 500-600 Gflop/s. The new algorithm also outperforms significantly the GPU-only implementation currently available in the NVIDIA cuSOLVER library.
DOI	10.1145/3038228.3038237

Project Tags:

magma

File:

icl-utk-987-2017.pdf

External Publication Flag: