Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU

TitleStability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU
Publication TypeJournal Article
Year of Publication2016
AuthorsYamazaki, I., S. Tomov, and J. Dongarra
JournalACM Transactions on Mathematical Software (TOMS)
Volume43
Issue2
Date Published2016-10
AbstractTo orthonormalize a set of dense vectors, Singular Value QR (SVQR) requires only one global reduction between the parallel processing units, and uses BLAS-3 kernels to perform most of its local computation. As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers. In this paper, we study the stability and performance of various SVQR implementations on multicore CPUs with a GPU, focusing on the dense triangular solve, which performs half of the total floating-point operations in SVQR. As a part of this study, we examine its adaptive mixed-precision variant that decides if a lower-precision arithmetic can be used for the triangular solution at runtime without increasing the order of its orthogonality error. Since the backward error of this adaptive mixed-precision variant is significantly greater than that of the standard SVQR, we study its effects on the solution convergence of several subspace projection methods for solving a linear system of equations and for computing singular values or eigenvalues of a sparse matrix. Our experimental results indicate that in some cases, the convergence rate of the solver may not be affected by the larger backward errors, while reducing the time to solution.
Project Tags: