Mixed-precision Block Gram Schmidt Orthogonalization

TitleMixed-precision Block Gram Schmidt Orthogonalization
Publication TypeConference Paper
Year of Publication2015
AuthorsYamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow
Conference Name6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Date Published2015-11
Conference LocationAustin, TX
AbstractThe mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the software-emulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a significant impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the software-emulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixed-precision CholQR, such a block variant can obtain speedups of up to 7:1 while maintaining about the same order of the numerical errors.
Project Tags: