Mixed-precision Block Gram Schmidt Orthogonalization

Ichitaro Yamazaki; Stanimire Tomov; Jakub Kurzak; Jack Dongarra; Jesse Barlow

Submitted by webmaster on Tue, 11/10/2015 - 11:08

Title	Mixed-precision Block Gram Schmidt Orthogonalization
Publication Type	Conference Paper
Year of Publication	2015
Authors	Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow
Conference Name	6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Date Published	2015-11
Publisher	ACM
Conference Location	Austin, TX
Abstract	The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the software-emulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a significant impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the software-emulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixed-precision CholQR, such a block variant can obtain speedups of up to 7:1 while maintaining about the same order of the numerical errors.

Project Tags:

magma

File:

icl-utk-821-2015.pdf