Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation

Mathieu Faverge; Julien Langou; Yves Robert; Jack Dongarra

Submitted by scrawford on Thu, 08/17/2017 - 14:05

Title	Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation
Publication Type	Conference Paper
Year of Publication	2017
Authors	Faverge, M., J. Langou, Y. Robert, and J. Dongarra
Conference Name	IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Date Published	2017-05
Publisher	IEEE
Conference Location	Orlando, FL
Keywords	Algorithm design and analysis, Approximation algorithms, Kernel, Multicore processing, Shape, Software algorithms, Transforms
Abstract	We study tiled algorithms for going from a “full” matrix to a condensed “band bidiagonal” form using orthog-onal transformations: (i) the tiled bidiagonalization algorithm BIDIAG, which is a tiled version of the standard scalar bidiago-nalization algorithm; and (ii) the R-bidiagonalization algorithm R-BIDIAG, which is a tiled version of the algorithm which consists in first performing the QR factorization of the initial matrix, then performing the band-bidiagonalization of the R- factor. For both BIDIAG and R-BIDIAG, we use four main types of reduction trees, namely FLATTS, FLATTT, GREEDY, and a newly introduced auto-adaptive tree, AUTO. We provide a study of critical path lengths for these tiled algorithms, which shows that (i) R-BIDIAG has a shorter critical path length than BIDIAG for tall and skinny matrices, and (ii) GREEDY based schemes are much better than earlier proposed algorithms with unbounded resources. We provide experiments on a single multicore node, and on a few multicore nodes of a parallel distributed shared- memory system, to show the superiority of the new algorithms on a variety of matrix sizes, matrix shapes and core counts.
DOI	10.1109/IPDPS.2017.46

File:

icl-utk-959-2017.pdf