Distributed-Memory Lattice H-Matrix Factorization

Ichitaro Yamazaki; Akihiro Ida; Rio Yokota; Jack Dongarra

Submitted by scrawford on Mon, 08/12/2019 - 12:42

Title	Distributed-Memory Lattice H-Matrix Factorization
Publication Type	Journal Article
Year of Publication	2019
Authors	Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra
Journal	The International Journal of High Performance Computing Applications
Volume	33
Issue	5
Pagination	1046–1063
Date Published	2019-08
Abstract	We parallelize the LU factorization of a hierarchical low-rank matrix (ℋ-matrix) on a distributed-memory computer. This is much more difficult than the ℋ-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the ℋ-matrix factorization. In this work, we propose to factorize the matrix using a “lattice ℋ-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the ℋ-matrix format. These blocks stored in the ℋ-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of ℋ-matrix factorization. We first compare factorization performances using the ℋ-matrix, BLR, and lattice ℋ-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the ℋ-matrix format, and hence a much lower cost of factorization than BLR. We then compare the BLR and lattice ℋ-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.
DOI	10.1177/1094342019861139

Project Tags:

magma

peeks

File:

icl-utk-1231-2019.pdf

External Publication Flag: