Submitted by scrawford on
|Title||Distributed-Memory Lattice H-Matrix Factorization|
|Publication Type||Journal Article|
|Year of Publication||2019|
|Authors||Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra|
|Journal||The International Journal of High Performance Computing Applications|
We parallelize the LU factorization of a hierarchical low-rank matrix (ℋ-matrix) on a distributed-memory computer. This is much more difficult than the ℋ-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the ℋ-matrix factorization. In this work, we propose to factorize the matrix using a “lattice ℋ-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the ℋ-matrix format. These blocks stored in the ℋ-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of ℋ-matrix factorization. We first compare factorization performances using the ℋ-matrix, BLR, and lattice ℋ-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the ℋ-matrix format, and hence a much lower cost of factorization than BLR. We then compare the BLR and lattice ℋ-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.