Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure

Thomas Herault; Yves Robert; George Bosilca; Robert Harrison; Cannada Lewis; Edward Valeev; Jack Dongarra

Submitted by scrawford on Tue, 01/05/2021 - 09:45

Title	Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure
Publication Type	Conference Paper
Year of Publication	2021
Authors	Herault, T., Y. Robert, G. Bosilca, R. Harrison, C. Lewis, E. Valeev, and J. Dongarra
Conference Name	35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
Date Published	2021-05
Publisher	IEEE
Conference Location	Portland, OR
Keywords	block-sparse matrix multiplication, distributed-memory, Electronic structure, multi-GPU node, parsec, tensor contraction
Abstract	Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-tosolution (e.g., to study dynamical problems or for real-time analysis) and to accommodate problems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.
URL	https://hal.inria.fr/hal-02970659/document

Project Tags:

dte

parsec

External Publication Flag: