Using long vector extensions for MPI reductions

Zhong, Dong; Cao, Qinglei; George Bosilca; Dongarra, Jack

Submitted by webmaster on Wed, 06/22/2022 - 10:52

Title	Using long vector extensions for MPI reductions
Publication Type	Journal Article
Year of Publication	2022
Authors	Zhong, D., Q. Cao, G. Bosilca, and J. Dongarra
Journal	Parallel Computing
Volume	109
Pagination	102871
Date Published	2022-03
ISSN	01678191
Abstract	The modern CPU’s design, including the deep memory hierarchies and SIMD/vectorization capability have a more significant impact on algorithms’ efficiency than the modest frequency increase observed recently. The current introduction of wide vector instruction set extensions (AVX and SVE) motivated vectorization to become a critical software component to increase efficiency and close the gap to peak performance. In this paper, we investigate the impact of the vectorization of MPI reduction operations. We propose an implementation of predefined MPI reduction operations using vector intrinsics (AVX and SVE) to improve the time-to-solution of the predefined MPI reduction operations. The evaluation of the resulting software stack under different scenarios demonstrates that the approach is not only efficient but also generalizable to many vector architectures. Experiments conducted on varied architectures (Intel Xeon Gold, AMD Zen 2, and Arm A64FX), show that the proposed vector extension optimized reduction operations significantly reduce completion time for collective communication reductions. With these optimizations, we achieve higher memory bandwidth and an increased efficiency for local computations, which directly benefit the overall cost of collective reductions and applications based on them.
URL	https://www.sciencedirect.com/science/article/pii/S0167819121001137
DOI	10.1016/j.parco.2021.102871
Short Title	Parallel Computing

External Publication Flag: