Using Advanced Vector Extensions AVX-512 for MPI Reduction

TitleUsing Advanced Vector Extensions AVX-512 for MPI Reduction
Publication TypeConference Paper
Year of Publication2020
AuthorsZhong, D., Q. Cao, G. Bosilca, and J. Dongarra
Conference NameEuroMPI/USA '20: 27th European MPI Users' Group Meeting
Date Published2020-09
Conference LocationAustin, TX
KeywordsInstruction level parallelism, Intel AVX2/AVX-512, Long vector extension, MPI reduction operation, Single instruction multiple data, Vector operation

As the scale of high-performance computing (HPC) systems continues to grow, researchers are devoted themselves to explore increasing levels of parallelism to achieve optimal performance. The modern CPU’s design, including its features of hierarchical memory and SIMD/vectorization capability, governs algorithms’ efficiency. The recent introduction of wide vector instruction set extensions (AVX and SVE) motivated vectorization to become of critical importance to increase efficiency and close the gap to peak performance. In this paper, we propose an implementation of predefined MPI reduction operations utilizing AVX, AVX2 and AVX-512 intrinsics to provide vector-based reduction operation and to improve the timeto- solution of these predefined MPI reduction operations. With these optimizations, we achieve higher efficiency for local computations, which directly benefit the overall cost of collective reductions. The evaluation of the resulting software stack under different scenarios demonstrates that the solution is at the same time generic and efficient. Experiments are conducted on an Intel Xeon Gold cluster, which shows our AVX-512 optimized reduction operations achieve 10X performance benefits than Open MPI default for MPI local reduction.

Project Tags: 
External Publication Flag: