Predicting MPI Collective Communication Performance Using Machine Learning

Sascha Hunold; Abhinav Bhatele; George Bosilca; Peter Knees

Submitted by scrawford on Fri, 03/26/2021 - 12:25

Title	Predicting MPI Collective Communication Performance Using Machine Learning
Publication Type	Conference Paper
Year of Publication	2020
Authors	Hunold, S., A. Bhatele, G. Bosilca, and P. Knees
Conference Name	2020 IEEE International Conference on Cluster Computing (CLUSTER)
Date Published	2020-09
Publisher	IEEE
Conference Location	Kobe, Japan
Keywords	Auto-tuning, GAM, KNN, Machine Learning, message passing interface, Performance Prediction, XGBoost
Abstract	The Message Passing Interface (MPI) defines the semantics of data communication operations, while the implementing libraries provide several parameterized algorithms for each operation. Each algorithm of an MPI collective operation may work best on a particular system and may be dependent on the specific communication problem. Internally, MPI libraries employ heuristics to select the best algorithm for a given communication problem when being called by an MPI application. The majority of MPI libraries allow users to override the default algorithm selection, enabling the tuning of this selection process. The problem then becomes how to select the best possible algorithm for a specific case automatically. In this paper, we address the algorithm selection problem for MPI collective communication operations. To solve this problem, we propose an auto-tuning framework for collective MPI operations based on machine-learning techniques. First, we execute a set of benchmarks of an MPI library and its entire set of collective algorithms. Second, for each algorithm, we fit a performance model by applying regression learners. Last, we use the regression models to predict the best possible (fastest) algorithm for an unseen communication problem. We evaluate our approach for different MPI libraries and several parallel machines. The experimental results show that our approach outperforms the standard algorithm selection heuristics, which are hard-coded into the MPI libraries, by a significant margin.
DOI	10.1109/CLUSTER49012.2020.00036

Project Tags:

evolve

open-mpi

File:

icl-utk-1470-2020.pdf

External Publication Flag: