%0 Conference Paper %B IEEE Cluster %D 2019 %T Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs %A Thananon Patinyasakdikul %A David Eberius %A George Bosilca %A Nathan Hjelm %K communication contention %K MPI %K thread %X The Message Passing Interface (MPI) has been one of the most prominent programming paradigms in highperformance computing (HPC) for the past decade. Lately, with changes in modern hardware leading to a drastic increase in the number of processor cores, developers of parallel applications are moving toward more integrated parallel programming paradigms, where MPI is used along with other, possibly node-level, programming paradigms, or MPI+X. MPI+threads emerged as one of the favorite choices in HPC community, according to a survey of the HPC community. However, threading support in MPI comes with many compromises to the overall performance delivered, and, therefore, its adoption is compromised. This paper studies in depth the MPI multi-threaded implementation design in one of the leading MPI implementations, Open MPI, and expose some of the shortcomings of the current design. We propose, implement, and evaluate a new design of the internal handling of communication progress which allows for a significant boost in multi-threading performance, increasing the viability of MPI in the MPI+X programming paradigm. %B IEEE Cluster %I IEEE %C Albuquerque, NM %8 2019-09 %G eng %0 Journal Article %J International Journal of Networking and Computing %D 2014 %T Performance and Reliability Trade-offs for the Double Checkpointing Algorithm %A Jack Dongarra %A Thomas Herault %A Yves Robert %K communication contention %K in-memory checkpoint %K performance %K resilience %K risk %X Fast checkpointing algorithms require distributed access to stable storage. This paper revisits the approach based upon double checkpointing, and compares the blocking algorithm of Zheng, Shi and Kalé [23], with the non-blocking algorithm of Ni, Meneses and Kalé [15] in terms of both performance and risk. We also extend the model proposedcan provide a better efficiency in [23, 15] to assess the impact of the overhead associated to non-blocking communications. In addition, we deal with arbitrary failure distributions (as opposed to uniform distributions in [23]). We then provide a new peer-to-peer checkpointing algorithm, called the triple checkpointing algorithm, that can work without additional memory, and achieves both higher efficiency and better risk handling than the double checkpointing algorithm. We provide performance and risk models for all the evaluated protocols, and compare them through comprehensive simulations. %B International Journal of Networking and Computing %V 4 %P 32-41 %8 2014 %G eng %& 32