%0 Journal Article
%J International Journal of Networking and Computing
%D 2022
%T Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms
%A Bosilca, George
%A Bouteiller, Aurélien
%A Herault, Thomas
%A Le Fèvre, Valentin
%A Robert, Yves
%A Dongarra, Jack
%X This paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). We analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages. We then compare the implementation of these algorithms over a task-based runtime system, PaRSEC and show the advantages and limitations of each approach in a real implementation.
%B International Journal of Networking and Computing
%V 12
%P 26 - 46
%8 2022-01
%G eng
%U https://www.jstage.jst.go.jp/article/ijnc/12/1/12_26/_article
%N 1
%! IJNC
%R 10.15803/ijnc.12.1_26
%0 Conference Proceedings
%B 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
%D 2021
%T Revisiting Credit Distribution Algorithms for Distributed Termination Detection
%A George Bosilca
%A Aurelien Bouteiller
%A Thomas Herault
%A Le Fèvre, Valentin
%A Robert, Yves
%A Jack Dongarra
%K control messages
%K credit distribution algorithms
%K task-based HPC application
%K Termination detection
%X This paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal Distributed algorithm (EDOD). We analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages.
%B 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
%I IEEE
%P 611–620
%G eng
%R 10.1109/IPDPSW52791.2021.00095
%0 Journal Article
%J Int. Journal of High Performance Computing Applications
%D 2019
%T A Generic Approach to Scheduling and Checkpointing Workflows
%A Han, Li
%A Le Fèvre, Valentin
%A Canon, Louis-Claude
%A Robert, Yves
%A Vivien, Frédéric
%B Int. Journal of High Performance Computing Applications
%V 33
%P 1255-1274
%G eng