Publications

2014

Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.

(480.05 KB)

2015

Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.

(1.14 MB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.

(755.54 KB)

Cao, C., G. Bosilca, T. Herault, and J. Dongarra, “Design for a Soft Error Resilient Dynamic Task-based Runtime,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(2.31 MB)

Dongarra, J., T. Herault, and Y. Robert, “Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.

Tang, C., A. Bouteiller, T. Herault, M G. Venkata, and G. Bosilca, “From MPI to OpenSHMEM: Porting LAMMPS,” OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, Annapolis, MD, USA, Springer International Publishing, pp. 121–137, 2015.

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.

(570.97 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(550.96 KB)

2016

Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, “Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016.

(2.06 MB)

Bosilca, G., T. Herault, and J. Dongarra, “Context Identifier Allocation in Open MPI,” University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.

(490.89 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.

2017

Seo, S., A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castello, D. Genet, T. Herault, et al., “Argobots: A Lightweight Low-Level Threading and Tasking Framework,” IEEE Transactions on Parallel and Distributed Systems, October 2017.

Hoque, R., T. Herault, G. Bosilca, and J. Dongarra, “Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,” ScalA17, Denver, ACM, September 2017.

(1.15 MB)

2018

Bouteiller, A., G. Bosilca, T. Herault, and J. Dongarra, “Data Movement Interfaces to Support Dataflow Runtimes,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.

(210.94 KB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.

Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, “Do moldable applications perform better on failure-prone HPC platforms?,” 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.

(360.72 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.

(1.04 MB)

Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, “Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,” 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018.

(899.3 KB)

Bosilca, G., D. Genet, R. Harrison, T. Herault, M. Mahdi Javanmard, C. Peng, and E. Valeev, “Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-13: University of Tennessee, December 2018.

(326.11 KB)

2019

Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, “Checkpointing Strategies for Shared High-Performance Computing Platforms,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.

(490.5 KB)

Le Fèvre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, “Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,” Parallel Computing, vol. 85, pp. 1–12, July 2019.

(865.18 KB)

Herault, T., Y. Robert, G. Bosilca, and J. Dongarra, “Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,” ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.

(260.69 KB)

Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.

(429.55 KB)

Benoit, A., T. Herault, V. Le Fèvre, and Y. Robert, “Replication is More Efficient Than You Think,” The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.

(975.69 KB)

Danalis, A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra, “Software-Defined Events through PAPI,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.

(446.41 KB)

Hori, A., Y. Tsujita, A. Shimada, K. Yoshinaga, N. Mitaro, G. Fukazawa, M. Sato, G. Bosilca, A. Bouteiller, and T. Herault, “System Software for Many-Core and Multi-Core Architectures,” Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, Singapore, Springer Singapore, pp. 59–75, 2019.

2020

Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Enabled Libraries and Applications (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(979.27 KB)

Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Systems and Interfaces (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(840.54 KB)

Hori, A., K. Yoshinaga, T. Herault, A. Bouteiller, G. Bosilca, and Y. Ishikawa, “Overhead of Using Spare Nodes,” The International Journal of High Performance Computing Applications, February 2020.

(2.15 MB)

Bosilca, G., R. Harrison, T. Herault, M. Mahdi Javanmard, P. Nookala, and E. Valeev, “The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,” 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.

(139.6 KB)

2021

Herault, T., Y. Robert, G. Bosilca, R. Harrison, C. Lewis, E. Valeev, and J. Dongarra, “Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,” 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.

Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Enabled Libraries and Applications : 2021 Exascale Computing Project Annual Meeting, April 2021.

(3.24 MB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Revisiting Credit Distribution Algorithms for Distributed Termination Detection,” 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW): IEEE, pp. 611–620, 2021.

2022

Benoit, A., Y. Du, T. Herault, L. Marchal, G. Pallez, L. Perotin, Y. Robert, H. Sun, and F. Vivien, “Checkpointing à la Young/Daly: An Overview,” IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, Noida, India, ACM Press, pp. 701-710, August 2022.

(639.77 KB)

2023

Benoit, A., T. Herault, L. Perotin, Y. Robert, and F. Vivien, “Revisiting I/O bandwidth-sharing strategies for HPC applications,” INRIA Research Report, no. RR-9502: INRIA, March 2023.

Barbut, Q., A. Benoit, T. Herault, Y. Robert, and F. Vivien, “When to checkpoint at the end of a fixed-length reservation?,” Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop, Denver, United States, August 2023.

Main menu

Pages