Thomas Herault

Position

I am working as a Research Assistant Professor at the Innovative Computing Laboratory at University of Tennessee, Knoxville. I can be joined at +1 (865) 974-6321 or in person in my office (Claxton 308).



Former Positions

From Sept. 2010 to January 2018, I was working as a Research Scientist at the Innovative Computing Laboratory at University of Tennessee, Knoxville.

From Sept 2004 to August 2010, I was an Assistant Professor (Maitre de Conferences) at the Université Paris-Sud XI, inside the Laboratoire de Recherche en Informatique. I am detached to the University of Tennessee since this date.

Diploma and Titles

1998 - BsC (Licence & Maitrise) in Computer Science from the Universite Paris-Sud XI (France)

1999 - MsC (Diplome d'Etudes Approfondies) in Computer Science from the Universite Paris-Sud XI (France)

2003 - PhD in Computer Science (These) from the Universite Paris-Sud XI (France)





Research Topics

Fault Tolerant HPC Systems

SMURFS - Toward Extreme Scale Fault-Tolerance: Exploration Methods, Comparative Studies and Decision Processes is an NSF SHF Collaborative Research project with Kurt Feirrera (Sandia National Laboratory) and Dorian Arnold (Emory University) in which we extend theoretical performance models for the large variety of fault tolerant protocols for High Performance Computing, evaluate these models for accuracy and predictability in forecoming systems, and design, develop and evaluate simulation tools to complete and extend these performance models with validation mechanisms.

CAARES - Cross-layer Application-Aware Resilience at Extreme Scale is an NSF project whose goal is to depart from the current siloed resilience mechanisms, and propose cross-layer composition solutions that can fundamentally address these resilience challenges at extreme scales. This exploration will not be limited to software developed using a single parallel programming paradigm, but will be extended to encompass the more challenging case where multiple programming paradigms can be combined to achieve a common goal, to simulate a set of large scale scientific applications in use today. More specifically, this proposal will address the following research challenges: (1) development of a theoretical foundation for a deeper understanding of the challenges and opportunities arising from combining different resilience models and methodologies; (2) design of a flexible programming abstraction to allow different resilience models and mechanisms to be combined to cooperate and address resilience in a more holistic manner; and (3) development of basic, programming paradigm independent, constructs necessary to implement cross-layer and domain-specific approaches to support resilience and to understand related performance / quality trade-offs. The proposed approach will be validated by exposing these generic abstractions in two different programming paradigms (MPI and OpenSHMEM), by creating and developing specialized concepts for each of these paradigms. This will enable the assessment of the validity of the concepts and the corresponding overheads imposed by the different software layers, using few software frameworks and applications.

ULFM - User Level Failure Mitigation is a set of MPI interface extensions enabling Message Passing programs to restore MPI communication capabilies affected by process failures. It supports rebuilding communicators, RMA windows and I/O Files. No particular recovery model is imposed or favored, instead a set of versatile APIs is included that provides support for differente recovery styles. The application directs the recovery, so it can pay for the cost of repairing only the necessary MPI objects. The ULFM specification is a crucial infrastructure to enable the deployment of advanced, production quality fault toleant techniques; it is a versatile solution to improve the efficiency of novel and established fault tolerant techniques. Look at the flyer.

MPICH-V was a research effort with theoretical studies, experimental evaluations and pragmatic implementations aiming to provide a MPI implementation based on MPICH, featuring multiple fault tolerant protocols. MPICH-V provides automatic fault tolerant MPI library (i.e. a totaly unchanged application linked with the mpich-v library is a fault tolerant application).

Dataflow Execution Model for HPC

PaRSEC - Parallel Runtime Scheduling and Execution Controller - is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact problem-size independent format that can be queried on-demand to discover data dependencies in a totally distributed fashion. PaRSEC assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse. The framework includes libraries, a runtime system, and development tools to help application developers tackle the difficult task of porting their applications to highly heterogeneous and diverse environment. PaRSEC is the underlying infrastructure for the DPLASMA distributed memory, tile algorithm based linear algebra package.

TESSE - Task-based Environment for Scientific Simulation at Extreme Scale is a collaborative Research funded by the NSF. The goals of TESSE are to design and demonstrate via substantial scientific simulations within chemistry and other disciplines a prototype software framework that provides a groundbreaking response to the twin problems of portable performance and programmer productivity for advanced scientific applications on emerging massively-parallel, hybrid, many-core systems. TESSE will create a viable foundation for a new generation of science codes, one which enables even more rapid exploration of new physical models, provides greatly enhanced performance portability through directed acyclic graph (DAG) scheduling and auto-tuned kernels, and works towards full interoperability between major chemistry packages through compatible runtimes and data structures. TESSE will mature to become a standard, widely available, community-based and broadly-applicable parallel programming environment complementing and rivaling MPI/OpenMP. This is needed due to the widely appreciated shortfalls of existing mainstream programming models and the already huge successes of the existing DAG-based runtimes that are the foundation of the next generation of NSF and DOE supported (Sca)LAPACK high-performance linear algebra libraries.

Message Passing

Evolve - aims at enhancing the Open MPI software library, focusing on two aspects: (1) Extend Open MPI to support new features of the MPI specification. The two most significant areas within the context of this proposal are (a) extensions to better support hybrid programming models and (b) support for fault tolerance in MPI applications. (2) Enhance the Open MPI core to support new architectures and improve scalability. While Open MPI has demonstrated very good scalability in the past, there is significant work to be done to ensure similarly good performance on future architectures.

Formal Verification & Security

APMC - Approximate Probabilistic Model Checker implements techniques of approximate model checking, in a collaboration with Richard Lassaigne (Univ Paris 7) and Sylvain Peyronnet (X-labs). This tools is of interest for the community of model checking since it was the only one to implement approximated model checking for probabilistic models. It uses a massive parallelism approach to enable the verification of very large systems, like it was done for the verification of the CSMA/CD protocol.

SAFE-OS - I was Principal Investigator of the SAFE-OS project, representing University Paris-Sud inside the French ANR defy “Securite et Confidentialite des Systemes d’Information” (SEC&SI: security and confidentiality of information systems). This was a new kind of project for the ANR (Agence Nationale de la Recherche, the french NSF), that put multiple research teams in competition on the same project. This project evolves in two phases that alternate: the work proposed by each team during the development period is evaluated by the other teams during the evaluation period. Teams report security breaches found in other teams operating systems, and the value of these security breaches is ranked by an independent jury. The goal of this project is to design an operating system with improved security features for an internet user. During this project, we used the strong expertise of the Parall team of LRI at University Paris-Sud on virtualization to propose a solution based on virtual machines. Using virtual machines, we transformed the computer in a distributed system, hence providing a better isolation of resources, and increasing the security and confidentiality of the data and of the processes.

Publicationsi (as imported from DBLP on Friday, 08-Jun-18 14:23:34 EDT)

George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra: "A failure detector for HPC platforms.", in IJHPCA:32[1]. pp 139-158, 2018

Sangmin Seo, Abdelhalim Amer, Pavan Balaji, Cyril Bordage, George Bosilca, Alex Brooks, Philip H. Carns, Adrián Castelló, Damien Genet, Thomas Hérault, Shintaro Iwasaki, Prateek Jindal, Laxmikant V. Kalé, Sriram Krishnamoorthy, Jonathan Lifflander, Huiwei Lu, Esteban Meneses, Marc Snir, Yanhua Sun, Kenjiro Taura, Peter H. Beckman: "Argobots: A Lightweight Low-Level Threading and Tasking Framework.", in IEEE Trans. Parallel Distrib. Syst.:29[3]. pp 512-526, 2018

Reazul Hoque, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Dynamic task discovery in PaRSEC: a data-flow task-based runtime.", in ScalA@SC. pp 6:1-6:8, 2017

Julien Herrmann, George Bosilca, Thomas Hérault, Loris Marchal, Yves Robert, Jack J. Dongarra: "Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results.", in Parallel Computing:52[]. pp 22-41, 2016

George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra: "Failure detection and propagation in HPC systems.", in SC. pp 312-322, 2016

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Composing resilience techniques: ABFT, periodic and incremental checkpointing.", in IJNC:5[1]. pp 2-25, 2015

Aurelien Bouteiller, Thomas Hérault, George Bosilca, Peng Du, Jack J. Dongarra: "Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.", in TOPC:1[2]. pp 10:1-10:28, 2015

Chongxiao Cao, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Design for a Soft Error Resilient Dynamic Task-Based Runtime.", in IPDPS. pp 765-774, 2015

Chunyan Tang, Aurelien Bouteiller, Thomas Hérault, Manjunath Gorentla Venkata, George Bosilca: "From MPI to OpenSHMEM: Porting LAMMPS.", in OpenSHMEM. pp 121-137, 2015

Atsushi Hori, Kazumi Yoshinaga, Thomas Hérault, Aurelien Bouteiller, George Bosilca, Yutaka Ishikawa: "Sliding Substitution of Failed Nodes.", in EuroMPI. pp 14:1-14:10, 2015

Thomas Hérault, Aurelien Bouteiller, George Bosilca, Marc Gamell, Keita Teranishi, Manish Parashar, Jack J. Dongarra: "Practical scalable consensus for pseudo-synchronous distributed systems.", in SC. pp 31:1-31:12, 2015

George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack J. Dongarra, Amina Guermouche, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "Unified model for assessing checkpointing protocols at extreme-scale.", in Concurrency and Computation: Practice and Experience:26[17]. pp 2772-2791, 2014

Jack J. Dongarra, Thomas Hérault, Yves Robert: "Performance and reliability trade-offs for the double checkpointing algorithm.", in IJNC:4[1]. pp 23-41, 2014

Heike McCraw, Anthony Danalis, Thomas Hérault, George Bosilca, Jack J. Dongarra, Karol Kowalski, Theresa L. Windus: "Utilizing dataflow-based execution for coupled cluster methods.", in CLUSTER. pp 296-297, 2014

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Assessing the Impact of ABFT and Checkpoint Composite Strategies.", in IPDPS Workshops. pp 679-688, 2014

Thomas Hérault, Julien Herrmann, Loris Marchal, Yves Robert: "Determining the Optimal Redistribution for a Given Data Partition.", in ISPDC. pp 95-102, 2014

Aurelien Bouteiller, Thomas Hérault, George Bosilca: "A Multithreaded Communication Substrate for OpenSHMEM.", in PGAS. pp 16:1-16:2, 2014

Anthony Danalis, George Bosilca, Aurelien Bouteiller, Thomas Hérault, Jack J. Dongarra: "PTG: an abstraction for unhindered parallelism.", in WOLFHPC@SC. pp 21-30, 2014

Wesley Bland, Aurelien Bouteiller, Thomas Hérault, Joshua Hursey, George Bosilca, Jack J. Dongarra: "An evaluation of User-Level Failure Mitigation support in MPI.", in Computing:95[12]. pp 1171-1184, 2013

Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Correlated set coordination in fault tolerant message logging protocols for many-core clusters.", in Concurrency and Computation: Practice and Experience:25[4]. pp 572-585, 2013

Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.", in Concurrency and Computation: Practice and Experience:25[17]. pp 2381-2393, 2013

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Hérault, Jack J. Dongarra: "PaRSEC: Exploiting Heterogeneity to Enhance Scalability.", in Computing in Science and Engineering:15[6]. pp 36-45, 2013

Wesley Bland, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Post-failure recovery of MPI communication capability: Design and rationale.", in IJHPCA:27[3]. pp 244-254, 2013

Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, Yves Robert: "Hierarchical QR factorization algorithms for multi-core clusters.", in Parallel Computing:39[4-5]. pp 212-232, 2013

Aurelien Bouteiller, Franck Cappello, Jack J. Dongarra, Amina Guermouche, Thomas Hérault, Yves Robert: "Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization.", in Euro-Par. pp 420-431, 2013

Jack J. Dongarra, Thomas Hérault, Yves Robert: "Revisiting the Double Checkpointing Algorithm.", in IPDPS Workshops. pp 706-715, 2013

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "On the Combination of Silent Error Detection and Checkpointing.", in PRDC. pp 11-20, 2013

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Optimal Checkpointing Period: Time vs. Energy.", in PMBS@SC. pp 203-214, 2013

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Optimal Checkpointing Period: Time vs. Energy.", in CoRR:abs/1310.8456[]. pp , 2013

Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "On the Combination of Silent Error Detection and Checkpointing.", in CoRR:abs/1310.8486[]. pp , 2013

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "DAGuE: A generic distributed DAG engine for High Performance Computing.", in Parallel Computing:38[1-2]. pp 37-51, 2012

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Jack J. Dongarra: "From Serial Loops to Parallel Execution on Distributed Systems.", in Euro-Par. pp 246-257, 2012

Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.", in Euro-Par. pp 477-488, 2012

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra: "Scalable Dense Linear Algebra on Heterogeneous Hardware.", in High Performance Computing Workshop (2). pp 65-103, 2012

Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Julien Langou, Yves Robert: "Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems.", in IPDPS. pp 607-618, 2012

Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Hérault, Jack J. Dongarra: "Algorithm-based fault tolerance for dense matrix factorizations.", in PPOPP. pp 225-234, 2012

Wesley Bland, Aurelien Bouteiller, Thomas Hérault, Joshua Hursey, George Bosilca, Jack J. Dongarra: "An Evaluation of User-Level Failure Mitigation Support in MPI.", in EuroMPI. pp 193-203, 2012

Emmanuel Agullo, Camille Coti, Thomas Hérault, Julien Langou, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello, Jack J. Dongarra: "QCG-OMPI: MPI applications on grids.", in Future Generation Comp. Syst.:27[4]. pp 357-369, 2011

George Bosilca, Thomas Hérault, Ala Rezmerita, Jack J. Dongarra: "On Scalability for MPI Runtime Systems.", in CLUSTER. pp 187-195, 2011

Teng Ma, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Process Distance-Aware Adaptive MPI Collective Communications.", in CLUSTER. pp 196-204, 2011

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Pierre Lemarinier, Narapat Ohm Saengpatsa, Stanimire Tomov, Jack J. Dongarra: "Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.", in CLUSTER. pp 395-402, 2011

Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Correlated Set Coordination in Fault Tolerant Message Logging Protocols.", in Euro-Par (2). pp 51-64, 2011

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "DAGuE: A Generic Distributed DAG Engine for High Performance Computing.", in IPDPS Workshops. pp 1151-1158, 2011

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Hérault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, Jack J. Dongarra: "Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.", in IPDPS Workshops. pp 1432-1441, 2011

George Bosilca, Thomas Hérault, Pierre Lemarinier, Ala Rezmerita, Jack J. Dongarra: "Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure.", in EuroMPI. pp 342-344, 2011

Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Julien Langou, Yves Robert: "Hierarchical QR factorization algorithms for multi-core cluster systems", in CoRR:abs/1110.1553[]. pp , 2011

Fatiha Bouabache, Thomas Hérault, Sylvain Peyronnet, Franck Cappello: "Planning Large Data Transfers in Institutional Grids.", in CCGRID. pp 547-552, 2010

Amine Bourki, Guillaume Chaslot, Matthieu Coulm, Vincent Danjean, Hassen Doghmen, Jean-Baptiste Hoock, Thomas Hérault, Arpad Rimmel, Fabien Teytaud, Olivier Teytaud, Paul Vayssière, Ziqin Yut: "Scalability and Parallelization of Monte-Carlo Tree Search.", in Computers and Games. pp 48-58, 2010

François Lesueur, Ala Rezmerita, Thomas Hérault, Sylvain Peyronnet, Sébastien Tixeuil: "SAFE-OS: A secure and usable desktop operating system.", in CRiSIS. pp 1-7, 2010

Emmanuel Agullo, Camille Coti, Jack J. Dongarra, Thomas Hérault, Julien Langou: "QR factorization of tall and skinny matrices in a grid computing environment.", in IPDPS. pp 1-11, 2010

Gilles Fedak, Jean-Patrick Gelas, Thomas Hérault, Victor Iniesta, Derrick Kondo, Laurent Lefèvre, Paul Malecot, Lucas Nussbaum, Ala Rezmerita, Olivier Richard: "DSL-Lab: A Low-Power Lightweight Platform to Experiment on Domestic Broadband Internet.", in ISPDC. pp 141-148, 2010

Aline Carneiro Viana, Thomas Hérault, Thomas Largillier, Sylvain Peyronnet, Fatiha Zaïdi: "Supple: a flexible probabilistic data dissemination protocol for wireless sensor networks.", in MSWiM. pp 385-392, 2010

George Bosilca, Aurelien Bouteiller, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.", in EuroMPI. pp 189-197, 2010

Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment.", in Journal of Interconnection Networks:10[4]. pp 345-364, 2009

Franck Cappello, Thomas Hérault, Jack J. Dongarra: "Foreword.", in Parallel Computing:35[12]. pp 571, 2009

Thomas Hérault, Thomas Largillier, Sylvain Peyronnet, Benjamin Quétier, Franck Cappello, Mathieu Jan: "High accuracy failure injection in parallel and distributed systems using virtualization.", in Conf. Computing Frontiers. pp 193-196, 2009

Pavel Bar, Camille Coti, Derek Groen, Thomas Hérault, Valentin Kravtsov, Assaf Schuster, Martin T. Swain: "Running Parallel Applications with Topology-Aware Grid Middleware.", in eScience. pp 292-299, 2009

Camille Coti, Thomas Hérault, Franck Cappello: "MPI Applications on Grids: A Topology Aware Approach.", in Euro-Par. pp 466-477, 2009

George Bosilca, Camille Coti, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "Constructing Resiliant Communication Infrastructure for Runtime Environments.", in PARCO. pp 441-451, 2009

Emmanuel Agullo, Camille Coti, Jack J. Dongarra, Thomas Hérault, Julien Langou: "QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment", in CoRR:abs/0912.2572[]. pp , 2009

Darius Buntinas, Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, Franck Cappello: "Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols.", in Future Generation Comp. Syst.:24[1]. pp 73-84, 2008

Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical replication techniques to ensure checkpoint storage reliability in grid environment.", in AICCSA. pp 939-940, 2008

Camille Coti, Thomas Hérault, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello: "Grid Services for MPI.", in CCGRID. pp 417-424, 2008

Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment.", in CCGRID. pp 475-483, 2008

Thomas Hérault, Mathieu Jan, Thomas Largillier, Sylvain Peyronnet, Benjamin Quétier, Franck Cappello: "Emulation platform for high accuracy failure injection in grids.", in High Performance Computing Workshop. pp 127-140, 2008

Julien Clément 0002, Thomas Hérault, Stéphane Messika, Olivier Peres: "On the Complexity of a Self-Stabilizing Spanning Tree Algorithm for Large Scale Systems.", in PRDC. pp 48-55, 2008

Alexandre Borghi, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "Cell Assisted APMC.", in QEST. pp 75-76, 2008

Michaël Cadilhac, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet, Sébastien Tixeuil: "Evaluating Complex MAC Protocols for Sensor Networks with APMC.", in Electr. Notes Theor. Comput. Sci.:185[]. pp 33-46, 2007

Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "A Distributed and Replicated Service for Checkpoint Storage.", in CoreGRID Workshop - Making Grids Work. pp 295-306, 2007

Thomas Hérault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, Joffroy Beauquier: "A Model for Large Scale Self-Stabilization.", in IPDPS. pp 1-10, 2007

Benjamin Quétier, Thomas Hérault, Vincent Néri, Franck Cappello: "Virtual Parallel Machines Through Virtualization: Impact on MPI Executions.", in PVM/MPI. pp 381-383, 2007

Camille Coti, Ala Rezmerita, Thomas Hérault, Franck Cappello: "Grid Services for MPI.", in PVM/MPI. pp 393-394, 2007

: "Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30 - October 3, 2007, Proceedings", in Lecture Notes in Computer Science:4757,

Guillaume Guirado, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "Distribution, Approximation and Probabilistic Model Checking.", in Electr. Notes Theor. Comput. Sci.:135[2]. pp 19-30, 2006

Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids.", in IJHPCA:20[1]. pp 77-90, 2006

Aurelien Bouteiller, Thomas Hérault, Géraud Krawezik, Pierre Lemarinier, Franck Cappello: "MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI.", in IJHPCA:20[3]. pp 319-333, 2006

William Hoarau, Pierre Lemarinier, Thomas Hérault, Eric Rodriguez, Sébastien Tixeuil, Franck Cappello: "FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?", in CLUSTER. pp , 2006

Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "APMC 3.0: Approximate Verification of Discrete and Continuous Time Markov Chains.", in QEST. pp 129-130, 2006

Akim Demaille, Thomas Hérault, Sylvain Peyronnet: "Probabilistic verification of sensor networks.", in RIVF. pp 45-54, 2006

Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, Franck Cappello: "MPI tools and performance studies - Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI.", in SC. pp 127, 2006

Thomas Hérault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, Joffroy Beauquier: "Brief Announcement: Self-stabilizing Spanning Tree Algorithm for Large Scale Systems.", in SSS. pp 574-575, 2006

Marie Duflot, Laurent Fribourg, Thomas Hérault, Richard Lassaigne, Frédéric Magniette, Stéphane Messika, Sylvain Peyronnet, Claudine Picaronny: "Probabilistic Model Checking of the CSMA/CD Protocol Using PRISM and APMC.", in Electr. Notes Theor. Comput. Sci.:128[6]. pp 195-214, 2005

Franck Cappello, Samir Djilali, Gilles Fedak, Thomas Hérault, Frédéric Magniette, Vincent Néri, Oleg Lodygensky: "Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid.", in Future Generation Comp. Syst.:21[3]. pp 417-437, 2005

Aurelien Bouteiller, Boris Collin, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI.", in IPDPS. pp , 2005

Pierre Lemarinier, Aurelien Bouteiller, Thomas Hérault, Géraud Krawezik, Franck Cappello: "Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.", in CLUSTER. pp 115-124, 2004

Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Hybrid Preemptive Scheduling of MPI Applications on the Grids.", in GRID. pp 130-137, 2004

Samir Djilali, Thomas Hérault, Oleg Lodygensky, Tangui Morlier, Gilles Fedak, Franck Cappello: "RPC-V: Toward Fault-Tolerant RPC for Internet Connected Desktop Grids with Volatile Nodes.", in SC. pp 39, 2004

Thomas Hérault, Richard Lassaigne, Frédéric Magniette, Sylvain Peyronnet: "Approximate Probabilistic Model Checking.", in VMCAI. pp 73-84, 2004

Aurelien Bouteiller, Franck Cappello, Thomas Hérault, Géraud Krawezik, Pierre Lemarinier, Frédéric Magniette: "MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging.", in SC. pp 25, 2003

George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fedak, Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky, Frédéric Magniette, Vincent Néri, Anton Selikhov: "MPICH-V: toward a scalable fault tolerant MPI for volatile nodes.", in SC. pp 31:1-31:18, 2002

Joffroy Beauquier, Thomas Hérault: "Fault-Local Stabilization: The Shortest Path Tree.", in SRDS. pp 62-69, 2002

Joffroy Beauquier, Thomas Hérault, Elad Schiller: "Easy Stabilization with an Agent.", in WSS. pp 35-50, 2001

Teaching & Tutorials

Since 2012, I give with Yves Robert, George Bosilca, and Aurelien Bouteiller a tutorial at the IEEE/ACM Supercomputing Conference on Fault Tolerance in High Performance Computing.

With George Bosilca, I gave a 24h class on Fault Tolerance in High Performance Computing during a research school at the ENS Lyon, in 2012

With Yves Robert, I presented a tutorial on Fault Tolerance in High Performance Computing at PPoPP in 2015, and at the International Conference on Supercomputing in 2013

When I was assistant professor at the University of Paris-Sud, I was teaching in Operating Systems, Formal Verification, Software Design, Databases, Networks, Architecture, and High Performance Computing. I was pedagogic director for a class at the University Paris-Sud engineering school IFIPS (now Polytech Paris-Sud)

Program Commmittees

Supercomputing 2018, member of the Workshops and Posters committees

Supercomputing 2017, member of the PC for the Algorithm track

ICPP’17, PC Vice-Chair, track “Systems”

Supercomputing 2016, member of the PC for the Algorithm track

ICPADS 2015, PC Vice-Chair, track “multicore computing”

HiPC 2014, Program Chair

HiPC 2013, PC Vice-Chair, track “Software”

IPDPS 2013, member of the PC

Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Heteropar) 2011: member of the PC

Facing the Multicore-Challenge II (conference for young scientists) 2011 – 2012: member of the PC

15th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) 2010: member of the PC

IEEE International Conference on Cluster Computing (Cluster) 2010: member of the PC

IEEE/ACM International Symposium on Cluster, Cloud and Grid (CCGRID) 2010 – 2008: member of the PC

International Symposium on Parallel and Distributed Computing (ISPDC) 2010 – 2008: member of the PC

High Performance Computing for Computational Science (VECPAR) 2008, 2010, 2012: Member of the PC

European PVM/MPI Users Group Meeting (EuroPVM/MPI), now EuroMPI 2008 – 2018: Member of the PC

European PVM/MPI Users Group Meeting (EuroPVM/MPI) 2007: PC Co-Chair / local organizer.