Position
I am working as a Research Assistant Professor at the Innovative Computing Laboratory at University of Tennessee, Knoxville. I can be joined at +1 (865) 974-6321 or in person in my office (Claxton 308).
Former Positions
From Sept. 2010 to January 2018, I was working as a Research Scientist at the Innovative Computing Laboratory at University of Tennessee, Knoxville.
From Sept 2004 to August 2010, I was an Assistant Professor (Maitre de Conferences) at the Université Paris-Sud XI, inside the Laboratoire de Recherche en Informatique. I am detached to the University of Tennessee since this date.
Diploma and Titles
1998 - BsC (Licence & Maitrise) in Computer Science from the Universite Paris-Sud XI (France)
1999 - MsC (Diplome d'Etudes Approfondies) in Computer Science from the Universite Paris-Sud XI (France)
2003 - PhD in Computer Science (These) from the Universite Paris-Sud XI (France)
Research Topics
Fault Tolerant HPC Systems
SMURFS - Toward Extreme Scale Fault-Tolerance: Exploration Methods, Comparative Studies and Decision Processes is an NSF SHF Collaborative Research project with Kurt Feirrera (Sandia National Laboratory) and Dorian Arnold (Emory University) in which we extend theoretical performance models for the large variety of fault tolerant protocols for High Performance Computing, evaluate these models for accuracy and predictability in forecoming systems, and design, develop and evaluate simulation tools to complete and extend these performance models with validation mechanisms.
CAARES - Cross-layer Application-Aware Resilience at Extreme Scale is an NSF project whose goal is to depart from the current siloed resilience mechanisms, and propose cross-layer composition solutions that can fundamentally address these resilience challenges at extreme scales. This exploration will not be limited to software developed using a single parallel programming paradigm, but will be extended to encompass the more challenging case where multiple programming paradigms can be combined to achieve a common goal, to simulate a set of large scale scientific applications in use today. More specifically, this proposal will address the following research challenges: (1) development of a theoretical foundation for a deeper understanding of the challenges and opportunities arising from combining different resilience models and methodologies; (2) design of a flexible programming abstraction to allow different resilience models and mechanisms to be combined to cooperate and address resilience in a more holistic manner; and (3) development of basic, programming paradigm independent, constructs necessary to implement cross-layer and domain-specific approaches to support resilience and to understand related performance / quality trade-offs. The proposed approach will be validated by exposing these generic abstractions in two different programming paradigms (MPI and OpenSHMEM), by creating and developing specialized concepts for each of these paradigms. This will enable the assessment of the validity of the concepts and the corresponding overheads imposed by the different software layers, using few software frameworks and applications.
ULFM - User Level Failure Mitigation is a set of MPI interface extensions enabling Message Passing programs to restore MPI communication capabilies affected by process failures. It supports rebuilding communicators, RMA windows and I/O Files. No particular recovery model is imposed or favored, instead a set of versatile APIs is included that provides support for differente recovery styles. The application directs the recovery, so it can pay for the cost of repairing only the necessary MPI objects. The ULFM specification is a crucial infrastructure to enable the deployment of advanced, production quality fault toleant techniques; it is a versatile solution to improve the efficiency of novel and established fault tolerant techniques. Look at the flyer.
MPICH-V was a research effort with theoretical studies, experimental evaluations and pragmatic implementations aiming to provide a MPI implementation based on MPICH, featuring multiple fault tolerant protocols. MPICH-V provides automatic fault tolerant MPI library (i.e. a totaly unchanged application linked with the mpich-v library is a fault tolerant application).
Dataflow Execution Model for HPC
PaRSEC - Parallel Runtime Scheduling and Execution Controller - is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact problem-size independent format that can be queried on-demand to discover data dependencies in a totally distributed fashion. PaRSEC assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse. The framework includes libraries, a runtime system, and development tools to help application developers tackle the difficult task of porting their applications to highly heterogeneous and diverse environment. PaRSEC is the underlying infrastructure for the DPLASMA distributed memory, tile algorithm based linear algebra package.
TESSE - Task-based Environment for Scientific Simulation at Extreme Scale is a collaborative Research funded by the NSF. The goals of TESSE are to design and demonstrate via substantial scientific simulations within chemistry and other disciplines a prototype software framework that provides a groundbreaking response to the twin problems of portable performance and programmer productivity for advanced scientific applications on emerging massively-parallel, hybrid, many-core systems. TESSE will create a viable foundation for a new generation of science codes, one which enables even more rapid exploration of new physical models, provides greatly enhanced performance portability through directed acyclic graph (DAG) scheduling and auto-tuned kernels, and works towards full interoperability between major chemistry packages through compatible runtimes and data structures. TESSE will mature to become a standard, widely available, community-based and broadly-applicable parallel programming environment complementing and rivaling MPI/OpenMP. This is needed due to the widely appreciated shortfalls of existing mainstream programming models and the already huge successes of the existing DAG-based runtimes that are the foundation of the next generation of NSF and DOE supported (Sca)LAPACK high-performance linear algebra libraries.
Message Passing
Evolve - aims at enhancing the Open MPI software library, focusing on two aspects: (1) Extend Open MPI to support new features of the MPI specification. The two most significant areas within the context of this proposal are (a) extensions to better support hybrid programming models and (b) support for fault tolerance in MPI applications. (2) Enhance the Open MPI core to support new architectures and improve scalability. While Open MPI has demonstrated very good scalability in the past, there is significant work to be done to ensure similarly good performance on future architectures.
Formal Verification & Security
APMC - Approximate Probabilistic Model Checker implements techniques of approximate model checking, in a collaboration with Richard Lassaigne (Univ Paris 7) and Sylvain Peyronnet (X-labs). This tools is of interest for the community of model checking since it was the only one to implement approximated model checking for probabilistic models. It uses a massive parallelism approach to enable the verification of very large systems, like it was done for the verification of the CSMA/CD protocol.
SAFE-OS - I was Principal Investigator of the SAFE-OS project, representing University Paris-Sud inside the French ANR defy “Securite et Confidentialite des Systemes d’Information” (SEC&SI: security and confidentiality of information systems). This was a new kind of project for the ANR (Agence Nationale de la Recherche, the french NSF), that put multiple research teams in competition on the same project. This project evolves in two phases that alternate: the work proposed by each team during the development period is evaluated by the other teams during the evaluation period. Teams report security breaches found in other teams operating systems, and the value of these security breaches is ranked by an independent jury. The goal of this project is to design an operating system with improved security features for an internet user. During this project, we used the strong expertise of the Parall team of LRI at University Paris-Sud on virtualization to propose a solution based on virtual machines. Using virtual machines, we transformed the computer in a distributed system, hence providing a better isolation of resources, and increasing the security and confidentiality of the data and of the processes.
Publicationsi (as imported from DBLP on Friday, 08-Jun-18 14:23:34 EDT)
George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra: "A failure detector for HPC platforms.", in IJHPCA:32[1]. pp 139-158, 2018 |
Sangmin Seo, Abdelhalim Amer, Pavan Balaji, Cyril Bordage, George Bosilca, Alex Brooks, Philip H. Carns, Adrián Castelló, Damien Genet, Thomas Hérault, Shintaro Iwasaki, Prateek Jindal, Laxmikant V. Kalé, Sriram Krishnamoorthy, Jonathan Lifflander, Huiwei Lu, Esteban Meneses, Marc Snir, Yanhua Sun, Kenjiro Taura, Peter H. Beckman: "Argobots: A Lightweight Low-Level Threading and Tasking Framework.", in IEEE Trans. Parallel Distrib. Syst.:29[3]. pp 512-526, 2018 |
Reazul Hoque, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Dynamic task discovery in PaRSEC: a data-flow task-based runtime.", in ScalA@SC. pp 6:1-6:8, 2017 |
Julien Herrmann, George Bosilca, Thomas Hérault, Loris Marchal, Yves Robert, Jack J. Dongarra: "Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results.", in Parallel Computing:52[]. pp 22-41, 2016 |
George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Hérault, Yves Robert, Pierre Sens, Jack J. Dongarra: "Failure detection and propagation in HPC systems.", in SC. pp 312-322, 2016 |
George Bosilca, Aurelien Bouteiller, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Composing resilience techniques: ABFT, periodic and incremental checkpointing.", in IJNC:5[1]. pp 2-25, 2015 |
Aurelien Bouteiller, Thomas Hérault, George Bosilca, Peng Du, Jack J. Dongarra: "Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.", in TOPC:1[2]. pp 10:1-10:28, 2015 |
Chongxiao Cao, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Design for a Soft Error Resilient Dynamic Task-Based Runtime.", in IPDPS. pp 765-774, 2015 |
Chunyan Tang, Aurelien Bouteiller, Thomas Hérault, Manjunath Gorentla Venkata, George Bosilca: "From MPI to OpenSHMEM: Porting LAMMPS.", in OpenSHMEM. pp 121-137, 2015 |
Atsushi Hori, Kazumi Yoshinaga, Thomas Hérault, Aurelien Bouteiller, George Bosilca, Yutaka Ishikawa: "Sliding Substitution of Failed Nodes.", in EuroMPI. pp 14:1-14:10, 2015 |
Thomas Hérault, Aurelien Bouteiller, George Bosilca, Marc Gamell, Keita Teranishi, Manish Parashar, Jack J. Dongarra: "Practical scalable consensus for pseudo-synchronous distributed systems.", in SC. pp 31:1-31:12, 2015 |
George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack J. Dongarra, Amina Guermouche, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "Unified model for assessing checkpointing protocols at extreme-scale.", in Concurrency and Computation: Practice and Experience:26[17]. pp 2772-2791, 2014 |
Jack J. Dongarra, Thomas Hérault, Yves Robert: "Performance and reliability trade-offs for the double checkpointing algorithm.", in IJNC:4[1]. pp 23-41, 2014 |
Heike McCraw, Anthony Danalis, Thomas Hérault, George Bosilca, Jack J. Dongarra, Karol Kowalski, Theresa L. Windus: "Utilizing dataflow-based execution for coupled cluster methods.", in CLUSTER. pp 296-297, 2014 |
George Bosilca, Aurelien Bouteiller, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Assessing the Impact of ABFT and Checkpoint Composite Strategies.", in IPDPS Workshops. pp 679-688, 2014 |
Thomas Hérault, Julien Herrmann, Loris Marchal, Yves Robert: "Determining the Optimal Redistribution for a Given Data Partition.", in ISPDC. pp 95-102, 2014 |
Aurelien Bouteiller, Thomas Hérault, George Bosilca: "A Multithreaded Communication Substrate for OpenSHMEM.", in PGAS. pp 16:1-16:2, 2014 |
Anthony Danalis, George Bosilca, Aurelien Bouteiller, Thomas Hérault, Jack J. Dongarra: "PTG: an abstraction for unhindered parallelism.", in WOLFHPC@SC. pp 21-30, 2014 |
Wesley Bland, Aurelien Bouteiller, Thomas Hérault, Joshua Hursey, George Bosilca, Jack J. Dongarra: "An evaluation of User-Level Failure Mitigation support in MPI.", in Computing:95[12]. pp 1171-1184, 2013 |
Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Correlated set coordination in fault tolerant message logging protocols for many-core clusters.", in Concurrency and Computation: Practice and Experience:25[4]. pp 572-585, 2013 |
Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.", in Concurrency and Computation: Practice and Experience:25[17]. pp 2381-2393, 2013 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Hérault, Jack J. Dongarra: "PaRSEC: Exploiting Heterogeneity to Enhance Scalability.", in Computing in Science and Engineering:15[6]. pp 36-45, 2013 |
Wesley Bland, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Post-failure recovery of MPI communication capability: Design and rationale.", in IJHPCA:27[3]. pp 244-254, 2013 |
Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Mathias Jacquelin, Julien Langou, Yves Robert: "Hierarchical QR factorization algorithms for multi-core clusters.", in Parallel Computing:39[4-5]. pp 212-232, 2013 |
Aurelien Bouteiller, Franck Cappello, Jack J. Dongarra, Amina Guermouche, Thomas Hérault, Yves Robert: "Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization.", in Euro-Par. pp 420-431, 2013 |
Jack J. Dongarra, Thomas Hérault, Yves Robert: "Revisiting the Double Checkpointing Algorithm.", in IPDPS Workshops. pp 706-715, 2013 |
Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "On the Combination of Silent Error Detection and Checkpointing.", in PRDC. pp 11-20, 2013 |
Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Optimal Checkpointing Period: Time vs. Energy.", in PMBS@SC. pp 203-214, 2013 |
Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Jack J. Dongarra: "Optimal Checkpointing Period: Time vs. Energy.", in CoRR:abs/1310.8456[]. pp , 2013 |
Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien, Dounia Zaidouni: "On the Combination of Silent Error Detection and Checkpointing.", in CoRR:abs/1310.8486[]. pp , 2013 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "DAGuE: A generic distributed DAG engine for High Performance Computing.", in Parallel Computing:38[1-2]. pp 37-51, 2012 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Jack J. Dongarra: "From Serial Loops to Parallel Execution on Distributed Systems.", in Euro-Par. pp 246-257, 2012 |
Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.", in Euro-Par. pp 477-488, 2012 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra: "Scalable Dense Linear Algebra on Heterogeneous Hardware.", in High Performance Computing Workshop (2). pp 65-103, 2012 |
Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Julien Langou, Yves Robert: "Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems.", in IPDPS. pp 607-618, 2012 |
Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Hérault, Jack J. Dongarra: "Algorithm-based fault tolerance for dense matrix factorizations.", in PPOPP. pp 225-234, 2012 |
Wesley Bland, Aurelien Bouteiller, Thomas Hérault, Joshua Hursey, George Bosilca, Jack J. Dongarra: "An Evaluation of User-Level Failure Mitigation Support in MPI.", in EuroMPI. pp 193-203, 2012 |
Emmanuel Agullo, Camille Coti, Thomas Hérault, Julien Langou, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello, Jack J. Dongarra: "QCG-OMPI: MPI applications on grids.", in Future Generation Comp. Syst.:27[4]. pp 357-369, 2011 |
George Bosilca, Thomas Hérault, Ala Rezmerita, Jack J. Dongarra: "On Scalability for MPI Runtime Systems.", in CLUSTER. pp 187-195, 2011 |
Teng Ma, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Process Distance-Aware Adaptive MPI Collective Communications.", in CLUSTER. pp 196-204, 2011 |
George Bosilca, Aurelien Bouteiller, Thomas Hérault, Pierre Lemarinier, Narapat Ohm Saengpatsa, Stanimire Tomov, Jack J. Dongarra: "Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.", in CLUSTER. pp 395-402, 2011 |
Aurelien Bouteiller, Thomas Hérault, George Bosilca, Jack J. Dongarra: "Correlated Set Coordination in Fault Tolerant Message Logging Protocols.", in Euro-Par (2). pp 51-64, 2011 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "DAGuE: A Generic Distributed DAG Engine for High Performance Computing.", in IPDPS Workshops. pp 1151-1158, 2011 |
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Hérault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, Jack J. Dongarra: "Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.", in IPDPS Workshops. pp 1432-1441, 2011 |
George Bosilca, Thomas Hérault, Pierre Lemarinier, Ala Rezmerita, Jack J. Dongarra: "Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure.", in EuroMPI. pp 342-344, 2011 |
Jack J. Dongarra, Mathieu Faverge, Thomas Hérault, Julien Langou, Yves Robert: "Hierarchical QR factorization algorithms for multi-core cluster systems", in CoRR:abs/1110.1553[]. pp , 2011 |
Fatiha Bouabache, Thomas Hérault, Sylvain Peyronnet, Franck Cappello: "Planning Large Data Transfers in Institutional Grids.", in CCGRID. pp 547-552, 2010 |
Amine Bourki, Guillaume Chaslot, Matthieu Coulm, Vincent Danjean, Hassen Doghmen, Jean-Baptiste Hoock, Thomas Hérault, Arpad Rimmel, Fabien Teytaud, Olivier Teytaud, Paul Vayssière, Ziqin Yut: "Scalability and Parallelization of Monte-Carlo Tree Search.", in Computers and Games. pp 48-58, 2010 |
François Lesueur, Ala Rezmerita, Thomas Hérault, Sylvain Peyronnet, Sébastien Tixeuil: "SAFE-OS: A secure and usable desktop operating system.", in CRiSIS. pp 1-7, 2010 |
Emmanuel Agullo, Camille Coti, Jack J. Dongarra, Thomas Hérault, Julien Langou: "QR factorization of tall and skinny matrices in a grid computing environment.", in IPDPS. pp 1-11, 2010 |
Gilles Fedak, Jean-Patrick Gelas, Thomas Hérault, Victor Iniesta, Derrick Kondo, Laurent Lefèvre, Paul Malecot, Lucas Nussbaum, Ala Rezmerita, Olivier Richard: "DSL-Lab: A Low-Power Lightweight Platform to Experiment on Domestic Broadband Internet.", in ISPDC. pp 141-148, 2010 |
Aline Carneiro Viana, Thomas Hérault, Thomas Largillier, Sylvain Peyronnet, Fatiha Zaïdi: "Supple: a flexible probabilistic data dissemination protocol for wireless sensor networks.", in MSWiM. pp 385-392, 2010 |
George Bosilca, Aurelien Bouteiller, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.", in EuroMPI. pp 189-197, 2010 |
Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment.", in Journal of Interconnection Networks:10[4]. pp 345-364, 2009 |
Franck Cappello, Thomas Hérault, Jack J. Dongarra: "Foreword.", in Parallel Computing:35[12]. pp 571, 2009 |
Thomas Hérault, Thomas Largillier, Sylvain Peyronnet, Benjamin Quétier, Franck Cappello, Mathieu Jan: "High accuracy failure injection in parallel and distributed systems using virtualization.", in Conf. Computing Frontiers. pp 193-196, 2009 |
Pavel Bar, Camille Coti, Derek Groen, Thomas Hérault, Valentin Kravtsov, Assaf Schuster, Martin T. Swain: "Running Parallel Applications with Topology-Aware Grid Middleware.", in eScience. pp 292-299, 2009 |
Camille Coti, Thomas Hérault, Franck Cappello: "MPI Applications on Grids: A Topology Aware Approach.", in Euro-Par. pp 466-477, 2009 |
George Bosilca, Camille Coti, Thomas Hérault, Pierre Lemarinier, Jack J. Dongarra: "Constructing Resiliant Communication Infrastructure for Runtime Environments.", in PARCO. pp 441-451, 2009 |
Emmanuel Agullo, Camille Coti, Jack J. Dongarra, Thomas Hérault, Julien Langou: "QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment", in CoRR:abs/0912.2572[]. pp , 2009 |
Darius Buntinas, Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, Franck Cappello: "Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols.", in Future Generation Comp. Syst.:24[1]. pp 73-84, 2008 |
Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical replication techniques to ensure checkpoint storage reliability in grid environment.", in AICCSA. pp 939-940, 2008 |
Camille Coti, Thomas Hérault, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello: "Grid Services for MPI.", in CCGRID. pp 417-424, 2008 |
Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment.", in CCGRID. pp 475-483, 2008 |
Thomas Hérault, Mathieu Jan, Thomas Largillier, Sylvain Peyronnet, Benjamin Quétier, Franck Cappello: "Emulation platform for high accuracy failure injection in grids.", in High Performance Computing Workshop. pp 127-140, 2008 |
Julien Clément 0002, Thomas Hérault, Stéphane Messika, Olivier Peres: "On the Complexity of a Self-Stabilizing Spanning Tree Algorithm for Large Scale Systems.", in PRDC. pp 48-55, 2008 |
Alexandre Borghi, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "Cell Assisted APMC.", in QEST. pp 75-76, 2008 |
Michaël Cadilhac, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet, Sébastien Tixeuil: "Evaluating Complex MAC Protocols for Sensor Networks with APMC.", in Electr. Notes Theor. Comput. Sci.:185[]. pp 33-46, 2007 |
Fatiha Bouabache, Thomas Hérault, Gilles Fedak, Franck Cappello: "A Distributed and Replicated Service for Checkpoint Storage.", in CoreGRID Workshop - Making Grids Work. pp 295-306, 2007 |
Thomas Hérault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, Joffroy Beauquier: "A Model for Large Scale Self-Stabilization.", in IPDPS. pp 1-10, 2007 |
Benjamin Quétier, Thomas Hérault, Vincent Néri, Franck Cappello: "Virtual Parallel Machines Through Virtualization: Impact on MPI Executions.", in PVM/MPI. pp 381-383, 2007 |
Camille Coti, Ala Rezmerita, Thomas Hérault, Franck Cappello: "Grid Services for MPI.", in PVM/MPI. pp 393-394, 2007 |
: "Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30 - October 3, 2007, Proceedings", in Lecture Notes in Computer Science:4757, |
Guillaume Guirado, Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "Distribution, Approximation and Probabilistic Model Checking.", in Electr. Notes Theor. Comput. Sci.:135[2]. pp 19-30, 2006 |
Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids.", in IJHPCA:20[1]. pp 77-90, 2006 |
Aurelien Bouteiller, Thomas Hérault, Géraud Krawezik, Pierre Lemarinier, Franck Cappello: "MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI.", in IJHPCA:20[3]. pp 319-333, 2006 |
William Hoarau, Pierre Lemarinier, Thomas Hérault, Eric Rodriguez, Sébastien Tixeuil, Franck Cappello: "FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?", in CLUSTER. pp , 2006 |
Thomas Hérault, Richard Lassaigne, Sylvain Peyronnet: "APMC 3.0: Approximate Verification of Discrete and Continuous Time Markov Chains.", in QEST. pp 129-130, 2006 |
Akim Demaille, Thomas Hérault, Sylvain Peyronnet: "Probabilistic verification of sensor networks.", in RIVF. pp 45-54, 2006 |
Camille Coti, Thomas Hérault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, Franck Cappello: "MPI tools and performance studies - Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI.", in SC. pp 127, 2006 |
Thomas Hérault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, Joffroy Beauquier: "Brief Announcement: Self-stabilizing Spanning Tree Algorithm for Large Scale Systems.", in SSS. pp 574-575, 2006 |
Marie Duflot, Laurent Fribourg, Thomas Hérault, Richard Lassaigne, Frédéric Magniette, Stéphane Messika, Sylvain Peyronnet, Claudine Picaronny: "Probabilistic Model Checking of the CSMA/CD Protocol Using PRISM and APMC.", in Electr. Notes Theor. Comput. Sci.:128[6]. pp 195-214, 2005 |
Franck Cappello, Samir Djilali, Gilles Fedak, Thomas Hérault, Frédéric Magniette, Vincent Néri, Oleg Lodygensky: "Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid.", in Future Generation Comp. Syst.:21[3]. pp 417-437, 2005 |
Aurelien Bouteiller, Boris Collin, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI.", in IPDPS. pp , 2005 |
Pierre Lemarinier, Aurelien Bouteiller, Thomas Hérault, Géraud Krawezik, Franck Cappello: "Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.", in CLUSTER. pp 115-124, 2004 |
Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas Hérault, Pierre Lemarinier, Franck Cappello: "Hybrid Preemptive Scheduling of MPI Applications on the Grids.", in GRID. pp 130-137, 2004 |
Samir Djilali, Thomas Hérault, Oleg Lodygensky, Tangui Morlier, Gilles Fedak, Franck Cappello: "RPC-V: Toward Fault-Tolerant RPC for Internet Connected Desktop Grids with Volatile Nodes.", in SC. pp 39, 2004 |
Thomas Hérault, Richard Lassaigne, Frédéric Magniette, Sylvain Peyronnet: "Approximate Probabilistic Model Checking.", in VMCAI. pp 73-84, 2004 |
Aurelien Bouteiller, Franck Cappello, Thomas Hérault, Géraud Krawezik, Pierre Lemarinier, Frédéric Magniette: "MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging.", in SC. pp 25, 2003 |
George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fedak, Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky, Frédéric Magniette, Vincent Néri, Anton Selikhov: "MPICH-V: toward a scalable fault tolerant MPI for volatile nodes.", in SC. pp 31:1-31:18, 2002 |
Joffroy Beauquier, Thomas Hérault: "Fault-Local Stabilization: The Shortest Path Tree.", in SRDS. pp 62-69, 2002 |
Joffroy Beauquier, Thomas Hérault, Elad Schiller: "Easy Stabilization with an Agent.", in WSS. pp 35-50, 2001 |
Teaching & Tutorials
Since 2012, I give with Yves Robert, George Bosilca, and Aurelien Bouteiller a tutorial at the IEEE/ACM Supercomputing Conference on Fault Tolerance in High Performance Computing.
With George Bosilca, I gave a 24h class on Fault Tolerance in High Performance Computing during a research school at the ENS Lyon, in 2012
With Yves Robert, I presented a tutorial on Fault Tolerance in High Performance Computing at PPoPP in 2015, and at the International Conference on Supercomputing in 2013
When I was assistant professor at the University of Paris-Sud, I was teaching in Operating Systems, Formal Verification, Software Design, Databases, Networks, Architecture, and High Performance Computing. I was pedagogic director for a class at the University Paris-Sud engineering school IFIPS (now Polytech Paris-Sud)
Program Commmittees
Supercomputing 2018, member of the Workshops and Posters committees
Supercomputing 2017, member of the PC for the Algorithm track
ICPP’17, PC Vice-Chair, track “Systems”
Supercomputing 2016, member of the PC for the Algorithm track
ICPADS 2015, PC Vice-Chair, track “multicore computing”
HiPC 2014, Program Chair
HiPC 2013, PC Vice-Chair, track “Software”
IPDPS 2013, member of the PC
Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Heteropar) 2011: member of the PC
Facing the Multicore-Challenge II (conference for young scientists) 2011 – 2012: member of the PC
15th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) 2010: member of the PC
IEEE International Conference on Cluster Computing (Cluster) 2010: member of the PC
IEEE/ACM International Symposium on Cluster, Cloud and Grid (CCGRID) 2010 – 2008: member of the PC
International Symposium on Parallel and Distributed Computing (ISPDC) 2010 – 2008: member of the PC
High Performance Computing for Computational Science (VECPAR) 2008, 2010, 2012: Member of the PC
European PVM/MPI Users Group Meeting (EuroPVM/MPI), now EuroMPI 2008 – 2018: Member of the PC
European PVM/MPI Users Group Meeting (EuroPVM/MPI) 2007: PC Co-Chair / local organizer.