Publications

Export 86 results:
Filters: Author is Thomas Herault  [Clear All Filters]
2010
Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing,” Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, vol. 19, pp. 441-451, 2010.
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.  (830.85 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.  (366.26 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.  (400.75 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.  (202.87 KB)
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids,” Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.  (1.48 MB)
Agullo, E., C. Coti, J. Dongarra, T. Herault, and J. Langou, QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,” 24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.  (261.55 KB)
2011
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.  (865.79 KB)
Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.  (486.68 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.  (830.85 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.  (1.26 MB)
Dongarra, J., M. Faverge, T. Herault, J. Langou, and Y. Robert, Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,” University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.  (405.71 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.  (290.98 KB)
Ma, T., T. Herault, G. Bosilca, and J. Dongarra, Process Distance-aware Adaptive MPI Collective Communications,” IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids.,” Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.  (1.48 MB)
Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems,” International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.  (898.76 KB)
Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-05, Knoxville, TN, May 2011.  (898.76 KB)
Bosilca, G., T. Herault, P. Lemariner, J. Dongarra, and A.. Rezmerita, Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.  (115.75 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
2012
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.  (865.79 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.  (289.32 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.  (830.85 KB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00 2012.  (422.76 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, and J. Dongarra, From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.  (203.08 KB)
Dongarra, J., M. Faverge, T. Herault, J. Langou, and Y. Robert, Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,” IPDPS 2012, the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, IEEE Computer Society Press, May 2012.  (405.71 KB)
Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.  (159.46 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.  (2.76 MB)
2013
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
Aupy, G., A. Benoit, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, On the Combination of Silent Error Detection and Checkpointing,” UT-CS-13-710: University of Tennessee Computer Science Technical Report, June 2013.  (1.29 MB)
Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March 2013.  (636.68 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Luszczek, and J. Dongarra, Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.  (1.01 MB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013.  (311.23 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013.  (3.89 MB)
Dongarra, J., M. Faverge, T. Herault, M. Jacquelin, J. Langou, and Y. Robert, Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,” Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.  (1.43 MB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.  (497.64 KB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.  (431.84 KB)
Aupy, G., A. Benoit, T. Herault, Y. Robert, and J. Dongarra, Optimal Checkpointing Period: Time vs. Energy,” University of Tennessee Computer Science Technical Report (also LAWN 281), no. ut-eecs-13-718: University of Tennessee, October 2013.  (440.13 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.  (2.16 MB)
Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.  (285.77 KB)
Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, May 2013.  (591.1 KB)
Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” University of Tennessee Computer Science Technical Report (LAWN 274), no. ut-cs-13-705, January 2013.  (682.22 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.  (760.32 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013.  (894.61 KB)
2014
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
Cao, C., T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.  (2.61 MB)
Bouteiller, A., T. Herault, and G. Bosilca, A Multithreaded Communication Substrate for OpenSHMEM,” 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), Eugene, OR, October 2014.  (261.66 KB)
Dongarra, J., T. Herault, and Y. Robert, Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.  (859.04 KB)

Pages