Publications
Export 75 results:
Filters: Author is Aurelien Bouteiller [Clear All Filters]
An Evaluation of User-Level Failure Mitigation Support in MPI,”
Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.
“Evaluating Contexts in OpenSHMEM-X Reference Implementation,”
OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, Cham, Springer International Publishing, pp. 50–62, 2018.
DOI: 10.1007/978-3-319-73814-7_4
“Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,”
7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.
(102.51 KB)
“Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,”
Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.
(202.87 KB)
“Do moldable applications perform better on failure-prone HPC platforms?,”
11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.
(360.72 KB)
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“Distributed Termination Detection for HPC Task-Based Environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,”
Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.
(1.01 MB)
“Data Movement Interfaces to Support Dataflow Runtimes,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.
(210.94 KB)
“DAGuE: A Generic Distributed DAG Engine for High Performance Computing,”
Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.
(830.85 KB)
“DAGuE: A generic distributed DAG Engine for High Performance Computing.,”
Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.
(830.85 KB)
“DAGuE: A generic distributed DAG engine for high performance computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.
(830.85 KB)
“Correlated Set Coordination in Fault Tolerant Message Logging Protocols,”
Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March 2013.
DOI: 10.1002/cpe.2859 (636.68 KB)
“Correlated Set Coordination in Fault Tolerant Message Logging Protocols,”
Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.
(486.68 KB)
“Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,”
International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.
(755.54 KB)
“Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,”
Parallel Computing, vol. 85, pp. 1–12, July 2019.
DOI: 10.1016/j.parco.2019.02.002 (865.18 KB)
“A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,”
18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.
(289.32 KB)
“Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,”
Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.
(440.7 KB)
“Assessing the Impact of ABFT and Checkpoint Composite Strategies,”
16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.02 MB)
“Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,”
ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.
DOI: 10.1145/2686892 (1.14 MB)
“Algorithm-based Fault Tolerance for Dense Matrix Factorizations,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.
(865.79 KB)
“Algorithm-Based Fault Tolerance for Dense Matrix Factorization,”
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.
DOI: 10.1145/2145816.2145845 (865.79 KB)
“