Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.“
Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,” in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.“
An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,” Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014. DOI: 10.1016/j.parco.2013.12.003“
Evaluating Data Redistribution in PaRSEC,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 8, pp. 1856-1872, 2022. DOI: 10.1109/TPDS.2021.3131657“
An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.“
An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013. DOI: 10.1007/s00607-013-0331-3“
Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013. DOI: 10.1002/cpe.3100“
A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018. DOI: 10.1177/1094342017711505“
Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,” Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020. DOI: 10.1016/j.future.2020.01.026“
Flexible collective communication tuning architecture applied to Open MPI,” 2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.“
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,” IPDPS 2012 (Best Paper), Shanghai, China, May 2012.“
High Performance RDMA Protocols in HPC,” Euro PVM/MPI 2006, Bonn, Germany, September 2006.“
A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.“
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.“
Implementation and Usage of the PERUSE-Interface in Open MPI,” Euro PVM/MPI 2006, Bonn, Germany, September 2006.“
An international survey on MPI users,” Parallel Computing, vol. 108, December 2021. DOI: 10.1016/j.parco.2021.102853“
Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms,” Journal of Parallel and Distributed Computing, vol. 73, issue 7, pp. 1000-1010, July 2013. DOI: 10.1016/j.jpdc.2013.01.015“
Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019. DOI: 10.1016/j.future.2018.09.041“
Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.“
MPI Collective Algorithm Selection and Quadtree Encoding,” Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.“
MPI Collective Algorithm Selection and Quadtree Encoding,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.“
OMPIO: A Modular Software Architecture for MPI I/O,” 18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.“
Overhead of Using Spare Nodes,” The International Journal of High Performance Computing Applications, February 2020. DOI: 10.1177%2F1094342020901885“
PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013. DOI: 10.1109/MCSE.2013.98“
Performance Analysis of MPI Collective Operations,” Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.“
Performance Analysis of MPI Collective Operations,” Cluster Computing Journal (to appear), January 2005.“
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.“
Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013. DOI: 10.1177/1094342013488238“
Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing,” International Journal for High Performance Applications and Supercomputing (to appear), April 2004.“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” SIAM SISC (to appear), May 2007.“
Redesigning the Message Logging Model for High Performance,” Concurrency and Computation: Practice and Experience (online version), June 2010.“
Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging,” Accepted for Euro PVM/MPI 2007: Springer, September 2007.“
Scalable Fault Tolerant Protocol for Parallel Runtime Environments,” 2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00 2006.“
Self Adapting Numerical Software SANS Effort,” IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.“
Self-Healing Network for Scalable Fault-Tolerant Runtime Environments,” Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.“
A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018. DOI: 10.1002/cpe.4851“
Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013. DOI: 10.1002/cpe.3173“
Accelerating FFT towards Exascale Computing : NVIDIA GPU Technology Conference (GTC2021), 2021.
DTE: PaRSEC Enabled Libraries and Applications : 2021 Exascale Computing Project Annual Meeting, April 2021.
DTE: PaRSEC Enabled Libraries and Applications (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
DTE: PaRSEC Systems and Interfaces (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
A Report of the MPI International Survey (Poster) , Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster) , Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers : 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.“
Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.“
Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.“
Constructing resiliant communication infrastructure for runtime environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.“
Context Identifier Allocation in Open MPI,” University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.“
DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.“