Publications
A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
DOI: 10.1093/nsr/nwv084
(393.55 KB)
“
New Robust ScaLAPACK Routine for Computing the QR Factorization with Column Pivoting,”
LAPACK Working Note, no. LAWN 296, ICL-UT-19-14: University of Tennessee, October 2019.
(454.83 KB)
“
Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations,”
2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, IEEE, April 2013.
(307.24 KB)
“
A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
DOI: 10.1007/978-3-642-01970-8_89
(236.02 KB)
“
Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs,”
International Conference on Supercomputing (ICS '17), Chicago, Illinois, ACM, June 2017.
DOI: 10.1145/3079079.3079103
(1.04 MB)
“
Numerical Algorithms for High-Performance Computational Science,”
Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020.
DOI: 10.1098/rsta.2019.0066
(724.37 KB)
“
Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload,”
The International Journal of High Performance Computing Applications, vol. 303, issue 136, September 2024.
DOI: 10.1177/10943420241281050
“Numerical Linear Algebra for High-Performance Computers,”
Software, Environments and Tools: SIAM, 1998.
DOI: 10.1137/1.9780898719611
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)

Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)

One-Sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators,”
The International Conference on Computational Science (ICCS), June 2012.
“OpenCL Evaluation for Numerical Linear Algebra Library Development,”
Symposium on Application Accelerators in High-Performance Computing (SAAHPC '10), Knoxville, TN, July 2010.
(2.69 MB)
“
OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,”
Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.
(1.48 MB)
“
OpenMP application experiences: Porting to accelerated nodes,”
Parallel Computing, vol. 109, March 2022.
DOI: 10.1016/j.parco.2021.102856
“Optimal Checkpointing Strategies for Iterative Applications,”
IEEE Transactions on Parallel Distributed Systems, vol. 33, issue 3, pp. 507-522, March 2022.
DOI: 10.1109/TPDS.2021.3099440
(1.47 MB)
“
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,”
2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018.
DOI: 10.1109/IPDPSW.2018.00127
(899.3 KB)
“
Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,”
The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.
DOI: 10.1177/1094342016646844
(2.08 MB)
“