Publications
High-Performance Computing,”
The Princeton Companion to Applied Mathematics, Princeton, New Jersey, Princeton University Press, pp. 839-842, 2015.
“The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems,”
International Conference on Computational Science (ICCS 2017), Zürich, Switzerland, Elsevier, June 2017.
(446.14 KB)
“Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,”
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.
(642.51 KB)
“Optimized Batched Linear Algebra for Modern Architectures,”
Euro-Par 2017, Santiago de Compostela, Spain, Springer, August 2017.
(618.33 KB)
“Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,”
Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.
(341.54 KB)
“Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
(2.24 MB)
“Numerical Algorithms for High-Performance Computational Science,”
Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020.
(724.37 KB)
“A Set of Batched Basic Linear Algebra Subprograms,”
ACM Transactions on Mathematical Software, October 2020.
“Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.
(1.03 MB)
“A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,”
SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.
(3.98 MB)
“