Publications
Rare Earth Elements and Critical Materials: Uses and Availability,”
Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 63-74, October 2021.
DOI: 10.1021/bk-2021-1388.ch003
“Rare Earth Elements and Actinides: Progress in Computational Science Applications,”
ACS Symposium Series, vol. 1388, Washington, DC, American Chemical Society, October 2021.
DOI: DOI: 10.1021/bk-2021-1388
“Randomized Numerical Linear Algebra: A Perspective on the Field with an Eye to Software,”
University of California, Berkeley EECS Technical Report, no. UCB/EECS-2022-258: University of California, Berkeley, November 2022.
DOI: 10.48550/arXiv.2302.11474
(1.05 MB)
(1.54 MB)
“

Race to Exascale,”
Computing in Science and Engineering, vol. 21, issue 1, pp. 4-5, March 2019.
DOI: 10.1109/MCSE.2018.2882574
(106.97 KB)
“
Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication,”
EuroMPI'21, Garching, Munich Germany, 2021.
(835.27 KB)
“
QUARK Users' Guide: QUeueing And Runtime for Kernels,”
University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-02, 00 2011.
(247.12 KB)
“
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“
QR Factorization for the CELL Processor,”
University of Tennessee Computer Science Technical Report, UT-CS-08-616 (also LAPACK Working Note 201), May 2008.
(194.95 KB)
“
QR Factorization for the CELL Processor,”
Scientific Programming, vol. 17, no. 1-2, pp. 31-42, 00 2010.
(194.95 KB)
“
A Python Library for Matrix Algebra on GPU and Multicore Architectures,”
2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, IEEE, December 2022.
DOI: 10.1109/MASS56207.2022.00121
(414.36 KB)
“
Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), Heidelberg, Germany, IEEE, September 2022.
DOI: 10.1109/CLUSTER51413.2022.00026
“PULSE: PAPI Unifying Layer for Software-Defined Events (Poster)
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
(1.86 MB)

PTG: An Abstraction for Unhindered Parallelism,”
International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.
(480.05 KB)
“
Providing performance portable numerics for Intel GPUs,”
Concurrency and Computation: Practice and Experience, vol. 17, October 2022.
DOI: 10.1002/cpe.7400
(3.16 MB)
“
Providing GPU Capability to LU and QR within the ScaLAPACK Framework,”
University of Tennessee Computer Science Technical Report (also LAWN 272), no. UT-CS-12-699, September 2012.
(7.48 MB)
“
Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC),”
LAPACK Working Notes, no. 297, ICL-UT-20-07: University of Tennessee.
(1.41 MB)
“
Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning,”
The Journal of Computational Science Education, vol. 11, issue 1, pp. 36-44, January 2020.
DOI: 10.22369/issn.2153-4136/11/1/7
(4.4 MB)
“
Progressive Optimization of Batched LU Factorization on GPUs,”
IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019.
(299.38 KB)
“
Programming the LU Factorization for a Multicore System with Accelerators,”
Proceedings of VECPAR’12, Kobe, Japan, April 2012.
(414.33 KB)
“
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers
, Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
(2.34 MB)

Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning,”
2022 SIAM Conference on Parallel Processing for Scientific Computing (PP), Philadelphia, PA, Society for Industrial and Applied Mathematics, pp. 14 - 24.
DOI: 10.1137/1.978161197714110.1137/1.9781611977141.2
“Predicting MPI Collective Communication Performance Using Machine Learning,”
2020 IEEE International Conference on Cluster Computing (CLUSTER), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00036
(619.68 KB)
“
Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
DOI: 10.1007/978-3-031-23606-810.1007/978-3-031-23606-8_3
“Preconditioned Krylov Solvers on GPUs,”
Parallel Computing, June 2017.
DOI: 10.1016/j.parco.2017.05.006
(1.19 MB)
“