Publications
HPC Forecast: Cloudy and Uncertain,”
Communications of the ACM, vol. 66, issue 2, pp. 82 - 90, January 2023.
DOI: 10.1145/3552309
“How to Build Your Own Deep Neural Network
: PEARC20, July 2020.
(18.8 MB)

hipMAGMA v2.0
: Zenodo, July 2020.
DOI: 10.5281/zenodo.3928667
hipMAGMA v1.0
: Zenodo, March 2020.
DOI: 10.5281/zenodo.3908549
High-Performance Tensor Contractions for GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(2.36 MB)
“
High-performance Matrix-matrix Multiplications of Very Small Matrices,”
22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
“High-Performance High-Resolution Semi-Lagrangian Tracer Transport on a Sphere,”
Journal of Computational Physics, vol. 230, issue 17, pp. 6778-6799, July 2011.
DOI: 10.1016/j.jcp.2011.05.008
(1.68 MB)
“
High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,”
8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.
“High-performance Cholesky Factorization for GPU-only Execution,”
Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.
DOI: 10.1145/3038228.3038237
(872.18 KB)
“
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,”
2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.
(1.3 MB)
“
Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing,”
IEEE Transactions on Computers, vol. 58, issue 11, pp. 1512-1524, November 2009.
DOI: 10.1109/TC.2009.42
(1.81 MB)
“
High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247), May 2011.
(424.93 KB)
“
Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.
(405.71 KB)
“
Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
IPDPS 2012, the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, IEEE Computer Society Press, May 2012.
(405.71 KB)
“
Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,”
Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.
(1.43 MB)
“
Hierarchical DAG scheduling for Hybrid Distributed Systems,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(1.11 MB)
“
Heterogeneous Streaming,”
The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(2.73 MB)
“
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(6.2 MB)
