Publications
CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,”
2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTERWorkshops61457.2023.00028
“Review of Performance Analysis Tools for MPI Parallel Programs,”
European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting, Lecture Notes in Computer Science 2131, Greece, Springer Verlag, Berlin, pp. 241-248, September 2001.
DOI: 10.1007/3-540-45417-9_34 (39.61 KB)
“Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“User-Defined Events for Hardware Performance Monitoring,”
Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.
DOI: 10.1016/j.procs.2011.04.229 (361.76 KB)
““BDEC Pathways to Convergence: Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-08: University of Tennessee, November 2017.
A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware,”
International Conference on Computational Science (ICCS 2002), Amsterdam, Netherlands, Springer, April 2002.
DOI: 10.1007/3-540-46080-2_95 (122 KB)
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“Automating the Large-Scale Collection and Analysis of Performance,”
5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.
(511.6 KB)
“PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data,”
European Conference on Parallel Processing (Euro-Par 2005), Monte de Caparica, Portugal, Springer, September 2005.
DOI: 10.1007/11549468_1 (205.45 KB)
“Randomized Numerical Linear Algebra: A Perspective on the Field with an Eye to Software,”
University of California, Berkeley EECS Technical Report, no. UCB/EECS-2022-258: University of California, Berkeley, November 2022.
DOI: 10.48550/arXiv.2302.11474 (1.05 MB) (1.54 MB)
“A Python Library for Matrix Algebra on GPU and Multicore Architectures,”
2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, IEEE, December 2022.
DOI: 10.1109/MASS56207.2022.00121 (414.36 KB)
“An Improved MAGMA GEMM for Fermi GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-CS-10-655 (also LAPACK working note 227), July 2010.
(486.71 KB)
“Accelerating GPU Kernels for Dense Linear Algebra,”
Proc. of VECPAR'10, Berkeley, CA, June 2010.
(615.07 KB)
“An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)
Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,”
ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.
(630.63 KB)
“Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
Evaluating Asynchronous Schwarz Solvers on GPUs,”
International Journal of High Performance Computing Applications, August 2020.
DOI: 10.1177/1094342020946814
“Heterogeneous Streaming,”
The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(2.73 MB)
“MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
DOI: 10.13140/RG.2.2.14906.64961 (7.84 MB)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
MagmaDNN: Accelerated Deep Learning Using MAGMA,”
Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.
(1.09 MB)
“MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,”
ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.
DOI: 10.1007/978-3-030-34356-9_37 (1.37 MB) (8.72 MB)
“DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020.
DOI: 10.1109/CCGrid49817.2020.00-76 (424.19 KB)
“Performance Analysis and Debugging Tools at Scale,”
Exascale Scientific Applications: Scalability and Performance Portability: Chapman & Hall / CRC Press, pp. 17-50, November 2017.
DOI: 10.1201/b21930
“Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs,”
IEEE Cluster, Albuquerque, NM, IEEE, September 2019.
(220.84 KB)
“Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,”
PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.
(4.51 MB)
“Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,”
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020.
DOI: 10.1109/IPDPSW50202.2020.00127 (1.33 MB)
“An Introduction to High Performance Computing and Its Intersection with Advances in Modeling Rare Earth Elements and Actinides,”
Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 3-53, October 2021.
DOI: 10.1021/bk-2021-1388.ch001
“Rare Earth Elements and Critical Materials: Uses and Availability,”
Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 63-74, October 2021.
DOI: 10.1021/bk-2021-1388.ch003
“Rare Earth Elements and Actinides: Progress in Computational Science Applications,”
ACS Symposium Series, vol. 1388, Washington, DC, American Chemical Society, October 2021.
DOI: DOI: 10.1021/bk-2021-1388
“Evaluations of molecular modeling and machine learning for predictive capabilities in binding of lanthanum and actinium with carboxylic acids,”
Journal of Radioanalytical and Nuclear Chemistry, December 2022.
DOI: 10.1007/s10967-022-08620-7
“