Publications
Ginkgo - A math library designed to accelerate Exascale Computing Project science applications,”
The International Journal of High Performance Computing Applications, August 2024.
DOI: 10.1177/10943420241268323
“Ginkgo—A math library designed for platform portability,”
Parallel Computing, vol. 111, pp. 102902, February 2022.
DOI: 10.1016/j.parco.2022.102902
“Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
DOI: 10.1016/j.jpdc.2018.04.017 (273.53 KB)
“ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance,”
Computer Physics Communications, vol. 97, issue 1-2, pp. 1-15, August 1996.
DOI: 10.1016/0010-4655(96)00017-3
“Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.
(740.37 KB)
Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing,”
IEEE Transactions on Computers, vol. 58, issue 11, pp. 1512-1524, November 2009.
DOI: 10.1109/TC.2009.42 (1.81 MB)
“Porting Sparse Linear Algebra to Intel GPUs,”
Euro-Par 2021: Parallel Processing Workshops, vol. 13098, Lisbon, Portugal, Springer International Publishing, pp. 57 - 68, June 2022.
DOI: 10.1007/978-3-031-06156-1_5
“SLATE Developers' Guide,”
SLATE Working Notes, no. 11, ICL-UT-19-02: Innovative Computing Laboratory, University of Tennessee, December 2019.
(1.68 MB)
“SLATE Mixed Precision Performance Report,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-03: University of Tennessee, April 2019.
(1.04 MB)
“Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 152-160, September 2022.
DOI: 10.1109/CLUSTER51413.2022.00029
“Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“PMIx: Process Management for Exascale Environments,”
Parallel Computing, vol. 79, pp. 9–29, January 2018.
DOI: 10.1016/j.parco.2018.08.002
“PMIx: Process Management for Exascale Environments,”
Proceedings of the 24th European MPI Users' Group Meeting, New York, NY, USA, ACM, pp. 14:1–14:10, 2017.
DOI: 10.1145/3127024.3127027
“Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors,”
Parallel Computing, vol. 75, pp. 41–60, July 2018.
DOI: 10.1016/j.parco.2018.03.004 (2.56 MB)
“Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on IaaS Cloud platforms,”
Concurrency and Computation: Practice and Experience, vol. 33, no. 17, pp. e6065, 2021.
DOI: 10.1002/cpe.6065 (1.99 MB)
“Design for a Soft Error Resilient Dynamic Task-based Runtime,”
ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
(2.61 MB)
“Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems,”
35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.
(1.08 MB)
“Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Press, November 2022.
“Flexible Data Redistribution in a Task-Based Runtime System,”
IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00032 (354.8 KB)
“Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion,”
2023 IEEE International Conference on Cluster Computing (CLUSTER), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTER52292.2023.00035
“Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,”
Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.
(429.55 KB)
“A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.
DOI: 10.1109/IPDPS53621.2022.00047 (1.03 MB)
“Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,”
Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020.
DOI: 10.1145/3394277.3401846 (2.71 MB)
“