Publications
Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs,”
ISC High Performance 2024 International Workshops , vol. 15058, Hamburg, Germany, Springer, Cham, pp. 127 - 140, December 2024.
DOI: 10.1007/978-3-031-73716-9
“Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“Ginkgo - A math library designed to accelerate Exascale Computing Project science applications,”
The International Journal of High Performance Computing Applications, August 2024.
DOI: 10.1177/10943420241268323
“MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures,”
The International Journal of High Performance Computing Applications, June 2024.
DOI: 10.1177/10943420241261960
“Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
DOI: 10.1109/MCSE.2024.3387302
“AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,”
Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.
DOI: 10.1007/978-3-031-23220-610.1007/978-3-031-23220-6_4
“Earth Virtualization Engines - A Technical Perspective
, September 2023.
GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624247
“GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624247
“Mixed Precision Algebraic Multigrid on GPUs,”
Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
DOI: 10.1007/978-3-031-30442-2_9
“PAQR: Pivoting Avoiding QR factorization,”
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.
DOI: 10.1109/IPDPS54959.2023.00040
“PAQR: Pivoting Avoiding QR factorization,”
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.
DOI: 10.1109/IPDPS54959.2023.00040
“Parallel Symbolic Cholesky Factorization,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624253
“Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
DOI: 10.1007/978-3-031-23606-810.1007/978-3-031-23606-8_3
“Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
DOI: 10.1007/978-3-031-23606-810.1007/978-3-031-23606-8_3
“Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
DOI: 10.1007/978-3-031-23606-810.1007/978-3-031-23606-8_3
“Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion,”
2023 IEEE International Conference on Cluster Computing (CLUSTER), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTER52292.2023.00035
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
DOI: 10.1002/cpe.7871
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
DOI: 10.1002/cpe.7871
“Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators,”
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624248
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
DOI: 10.1016/j.future.2023.07.024
“Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,”
Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.
DOI: 10.1002/spe.v53.110.1002/spe.3041
“Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.
DOI: 10.1109/TPDS.2021.3084071
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“Approximate Computing for Scientific Applications,”
Approximate Computing Techniques, 322: Springer International Publishing, pp. 415 - 465, January 2022.
DOI: 10.1007/978-3-030-94705-7_14
“Batch QR Factorization on GPUs: Design, Optimization, and Tuning,”
Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.
DOI: 10.1007/978-3-031-08751-6_5
“Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations,”
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.
DOI: 10.1109/IPDPS53621.2022.00024 (1.26 MB)
“Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
DOI: 10.1177/10943420221115140 (13.52 MB)
“Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
DOI: 10.1177/10943420221115140 (13.52 MB)
“Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
DOI: 10.1002/cpe.6515 (749.82 KB)
“Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
DOI: 10.1002/cpe.6515 (749.82 KB)
“Deep Gaussian process with multitask and transfer learning for performance optimization,”
2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1-7, September 2022.
DOI: 10.1109/HPEC55821.2022.9926396
“FFT Benchmark Performance Experiments on Systems Targeting Exascale,”
ICL Technical Report, no. ICL-UT-22-02, March 2022.
(5.87 MB)
“A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.
DOI: 10.1109/IPDPS53621.2022.00047 (1.03 MB)
“Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,”
ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.
DOI: 10.1145/3480935 (4.2 MB)
“Ginkgo—A math library designed for platform portability,”
Parallel Computing, vol. 111, pp. 102902, February 2022.
DOI: 10.1016/j.parco.2022.102902
“Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 152-160, September 2022.
DOI: 10.1109/CLUSTER51413.2022.00029
“Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“PAQR: Pivoting Avoiding QR factorization,”
ICL Technical Report, no. ICL-UT-22-06, June 2022.
(364.85 KB)
“Performance Analysis of Parallel FFT on Large Multi-GPU Systems,”
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, IEEE, August 2022.
DOI: 10.1109/IPDPSW55747.2022.00072
“