Publications
“Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation,”
2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Milano, Italy, IEEE, July 2025.
DOI: 10.1109/IPDPS64566.2025.00110
“Accelerating Supercomputing: AI-Hardware-Driven Innovation for Speed and Efficiency,”
2025 IEEE High Performance Extreme Computing Conference (HPEC), Wakefield, MA, USA, IEEE, October 2025.
DOI: 10.1109/HPEC67600.2025.11196413
Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic
: arXiv, June 2025.
“Durable Engines of Discovery,”
Communications of the ACM, September 2025.
DOI: 10.1145/3749367
“Efficient Embedding Initialization via Dominant Eigenvector Projections,”
SC Workshops '25: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, St Louis, MO USA, ACM, November 2025.
DOI: 10.1145/3731599.3767541
“Evolution of the computational science community: The dynamics of topics and collaborations in 24 years of ICCS and JoCS publications,”
Journal of Computational Science, vol. 89, July 2025.
DOI: 10.1016/j.jocs.2025.102609
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
: arXiv, February 2025.
“Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs,”
ISC High Performance 2024 International Workshops , vol. 15058, Hamburg, Germany, Springer, Cham, pp. 127 - 140, December 2024.
DOI: 10.1007/978-3-031-73716-9
“Advancements of PAPI for the exascale generation,”
The International Journal of High Performance Computing Applications, December 2024.
DOI: 10.1177/10943420241303884
“Asynchrony and Failure Masking via Pseudo-Local Process Recovery in MPI Applications,”
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), San Francisco, CA, USA, IEEE, May 2024.
DOI: 10.1109/IPDPSW63119.2024.00193
“Automated Data Analysis for Defining Performance Metrics from Raw Hardware Events,”
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), San Francisco, CA, USA, IEEE, May 2024.
DOI: 10.1109/IPDPSW63119.2024.00134
“Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators,”
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis: IEEE Press, 2024.
DOI: 10.1109/SC41406.2024.00008
(15.98 MB)
CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
“The co-evolution of computational physics and high-performance computing,”
Nature Reviews Physics, August 2024.
DOI: 10.1038/s42254-024-00750-z
“Computation at the Cutting Edge of Science,”
Journal of Computational Science, June 2024.
DOI: 10.1016/j.jocs.2024.102379
“Economical Quasi-Newton Unitary Optimization of Electronic Orbitals,”
Physical Chemistry Chemical Physics, December 2023, 2024.
DOI: 10.1039/D3CP05557D
“Evaluating PaRSEC Through Matrix Computations in Scientific Applications,”
Asynchronous Many-Task Systems and Applications - Second International Workshop, WAMTA 2024, Knoxville, TN, USA, February 14-16, 2024, Proceedings, vol. 14626: Springer, pp. 22–33, 2024.
DOI: 10.1007/978-3-031-61763-8_3
(600.76 KB)
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“Ginkgo - A math library designed to accelerate Exascale Computing Project science applications,”
The International Journal of High Performance Computing Applications, August 2024.
DOI: 10.1177/10943420241268323
Hardware Trends Impacting Floating-Point Computations In Scientific Applications
: arXiv, December 2024.
Interface for Sparse Linear Algebra Operations
, November 2024.
DOI: 10.48550/arXiv.2411.13259
“MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures,”
The International Journal of High Performance Computing Applications, June 2024.
DOI: 10.1177/10943420241261960
“Multi-GPU work sharing in a task-based dataflow programming model,”
Future Generation Computer Systems, vol. 156, pp. 313 - 324, July 2024.
DOI: 10.1016/j.future.2024.03.017
“Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload,”
The International Journal of High Performance Computing Applications, vol. 303, issue 136, September 2024.
DOI: 10.1177/10943420241281050
“PaRSEC: Scalability, flexibility, and hybrid architecture support for task-based applications in ECP,”
The International Journal of High Performance Computing Applications, October 2024.
DOI: 10.1177/10943420241290520
“A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?,”
Future Generation Computer Systems, July 2024.
DOI: 10.1016/j.future.2024.07.022
“Taking the MPI standard and the open MPI library to exascale,”
The International Journal of High Performance Computing Applications, July 2024.
DOI: 10.1177/10943420241265936
“Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
DOI: 10.1109/MCSE.2024.3387302
“Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression,”
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis: IEEE Press, 2024.
DOI: 10.1109/SC41406.2024.00012
(1.4 MB)
“Towards Scalable and Efficient Spiking Reinforcement Learning for Continuous Control Tasks,”
2024 International Conference on Neuromorphic Systems (ICONS), Arlington, VA, USA, IEEE, 2024.
DOI: 10.1109/ICONS62911.2024.00057
“XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing,”
Computing in Science & Engineering, vol. 26, issue 3, pp. 40 - 51, July 2024.
DOI: 10.1109/MCSE.2024.3382154
“AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,”
Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.
DOI: 10.1007/978-3-031-23220-610.1007/978-3-031-23220-6_4
“Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors,”
ACM Transactions on Mathematical Software, vol. 49, issue 3, pp. 1 - 29, September 2023.
DOI: 10.1145/3595178
“Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,”
The International Journal of High Performance Computing Applications, March 2023.
DOI: 10.1177/10943420231166365
“Direct Determination of Optimal Real-Space Orbitals for Correlated Electronic Structure of Molecules,”
Journal of Chemical Theory and Computation, vol. 19, issue 20, pp. 7230 - 7241, October 2023.
DOI: 10.1021/acs.jctc.3c00732
“O(N) distributed direct factorization of structured dense matrices using runtime systems,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
DOI: 10.1145/3605573.3605606
Earth Virtualization Engines - A Technical Perspective
, September 2023.
“Elastic deep learning through resilient collective operations,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3626080
Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes
: arXiv, December 2023.
“GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624247
“HPC Forecast: Cloudy and Uncertain,”
Communications of the ACM, vol. 66, issue 2, pp. 82 - 90, January 2023.
DOI: 10.1145/3552309
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,”
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.
DOI: 10.1109/IPDPSW59300.2023.00070
(1.81 MB)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)
“Mixed Precision Algebraic Multigrid on GPUs,”
Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
DOI: 10.1007/978-3-031-30442-2_9
“MPI Continuations And How To Invoke Them,”
Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.
DOI: 10.1007/978-3-031-18046-010.1007/978-3-031-18046-0_5
“PAQR: Pivoting Avoiding QR factorization,”
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.
DOI: 10.1109/IPDPS54959.2023.00040
“Parallel Symbolic Cholesky Factorization,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624253
“Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,”
2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTERWorkshops61457.2023.00028




