Publications
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
The co-evolution of computational physics and high-performance computing,”
Nature Reviews Physics, August 2024.
DOI: 10.1038/s42254-024-00750-z
“Computation at the Cutting Edge of Science,”
Journal of Computational Science, June 2024.
DOI: 10.1016/j.jocs.2024.102379
“Economical Quasi-Newton Unitary Optimization of Electronic Orbitals,”
Physical Chemistry Chemical Physics, December 2023, 2024.
DOI: 10.1039/D3CP05557D
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“Ginkgo - A math library designed to accelerate Exascale Computing Project science applications,”
The International Journal of High Performance Computing Applications, August 2024.
DOI: 10.1177/10943420241268323
“MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures,”
The International Journal of High Performance Computing Applications, June 2024.
DOI: 10.1177/10943420241261960
“Multi-GPU work sharing in a task-based dataflow programming model,”
Future Generation Computer Systems, vol. 156, pp. 313 - 324, July 2024.
DOI: 10.1016/j.future.2024.03.017
“Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload,”
The International Journal of High Performance Computing Applications, vol. 303, issue 136, September 2024.
DOI: 10.1177/10943420241281050
“PaRSEC: Scalability, flexibility, and hybrid architecture support for task-based applications in ECP,”
The International Journal of High Performance Computing Applications, October 2024.
DOI: 10.1177/10943420241290520
“A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?,”
Future Generation Computer Systems, July 2024.
DOI: 10.1016/j.future.2024.07.022
“Taking the MPI standard and the open MPI library to exascale,”
The International Journal of High Performance Computing Applications, July 2024.
DOI: 10.1177/10943420241265936
“Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
DOI: 10.1109/MCSE.2024.3387302
“XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,”
Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.
DOI: 10.1007/978-3-031-23220-610.1007/978-3-031-23220-6_4
“Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors,”
ACM Transactions on Mathematical Software, vol. 49, issue 3, pp. 1 - 29, September 2023.
DOI: 10.1145/3595178
“Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,”
The International Journal of High Performance Computing Applications, March 2023.
DOI: 10.1177/10943420231166365
“Direct Determination of Optimal Real-Space Orbitals for Correlated Electronic Structure of Molecules,”
Journal of Chemical Theory and Computation, vol. 19, issue 20, pp. 7230 - 7241, October 2023.
DOI: 10.1021/acs.jctc.3c00732
“O(N) distributed direct factorization of structured dense matrices using runtime systems,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
DOI: 10.1145/3605573.3605606
“Earth Virtualization Engines - A Technical Perspective
, September 2023.
Elastic deep learning through resilient collective operations,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3626080
“Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes
: arXiv, December 2023.
GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624247
“HPC Forecast: Cloudy and Uncertain,”
Communications of the ACM, vol. 66, issue 2, pp. 82 - 90, January 2023.
DOI: 10.1145/3552309
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,”
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.
DOI: 10.1109/IPDPSW59300.2023.00070 (1.81 MB)
“Mixed Precision Algebraic Multigrid on GPUs,”
Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
DOI: 10.1007/978-3-031-30442-2_9
“MPI Continuations And How To Invoke Them,”
Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.
DOI: 10.1007/978-3-031-18046-010.1007/978-3-031-18046-0_5
“PAQR: Pivoting Avoiding QR factorization,”
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.
DOI: 10.1109/IPDPS54959.2023.00040
“Parallel Symbolic Cholesky Factorization,”
SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624253
“Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,”
2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTERWorkshops61457.2023.00028
“Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
DOI: 10.1007/978-3-031-23606-810.1007/978-3-031-23606-8_3
“Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion,”
2023 IEEE International Conference on Cluster Computing (CLUSTER), Santa Fe, NM, USA, IEEE, November 2023.
DOI: 10.1109/CLUSTER52292.2023.00035
“Revisiting I/O bandwidth-sharing strategies for HPC applications,”
INRIA Research Report, no. RR-9502: INRIA, March 2023.
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
DOI: 10.1002/cpe.7871
“Synchronizing MPI Processes in Space and Time,”
EUROMPI '23: 30th European MPI Users' Group Meeting, Bristol, United Kingdom, ACM, September 2023.
DOI: 10.1145/3615318.3615325
“Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators,”
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624248
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
DOI: 10.1016/j.future.2023.07.024
“Using Additive Modifications in LU Factorization Instead of Pivoting,”
37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.
DOI: 10.1145/3577193.3593731 (624.18 KB)
“Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,”
Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.
DOI: 10.1002/spe.v53.110.1002/spe.3041
“When to checkpoint at the end of a fixed-length reservation?,”
Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop, Denver, United States, August 2023.
“Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.
DOI: 10.1109/TPDS.2021.3084071
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“Approximate Computing for Scientific Applications,”
Approximate Computing Techniques, 322: Springer International Publishing, pp. 415 - 465, January 2022.
DOI: 10.1007/978-3-030-94705-7_14
“Batch QR Factorization on GPUs: Design, Optimization, and Tuning,”
Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.
DOI: 10.1007/978-3-031-08751-6_5
“Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations,”
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.
DOI: 10.1109/IPDPS53621.2022.00024 (1.26 MB)
“Checkpointing à la Young/Daly: An Overview,”
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, Noida, India, ACM Press, pp. 701-710, August 2022.
DOI: 10.1145/3549206 (639.77 KB)
“