Publications
AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,”
Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.
“Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors,”
ACM Transactions on Mathematical Software, vol. 49, issue 3, pp. 1 - 29, September 2023.
“Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,”
The International Journal of High Performance Computing Applications, March 2023.
“Communications in Computer and Information ScienceAccelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and SimulationPreconditioners for Batched Iterative Linear Solvers on GPUs
, vol. 169075, Cham, Springer Nature Switzerland, pp. 38 - 53, 2023.
O(N) distributed direct factorization of structured dense matrices using runtime systems,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
“HPC Forecast,”
Communications of the ACM, vol. 664648, issue 2, pp. 82 - 90, January 2023.
“ Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, August 2023.
“Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)

Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,”
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, 2023.
(1.81 MB)
“
Mixed Precision Algebraic Multigrid on GPUs,”
Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
“MPI Continuations And How To Invoke Them,”
Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.
“Revisiting I/O bandwidth-sharing strategies for HPC applications,”
INRIA Research Report, no. RR-9502: INRIA, March 2023.
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
“Synchronizing MPI Processes in Space and Time,”
EUROMPI '23: 30th European MPI Users' Group Meeting, Bristol, United Kingdom, ACM, September 2023.
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
“Using Additive Modifications in LU Factorization Instead of Pivoting,”
37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.
(624.18 KB)
“
Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,”
Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.
“Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022.
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“
Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“
Approximate Computing for Scientific Applications,”
Approximate Computing Techniques, 322: Springer International Publishing, pp. 415 - 465, January 2022.
“Batch QR Factorization on GPUs: Design, Optimization, and Tuning,”
Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.
“Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations,”
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.
(1.26 MB)
“
Checkpointing à la Young/Daly: An Overview,”
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, Noida, India, ACM Press, pp. 701-710, August 2022.
(639.77 KB)
“
Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“
Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms,”
International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022.
“Composition of Algorithmic Building Blocks in Template Task Graphs,”
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), Dallas, TX, USA, IEEE, January 2023, 2022.
(1015.99 KB)
“
Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
(13.52 MB)
“
Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
(749.82 KB)
“
Computational science for a better future,”
Journal of Computational Science, vol. 62, pp. 101745, July 2022.
“Deep Gaussian process with multitask and transfer learning for performance optimization,”
2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1-7, September 2022.
“Evaluating Data Redistribution in PaRSEC,”
IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 8, pp. 1856-1872, August 2022.
(3.19 MB)
“
Evaluations of molecular modeling and machine learning for predictive capabilities in binding of lanthanum and actinium with carboxylic acids,”
Journal of Radioanalytical and Nuclear Chemistry, December 2022.
“The evolution of mathematical software,”
Communications of the ACM, vol. 65227, issue 12, pp. 66 - 72, December 2022.
“Extending MAGMA Portability with OneAPI,”
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), Dallas, TX, November 2022.
(999.19 KB)
“
Extending MAGMA Portability with OneAPI
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), ACM Student Research Competition, November 2022.
(1.33 MB)

FFT Benchmark Performance Experiments on Systems Targeting Exascale,”
ICL Technical Report, no. ICL-UT-22-02, March 2022.
(5.87 MB)
“
A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.
(1.03 MB)
“
Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment,”
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.
“Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,”
ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.
(4.2 MB)
“
Ginkgo—A math library designed for platform portability,”
Parallel Computing, vol. 111, pp. 102902, February 2022.
“Implicit Actions and Non-blocking Failure Recovery with MPI,”
2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), Dallas, TX, USA, IEEE, January 2023, 2022.
“Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,”
2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.
“Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs,”
2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 152-160, September 2022.
“Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“
OpenMP application experiences: Porting to accelerated nodes,”
Parallel Computing, vol. 109, March 2022.
“Optimal Checkpointing Strategies for Iterative Applications,”
IEEE Transactions on Parallel Distributed Systems, vol. 33, issue 3, pp. 507-522, March 2022.
(1.47 MB)
“
PAQR: Pivoting Avoiding QR factorization,”
ICL Technical Report, no. ICL-UT-22-06, June 2022.
(364.85 KB)
“
Performance Analysis of Parallel FFT on Large Multi-GPU Systems,”
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, IEEE, August 2022.
“Performance Application Programming Interface,”
Accelerated Computing with HIP: Sun, Baruah and Kaeli, December 2022.
“