Publications
Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors,”
Parallel Computing, vol. 75, pp. 41–60, July 2018.
DOI: 10.1016/j.parco.2018.03.004 (2.56 MB)
“Computing Dense Tensor Decompositions with Optimal Dimension Trees,”
Algorithmica, vol. 81, issue 5, pp. 2092–2121, May 2019.
DOI: 10.1007/s00453-018-0525-3 (638.4 KB)
“Computational science for a better future,”
Journal of Computational Science, vol. 62, pp. 101745, July 2022.
DOI: 10.1016/j.jocs.2022.101745
“Computational Benefit of GPU Optimization for Atmospheric Chemistry Modeling,”
Journal of Advances in Modeling Earth Systems, vol. 10, issue 8, pp. 1952–1969, August 2018.
DOI: 10.1029/2018MS001276 (3.4 MB)
“Computation at the Cutting Edge of Science,”
Journal of Computational Science, June 2024.
DOI: 10.1016/j.jocs.2024.102379
“Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
DOI: 10.1002/cpe.6515 (749.82 KB)
“Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
DOI: 10.1177/10943420221115140 (13.52 MB)
“Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,”
Parallel Computing, vol. 85, pp. 1–12, July 2019.
DOI: 10.1016/j.parco.2019.02.002 (865.18 KB)
“Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms,”
International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022.
DOI: 10.15803/ijnc.12.1_26
“Communication-Avoiding Symmetric-Indefinite Factorization,”
SIAM Journal on Matrix Analysis and Application, vol. 35, issue 4, pp. 1364-1406, July 2014.
(593.18 KB)
“Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,”
The International Journal of High Performance Computing Applications, March 2023.
DOI: 10.1177/10943420231166365
“Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.
(754.6 KB)
“Collecting Performance Data with PAPI-C,”
Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.
DOI: 10.1007/978-3-642-11261-4_11 (4.45 MB)
“The co-evolution of computational physics and high-performance computing,”
Nature Reviews Physics, August 2024.
DOI: 10.1038/s42254-024-00750-z
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing, vol. 35, pp. 38-53, 00 2009.
(274.74 KB)
“Checkpointing Workflows for Fail-Stop Errors,”
IEEE Transactions on Computers, vol. 67, issue 8, pp. 1105–1120, August 2018.
“Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“Callback-based completion notification using MPI Continuations,”
Parallel Computing, vol. 21238566, issue 0225, pp. 102793, May Jan.
DOI: 10.1016/j.parco.2021.102793
“Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors,”
ACM Transactions on Mathematical Software, vol. 49, issue 3, pp. 1 - 29, September 2023.
DOI: 10.1145/3595178
“Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on IaaS Cloud platforms,”
Concurrency and Computation: Practice and Experience, vol. 33, no. 17, pp. e6065, 2021.
DOI: 10.1002/cpe.6065 (1.99 MB)
“A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
DOI: http://dx.doi.org/10.1016/j.jpdc.2013.05.008 (1.08 MB)
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,”
ICCS 2012, Omaha, NE, June 2012.
(608.95 KB)
“Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.
DOI: 10.1177/1094342018778123 (1.29 MB)
“Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,”
Journal of Computational Science, vol. 26, pp. 226–236, May 2018.
DOI: 10.1016/j.jocs.2018.01.005 (3.73 MB)
“