Publications
Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,”
Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.
DOI: 10.1002/cpe.4460
(341.54 KB)
“
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications,”
Future Generation Computer Systems, vol. 160, pp. 359 - 374, November 2024.
DOI: 10.1016/j.future.2024.06.004
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,”
ICCS 2012, Omaha, NE, June 2012.
(608.95 KB)
“
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
DOI: http://dx.doi.org/10.1016/j.jpdc.2013.05.008
(1.08 MB)
“
Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
DOI: 10.1177/10943420221115140
(13.52 MB)
“
Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
DOI: 10.1002/cpe.6515
(749.82 KB)
“
A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,”
Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019.
DOI: 10.1002/cpe.5418
“Evaluating Asynchronous Schwarz Solvers on GPUs,”
International Journal of High Performance Computing Applications, August 2020.
DOI: 10.1177/1094342020946814
“Fine-grained Bit-Flip Protection for Relaxation Methods,”
Journal of Computational Science, November 2016.
DOI: 10.1016/j.jocs.2016.11.013
(1.47 MB)
“
Ginkgo: A High Performance Numerical Linear Algebra Library,”
Journal of Open Source Software, vol. 5, issue 52, August 2020.
DOI: 10.21105/joss.02260
(721.84 KB)
“
Ginkgo - A math library designed to accelerate Exascale Computing Project science applications,”
The International Journal of High Performance Computing Applications, August 2024.
DOI: 10.1177/10943420241268323
“Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,”
ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.
DOI: 10.1145/3480935
(4.2 MB)
“
Ginkgo—A math library designed for platform portability,”
Parallel Computing, vol. 111, pp. 102902, February 2022.
DOI: 10.1016/j.parco.2022.102902
“GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.
(662.98 KB)
“
Incomplete Sparse Approximate Inverses for Parallel Preconditioning,”
Parallel Computing, vol. 71, pp. 1–22, January 2018.
DOI: 10.1016/j.parco.2017.10.003
(1.24 MB)
“
Linear Algebra Software for Large-Scale Accelerated Multicore Computing,”
Acta Numerica, vol. 25, pp. 1-160, May 2016.
DOI: 10.1017/S0962492916000015
“Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.
DOI: 10.1145/3380930
(5.67 MB)
“
Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,”
The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.
DOI: 10.1177/1094342016646844
(2.08 MB)
“
PAPI Software-Defined Events for in-Depth Performance Analysis,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.
(442.39 KB)
“
Parallel Selection on GPUs,”
Parallel Computing, vol. 91, March 2020, 2019.
DOI: 10.1016/j.parco.2019.102588
(1.43 MB)
“
ParILUT - A New Parallel Threshold ILU,”
SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.
DOI: 10.1137/16M1079506
(19.26 MB)
“
On the performance and energy efficiency of sparse linear algebra on GPUs,”
International Journal of High Performance Computing Applications, October 2016.
DOI: 10.1177/1094342016672081
(1.19 MB)
“
Preconditioned Krylov Solvers on GPUs,”
Parallel Computing, June 2017.
DOI: 10.1016/j.parco.2017.05.006
(1.19 MB)
“
Providing performance portable numerics for Intel GPUs,”
Concurrency and Computation: Practice and Experience, vol. 17, October 2022.
DOI: 10.1002/cpe.7400
(3.16 MB)
“
Resiliency in numerical algorithm design for extreme scale simulations,”
The International Journal of High Performance Computing Applications, vol. 36371337212766180823, issue 2, pp. 251 - 285, March 2022.
DOI: 10.1177/10943420211055188
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
DOI: 10.1002/cpe.7871
“A survey of numerical linear algebra methods utilizing mixed-precision arithmetic,”
The International Journal of High Performance Computing Applications, vol. 35, no. 4, pp. 344–369, 2021.
DOI: 10.1177/10943420211003313
“Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
DOI: 10.1109/MCSE.2024.3387302
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
DOI: 10.1016/j.future.2023.07.024
“Toward a Modular Precision Ecosystem for High-Performance Computing,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.
DOI: 10.1177/1094342019846547
(1.93 MB)
“
Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
DOI: 10.1002/pamm.201900490
“Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,”
Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.
DOI: 10.1002/spe.v53.110.1002/spe.3041
“Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
DOI: 10.1016/j.jpdc.2018.04.017
(273.53 KB)
“
Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006
(1.9 MB)
“
With Extreme Computing, the Rules Have Changed,”
Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017.
DOI: 10.1109/MCSE.2017.48
(485.34 KB)
“
Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)

Gingko: A Sparse Linear Algebrea Library for HPC
: 2021 ECP Annual Meeting, April 2021.
(893.04 KB)

Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)

Earth Virtualization Engines - A Technical Perspective
, September 2023.
Interface for Sparse Linear Algebra Operations
, November 2024.
DOI: 10.48550/arXiv.2411.13259
Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)

MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
