Publications

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, Y-H. Tsai, and J. Dongarra, Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(699 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.

(578.11 KB)

Anzt, H., T. Cojean, and E. Kuhn, “Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,” Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.

Anzt, H., E. Chow, and J. Dongarra, “Iterative Sparse Triangular Solves for Preconditioning,” EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015.

(322.36 KB)

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.

(608.95 KB)

Anzt, H., G. Flegar, T. Gruetzmacher, and E. S. Quintana-Orti, “Toward a Modular Precision Ecosystem for High-Performance Computing,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.

(1.93 MB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,” 46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.

(1.08 MB)

Anzt, H., S. Tomov, and J. Dongarra, “On the performance and energy efficiency of sparse linear algebra on GPUs,” International Journal of High Performance Computing Applications, October 2016.

(1.19 MB)

Anzt, H., and G. Flegar, “Are we Doing the Right Thing? – A Critical Analysis of the Academic HPC Community,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.

(622.32 KB)

Anzt, H., T. Gruetzmacher, E. S. Quintana-Orti, and F. Scheidegger, “High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,” 8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.

Anzt, H., S. Tomov, and J. Dongarra, “Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers,” Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.

(2.29 MB)

Anzt, H., E. Chow, J. Saak, and J. Dongarra, “Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.

(565.34 KB)

Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, “Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.

(1.46 MB)

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.

(1.83 MB)

Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., “MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.

(1.28 MB)

Anzt, H., T. Cojean, C. Yen-Chen, J. Dongarra, G. Flegar, P. Nayak, S. Tomov, Y. M. Tsai, and W. Wang, “Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,” ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.

(5.67 MB)

Anzt, H., T. Cojean, G. Flegar, F. Göbel, T. Grützmacher, P. Nayak, T. Ribizel, Y. Mike Tsai, and E. S. Quintana-Ortí, “Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,” ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.

(4.2 MB)

Anzt, H., T. Huckle, J. Bräckle, and J. Dongarra, “Incomplete Sparse Approximate Inverses for Parallel Preconditioning,” Parallel Computing, vol. 71, pp. 1–22, January 2018.

(1.24 MB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016.

(1.47 MB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, vol. 81, pp. 131-146, January 2019.

(1.9 MB)

Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, “Variable-Size Batched Condition Number Calculation on GPUs,” SBAC-PAD, Lyon, France, September 2018.

(509.3 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.

(764.02 KB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,” Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015.

(1.99 MB)

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.

(285.28 KB)

Anzt, H., P. Luszczek, J. Dongarra, and V. Heuveline, “GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.

(662.98 KB)

Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, “Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.

Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.

(1.46 MB)

Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, “ParILUT – A Parallel Threshold ILU for GPUs,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(505.95 KB)

Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.

(2.03 MB)

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, and Y-H. Tsai, “Ginkgo: A High Performance Numerical Linear Algebra Library,” Journal of Open Source Software, vol. 5, issue 52, August 2020.

(721.84 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.

(1.08 MB)

Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.

(1.05 MB)

Anzt, H., N. Beams, T. Cojean, F. Göbel, T. Grützmacher, A. Kashi, P. Nayak, T. Ribizel, and Y. M. Tsai, Gingko: A Sparse Linear Algebrea Library for HPC : 2021 ECP Annual Meeting, April 2021.

(893.04 KB)

Anzt, H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J. Dongarra, “Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,” 2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.

(1.9 MB)

Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, “Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.

(2.08 MB)

Anzt, H., and J. Dongarra, “A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs,” SBAC-PAD, Lyon, France, IEEE, 2018.

(237.68 KB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,” Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.

(552.62 KB)

Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, “Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.

(341.54 KB)

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.

(16.8 MB)

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.

Thiyagalingam, J., G. von Laszewski, J. Yin, M. Emani, J. Papay, G. Barrett, P. Luszczek, A. Tsaris, C. Kirkpatrick, F. Wang, et al., “AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,” Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.

(1.98 MB)

Anzt, H., E. Chow, and J. Dongarra, “ParILUT - A New Parallel Threshold ILU,” SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.

(19.26 MB)

Anzt, H., M. Gates, J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Preconditioned Krylov Solvers on GPUs,” Parallel Computing, June 2017.

(1.19 MB)

Anzt, H., J. Dongarra, M. Gates, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Bringing High Performance Computing to Big Data Algorithms,” Handbook of Big Data Technologies: Springer, 2017.

(1.22 MB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for General Narrow Banded Linear Systems,” Parallel and Distributed Computing Practices, vol. 2, pp. 385-400, October 2002.

(304.96 KB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for General Narrow Banded Linear Systems (LAPACK Working Note 142),” University of Tennessee Computer Science Technical Report, no. UT-CS-99-414, January 1999.

(304.96 KB)