Publications

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, and Y-H. Tsai, “Ginkgo: A High Performance Numerical Linear Algebra Library,” Journal of Open Source Software, vol. 5, issue 52, August 2020.

(721.84 KB)

Anzt, H., T. Gruetzmacher, E. S. Quintana-Orti, and F. Scheidegger, “High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,” 8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.

Anzt, H., N. Beams, T. Cojean, F. Göbel, T. Grützmacher, A. Kashi, P. Nayak, T. Ribizel, and Y. M. Tsai, Gingko: A Sparse Linear Algebrea Library for HPC : 2021 ECP Annual Meeting, April 2021.

(893.04 KB)

Anzt, H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J. Dongarra, “Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,” 2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.

(1.9 MB)

Anzt, H., S. Tomov, and J. Dongarra, “Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers,” Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.

(2.29 MB)

Anzt, H., E. Chow, J. Saak, and J. Dongarra, “Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.

(565.34 KB)

Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, “Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.

(1.46 MB)

Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, “Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.

(341.54 KB)

Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., “MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.

(1.28 MB)

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.

(1.83 MB)

Anzt, H., T. Huckle, J. Bräckle, and J. Dongarra, “Incomplete Sparse Approximate Inverses for Parallel Preconditioning,” Parallel Computing, vol. 71, pp. 1–22, January 2018.

(1.24 MB)

Thiyagalingam, J., G. von Laszewski, J. Yin, M. Emani, J. Papay, G. Barrett, P. Luszczek, A. Tsaris, C. Kirkpatrick, F. Wang, et al., “AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,” Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016.

(1.47 MB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, vol. 81, pp. 131-146, January 2019.

(1.9 MB)

Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, “Variable-Size Batched Condition Number Calculation on GPUs,” SBAC-PAD, Lyon, France, September 2018.

(509.3 KB)

Anzt, H., P. Luszczek, J. Dongarra, and V. Heuveline, “GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.

(662.98 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.

(764.02 KB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,” Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015.

(1.99 MB)

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.

(285.28 KB)

Anzt, H., P. Luszczek, J. Dongarra, and V. Heuveline, “GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.

(662.98 KB)

Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, “Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Flexible Batched Sparse Matrix-Vector Product on GPUs,” 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), Denver, CO, ACM Press, November 2017.

(583.4 KB)

Anzt, H., A. Huebl, and X. S. Li, “Then and Now: Improving Software Portability, Productivity, and 100× Performance,” Computing in Science & Engineering, pp. 1 - 10, April 2024.

Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.

(1.46 MB)

Anzt, H., Y. Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak, E. S. Quintana-Orti, Y. M. Tsai, and W. Wang, “Towards Continuous Benchmarking,” Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.

(1.51 MB)

Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.

(1.05 MB)

Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, “Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.

(2.08 MB)

Anzt, H., and J. Dongarra, “A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs,” SBAC-PAD, Lyon, France, IEEE, 2018.

(237.68 KB)

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.

(608.95 KB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,” Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.

(552.62 KB)

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.

(16.8 MB)

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, Y-H. Tsai, and J. Dongarra, Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(699 KB)

Anzt, H., T. Cojean, and E. Kuhn, “Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,” Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.

Anzt, H., G. Flegar, T. Gruetzmacher, and E. S. Quintana-Orti, “Toward a Modular Precision Ecosystem for High-Performance Computing,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.

(1.93 MB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.

(1.98 MB)

Anzt, H., E. Chow, and J. Dongarra, “ParILUT - A New Parallel Threshold ILU,” SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.

(19.26 MB)

Antoniu, G., A. Costan, O. Marcu, M. S. Pérez, N. Stojanovic, R. M. Badia, M. Vázquez, S. Girona, M. Beck, T. Moore, et al., “A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019.

(5.82 MB)

Angskun, T., G. Bosilca, B. Vander Zanden, and J. Dongarra, “Optimal Routing in Binomial Graph Networks,” The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.

Angskun, T., G. Bosilca, and J. Dongarra, “Self-Healing in Binomial Graph Networks,” 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.

(322.39 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Scalable Fault Tolerant Protocol for Parallel Runtime Environments,” 2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00 2006.

(149.07 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Self-Healing Network for Scalable Fault-Tolerant Runtime Environments,” Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.

(1.54 MB)

Angskun, T., G. Bosilca, and J. Dongarra, “Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,” Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.

(480.47 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Self-Healing Network for Scalable Fault Tolerant Runtime Environments,” DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.

(162.83 KB)

Angskun, T., G. Bosilca, G. Fagg, J. Pjesivac–Grbovic, and J. Dongarra, “Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,” Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.

Andersson, U., and P. Mucci, “Analysis and Optimization of Yee_Bench using Hardware Performance Counters,” Proceedings of Parallel Computing 2005 (ParCo), Malaga, Spain, January 2005.

(72.27 KB)

Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, et al., “LAPACK Users' Guide, 3rd ed.,” Philadelphia: Society for Industrial and Applied Mathematics, January 1999.

Alvaro, W., J. Kurzak, and J. Dongarra, “Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,” University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.

(500.99 KB)

Main menu

Pages