Publications
Export 190 results:
Filters: First Letter Of Last Name is A [Clear All Filters]
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators
, Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
(3.86 MB)

Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)

LU Factorization for Accelerator-Based Systems,”
IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.
(234.86 KB)
“
Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs,”
in GPU Computing Gems, Jade Edition, vol. 2: Elsevier, pp. 473-484, 00 2011.
“Resiliency in numerical algorithm design for extreme scale simulations,”
The International Journal of High Performance Computing Applications, vol. 36371337212766180823, issue 2, pp. 251 - 285, March 2022.
DOI: 10.1177/10943420211055188
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“
A Collection of White Papers from the BDEC2 Workshop in Bloomington, IN,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-15: University of Tennessee, Knoxville, November 2018.
(9.26 MB)
“
SLATE Performance Improvements: QR and Eigenvalues,”
SLATE Working Notes, no. 17, ICL-UT-21-02, April 2021.
(2 MB)
“
Compressed basis GMRES on high-performance graphics processing units,”
The International Journal of High Performance Computing Applications, May 2022.
DOI: 10.1177/10943420221115140
(13.52 MB)
“
Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,”
Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.
DOI: 10.1002/cpe.6515
(749.82 KB)
“
Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
DOI: 10.1002/cpe.7871
“Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“
A Collection of White Papers from the BDEC2 Workshop in San Diego, CA,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-13: University of Tennessee, October 2019.
(8.25 MB)
“
Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.
(500.99 KB)
“
Analysis and Optimization of Yee_Bench using Hardware Performance Counters,”
Proceedings of Parallel Computing 2005 (ParCo), Malaga, Spain, January 2005.
(72.27 KB)
“
A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019.
(5.82 MB)
“
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)

Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006
(1.9 MB)
“
Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
DOI: 10.1145/3324989.3325719
(1.51 MB)
“
Are we Doing the Right Thing? – A Critical Analysis of the Academic HPC Community,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00122
(622.32 KB)
“
Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)

Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
DOI: 10.1002/pamm.201900490
“Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.
DOI: 10.1145/3380930
(5.67 MB)
“