Publications
An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“Incomplete Sparse Approximate Inverses for Parallel Preconditioning,”
Parallel Computing, vol. 71, pp. 1–22, January 2018.
DOI: 10.1016/j.parco.2017.10.003
(1.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The International Exascale Software Project Roadmap,”
International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.
DOI: 10.1177/1094342010391989
(719.74 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Investigating Power Capping toward Energy-Efficient Scientific Applications,”
Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018.
DOI: 10.1002/cpe.4485
(1.2 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community,”
IEEE Computing in Science & Engineering, vol. 13, issue 5, pp. 90-95, August 2011.
DOI: 10.1109/MCSE.2011.83
(932.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Linear Algebra Software for Large-Scale Accelerated Multicore Computing,”
Acta Numerica, vol. 25, pp. 1-160, May 2016.
DOI: 10.1017/S0962492916000015
“Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.
DOI: 10.1145/3380930
(5.67 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Look Back on 30 Years of the Gordon Bell Prize,”
International Journal of High Performance Computing and Networking, vol. 31, issue 6, pp. 469–484, 2017.
“LU Factorization for Accelerator-Based Systems,”
IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.
(234.86 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242
(1.08 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
DOI: 10.1177/1094342020938421
“Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,”
Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.
DOI: 10.1016/j.jpdc.2020.07.001
(1.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
DOI: 10.1098/rspa.2020.0110
(2.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures,”
Journal of Physics: Conference Series, issue 16, pp. 277-282, June 2005.
DOI: 10.1088/1742-6596/16/1/038
(476.64 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)