Publications
The Impact of Multicore on Computational Science Software,”
CTWatch Quarterly, vol. 3, issue 1, February 2007.
“The Impact of Multicore on Math Software,”
PARA 2006, Umea, Sweden, June 2006.
(223.53 KB)
“
An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“Incomplete Sparse Approximate Inverses for Parallel Preconditioning,”
Parallel Computing, vol. 71, pp. 1–22, January 2018.
DOI: 10.1016/j.parco.2017.10.003
(1.24 MB)
“
The International Exascale Software Project Roadmap,”
International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.
DOI: 10.1177/1094342010391989
(719.74 KB)
“
An international survey on MPI users,”
Parallel Computing, vol. 108, December 2021.
DOI: 10.1016/j.parco.2021.102853
(1.49 MB)
“
Investigating Power Capping toward Energy-Efficient Scientific Applications,”
Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018.
DOI: 10.1002/cpe.4485
(1.2 MB)
“
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community,”
IEEE Computing in Science & Engineering, vol. 13, issue 5, pp. 90-95, August 2011.
DOI: 10.1109/MCSE.2011.83
(932.57 KB)
“
libCEED: Fast algebra for high-order element-based discretizations,”
Journal of Open Source Software, vol. 6, no. 63, pp. 2945, 2021.
DOI: 10.21105/joss.02945
“Linear Algebra Software for Large-Scale Accelerated Multicore Computing,”
Acta Numerica, vol. 25, pp. 1-160, May 2016.
DOI: 10.1017/S0962492916000015
“Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.
DOI: 10.1145/3380930
(5.67 MB)
“
Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,”
Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.
DOI: 10.1016/j.future.2018.09.041
(1.16 MB)
“
A Look Back on 30 Years of the Gordon Bell Prize,”
International Journal of High Performance Computing and Networking, vol. 31, issue 6, pp. 469–484, 2017.
“LU Factorization for Accelerator-Based Systems,”
IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.
(234.86 KB)
“
LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242
(1.08 MB)
“
MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures,”
The International Journal of High Performance Computing Applications, June 2024.
DOI: 10.1177/10943420241261960
“MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
DOI: 10.1177/1094342020938421
“