Publications
Export 63 results:
Filters: Author is Ahmad Abdelfattah [Clear All Filters]
Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes
, San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.
(480.51 KB)

CEED ECP Milestone Report: Public release of CEED 2.0
: Zenodo, April 2019.
DOI: 10.5281/zenodo.2641316
(4.98 MB)

CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
DOI: 10.5281/zenodo.3477618
(8.31 MB)

C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“
C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“
Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,”
Journal of Computational Science, vol. 26, pp. 226–236, May 2018.
DOI: 10.1016/j.jocs.2018.01.005
(3.73 MB)
“
Batched Matrix Computations on Hardware Accelerators Based on GPUs,”
2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.
(9.36 MB)
“
Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.
(1.37 MB)
“
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.
DOI: 10.1109/TPDS.2018.2842785
(2.53 MB)
“
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Parallel Computing, vol. 81, pp. 1–21, January 2019.
DOI: 10.1016/j.parco.2018.10.003
(3.27 MB)
“
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“
Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched
, Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.
(9.29 MB)

Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.
(4.29 MB)
