Publications
Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
DOI: 10.5281/zenodo.3860804
(28.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
DOI: 10.5281/zenodo.3477618
(8.31 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Context Identifier Allocation in Open MPI,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
(490.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Data Movement Interfaces to Support Dataflow Runtimes,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.
(210.94 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design and Implementation for FFT-ECP on Distributed Accelerated Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.
(3.19 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design for a Soft Error Resilient Dynamic Task-based Runtime,”
ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
(2.61 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-12: University of Tennessee, August 2020.
(476.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Designing SLATE: Software for Linear Algebra Targeting Exascale,”
SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.
(2.8 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed Termination Detection for HPC Task-Based Environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An efficient distributed randomized solver with application to large dense linear systems,”
ICL Technical Report, no. ICL-UT-12-02, July 2012.
(626.26 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.
(5.93 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Empirical View of SLATE Algorithms on Scalable Hybrid System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019.
(441.16 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Evaluation and Design of FFT for Distributed Accelerated Systems,”
ECP WBS 2.3.3.09 Milestone Report, no. FFT-ECP ST-MS-10-1216: Innovative Computing Laboratory, University of Tennessee, October 2018.
(7.53 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.
(500.99 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
FFT Benchmark Performance Experiments on Systems Targeting Exascale,”
ICL Technical Report, no. ICL-UT-22-02, March 2022.
(5.87 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,”
ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.
(9.71 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
FFT-ECP Implementation Optimizations and Features Phase,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.
(4.14 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,”
PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.
(1.49 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)