Publications
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.
(1.65 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,”
Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.
(226.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analyzing PAPI Performance on Virtual Machines,”
ICL Technical Report, no. ICL-UT-13-02, August 2013.
(437.37 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Asynchronous Algorithm on NetSolve Global Computing System,”
PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.
(377.33 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),”
University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.
(373.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Determination of Matrix-Blocks,”
Lapack Working Note 151, University of Tennessee Computer Science Technical Report, no. UT-CS-01-458, January 2001.
(1.15 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
“BDEC Pathways to Convergence: Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-08: University of Tennessee, November 2017.
BDEC2 Platform White Paper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-11: University of Tennessee, September 2019.
(30.16 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,”
LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.
(1.53 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On block-asynchronous execution on GPUs,”
LAPACK Working Note, no. 291, November 2016.
(1.05 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
(28.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
(8.31 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Public release of CEED 2.0
: Zenodo, April 2019.
(4.98 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.
(274.74 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
clMAGMA: High Performance Dense Linear Algebra with OpenCL,”
University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.
(526.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Collection of Presentations from the BDEC2 Workshop in Kobe, Japan,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-09: University of Tennessee, Knoxville, February 2019.
(58.85 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Collection of White Papers from the BDEC2 Workshop in Bloomington, IN,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-15: University of Tennessee, Knoxville, November 2018.
(9.26 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019.
(5.82 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Collection of White Papers from the BDEC2 Workshop in San Diego, CA,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-13: University of Tennessee, October 2019.
(8.25 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On the Combination of Silent Error Detection and Checkpointing,”
UT-CS-13-710: University of Tennessee Computer Science Technical Report, June 2013.
(1.29 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow Banded Linear Systems II (LAPACK Working Note 143),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-415, January 1999.
(174.46 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Comparison of Parallel Solvers for General Narrow Banded Linear Systems (LAPACK Working Note 142),”
University of Tennessee Computer Science Technical Report, no. UT-CS-99-414, January 1999.
(304.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Computing the Conditioning of the Components of a Linear Least Squares Solution,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-604, (also LAPACK Working Note 193), January 2007.
(374.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Condition Numbers of Gaussian Random Matrices,”
University of Tennessee Computer Science Department Technical Report, vol. –04-539, 00 2005.
(186.46 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Constructing resiliant communication infrastructure for runtime environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.
(463.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Context Identifier Allocation in Open MPI,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
(490.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CUBE User Manual,”
ICL Technical Report, no. ICL-UT-04-01, February 2004.
(429.12 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
DAGuE: A generic distributed DAG engine for high performance computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.
(830.85 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Data Movement Interfaces to Support Dataflow Runtimes,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.
(210.94 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Dataflow Programming Paradigms for Computational Chemistry Methods,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-01, Knoxville, TN, University of Tennessee, May 2017.
“Design and Implementation for FFT-ECP on Distributed Accelerated Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.
(3.19 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design and Implementation of NetSolve using DCOM as the Remoting Layer,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-440, May 2000.
(65.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design for a Soft Error Resilient Dynamic Task-based Runtime,”
ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
(2.61 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-12: University of Tennessee, August 2020.
(476.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Designing LU-QR hybrid solvers for performance and stability,”
University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.
(4.11 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)