Publications
Design and Implementation for FFT-ECP on Distributed Accelerated Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.
(3.19 MB)
“Data Movement Interfaces to Support Dataflow Runtimes,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.
(210.94 KB)
“DAGuE: A generic distributed DAG engine for high performance computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.
(830.85 KB)
“Context Identifier Allocation in Open MPI,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
(490.89 KB)
“Constructing resiliant communication infrastructure for runtime environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.
(463.71 KB)
“Condition Numbers of Gaussian Random Matrices,”
University of Tennessee Computer Science Department Technical Report, vol. –04-539, 00 2005.
(186.46 KB)
“Computing the Conditioning of the Components of a Linear Least Squares Solution,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-604, (also LAPACK Working Note 193), January 2007.
(374.97 KB)
“A Comparison of Parallel Solvers for General Narrow Banded Linear Systems (LAPACK Working Note 142),”
University of Tennessee Computer Science Technical Report, no. UT-CS-99-414, January 1999.
(304.96 KB)
“A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow Banded Linear Systems II (LAPACK Working Note 143),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-415, January 1999.
(174.46 KB)
“Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“clMAGMA: High Performance Dense Linear Algebra with OpenCL,”
University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.
(526.6 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.
(274.74 KB)
“CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
(8.31 MB)
CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
(28.12 MB)
“The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“On block-asynchronous execution on GPUs,”
LAPACK Working Note, no. 291, November 2016.
(1.05 MB)
“Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),”
University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.
(373.69 KB)
“ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“An Asynchronous Algorithm on NetSolve Global Computing System,”
PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.
(377.33 KB)
“Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,”
Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.
(226.9 KB)
“Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.
(1.65 MB)
“Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,”
University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.
(358.98 KB)
“Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“Algorithm-based Fault Tolerance for Dense Matrix Factorizations,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.
(865.79 KB)
“Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,”
University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.
(266.54 KB)
“Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,”
University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.
(618.53 KB)
“Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.
(2.37 MB)
“Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“2016 Dense Linear Algebra Software Packages Survey,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-16-744 / LAWN 290: University of Tennessee, September 2016.
(366.43 KB)
“Reinventing High Performance Computing: Challenges and Opportunities,”
ICL Technical Report, no. ICL-UT-22-03, March 2022.
(1.36 MB)
“Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
Is your scheduling good? How would you know?
, Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
(2.5 MB)
What it Takes to keep PAPI Instrumental for the HPC Community
, Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
(3.29 MB)
Understanding Native Event Semantics
, Knoxville, TN, 9th JLESC Workshop, April 2019.
(2.33 MB)
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance
, Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
SLATE Tutorial
, Houston, TX, 2020 ECP Annual Meeting, February 2020.
(12.14 MB)
SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library
, Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.
(16.19 MB)
Power-Aware HPC on Intel Xeon Phi KNL Processors
, Frankfurt, Germany, ISC High Performance (ISC17), Intel Booth Presentation, June 2017.
(5.87 MB)