Publications
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating Linear Algebra with MAGMA
, Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.
(35.27 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched
, Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.
(9.29 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs
, Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.
(748 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Dense Linear Algebra Solvers for Multicore with GPU Accelerators
, Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
(956.68 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation
, Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.
(17.25 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Does your tool support PAPI SDEs yet?
, Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
(3.09 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
The Future of Computing: Software Libraries
, Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
(6.76 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
How to Build Your Own Deep Neural Network
: PEARC20, July 2020.
(18.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra
: NVIDIA Webinar, June 2010.
Linear Algebra Prepara.on for Emergent Neural Network Architectures: MAGMA, BLAS, and Batched GPU Computing
, Virtual, LAPENNA Workshop, November 2021.
(17.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors)
, Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
(15.41 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems
, San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
(9.23 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
(4.69 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: Evolution and Revolution
, Knoxville, TN, ICL Lunch Talk Seminar, July 2021.
(8.88 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs
, San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Tutorial
, Atlanta, GA, Keeneland Workshop, February 2012.
(2.47 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
(7.84 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Algebra on GPU and Multicore Architectures
, Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
(49.27 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
PAPI: Counting outside the Box
, Barcelona, Spain, 8th JLESC Meeting, April 2018.
PAPI's new Software-Defined Events for in-depth Performance Analysis
, Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
(3.14 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
PAPI's New Software-Defined Events for In-Depth Performance Analysis
, Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
Power-Aware HPC on Intel Xeon Phi KNL Processors
, Frankfurt, Germany, ISC High Performance (ISC17), Intel Booth Presentation, June 2017.
(5.87 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers
, Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
(2.34 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library
, Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.
(16.19 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
SLATE Tutorial
, Houston, TX, 2020 ECP Annual Meeting, February 2020.
(12.14 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance
, Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Understanding Native Event Semantics
, Knoxville, TN, 9th JLESC Workshop, April 2019.
(2.33 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
What it Takes to keep PAPI Instrumental for the HPC Community
, Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
(3.29 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Is your scheduling good? How would you know?
, Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
(2.5 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Reinventing High Performance Computing: Challenges and Opportunities,”
ICL Technical Report, no. ICL-UT-22-03, March 2022.
(1.36 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Revisiting I/O bandwidth-sharing strategies for HPC applications,”
INRIA Research Report, no. RR-9502: INRIA, March 2023.
“2016 Dense Linear Algebra Software Packages Survey,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-16-744 / LAWN 290: University of Tennessee, September 2016.
(366.43 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.
(2.37 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,”
University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.
(618.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,”
University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.
(266.54 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithm-based Fault Tolerance for Dense Matrix Factorizations,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.
(865.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)