Publications
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
(7.84 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Tutorial
, Atlanta, GA, Keeneland Workshop, February 2012.
(2.47 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs
, San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: Evolution and Revolution
, Knoxville, TN, ICL Lunch Talk Seminar, July 2021.
(8.88 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
(4.69 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems
, San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
(9.23 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors)
, Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
(15.41 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Linear Algebra Prepara.on for Emergent Neural Network Architectures: MAGMA, BLAS, and Batched GPU Computing
, Virtual, LAPENNA Workshop, November 2021.
(17.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra
: NVIDIA Webinar, June 2010.
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
How to Build Your Own Deep Neural Network
: PEARC20, July 2020.
(18.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
The Future of Computing: Software Libraries
, Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
(6.76 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Does your tool support PAPI SDEs yet?
, Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
(3.09 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation
, Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.
(17.25 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Dense Linear Algebra Solvers for Multicore with GPU Accelerators
, Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
(956.68 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs
, Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.
(748 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched
, Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.
(9.29 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating Linear Algebra with MAGMA
, Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.
(35.27 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
The HPL Benchmark: Past, Present & Future
, ISC High Performance, Frankfurt, Germany, July 2016.
(3.41 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
hipMAGMA v2.0
: Zenodo, July 2020.
hipMAGMA v1.0
: Zenodo, March 2020.
Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes
: arXiv, December 2023.
Exascale Computing and Big Data,”
Communications of the ACM, vol. 58, no. 7: ACM, pp. 56-68, July 2015.
(7.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Earth Virtualization Engines - A Technical Perspective
, September 2023.
CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)
: arXiv, February 2024.
xSDK4ECP: Extreme-scale Scientific Software Development Kit for ECP (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(1.54 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Using Quantized Integer in LU Factorization with Partial Pivoting (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(6.65 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption
, Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.
(3.01 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
(708.68 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Towards a High-Performance Tensor Algebra Package for Accelerators
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
(1.76 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Tensor Contractions using Optimized Batch GEMM Routines
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(1.64 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
A Standard for Batched BLAS Routines
, Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.
(1.93 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
SLATE: Software for Linear Algebra Targeting Exascale (POSTER)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(546.56 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators
, Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
(3.86 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
A Report of the MPI International Survey (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
PULSE: PAPI Unifying Layer for Software-Defined Events (Poster)
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
(1.86 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
The PLASMA Library on CORAL Systems and Beyond (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(550.86 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX) (Poster)
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, 20 2020.
(2.53 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
PAPI 5: Measuring Power, Energy, and the Cloud
, Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.
(78.39 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Optimizing Batch HGEMM on Small Sizes Using Tensor Cores
, San Jose, CA, GPU Technology Conference (GTC), March 2019.
(2.47 MB)
![application/pdf](/modules/file/icons/application-pdf.png)