Publications
Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project)
, Austin, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08), November 2008.
(5.28 MB)
Extending MAGMA Portability with OneAPI
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), ACM Student Research Competition, November 2022.
(1.33 MB)
FFT-ECP Fast Fourier Transform
, Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
(1.51 MB)
GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,”
EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.
(2.25 MB)
“Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(2.96 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(1.54 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(6.2 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
: NVIDIA GPU Technology Conference (GTC2020), October 2020.
(866.88 KB)
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
(2.28 MB)
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
(2.55 MB)
MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR)
, Washington, DC, NSF PI Meeting, Poster, April 2018.
DOI: 10.6084/m9.figshare.6174143.v3 (2.4 MB)
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)
Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)
Optimizing Batch HGEMM on Small Sizes Using Tensor Cores
, San Jose, CA, GPU Technology Conference (GTC), March 2019.
(2.47 MB)
Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators
, Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
(3.86 MB)
A Standard for Batched BLAS Routines
, Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.
(1.93 MB)
Tensor Contractions using Optimized Batch GEMM Routines
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(1.64 MB)
Towards a High-Performance Tensor Algebra Package for Accelerators
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
(1.76 MB)
Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption
, Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.
(3.01 MB)
hipMAGMA v1.0
: Zenodo, March 2020.
DOI: 10.5281/zenodo.3908549
hipMAGMA v2.0
: Zenodo, July 2020.
DOI: 10.5281/zenodo.3928667
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
: 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
(499.51 KB)
Accelerating Linear Algebra with MAGMA
, Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.
(35.27 MB)
Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched
, Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.
(9.29 MB)
Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
Dense Linear Algebra Solvers for Multicore with GPU Accelerators
, Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
(956.68 KB)
The Future of Computing: Software Libraries
, Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
(6.76 MB)
How to Build Your Own Deep Neural Network
: PEARC20, July 2020.
(18.8 MB)
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra
: NVIDIA Webinar, June 2010.
Linear Algebra Prepara.on for Emergent Neural Network Architectures: MAGMA, BLAS, and Batched GPU Computing
, Virtual, LAPENNA Workshop, November 2021.
(17.8 MB)
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors)
, Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
(15.41 MB)
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems
, San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
(9.23 MB)
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
(4.69 MB)
MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs
, San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
(11.12 MB)
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
DOI: 10.13140/RG.2.2.14906.64961 (7.84 MB)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
Matrix Algebra on GPU and Multicore Architectures
, Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
(49.27 MB)
Power-Aware HPC on Intel Xeon Phi KNL Processors
, Frankfurt, Germany, ISC High Performance (ISC17), Intel Booth Presentation, June 2017.
(5.87 MB)
Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.
(2.37 MB)
“Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“