Publications
SLATE Port to AMD and Intel Platforms,”
SLATE Working Notes, no. 16, ICL-UT-21-01, April 2021.
(890.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic,”
The International Journal of High Performance Computing Applications, vol. 35, no. 4, pp. 344–369, 2021.
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
“ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
(28.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,”
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020.
(1.33 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part III,”
Lecture Notes in Computer Science, 1, no. 12139: Springer International Publishing, pp. 648, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I,”
Lecture Notes in Computer Science, 1, no. 12137: Springer International Publishing, pp. 707, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part IV,”
Lecture Notes in Computer Science, 1, no. 12140: Springer International Publishing, pp. 668, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part II,”
Lecture Notes in Computer Science, 1, no. 12138: Springer International Publishing, pp. 697, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VII,”
Lecture Notes in Computer Science, 1, no. 12143: Springer International Publishing, pp. 775, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VI,”
Lecture Notes in Computer Science, 1, no. 12142: Springer International Publishing, pp. 667, June 2020.
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part V,”
Lecture Notes in Computer Science, 1, no. 12141: Springer International Publishing, pp. 618, June 2020.
“Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
2020 IEEE High Performance Extreme Computing Virtual Conference: IEEE, September 2020.
(476.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-12: University of Tennessee, August 2020.
(476.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
DTE: PaRSEC Systems and Interfaces (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(840.54 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,”
2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.
(1.9 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exa-PAPI: The Exascale Performance API with Modern C++
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(556.78 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,”
Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020.
(2.71 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,”
ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.
(9.71 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Flexible Data Redistribution in a Task-Based Runtime System,”
IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.
(354.8 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,”
PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.
(1.49 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
HAN: A Hierarchical AutotuNed Collective Communication Framework,”
IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.
(764.05 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Harnessing the Computing Continuum for Programming Our World,”
Fog Computing: Theory and Practice: John Wiley & Sons, Inc., 2020.
(1.4 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
heFFTe: Highly Efficient FFT for Exascale,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.
(2.62 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
heFFTe: Highly Efficient FFT for Exascale (Poster)
: NVIDIA GPU Technology Conference (GTC2020), October 2020.
(866.88 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(1.54 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(6.2 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,”
2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.
(1.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
hipMAGMA v1.0
: Zenodo, March 2020.
hipMAGMA v2.0
: Zenodo, July 2020.
Improving the Performance of the GMRES Method using Mixed-Precision Techniques,”
Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.
(600.33 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Sciences at Exascale,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.
(1.09 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Integrating Deep Learning in Domain Sciences at Exascale,”
2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.
“Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, Springer, Cham, June 2020.
(702.38 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.
(5.67 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
“Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,”
Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.
(1.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
(2.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.
(1.03 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Numerical Algorithms for High-Performance Computational Science,”
Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020.
(724.37 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,”
Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.
“Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,”
Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.
“