Publications
SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,”
International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.
DOI: 10.1145/3295500.3356223 (2.01 MB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library
, Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.
(16.19 MB)
SLATE Developers' Guide,”
SLATE Working Notes, no. 11, ICL-UT-19-02: Innovative Computing Laboratory, University of Tennessee, December 2019.
(1.68 MB)
“SLATE Mixed Precision Performance Report,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-03: University of Tennessee, April 2019.
(1.04 MB)
“SLATE Working Note 12: Implementing Matrix Inversions,”
SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019.
(1.95 MB)
“SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers,”
SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.
(3.47 MB)
“Software-Defined Events through PAPI,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00069 (446.41 KB)
“Solving Linear Diophantine Systems on Parallel Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019.
DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354 (802.97 KB)
“Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
DOI: 10.1145/3324989.3325719 (1.51 MB)
“Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs,”
ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(523.87 KB) (3.42 MB)
“Understanding Native Event Semantics
, Knoxville, TN, 9th JLESC Workshop, April 2019.
(2.33 MB)
Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006 (1.9 MB)
“What it Takes to keep PAPI Instrumental for the HPC Community
, Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
(3.29 MB)
What it Takes to keep PAPI Instrumental for the HPC Community,”
1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.
(50.57 KB)
“Is your scheduling good? How would you know?
, Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
(2.5 MB)
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
DOI: 10.2172/1631812
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.
(188.51 KB)
“CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
DOI: 10.5281/zenodo.3860804 (28.12 MB)
“Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,”
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020.
DOI: 10.1109/IPDPSW50202.2020.00127 (1.33 MB)
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part III,”
Lecture Notes in Computer Science, 1, no. 12139: Springer International Publishing, pp. 648, June 2020.
DOI: 10.1007/978-3-030-50420-5
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part IV,”
Lecture Notes in Computer Science, 1, no. 12140: Springer International Publishing, pp. 668, June 2020.
DOI: 10.1007/978-3-030-50423-6
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part V,”
Lecture Notes in Computer Science, 1, no. 12141: Springer International Publishing, pp. 618, June 2020.
DOI: 10.1007/978-3-030-50426-7
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I,”
Lecture Notes in Computer Science, 1, no. 12137: Springer International Publishing, pp. 707, June 2020.
DOI: 10.1007/978-3-030-50371-0
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VII,”
Lecture Notes in Computer Science, 1, no. 12143: Springer International Publishing, pp. 775, June 2020.
DOI: 10.1007/978-3-030-50436-6
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VI,”
Lecture Notes in Computer Science, 1, no. 12142: Springer International Publishing, pp. 667, June 2020.
DOI: 10.1007/978-3-030-50433-5
“Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part II,”
Lecture Notes in Computer Science, 1, no. 12138: Springer International Publishing, pp. 697, June 2020.
DOI: 10.1007/978-3-030-50417-5
“Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
2020 IEEE High Performance Extreme Computing Virtual Conference: IEEE, September 2020.
(476.36 KB)
“Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-12: University of Tennessee, August 2020.
(476.36 KB)
“DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
DTE: PaRSEC Systems and Interfaces (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(840.54 KB)
Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,”
2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.
(1.9 MB)
“Exa-PAPI: The Exascale Performance API with Modern C++
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(556.78 KB)
Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,”
Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020.
DOI: 10.1145/3394277.3401846 (2.71 MB)
“FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,”
ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.
(9.71 MB)
“Flexible Data Redistribution in a Task-Based Runtime System,”
IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00032 (354.8 KB)
“Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,”
PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.
(1.49 MB)
“Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)
HAN: A Hierarchical AutotuNed Collective Communication Framework,”
IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.
(764.05 KB)
“Harnessing the Computing Continuum for Programming Our World,”
Fog Computing: Theory and Practice: John Wiley & Sons, Inc., 2020.
DOI: 10.1002/9781119551713.ch7 (1.4 MB)
“heFFTe: Highly Efficient FFT for Exascale,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.
DOI: 10.1007/978-3-030-50371-0_19 (2.62 MB)
“heFFTe: Highly Efficient FFT for Exascale (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(6.2 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(1.54 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
: NVIDIA GPU Technology Conference (GTC2020), October 2020.
(866.88 KB)
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,”
2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.
(1.3 MB)
“hipMAGMA v1.0
: Zenodo, March 2020.
DOI: 10.5281/zenodo.3908549
hipMAGMA v2.0
: Zenodo, July 2020.
DOI: 10.5281/zenodo.3928667
Improving the Performance of the GMRES Method using Mixed-Precision Techniques,”
Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.
(600.33 KB)
“