Publications
Distributed-Memory Lattice H-Matrix Factorization,”
The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046–1063, August 2019.
DOI: 10.1177/1094342019861139 (1.14 MB)
“Does your tool support PAPI SDEs yet?
, Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
(3.09 MB)
An Empirical View of SLATE Algorithms on Scalable Hybrid System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019.
(441.16 KB)
“Evaluation of Directive-Based Performance Portable Programming Models,”
International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182.
DOI: http://dx.doi.org/10.1504/IJHPCN.2017.10009064 (1.12 MB)
“Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,”
PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.
(4.51 MB)
“Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,”
33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(675.5 KB)
“FFT-ECP Fast Fourier Transform
, Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
(1.51 MB)
FFT-ECP Implementation Optimizations and Features Phase,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.
(4.14 MB)
“A Generic Approach to Scheduling and Checkpointing Workflows,”
International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1255-1274, November 2019.
DOI: 10.1177/1094342019866891 (555.01 KB)
“A Generic Approach to Scheduling and Checkpointing Workflows,”
Int. Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1255-1274, 2019.
(555.01 KB)
“Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,”
ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(260.69 KB)
“Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs,”
IEEE Cluster, Albuquerque, NM, IEEE, September 2019.
(220.84 KB)
“GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,”
EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.
(2.25 MB)
“Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,”
ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.
(1016.52 KB)
“Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation,”
Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019.
(1.6 MB)
“Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,”
IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.
(470.21 KB)
“Least Squares Solvers for Distributed-Memory Machines with GPU Accelerators,”
ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117–126, June 2019.
DOI: https://dl.acm.org/doi/abs/10.1145/3330345.3330356 (1.63 MB)
“Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators,”
Euro-Par 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019.
DOI: 10.1007/978-3-030-29400-7_35
“Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,”
Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.
DOI: 10.1016/j.future.2018.09.041 (1.16 MB)
“MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
DOI: 10.13140/RG.2.2.14906.64961 (7.84 MB)
MagmaDNN: Accelerated Deep Learning Using MAGMA,”
Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.
(1.09 MB)
“MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,”
ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.
DOI: 10.1007/978-3-030-34356-9_37 (1.37 MB) (8.72 MB)
“Massively Parallel Automated Software Tuning,”
48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.
DOI: 10.1145/3337821.3337908 (911.88 KB)
“Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,”
International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(480.73 KB)
“New Robust ScaLAPACK Routine for Computing the QR Factorization with Column Pivoting,”
LAPACK Working Note, no. LAWN 296, ICL-UT-19-14: University of Tennessee, October 2019.
(454.83 KB)
“OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,”
Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.
(1.48 MB)
“Optimizing Batch HGEMM on Small Sizes Using Tensor Cores
, San Jose, CA, GPU Technology Conference (GTC), March 2019.
(2.47 MB)
PAPI Software-Defined Events for in-Depth Performance Analysis,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.
(442.39 KB)
“PAPI's new Software-Defined Events for in-depth Performance Analysis
, Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
(3.14 MB)
Parallel Selection on GPUs,”
Parallel Computing, vol. 91, March 2020, 2019.
DOI: 10.1016/j.parco.2019.102588 (1.43 MB)
“ParILUT – A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPS.2019.00033 (505.95 KB)
“Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,”
Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.
(429.55 KB)
“Performance of Asynchronous Optimized Schwarz with One-sided Communication,”
Parallel Computing, vol. 86, pp. 66-81, August 2019.
DOI: 10.1016/j.parco.2019.05.004 (3.09 MB)
“PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,”
ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019.
DOI: 10.1145/3264491 (7.5 MB)
“Progressive Optimization of Batched LU Factorization on GPUs,”
IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019.
(299.38 KB)
“Race to Exascale,”
Computing in Science and Engineering, vol. 21, issue 1, pp. 4-5, March 2019.
DOI: 10.1109/MCSE.2018.2882574 (106.97 KB)
“Replication is More Efficient Than You Think,”
The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.
(975.69 KB)
“Reservation Strategies for Stochastic Jobs,”
33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019), Rio de Janeiro, Brazil, IEEE Computer Society Press, May 2019.
(808.93 KB)
“Runtime Level Failure Detection and Propagation in HPC Systems,”
European MPI Users' Group Meeting (EuroMPI '19), Zürich, Switzerland, ACM, September 2019.
DOI: 10.1145/3343211.3343225 (1.11 MB)
“Scheduling Independent Stochastic Tasks on Heterogeneous Cloud Platforms,”
IEEE Cluster 2019, Albuquerque, New Mexico, IEEE Computer Society Press, September 2019.
(651 KB)
“Scheduling Independent Stochastic Tasks under Deadline and Budget Constraints,”
International Journal of High Performance Computing Applications, vol. 34, issue 2, pp. 246-264, June 2019.
DOI: 10.1177/1094342019852135 (427.92 KB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,”
International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.
DOI: 10.1145/3295500.3356223 (2.01 MB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library
, Denver, CO, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), November 2019.
(16.19 MB)
SLATE Developers' Guide,”
SLATE Working Notes, no. 11, ICL-UT-19-02: Innovative Computing Laboratory, University of Tennessee, December 2019.
(1.68 MB)
“SLATE Mixed Precision Performance Report,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-03: University of Tennessee, April 2019.
(1.04 MB)
“SLATE Working Note 12: Implementing Matrix Inversions,”
SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019.
(1.95 MB)
“SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers,”
SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.
(3.47 MB)
“Software-Defined Events through PAPI,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00069 (446.41 KB)
“Solving Linear Diophantine Systems on Parallel Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019.
DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354 (802.97 KB)
“System Software for Many-Core and Multi-Core Architectures,”
Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, Singapore, Springer Singapore, pp. 59–75, 2019.
DOI: 10.1007/978-981-13-1924-2_4
“