Publications
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,”
2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.
(1.77 MB)
“A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
DOI: 10.1002/cpe.3306 (783.45 KB)
“Towards a High-Performance Tensor Algebra Package for Accelerators
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
(1.76 MB)
Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“2016 Dense Linear Algebra Software Packages Survey,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-16-744 / LAWN 290: University of Tennessee, September 2016.
(366.43 KB)
“Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.
(4.29 MB)
Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes
, San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.
(480.51 KB)
Context Identifier Allocation in Open MPI,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
(490.89 KB)
“On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,”
The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(708.62 KB)
“Fine-grained Bit-Flip Protection for Relaxation Methods,”
Journal of Computational Science, November 2016.
DOI: 10.1016/j.jocs.2016.11.013 (1.47 MB)
“Heterogeneous Streaming,”
The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.
(2.73 MB)
“High-performance Matrix-matrix Multiplications of Very Small Matrices,”
22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
“High-Performance Tensor Contractions for GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(2.36 MB)
“Linear Algebra Software for Large-Scale Accelerated Multicore Computing,”
Acta Numerica, vol. 25, pp. 1-160, May 2016.
DOI: 10.1017/S0962492916000015
“MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.
(929.79 KB)
“A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
DOI: 10.1093/nsr/nwv084 (393.55 KB)
“On the performance and energy efficiency of sparse linear algebra on GPUs,”
International Journal of High Performance Computing Applications, October 2016.
DOI: 10.1177/1094342016672081 (1.19 MB)
“Performance, Design, and Autotuning of Batched GEMM for GPUs,”
The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.
(1.27 MB)
“