Publications
High-Performance Tensor Contractions for GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(2.36 MB)
“Linear Algebra Software for Large-Scale Accelerated Multicore Computing,”
Acta Numerica, vol. 25, pp. 1-160, May 2016.
DOI: 10.1017/S0962492916000015
“MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.
(929.79 KB)
“A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
DOI: 10.1093/nsr/nwv084 (393.55 KB)
“On the performance and energy efficiency of sparse linear algebra on GPUs,”
International Journal of High Performance Computing Applications, October 2016.
DOI: 10.1177/1094342016672081 (1.19 MB)
“Performance, Design, and Autotuning of Batched GEMM for GPUs,”
The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.
(1.27 MB)
“Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(626.21 KB)
“Power Management and Event Verification in PAPI,”
Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016.
DOI: 10.1007/978-3-319-39589-0_4 (565.14 KB)
“A Standard for Batched BLAS Routines
, Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.
(1.93 MB)
Sunway TaihuLight Supercomputer Makes Its Appearance,”
National Science Review, vol. 3, issue 3, pp. 256-266, September 2016.
DOI: 10.1093/nsr/nww044 (292.11 KB)
“Accelerating NWChem Coupled Cluster through dataflow-based Execution,”
11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.
(452.82 KB)
“Design for a Soft Error Resilient Dynamic Task-based Runtime,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(2.31 MB)
“Hierarchical DAG scheduling for Hybrid Distributed Systems,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(1.11 MB)
“