Publications
Automating the Large-Scale Collection and Analysis of Performance,”
5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.
(511.6 KB)
“Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,”
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.
(502.49 KB)
“Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems,”
26th ACM International Conference on Supercomputing (ICS 2012), San Servolo Island, Venice, Italy, ACM, June 2012.
(5.88 MB)
“A Scalable Framework for Heterogeneous GPU-Based Clusters,”
The 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012), Pittsburgh, PA, USA, ACM, June 2012.
(3.39 MB)
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems,”
SC'10, New Orleans, LA, ACM SIGARCH/ IEEE Computer Society, November 2010.
(3.42 MB)
“Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.
(5.93 MB)
“Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems,”
University of Tennessee Computer Science Technical Report, vol. –10-653, April 2010.
(3.42 MB)
“