Publications
Towards a Parallel Tile LDL Factorization for Multicore Architectures,”
ICL Technical Report, no. ICL-UT-11-03, Seattle, WA, April 2011.
(425.45 KB)
“Accelerating GPU Kernels for Dense Linear Algebra,”
Proc. of VECPAR'10, Berkeley, CA, June 2010.
(615.07 KB)
“Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,”
Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.
(1.39 MB)
“Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
Submitted to Concurrency and Computations: Practice and Experience, November 2010.
(1.65 MB)
“Autotuning Dense Linear Algebra Libraries on GPUs
, Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
(579.44 KB)
Can Hardware Performance Counters Produce Expected, Deterministic Results?,”
3rd Workshop on Functionality of Hardware Performance Monitoring, Atlanta, GA, December 2010.
(392.71 KB)
“Collecting Performance Data with PAPI-C,”
Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.
DOI: 10.1007/978-3-642-11261-4_11 (4.45 MB)
“Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,”
SIAM Journal on Scientific Computing (submitted), August 2010.
“Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,”
IEEE Transaction on Parallel and Distributed Systems (submitted), March 2010.
(3.75 MB)
“An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“An Improved MAGMA GEMM for Fermi GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-CS-10-655 (also LAPACK working note 227), July 2010.
(486.71 KB)
“An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra
: NVIDIA Webinar, June 2010.
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,”
First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.
(1.24 MB)
“OpenCL Evaluation for Numerical Linear Algebra Library Development,”
Symposium on Application Accelerators in High-Performance Computing (SAAHPC '10), Knoxville, TN, July 2010.
(2.69 MB)
“QR Factorization for the CELL Processor,”
Scientific Programming, vol. 17, no. 1-2, pp. 31-42, 00 2010.
(194.95 KB)
“QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,”
Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.
(468.17 KB)
“