Publications
Multithreading for synchronization tolerance in matrix factorization,”
Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.
(577.73 KB)
“Accelerating Scientific Computations with Mixed Precision Algorithms,”
Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.
DOI: 10.1016/j.cpc.2008.11.005 (402.69 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing (to appear), 00 2010.
(612.23 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing, vol. 35, pp. 38-53, 00 2009.
(274.74 KB)
“Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
In High Performance Computing and Grids in Action (to appear), Amsterdam, IOS Press, 00 2007.
(122.01 KB)
“Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.
(92.95 KB)
“Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.
(221.39 KB)
“The Impact of Multicore on Math Software,”
PARA 2006, Umea, Sweden, June 2006.
(223.53 KB)
“Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,”
International Journal of High Performance Computer Applications (to appear), August 2007.
(157.4 KB)
“Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“Parallel Tiled QR Factorization for Multicore Architectures,”
Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.
(277.92 KB)
“The PlayStation 3 for High Performance Scientific Computing,”
Computing in Science and Engineering, pp. 80-83, January 2008.
(2.45 MB)
“Prospectus for the Next LAPACK and ScaLAPACK Libraries,”
PARA 2006, Umea, Sweden, June 2006.
(460.11 KB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 9, pp. 1-11, January 2008.
(751.57 KB)
“Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,”
ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.
(364.48 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.
(274.74 KB)
“Limitations of the Playstation 3 for High Performance Cluster Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-07-597 (Also LAPACK Working Note 185), 00 2007.
(171.01 KB)
“Parallel Tiled QR Factorization for Multicore Architectures,”
University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190), 00 2007.
(277.92 KB)
“Performance Optimization and Modeling of Blocked Sparse Kernels,”
ICL Technical Report, no. ICL-UT-04-05, 00 2004.
(229.58 KB)
“The PlayStation 3 for High Performance Scientific Computing,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.
(2.45 MB)
“SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3,”
University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595, 00 2007.
(1.74 MB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
UT Computer Science Technical Report (Also LAPACK Working Note 184), no. UT-CS-07-596, January 2007.
(751.57 KB)
“