Publications
Performance Optimization and Modeling of Blocked Sparse Kernels,”
ICL Technical Report, no. ICL-UT-04-05, 00 2004.
(229.58 KB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
ICL Technical Report, no. ICL-UT-04-04, January 2004.
(241.36 KB)
“
Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,”
Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.
(172.86 KB)
“
Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,”
International Journal of Computational Science and Engineering (to appear), January 2005.
(428.21 KB)
“
Fault Tolerant High Performance Computing by a Coding Approach,”
Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, January 2005.
(209.37 KB)
“
Hash Functions for Datatype Signatures in MPI,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples), Italy, Springer-Verlag Berlin, pp. 76-83, September 2005.
(304.2 KB)
“
NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures,”
Journal of Physics: Conference Series, issue 16, pp. 277-282, June 2005.
(476.64 KB)
“
On the Parallel Solution of Large Industrial Wave Propagation Problems,”
Journal of Computational Acoustics (to appear), January 2005.
(1.08 MB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 00 2005.
(241.36 KB)
“
Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,”
Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.
(157.48 KB)
“
Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,”
International Journal of Computational Science and Engineering, vol. 2, no. 3/4, pp. 205-212, 00 2006.
(428.21 KB)
“
Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.
(221.39 KB)
“
Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.
(221.39 KB)
“
The Impact of Multicore on Math Software,”
PARA 2006, Umea, Sweden, June 2006.
(223.53 KB)
“
Performance evaluation of eigensolvers in nano-structure computations,”
IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.
(120.61 KB)
“
Predicting the electronic properties of 3D, million-atom semiconductor nanostructure architectures,”
J. Phys.: Conf. Ser. 46, vol. :101088/1742-6596/46/1/040, pp. 292-298, January 2006.
(644.1 KB)
“
Prospectus for the Next LAPACK and ScaLAPACK Libraries,”
PARA 2006, Umea, Sweden, June 2006.
(460.11 KB)
“
Prospectus for the Next LAPACK and ScaLAPACK Libraries,”
PARA 2006, Umea, Sweden, June 2006.
(460.11 KB)
“
Self Adapting Numerical Software SANS Effort,”
IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.
(357.53 KB)
“
Towards bulk based preconditioning for quantum dot computations,”
IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.
(172.46 KB)
“
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.
(274.74 KB)
“
Computing the Conditioning of the Components of a Linear Least Squares Solution,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-604, (also LAPACK Working Note 193), January 2007.
(374.97 KB)
“
Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,”
in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.
(260.18 KB)
“
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
In High Performance Computing and Grids in Action (to appear), Amsterdam, IOS Press, 00 2007.
(122.01 KB)
“
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,”
International Journal of High Performance Computer Applications (to appear), August 2007.
(157.4 KB)
“
Parallel Tiled QR Factorization for Multicore Architectures,”
University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190), 00 2007.
(277.92 KB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
SIAM SISC (to appear), May 2007.
(241.36 KB)
“
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“
Computing the Conditioning of the Components of a Linear Least Squares Solution,”
VECPAR '08, High Performance Computing for Computational Science, Toulouse, France, January 2008.
(374.97 KB)
“
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.
(92.95 KB)
“
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.
(92.95 KB)
“
Interior State Computation of Nano Structures,”
PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.
(137.12 KB)
“
Parallel Tiled QR Factorization for Multicore Architectures,”
Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.
(277.92 KB)
“
The Problem with the Linpack Benchmark Matrix Generator,”
University of Tennessee Computer Science Technical Report, UT-CS-08-621 (also LAPACK Working Note 206), June 2008.
(136.41 KB)
“
Accelerating Scientific Computations with Mixed Precision Algorithms,”
Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.
(402.69 KB)
“
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.
(313.55 KB)
“
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing, vol. 35, pp. 38-53, 00 2009.
(274.74 KB)
“
Computing the Conditioning of the Components of a Linear Least-squares Solution,”
Numerical Linear Algebra with Applications, vol. 16, no. 7, pp. 517-533, 00 2009.
(374.97 KB)
“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)

Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“The Problem with the Linpack Benchmark Matrix Generator,”
International Journal of High Performance Computing Applications, vol. 23, no. 1, pp. 5-14, 00 2009.
(136.41 KB)
“
Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
ACM TOMS (to appear), 00 2009.
(896.03 KB)
“
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing (to appear), 00 2010.
(612.23 KB)
“
Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“
Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“
QCG-OMPI: MPI Applications on Grids,”
Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.
(1.48 MB)
“
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,”
24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.
(261.55 KB)
“
Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,”
ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.
(896.03 KB)
“
Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.
(896.03 KB)
“