Publications
Revisiting Matrix Product on Master-Worker Platforms,”
International Journal of Foundations of Computer Science (IJFCS), vol. 19, no. 6, pp. 1317-1336, December 2008.
(248.66 KB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 9, pp. 1-11, January 2008.
(751.57 KB)
“Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,”
PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim Norway, May 2008.
“Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-08-615 (also LAPACK Working Note 200), January 2008.
(289.93 KB)
“State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,”
Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
University of Tennessee Computer Science Technical Report, UT-CS-08-632 (also LAPACK Working Note 210), January 2008.
(606.41 KB)
“A Tribute to Gene Golub,”
Computing in Science and Engineering: IEEE, pp. 5, January 2008.
“Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,”
ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.
(364.48 KB)
“Accelerating Scientific Computations with Mixed Precision Algorithms,”
Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.
(402.69 KB)
“Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.
(2.37 MB)
“Accelerating Time-To-Solution for Computational Science and Engineering,”
SciDAC Review, 00 2009.
(739.11 KB)
“Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.
(313.55 KB)
“Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,”
IEEE Cluster 2009, New Orleans, August 2009.
(395.53 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing, vol. 35, pp. 38-53, 00 2009.
(274.74 KB)
“Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,”
2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00 2009.
(515.63 KB)
““Computational Science – ICCS 2009, Proceedings of the 9th International Conference,”
Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.
Computing the Conditioning of the Components of a Linear Least-squares Solution,”
Numerical Linear Algebra with Applications, vol. 16, no. 7, pp. 517-533, 00 2009.
(374.97 KB)
“Constructing resiliant communication infrastructure for runtime environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.
(463.71 KB)
“Constructing Resilient Communication Infrastructure for Runtime Environments,”
ParCo 2009, Lyon France, September 2009.
“Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,”
PPAM 2009, Poland, September 2009.
“Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,”
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.
(502.49 KB)
“Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,”
Submitted to Transaction on Parallel and Distributed Systems, December 2009.
(464.23 KB)
“Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.
(488.24 KB)
“Grid Computing applied to the Boundary Element Method,”
Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, vol. 27, no. :104203/9027, Stirlingshire, UK, Civil-Comp Press, 00 2009.
“Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing,”
IEEE Transactions on Computers, vol. 58, issue 11, pp. 1512-1524, November 2009.
(1.81 MB)
“A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,”
International Journal of High Performance Computing Applications (to appear), July 2009.
(203.04 KB)
“I/O Performance Analysis for the Petascale Simulation Code FLASH,”
ISC'09, Hamburg, Germany, June 2009.
(88.88 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
(236.02 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)
Numerical Linear Algebra on Hybrid Architectures: Recent Developments in the MAGMA Project
, Portland, Oregon, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(1.41 MB)
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,”
Parallel Computing, vol. 35, pp. 138-150, 00 2009.
(591.16 KB)
“Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.
(208.16 KB)
“Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“Performance evaluation for petascale quantum simulation tools,”
Proceedings of CUG09, Atlanta, GA, May 2009.
(1.09 MB)
“The Problem with the Linpack Benchmark Matrix Generator,”
International Journal of High Performance Computing Applications, vol. 23, no. 1, pp. 5-14, 00 2009.
(136.41 KB)
“QR Factorization for the CELL Processor,”
Scientific Programming (to appear), 00 2009.
(234.02 KB)
“Reasons for a Pessimistic or Optimistic Message Logging Protocol in MPI Uncoordinated Failure Recovery,”
CLUSTER '09, New Orleans, IEEE, August 2009.
(191.36 KB)
“Recent Trends in High Performance Computing,”
in Birth of Numerical Analysis (to appear), 00 2009.
“Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
ACM TOMS (to appear), 00 2009.
(896.03 KB)
“Reliability and Performance Modeling and Analysis for Grid Computing,”
in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.
(200.57 KB)
“Reliability and Performance Modeling and Analysis for Grid Computing,”
in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.
(200.57 KB)
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.
(716.18 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
Concurrency Practice and Experience (to appear), 00 2009.
(716.18 KB)
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“