Publications
Recent Trends in High Performance Computing,”
in Birth of Numerical Analysis (to appear), 00 2009.
“Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
ACM TOMS (to appear), 00 2009.
(896.03 KB)
“Reliability and Performance Modeling and Analysis for Grid Computing,”
in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.
(200.57 KB)
“Reliability and Performance Modeling and Analysis for Grid Computing,”
in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.
(200.57 KB)
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
Concurrency Practice and Experience (to appear), 00 2009.
(716.18 KB)
“Scheduling Linear Algebra Operations on Multicore Processors,”
University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.
(716.18 KB)
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-01, April 2009.
(887.54 KB)
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
““,”
8th International Conference on Computational Science (ICCS), Proceedings Parts I, II, and III, Lecture Notes in Computer Science, vol. 5101, Krakow, Poland, Springer Berlin, January 2008.
“,”
7th International parallel Processing and Applied Mathematics Conference, Lecture Notes in Comptuer Science, vol. 4967, Gdansk, Poland, Springer Berlin, January 2008.
“,”
15th European PVM/MPI Users' Group Meeting, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, vol. 5205, Dublin Ireland, Springer Berlin, January 2008.
Algorithm-Based Fault Tolerance for Fail-Stop Failures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.
(340.49 KB)
“Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“A Comparison of Search Heuristics for Empirical Code Optimization,”
The 3rd international Workshop on Automatic Performance Tuning, Tsukuba, Japan, October 2008.
(772.48 KB)
“Computing the Conditioning of the Components of a Linear Least Squares Solution,”
VECPAR '08, High Performance Computing for Computational Science, Toulouse, France, January 2008.
(374.97 KB)
“DARPA's HPCS Program: History, Models, Tools, Languages,”
in Advances in Computers, vol. 72: Elsevier, January 2008.
(3.61 MB)
“Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project)
, Austin, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08), November 2008.
(5.28 MB)
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.
(92.95 KB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.
(500.99 KB)
“How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination,”
in Beautiful Code Leading Programmers Explain How They Think (Chapter 14), pp. 243-282, January 2008.
(257 KB)
“HPCS Library Study Effort,”
University of Tennessee Computer Science Technical Report, UT-CS-08-617, January 2008.
(73.22 KB)
“The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“Interactive Grid-Access Using Gridsolve and Giggle,”
Computing and Informatics, vol. 27, no. 2, pp. 233-248,ISSN1335-9150, 00 2008.
(533.4 KB)
“Interior State Computation of Nano Structures,”
PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.
(137.12 KB)
“The LINPACK Benchmark: Past, Present, and Future,”
Concurrency: Practice and Experience, vol. 15, pp. 803-820, 00 2008.
(94.86 KB)
“Matrix Product on Heterogeneous Master Worker Platforms,”
2008 PPoPP Conference, Salt Lake City, Utah, January 2008.
“Netlib and NA-Net: Building a Scientific Computing Community,”
IEEE Annals of the History of Computing, vol. 30, no. 2, pp. 30-41, January 2008.
(352.71 KB)
“Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,”
University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.
(420.31 KB)
“Parallel Tiled QR Factorization for Multicore Architectures,”
Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.
(277.92 KB)
“Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.
(350.9 KB)
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, CS-89-85, January 2008.
(6.42 MB)
“PERI Auto-tuning,”
Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.
(873.75 KB)
“The PlayStation 3 for High Performance Scientific Computing,”
Computing in Science and Engineering, pp. 80-83, January 2008.
(2.45 MB)
“The PlayStation 3 for High Performance Scientific Computing,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.
(2.45 MB)
“The Problem with the Linpack Benchmark Matrix Generator,”
University of Tennessee Computer Science Technical Report, UT-CS-08-621 (also LAPACK Working Note 206), June 2008.
(136.41 KB)
“QR Factorization for the CELL Processor,”
University of Tennessee Computer Science Technical Report, UT-CS-08-616 (also LAPACK Working Note 201), May 2008.
(194.95 KB)
“Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,”
University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.
(896.03 KB)
“Redesigning the Message Logging Model for High Performance,”
International Supercomputer Conference (ISC 2008), Dresden, Germany, January 2008.
(622.1 KB)
“Request Sequencing: Enabling Workflow for Efficient Parallel Problem Solving in GridSolve,”
ICL Technical Report, no. ICL-UT-08-01, April 2008.
(1.64 MB)
“Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve,”
International Conference on Grid and Cooperative Computing (GCC 2008) (submitted), Shenzhen, China, October 2008.
(1.64 MB)
“Revisiting Matrix Product on Master-Worker Platforms,”
International Journal of Foundations of Computer Science (IJFCS), vol. 19, no. 6, pp. 1317-1336, December 2008.
(248.66 KB)
“Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 9, pp. 1-11, January 2008.
(751.57 KB)
“Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-08-615 (also LAPACK Working Note 200), January 2008.
(289.93 KB)
“