Publications
A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,”
ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.
(3.96 MB)
“Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation,”
Euro-Par 2009, Lecture Notes in Computer Science, vol. 5704/2009, Delft, The Netherlands, Springer Berlin / Heidelberg, pp. 334-344, August 2009.
(312.74 KB)
“The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,”
International Journal of High Performance Computing Applications (to appear), July 2009.
(203.04 KB)
“Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,”
SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.
(906.39 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
DOI: 10.1007/978-3-642-01970-8_89 (236.02 KB)
“A Note on Auto-tuning GEMM for GPUs,”
9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.
DOI: 10.1007/978-3-642-01970-8_89 (236.02 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,”
Journal of Physics: Conference Series, vol. 180, 00 2009.
(119.37 KB)
“Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
, Portland, OR, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09), November 2009.
(3.53 MB)
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,”
Parallel Computing, vol. 35, pp. 138-150, 00 2009.
(591.16 KB)
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
““,”
8th International Conference on Computational Science (ICCS), Proceedings Parts I, II, and III, Lecture Notes in Computer Science, vol. 5101, Krakow, Poland, Springer Berlin, January 2008.
DARPA's HPCS Program: History, Models, Tools, Languages,”
in Advances in Computers, vol. 72: Elsevier, January 2008.
(3.61 MB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.
(500.99 KB)
“High Performance GridRPC Middleware,”
Recent developments in Grid Technology and Applications: Nova Science Publishers, 00 2008.
(923.06 KB)
“Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,”
Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.
(229.2 KB)
“Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,”
Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.
(480.47 KB)
“Decision Trees and MPI Collective Algorithm Selection Problem,”
Euro-Par 2007, Rennes, France, Springer, pp. 105–115, August 2007.
(552.94 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.
(308.39 KB)
“Optimal Routing in Binomial Graph Networks,”
The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.
“Performance Analysis of MPI Collective Operations,”
Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.
(1018.28 KB)
“Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,”
Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
DOI: 10.1007/978-3-540-72586-2_115 (145.84 KB)
“Self-Healing in Binomial Graph Networks,”
2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.
(322.39 KB)
“Flexible collective communication tuning architecture applied to Open MPI,”
2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.
(206.58 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.
(308.39 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
ICL Technical Report, no. ICL-UT-06-11, 00 2006.
(308.39 KB)
“Scalable Fault Tolerant Protocol for Parallel Runtime Environments,”
2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00 2006.
(149.07 KB)
“Self-Healing Network for Scalable Fault Tolerant Runtime Environments,”
DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.
(162.83 KB)
“A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering,”
Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006), no. ICL-UT-05-06, Rhodes Island, Greece, IEEE Computer Society, April 2006.
(1.02 MB)
“Analysis and Optimization of Yee_Bench using Hardware Performance Counters,”
Proceedings of Parallel Computing 2005 (ParCo), Malaga, Spain, January 2005.
(72.27 KB)
“Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,”
Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.
(172.86 KB)
“Fault Tolerant High Performance Computing by a Coding Approach,”
Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, January 2005.
(209.37 KB)
“NetSolve: Grid Enabling Scientific Computing Environments,”
Grid Computing and New Frontiers of High Performance Processing, no. 14: Elsevier, 00 2005.
(425 KB)
“PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data,”
European Conference on Parallel Processing (Euro-Par 2005), Monte de Caparica, Portugal, Springer, September 2005.
DOI: 10.1007/11549468_1 (205.45 KB)
“Performance Analysis of MPI Collective Operations,”
Cluster Computing Journal (to appear), January 2005.
(1018.28 KB)
“Performance Analysis of MPI Collective Operations,”
4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.
(1018.28 KB)
“Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.
(144.86 KB)
“Cray X1 Evaluation Status Report,”
Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.
(817.33 KB)
“Cray X1 Evaluation Status Report,”
Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.
(817.33 KB)
“Cray X1 Evaluation Status Report,”
Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.
(817.33 KB)
“Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,”
International Conference on Computational Science, Poland, Springer Verlag, June 2004.
DOI: 10.1007/978-3-540-25944-2_35 (88.31 KB)
“Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,”
Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 2004.
(548.38 KB)
“Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing,”
International Journal for High Performance Applications and Supercomputing (to appear), April 2004.
(186.9 KB)
“Computational Science — ICCS 2003,”
Lecture Notes in Computer Science, vol. 2657-2660, ICCS 2003, International Conference. Melbourne, Australia, Springer-Verlag, Berlin, June 2003.
“Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover,”
Special Issue on Biological Applications of Genetic and Evolutionary Computation (submitted), March 2003.
(438.68 KB)
“