Publications
Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations,”
2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, IEEE, April 2013.
(307.24 KB)
“PAPI 5: Measuring Power, Energy, and the Cloud
, Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.
(78.39 KB)
Measuring Energy and Power with PAPI,”
International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.
DOI: 10.1109/ICPPW.2012.39 (146.79 KB)
“PAPI-V: Performance Monitoring for Virtual Machines,”
CloudTech-HPC 2012, Pittsburgh, PA, September 2012.
DOI: 10.1109/ICPPW.2012.29 (2.69 MB)
“Power Aware Computing on GPUs,”
SAAHPC '12 (Best Paper Award), Argonne, IL, July 2012.
(658.06 KB)
“Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,”
International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.
(467.18 KB)
“Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,”
6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.
(114.73 KB)
“Power-aware Computing on GPGPUs
, Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
(2.89 MB)
Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,”
International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.
(479.49 KB)
“User-Defined Events for Hardware Performance Monitoring,”
Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.
DOI: 10.1016/j.procs.2011.04.229 (361.76 KB)
“Empirical Performance Tuning of Dense Linear Algebra Software,”
in Performance Tuning of Scientific Applications (to appear), 00 2010.
“Performance Evaluation for Petascale Quantum Simulation Tools,”
Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.
“Scalability Study of a Quantum Simulation Code,”
PARA 2010, Reykjavik, Iceland, June 2010.
“Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,”
IEEE Cluster 2009, New Orleans, August 2009.
(395.53 KB)
“Capturing and Analyzing the Execution Control Flow of OpenMP Applications,”
International Journal of Parallel Programming, vol. 37, no. 3, pp. 266-276, 00 2009.
“I/O Performance Analysis for the Petascale Simulation Code FLASH,”
ISC'09, Hamburg, Germany, June 2009.
(88.88 KB)
“Making Performance Analysis and Tuning Part of the Software Development Cycle,”
Proceedings of DoD HPCMP UGC 2009, San Diego, CA, IEEE, June 2009.
“Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,”
SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.
(906.39 KB)
“Performance evaluation for petascale quantum simulation tools,”
Proceedings of CUG09, Atlanta, GA, May 2009.
(1.09 MB)
“Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior,”
Future Generation Computing Systems, vol. 26, pp. 162-166, 00 2009.
“A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling,”
The International Conference on Computational Science 2009 (ICCS 2009), vol. 5544, Baton Rouge, LA, pp. 195-204, May 2009.
(228.45 KB)
“Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“Detection and Analysis of Iterative Behavior in Parallel Applications,”
Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.
(141.02 KB)
“Exploring New Architectures in Accelerating CFD for Air Force Applications,”
Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.
(492.86 KB)
“OpenMP-centric Performance Analysis of Hybrid Applications,”
Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008), Tsukuba, Japan, January 2008.
(218.63 KB)
“Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.
(350.9 KB)
“PERI Auto-tuning,”
Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.
(873.75 KB)
“Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,”
Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.
(229.2 KB)
“Visualizing the Program Execution Control Flow of OpenMP Applications,”
Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.
(194.25 KB)
“Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“Automatic Analysis of Inefficiency Patterns in Parallel Applications,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.
(233.31 KB)
“Continuous Runtime Profiling of OpenMP Applications,”
Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007), Juelich and Aachen, Germany, January 2007.
(408.01 KB)
“Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“Feedback-Directed Thread Scheduling with Memory Considerations,”
IEEE International Symposium on High Performance Distributed Computing, Monterey Bay, CA, June 2007.
(297.24 KB)
“L2 Cache Modeling for Scientific Applications on Chip Multi-Processors,”
Proceedings of the 2007 International Conference on Parallel Processing, Xi'an, China, IEEE Computer Society, January 2007.
(654.11 KB)
“Memory Leak Detection in Fortran Applications using TAU,”
Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07), Pittsburgh, PA, IEEE Computer Society, January 2007.
“Results of the PERI survey of SciDAC applications,”
Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.
(692.83 KB)
“Experiments with Strassen's Algorithm: From Sequential to Parallel,”
18th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 2006 (submitted), Dallas, Texas, January 2006.
(514.33 KB)
“Large Event Traces in Parallel Performance Analysis,”
8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics, no. ICL-UT-06-08, Frankfurt/Main, Germany, Gesellschaft für Informatik, March 2006.
(92.47 KB)
“Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors,”
University of Tennessee Computer Science Technical Report, no. UT-CS-06-583, January 2006.
(652.93 KB)
“Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,”
Second International Workshop on OpenMP, Reims, France, January 2006.
(350.9 KB)
“Automatic analysis of inefficiency patterns in parallel applications,”
Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.
(233.31 KB)
“Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,”
In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.
(227.13 KB)
“Improving Time to Solution with Automated Performance Analysis,”
Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005), San Francisco, February 2005.
(112.63 KB)
“A Pattern-Based Approach to Automated Application Performance Analysis,”
Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.
(3.47 MB)
“Performance Analysis of GYRO: A Tool Evaluation,”
In Proceedings of the 2005 SciDAC Conference, San Francisco, CA, June 2005.
(172.07 KB)
“Performance Profiling and Analysis of DoD Applications using PAPI and TAU,”
Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.
(322.56 KB)
“A Scalable Approach to MPI Application Performance Analysis,”
In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.
(988.58 KB)
“