%0 Journal Article
%J The Computer Journal
%D 2013
%T BlackjackBench: Portable Hardware Characterization with Automated Results Analysis
%A Anthony Danalis
%A Piotr Luszczek
%A Gabriel Marin
%A Jeffrey Vetter
%A Jack Dongarra
%K hardware characterization
%K micro-benchmarks
%K statistical analysis
%X DARPA's AACE project aimed to develop Architecture Aware Compiler Environments. Such a compiler automatically characterizes the targeted hardware and optimizes the application codes accordingly. We present the BlackjackBench suite, a collection of portable micro-benchmarks that automate system characterization, plus statistical analysis techniques for interpreting the results. The BlackjackBench benchmarks discover the effective sizes and speeds of the hardware environment rather than the often unattainable peak values. We aim at hardware characteristics that can be observed by running executables generated by existing compilers from standard C codes. We characterize the memory hierarchy, including cache sharing and non-uniform memory access characteristics of the system, properties of the processing cores affecting the instruction execution speed and the length of the operating system scheduler time slot. We show how these features of modern multicores can be discovered programmatically. We also show how the features could potentially interfere with each other resulting in incorrect interpretation of the results, and how established classification and statistical analysis techniques can reduce experimental noise and aid automatic interpretation of results. We show how effective hardware metrics from our probes allow guided tuning of computational kernels that outperform an autotuning library further tuned by the hardware vendor.
%B The Computer Journal
%8 2013-03
%G eng
%R 10.1093/comjnl/bxt057

%0 Conference Paper
%B Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13)
%D 2013
%T Diagnosis and Optimization of Application Prefetching Performance
%A Gabriel Marin
%A Colin McCurdy
%A Jeffrey Vetter
%E Allen D. Malony
%E Nemirovsky, Mario
%E Midkiff, Sam
%X Hardware prefetchers are effective at recognizing streaming memory access patterns and at moving data closer to the processing units to hide memory latency. However, hardware prefetchers can track only a limited number of data streams due to finite hardware resources. In this paper, we introduce the term <em>streaming concurrency</em> to characterize the number of parallel, logical data streams in an application. We present a simulation algorithm for understanding the streaming concurrency at any point in an application, and we show that this metric is a good predictor of the number of memory requests initiated by streaming prefetchers. Next, we try to understand the causes behind poor prefetching performance. We identified four prefetch unfriendly conditions and we show how to classify an application's memory references based on these conditions. We evaluated our analysis using the SPEC CPU2006 benchmark suite. We selected two benchmarks with unfavorable access patterns and transformed them to improve their prefetching effectiveness. Results show that making applications more prefetcher friendly can yield meaningful performance gains.
%B Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13)
%I ACM Press
%C Eugene, Oregon, USA
%8 2013-06
%@ 9781450321303
%G eng
%U http://dl.acm.org/citation.cfm?doid=2464996.2465014
%R 10.1145/2464996.2465014

%0 Book Section
%B Contemporary High Performance Computing: From Petascale Toward Exascale
%D 2013
%T Keeneland: Computational Science Using Heterogeneous GPU Computing
%A Jeffrey Vetter
%A Richard Glassbrook
%A Karsten Schwan
%A Sudha Yalamanchili
%A Mitch Horton
%A Ada Gavrilovska
%A Magda Slawinska
%A Jack Dongarra
%A Jeremy Meredith
%A Philip Roth
%A Kyle Spafford
%A Stanimire Tomov
%A John Wynkoop
%X The Keeneland Project is a five year Track 2D grant awarded by the National Science Foundation (NSF) under solicitation NSF 08-573 in August 2009 for the development and deployment of an innovative high performance computing system. The Keeneland project is led by the Georgia Institute of Technology (Georgia Tech) in collaboration with the University of Tennessee at Knoxville, National Institute of Computational Sciences, and Oak Ridge National Laboratory.
%B Contemporary High Performance Computing: From Petascale Toward Exascale
%S CRC Computational Science Series
%I Taylor and Francis
%C Boca Raton, FL
%G eng
%& 7

%0 Journal Article
%J On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing (to appear)
%D 2012
%T HPC Challenge: Design, History, and Implementation Highlights
%A Jack Dongarra
%A Piotr Luszczek
%E Jeffrey Vetter
%B On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing (to appear)
%I Chapman & Hall/CRC Press
%8 2012-00
%G eng

%0 Conference Proceedings
%B IEEE International Parallel and Distributed Processing Symposium (submitted)
%D 2011
%T BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results
%A Anthony Danalis
%A Piotr Luszczek
%A Gabriel Marin
%A Jeffrey Vetter
%A Jack Dongarra
%B IEEE International Parallel and Distributed Processing Symposium (submitted)
%C Anchorage, AK
%8 2011-05
%G eng

%0 Journal Article
%J International Journal of High Performance Computing
%D 2011
%T The International Exascale Software Project Roadmap
%A Jack Dongarra
%A Pete Beckman
%A Terry Moore
%A Patrick Aerts
%A Giovanni Aloisio
%A Jean-Claude Andre
%A David Barkai
%A Jean-Yves Berthou
%A Taisuke Boku
%A Bertrand Braunschweig
%A Franck Cappello
%A Barbara Chapman
%A Xuebin Chi
%A Alok Choudhary
%A Sudip Dosanjh
%A Thom Dunning
%A Sandro Fiore
%A Al Geist
%A Bill Gropp
%A Robert Harrison
%A Mark Hereld
%A Michael Heroux
%A Adolfy Hoisie
%A Koh Hotta
%A Zhong Jin
%A Yutaka Ishikawa
%A Fred Johnson
%A Sanjay Kale
%A Richard Kenway
%A David Keyes
%A Bill Kramer
%A Jesus Labarta
%A Alain Lichnewsky
%A Thomas Lippert
%A Bob Lucas
%A Barney MacCabe
%A Satoshi Matsuoka
%A Paul Messina
%A Peter Michielse
%A Bernd Mohr
%A Matthias S. Mueller
%A Wolfgang E. Nagel
%A Hiroshi Nakashima
%A Michael E. Papka
%A Dan Reed
%A Mitsuhisa Sato
%A Ed Seidel
%A John Shalf
%A David Skinner
%A Marc Snir
%A Thomas Sterling
%A Rick Stevens
%A Fred Streitz
%A Bob Sugar
%A Shinji Sumimoto
%A William Tang
%A John Taylor
%A Rajeev Thakur
%A Anne Trefethen
%A Mateo Valero
%A Aad van der Steen
%A Jeffrey Vetter
%A Peg Williams
%A Robert Wisniewski
%A Kathy Yelick
%X Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.
%B International Journal of High Performance Computing
%V 25
%P 3-60
%8 2011-01
%G eng
%R https://doi.org/10.1177/1094342010391989

%0 Journal Article
%J IEEE Computing in Science & Engineering
%D 2011
%T Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community
%A Jeffrey Vetter
%A Richard Glassbrook
%A Jack Dongarra
%A Karsten Schwan
%A Bruce Loftis
%A Stephen McNally
%A Jeremy Meredith
%A James Rogers
%A Philip Roth
%A Kyle Spafford
%A Sudhakar Yalamanchili
%K Benchmark testing
%K Computational modeling
%K Computer architecture
%K Graphics processing unit
%K Hardware
%K Random access memory
%K Scientific computing
%X The Keeneland project's goal is to develop and deploy an innovative, GPU-based high-performance computing system for the NSF computational science community.
%B IEEE Computing in Science & Engineering
%V 13
%P 90-95
%8 2011-08
%G eng
%N 5
%R https://doi.org/10.1109/MCSE.2011.83

%0 Conference Proceedings
%B ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing
%D 2009
%T A Holistic Approach for Performance Measurement and Analysis for Petascale Applications
%A Heike Jagode
%A Jack Dongarra
%A Sadaf Alam
%A Jeffrey Vetter
%A W. Spear
%A Allen D. Malony
%E Gabrielle Allen
%K point
%K test
%B ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing
%I Springer-Verlag Berlin Heidelberg 2009
%C Baton Rouge, Louisiana
%V 2009
%P 686-695
%8 2009-05
%G eng

%0 Journal Article
%J in Advances in Computers
%D 2008
%T DARPA's HPCS Program: History, Models, Tools, Languages
%A Jack Dongarra
%A Robert Graybill
%A William Harrod
%A Robert Lucas
%A Ewing Lusk
%A Piotr Luszczek
%A Janice McMahon
%A Allan Snavely
%A Jeffrey Vetter
%A Katherine Yelick
%A Sadaf Alam
%A Roy Campbell
%A Laura Carrington
%A Tzu-Yi Chen
%A Omid Khalili
%A Jeremy Meredith
%A Mustafa Tikir
%E M. Zelkowitz
%B in Advances in Computers
%I Elsevier
%V 72
%8 2008-01
%G eng

%0 Journal Article
%J Oak Ridge National Laboratory Report
%D 2004
%T Cray X1 Evaluation Status Report
%A Pratul Agarwal
%A R. A. Alexander
%A E. Apra
%A Satish Balay
%A Arthur S. Bland
%A James Colgan
%A Eduardo D'Azevedo
%A Jack Dongarra
%A Tom Dunigan
%A Mark Fahey
%A Al Geist
%A M. Gordon
%A Robert Harrison
%A Dinesh Kaushik
%A M. Krishnakumar
%A Piotr Luszczek
%A Tony Mezzacapa
%A Jeff Nichols
%A Jarek Nieplocha
%A Leonid Oliker
%A T. Packwood
%A M. Pindzola
%A Thomas C. Schulthess
%A Jeffrey Vetter
%A James B White
%A T. Windus
%A Patrick H. Worley
%A Thomas Zacharia
%B Oak Ridge National Laboratory Report
%V /-2004/13
%8 2004-01
%G eng