CTWatch
November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Tzu-Yi Chen, Pomona College
Meghan Gunn, University of San Diego
Beth Simon, UC San Diego
Laura Carrington, San Diego Supercomputer Center
Allan Snavely, San Diego Supercomputer Center

1
Abstract

We introduce a metric for evaluating the quality of any predictive ranking and use this metric to investigate methods for answering the question: How can we best rank a set of supercomputers based on their expected performance on a set of applications? On modern supercomputers, with their deep memory hierarchies, we find that rankings based on benchmarks measuring the latency of accesses to L1 cache and the bandwidth of accesses to main memory are significantly better than rankings based on peak flops. We show how to use a combination of application characteristics and machine attributes to compute improved workload-independent rankings.

1. Introduction

Low-level performance metrics such as processor speed and peak floating-point issue rate (flops) are commonly reported, appearing even in mass-market computer advertisements. The implication is that these numbers can be used to predict how fast applications will run on different machines,
so faster is better. More sophisticated users realize that manufacturer specifications such as theoretical peak floating-point issue rates are rarely achieved in practice, and so may instead use simple benchmarks to predict relative application performance on different machines. For understanding parallel performance, benchmarks range from scaled down versions of real applications to simpler metrics (e.g., the NAS parallel benchmark suites,1 the SPEC benchmark,2 the HINT benchmark,3 the HPC Challenge benchmark,4 STREAM,5 and the ratio of flops to memory bandwidth6).

A particularly well-known parallel benchmark is Linpack,7 which has been used since 1993 to rank supercomputers for inclusion on the Top 500 list8(more recently the IDC Balanced Rating9 has also been used to rank machines). The Top 500 list is popular partly because it is easy to read, is based on a simple metric that is easy to measure (essentially peak flops), and is easy to update. Unfortunately, such benchmarks have also been found insufficient for accurately predicting runtimes of real applications.10

Figure 1

Figure 1. This graph shows the measured runtimes of eight applications on seven supercomputers. The runtime for each application is divided by the maximum time taken by any of the entire set of 14 machines to run that application.

The reason is clear. Consider Figure 1, which plots the performance of eight different High Performance Computing (HPC) codes on seven different supercomputers. The codes are a subset of those in Table 4; the machines are a subset of those in Table 5 (both in Appendix). For each application, all the running times are normalized by the slowest time over all the machines. Since the machines shown are only a subset of those on which we collected runtimes, the highest bar is not at one for every application. While some machines are generally faster than others, no machine is fastest (or slowest) on all the applications. This suggests performance is not a function of any single metric.

Pages: 1 2 3 4 5 6 7 8 9

Reference this article
"Metrics for Ranking the Performance of Supercomputers ," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/metrics-for-ranking-the-performance-of-supercomputers/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.