CTWatch Quarterly » Metrics for Ranking the Performance of Supercomputers

Metrics for Ranking the Performance of Supercomputers

Tzu-Yi Chen, Pomona College
Meghan Gunn, University of San Diego
Beth Simon, UC San Diego
Laura Carrington, San Diego Supercomputer Center
Allan Snavely, San Diego Supercomputer Center

2. Ranking using machine metrics

Ideally we would be able to perfectly rank all machines in an application-independent manner, using only machine characteristics. Although this may be impossible, we can still measure how close any ranking comes to perfect. In this section, we describe a metric for ranking defined by thresholded inversions. We describe the optimal ranking for that metric on our combination of machines and applications, and explore how close we can come to optimal using machine characteristics both individually and in combination.

2.1 Measuring the quality of a ranking

We evaluate the quality of a ranking by the number of thresholded inversions that it contains. For example, consider a list of n systems m₁, m₂, . . .m_n, sorted from highest to lowest interprocessor network bandwidth so that i < j means that the bandwidth achievable on system m_i is at least as good as the bandwidth achievable on system ,m_j. Given an application A and times t(1), t(2), . . . , t(n), where t(k) is the measured running time of A on machine m_k, we would normally say that machines m_i and m_j are inverted if i < j, but t(i) > t(j). Intuitively, this says that all other things being equal, application A should run faster on the machine with the higher network bandwidth. If it does not, we call that an inversion. The number of inversions in the ranking, then, is the number of pairs of machines that are inverted;¹⁵ in our example this is the number of pairs of machines for which the inter-processor network bandwidth incorrectly predicts which machine should run faster on application A. Clearly this number is at least zero and is no larger than n(n - 1)/2.

In practice, to allow for variance in the measured runtimes, we count only thresholded inversions. Given a threshold α satisfying 0 ≤ α ≤ 1, machines m_i and m_j are considered inverted only if m_i is predicted to be faster than m_j, and t(i) > (1 + α)t(j). If α = 0, this is equivalent to the number of inversions as described in the previous paragraph. However, choosing a small, nonzero α allows us to handle variations in timing measurements that can be caused for reasons including those discussed in.¹⁶

Furthermore, to allow for variance in the benchmark timings that give the machine metrics, we use a second threshold parameter β, where 0 ≤ β ≤ 1. Even if two machines m_i and m_j would be considered inverted by the above measures because t(i) > (1 + α)t(j), we only count the inversion as such if the results of the benchmark (e.g., network bandwidth) also differ by a relative measure β. In general β should be chosen to be smaller than α because one expects less variance in benchmark times than in full application runtimes. In this paper we use α = .01, which means a difference of up to 1% in the application runtimes is considered insignificant, and β = .001.

Note that a metric based on thresholded inversions is practical because it is monotonic in the sense that adding another machine m_k and its associated runtime t(k) cannot decrease the number of inversions in a ranking. Within our context of large parallel applications, this is useful because we have only partial runtime data; not all of the applications in Table 4, with all possible processor counts, were run on all the machines in Table 5. In some cases these gaps are necessary because some systems do not have enough compute processors to run every application with all processor counts. Furthermore, in practice, it is not unusual for timings that have been collected at different times on different machines by different people to be incomplete in this way.

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By