CTWatch
November 2006 A
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Andrew Funk, MIT Lincoln Laboratory
Victor Basili, University of Maryland, College Park
Lorin Hochstein, University of Nebraska, Lincoln
Jeremy Kepner, MIT Lincoln Laboratory

4
HPC Challenge

The HPC Challenge suite consists of several activity-based benchmarks designed to test various aspects of a computing platform (see “Design and Implementation of the HPC Challenge Benchmark Suite” in this issue). The four benchmarks used in this study were FFT (v0.6a), High Performance Linpack (HPL, v0.6a), RandomAccess (v0.5b), and Stream (v0.6a). These codes were run on the Lincoln Laboratory Grid (LLGrid), a cluster of dual-processor nodes connected by Gigabit Ethernet.9 The parallel codes were each run using 64 of these dual-processor nodes, for a total of 128 CPUs. The speedup for each parallel code was determined by dividing the runtime for a baseline serial C/Fortran code by the runtime for the parallel code (the serial Matlab code was treated the same as the parallel codes for purposes of comparison).

Figure 3 presents the results of RDTP analysis for the HPC Challenge benchmarks. With the exception of Random Access (the implementation of which does not scale well on distributed-memory computing clusters), the MPI implementations all fall into the upper-right quadrant of the graph, indicating that they deliver some level of parallel speedup, while requiring greater effort than the serial code. As expected, the serial Matlab implementations do not deliver any speedup, but all require less effort than the serial code. The pMatlab implementations (except Random Access) fall into the upper-left quadrant of the graph, delivering parallel speedup while at the same time requiring less effort.

The combination of parallel speedup and reduced effort means that the pMatlab implementations generally have higher RDTP values. On average the serial Matlab implementations come in second, due to their low relative effort. The MPI implementations, while delivering better speedup, also require more relative effort, leading to lower RDTP values.

Figure 3

Figure 3. Speedup vs. Relative Effort and RDTP for HPC Challenge.

Pages: 1 2 3 4 5 6

Reference this article
Funk, A., Basili, V., Hochstein, L., Kepner, J. "Analysis of Parallel Software Development using the Relative Development Time Productivity Metric," CTWatch Quarterly, Volume 2, Number 4A, November 2006 A. http://www.ctwatch.org/quarterly/articles/2006/11/analysis-of-parallel-software-development-using-the-relative-development-time-productivity-metric/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.