CTWatch Quarterly » Metrics for Ranking the Performance of Supercomputers

Metrics for Ranking the Performance of Supercomputers

Tzu-Yi Chen, Pomona College
Meghan Gunn, University of San Diego
Beth Simon, UC San Diego
Laura Carrington, San Diego Supercomputer Center
Allan Snavely, San Diego Supercomputer Center

Appendix

Name	Description	Processor Counts
avus	CFD calculations on unstructured grids	32, 64, 96, 128, 192, 256, 384
cth7	effects of strong shock waves	16, 32, 64, 96
gamess	general ab-initio quantum chemistry	32, 48, 64, 96, 128
hycom	primitive equation ocean general circulation model	24, 47, 59, 80, 96, 111, 124
lammps	classical molecular dynamics simulation	16, 32, 48, 64, 128
oocore	out-of-core solver	16, 32, 48, 64
overflow2	CFD calculations on overlapping, multi-resolution grids	16, 32, 48, 64
wrf	weather research and forecast	16, 32, 48, 64, 96, 128, 192, 256, 384

Table 4. The applications used in the study and the number of processors on which each was run.

HPC lab location	Processor	Interconnect	# of compute processors
ARL	SGI-03800-0.4GHz	NUMACC	512
ARL	LNX-Xeon-3.6GHz	Myrinet	2048
ARSC	IBM-690-1.3GHz	Federation	784
ASC	SGI-03900-0.7GHz	NUMACC	2032
ASC	HP-SC45-1.0GHz	Quadrics	768
ERDC	SGI-O3900-0.7GHz	NUMACC	1008
ERDC	HP-SC40-0.833GHz	Quadrics	488
ERDC	HP-SC45-1.0GHz	Quadrics	488
MHPCC	IBM-690-1.3GHz	Colony	320
MHPCC	IBM-P3-0.375GHz	Colony	736
NAVO	IBM-655-1.7GHz	Federation	2832
NAVO	IBM-690-1.3GHz	Colony	1328
NAVO	IBM-P3-0.375GHz	Colony	736
SDSC	IBM-IA64-1.5GHz	Myrinet	512

Table 5. Systems used in this study.

Probe name	DoD TI06 Benchmark suite²⁰	Machine property measured
flops	CPUBENCH	peak rate for issuing floating-point operations
L1 bw(1)	MEMBENCH	rate for loading strided data from L1 cache
L1 bw(r)	"	rate for loading random stride data from L1 cache
L2 bw(1)	"	rate for loading strided data from L2 cache
L2 bw(r)	"	rate for loading random stride data from L2 cache
L3 bw(1)	"	rate for loading load strided data from L3 cache
L3 bw(r)	"	rate for loading random stride data from L3 cache
MM bw(1)	"	rate for loading strided data from main memory
MM bw(r)	"	rate for loading random stride data from main memory
NW bw	NETBENCH	rate for sending data point-to-point
NW latency	NETBENCH	startup latency for sending data point-to-point

Table 6. Probes run as part of DoD benchmarking.

References

¹ Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, D., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V., Weeratunga, S. K. "The NAS Parallel Benchmarks," The International Journal of Supercomputer Applications, 5(3):63–73, Fall 1991.
²SPEC: Standard performance evaluation corporation - www.spec.org, 2005.
³ Gustafson, J. L., and Todi, R. Conventional benchmarks as a sample of the performance spectrum. The Journal of Supercomputing, 13(3):321–342, 1999.
⁴Luszczek, P., Dongarra, J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D. "Introduction the the HPC challenge benchmark suite," Available at www.hpcchallenge.org/pubs/, March 2005.
⁵STREAM: Sustainable memory bandwidth in high performance computers - www.cs.virginia.edu/stream
⁶McCalpin, J. D. "Memory bandwidth and machine balance in current high performance computers," IEEE Technical Committee on Computer Architecture Newsletter, December 1995.
⁷Dongarra, J., Luszczek, P., Petitet, A. "The LINPACK benchmark: past, present and future," Concurrency and Computation: practice and experience, 15:1–18, 2003.
⁸Top500 supercomputer sites - www.top500.org
⁹IDC balanced rating - www.hpcuserforum.com
¹⁰Carrington, L., Laurenzano, M., Snavely, A., Campbell, R., Davis, L. "How well can simple metrics predict the performance of real applications?" In Proceedings of Supercomputing (SC|05), November 2005.
¹¹Armstrong, B., Eigenmann, R. "Performance forecasting: Towards a methodology for characterizing large computational applications," In ICPP ’98: Proceedings of the 1998 International Conference on Parallel Processing, pages 518–526, Minneapolis, Minnesota, August 1998.
¹²Kerbyson, D. J., Alme, H. J., Hoisie, A., Petrini, F., Wasserman, H. J., Gittings, M. "Predictive performance and scalability modeling of a large-scale application," In Proceedings of Supercomputing (SC 2001), Denver, Colarado, November 2001.
¹³Simon, J., Wierum, J. "Accurate performance prediction for massively parallel systems and its applications," In Proceedings of 2nd International Euro-Par Conference, Lyon, France, August 1996.
¹⁴Spooner, D., Kerbyson, D. "Identification of performance characteristics from multi-view trace analysis," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
¹⁵List inversions are also used in other fields, for example to compare the results returned by different queries to a database, and are related to the statistical measure Kendall’s tau.
¹⁶Kramer, W. T. C., Ryan, C. "Performance variability of highly parallel architectures," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
¹⁷In comparison, the worst ranking we saw had 2008 inversions. A random sample of 100 rankings had an average of 1000 inversions with a standard deviation just over 200.
¹⁸Carrington, L., Snavely, A., Wolter, N., Gao, X. "A performance prediction framework for scientific applications," In Proceedings of the International Conference on ComputationalScience (ICCS 2003), Melbourne, Australia, June 2003.
¹⁹Marin, G., Mellor-Crummey, J. "Cross-architecture performance predictions for scientific applications using parameterized models," In Proceedings of SIGMETRICS/Performance’04, New York, NY, June 2004.
²⁰Department of Defense High Performance Computing Modernization Program. Technology insertion, 06 (TI-06) - www.hpcmo.hpc.mil/Htdocs/TI/TI06, May 2005.

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By