9
Appendix
Name | Description | Processor Counts |
avus | CFD calculations on unstructured grids | 32, 64, 96, 128, 192, 256, 384 |
cth7 | effects of strong shock waves | 16, 32, 64, 96 |
gamess | general ab-initio quantum chemistry | 32, 48, 64, 96, 128 |
hycom | primitive equation ocean general circulation model | 24, 47, 59, 80, 96, 111, 124 |
lammps | classical molecular dynamics simulation | 16, 32, 48, 64, 128 |
oocore | out-of-core solver | 16, 32, 48, 64 | overflow2 | CFD calculations on overlapping, multi-resolution grids | 16, 32, 48, 64 |
wrf | weather research and forecast | 16, 32, 48, 64, 96, 128, 192, 256, 384 |
Table 4. The applications used in the study and the number of processors on which each was run.
HPC lab location | Processor | Interconnect | # of compute processors |
ARL | SGI-03800-0.4GHz | NUMACC | 512 |
ARL | LNX-Xeon-3.6GHz | Myrinet | 2048 |
ARSC | IBM-690-1.3GHz | Federation | 784 |
ASC | SGI-03900-0.7GHz | NUMACC | 2032 |
ASC | HP-SC45-1.0GHz | Quadrics | 768 |
ERDC | SGI-O3900-0.7GHz | NUMACC | 1008 |
ERDC | HP-SC40-0.833GHz | Quadrics | 488 |
ERDC | HP-SC45-1.0GHz | Quadrics | 488 |
MHPCC | IBM-690-1.3GHz | Colony | 320 |
MHPCC | IBM-P3-0.375GHz | Colony | 736 |
NAVO | IBM-655-1.7GHz | Federation | 2832 |
NAVO | IBM-690-1.3GHz | Colony | 1328 |
NAVO | IBM-P3-0.375GHz | Colony | 736 |
SDSC | IBM-IA64-1.5GHz | Myrinet | 512 |
Table 5. Systems used in this study.
Probe name | DoD TI06 Benchmark suite20 | Machine property measured |
flops | CPUBENCH | peak rate for issuing floating-point operations |
L1 bw(1) | MEMBENCH | rate for loading strided data from L1 cache |
L1 bw(r) | " | rate for loading random stride data from L1 cache |
L2 bw(1) | " | rate for loading strided data from L2 cache |
L2 bw(r) | " | rate for loading random stride data from L2 cache |
L3 bw(1) | " | rate for loading load strided data from L3 cache |
L3 bw(r) | " | rate for loading random stride data from L3 cache |
MM bw(1) | " | rate for loading strided data from main memory |
MM bw(r) | " | rate for loading random stride data from main memory |
NW bw | NETBENCH | rate for sending data point-to-point |
NW latency | NETBENCH | startup latency for sending data point-to-point |
Table 6. Probes run as part of DoD benchmarking.
References
1 Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, D., Fatoohi, R. A.,
Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V.,
Weeratunga, S. K. "The NAS Parallel Benchmarks," The International Journal of Supercomputer
Applications, 5(3):63–73, Fall 1991.
2SPEC: Standard performance evaluation corporation - www.spec.org, 2005.
3 Gustafson, J. L., and Todi, R. Conventional benchmarks as a sample of the performance spectrum. The Journal of Supercomputing, 13(3):321–342, 1999.
4Luszczek, P., Dongarra, J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D. "Introduction the the HPC challenge benchmark suite," Available at www.hpcchallenge.org/pubs/, March 2005.
5STREAM: Sustainable memory bandwidth in high performance computers - www.cs.virginia.edu/stream
6McCalpin, J. D. "Memory bandwidth and machine balance in current high performance computers," IEEE Technical Committee on Computer Architecture Newsletter, December 1995.
7Dongarra, J., Luszczek, P., Petitet, A. "The LINPACK benchmark: past, present and future," Concurrency and Computation: practice and experience, 15:1–18, 2003.
8Top500 supercomputer sites - www.top500.org
9IDC balanced rating - www.hpcuserforum.com
10Carrington, L., Laurenzano, M., Snavely, A., Campbell, R., Davis, L. "How well can simple metrics predict the performance of real applications?" In Proceedings of Supercomputing (SC|05), November 2005.
11Armstrong, B., Eigenmann, R. "Performance forecasting: Towards a methodology for characterizing large computational applications," In ICPP ’98: Proceedings of the 1998 International Conference on Parallel Processing, pages 518–526, Minneapolis, Minnesota, August 1998.
12Kerbyson, D. J., Alme, H. J., Hoisie, A., Petrini, F., Wasserman, H. J., Gittings, M. "Predictive performance and scalability modeling of a large-scale application," In Proceedings of Supercomputing (SC 2001), Denver, Colarado, November 2001.
13Simon, J., Wierum, J. "Accurate performance prediction for massively parallel systems and its applications," In Proceedings of 2nd International Euro-Par Conference, Lyon, France, August 1996.
14Spooner, D., Kerbyson, D. "Identification of performance characteristics from multi-view trace analysis," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
15List inversions are also used in other fields, for example to compare the results returned by different queries to a database, and are related to the statistical measure Kendall’s tau.
16Kramer, W. T. C., Ryan, C. "Performance variability of highly parallel architectures," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
17In comparison, the worst ranking we saw had 2008 inversions. A random sample of 100 rankings had an average of 1000 inversions with a standard deviation just over 200.
18Carrington, L., Snavely, A., Wolter, N., Gao, X. "A performance prediction framework for scientific applications," In Proceedings of the International Conference on ComputationalScience (ICCS 2003), Melbourne, Australia, June 2003.
19Marin, G., Mellor-Crummey, J. "Cross-architecture performance predictions for scientific applications using parameterized models," In Proceedings of SIGMETRICS/Performance’04, New York, NY, June 2004.
20Department of Defense High Performance Computing Modernization Program. Technology insertion, 06 (TI-06) - www.hpcmo.hpc.mil/Htdocs/TI/TI06, May 2005.
2SPEC: Standard performance evaluation corporation - www.spec.org, 2005.
3 Gustafson, J. L., and Todi, R. Conventional benchmarks as a sample of the performance spectrum. The Journal of Supercomputing, 13(3):321–342, 1999.
4Luszczek, P., Dongarra, J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D. "Introduction the the HPC challenge benchmark suite," Available at www.hpcchallenge.org/pubs/, March 2005.
5STREAM: Sustainable memory bandwidth in high performance computers - www.cs.virginia.edu/stream
6McCalpin, J. D. "Memory bandwidth and machine balance in current high performance computers," IEEE Technical Committee on Computer Architecture Newsletter, December 1995.
7Dongarra, J., Luszczek, P., Petitet, A. "The LINPACK benchmark: past, present and future," Concurrency and Computation: practice and experience, 15:1–18, 2003.
8Top500 supercomputer sites - www.top500.org
9IDC balanced rating - www.hpcuserforum.com
10Carrington, L., Laurenzano, M., Snavely, A., Campbell, R., Davis, L. "How well can simple metrics predict the performance of real applications?" In Proceedings of Supercomputing (SC|05), November 2005.
11Armstrong, B., Eigenmann, R. "Performance forecasting: Towards a methodology for characterizing large computational applications," In ICPP ’98: Proceedings of the 1998 International Conference on Parallel Processing, pages 518–526, Minneapolis, Minnesota, August 1998.
12Kerbyson, D. J., Alme, H. J., Hoisie, A., Petrini, F., Wasserman, H. J., Gittings, M. "Predictive performance and scalability modeling of a large-scale application," In Proceedings of Supercomputing (SC 2001), Denver, Colarado, November 2001.
13Simon, J., Wierum, J. "Accurate performance prediction for massively parallel systems and its applications," In Proceedings of 2nd International Euro-Par Conference, Lyon, France, August 1996.
14Spooner, D., Kerbyson, D. "Identification of performance characteristics from multi-view trace analysis," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
15List inversions are also used in other fields, for example to compare the results returned by different queries to a database, and are related to the statistical measure Kendall’s tau.
16Kramer, W. T. C., Ryan, C. "Performance variability of highly parallel architectures," In Proceedings of the International Conference on Computational Science (ICCS 2003), Melbourne, Australia, June 2003.
17In comparison, the worst ranking we saw had 2008 inversions. A random sample of 100 rankings had an average of 1000 inversions with a standard deviation just over 200.
18Carrington, L., Snavely, A., Wolter, N., Gao, X. "A performance prediction framework for scientific applications," In Proceedings of the International Conference on ComputationalScience (ICCS 2003), Melbourne, Australia, June 2003.
19Marin, G., Mellor-Crummey, J. "Cross-architecture performance predictions for scientific applications using parameterized models," In Proceedings of SIGMETRICS/Performance’04, New York, NY, June 2004.
20Department of Defense High Performance Computing Modernization Program. Technology insertion, 06 (TI-06) - www.hpcmo.hpc.mil/Htdocs/TI/TI06, May 2005.