CTWatch
November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Pedro C. Diniz, Information Sciences Institute, University of Southern California
Jeremy Abramson, Information Sciences Institute, University of Southern California

5
3.4 Discussion

The architectural model developed in the current SLOPE implementation is rather simple (but not simplistic) in several respects. First, it assumes a zero overhead instruction scheduling. This is clearly not the case although pipelining execution techniques can emulate this aspect. Second, it does not take into account register pressure in the execution. Lastly, it does not yet take into account advanced execution techniques such as software pipelining and multi-threading. Ignoring these techniques and compiler implementation details clearly leads to quantitative results that differ, maybe substantially, from current high-end machines. Last but not least, this approach needs to be validated against a real architecture.

Nevertheless, this approach allows the development of quantitative architectural performance trends and hence allows architecture designers to make informed decisions about how to most efficiently allocate transistors. In the above case, a determination could be made between the complexity and power consumption (for example) of having an arithmetic that pipelines all operations (including divides) and simply replicating standard arithmetic. This information also allows developers to have an idea of what the performance trend increases will be on a proposed “future” machine for a given code.

4. Conclusion

We have described SLOPE, a system for performance prediction and architecture sensitivity analysis using source level program analysis and scheduling techniques. SLOPE provides a very fast qualitative analysis of the performance of a given kernel code. We have experimented with a real scientific code that engineers and scientists use in practice. The results yield important qualitative performance sensitivity information that can be used when allocating computing resources to the computation in a judicious fashion for maximum resource efficiency and/or help guide the application of compiler transformations such as loop unrolling.

References
1Bailey, D., Barton, J., Lasinski, T., Simon, H. “The NAS parallel benchmarks,” International Journal of Supercomputer Applications, 1991.
2SPEC - www.spec.org/
3Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A. “A Framework for Application Performance Modeling and Prediction,” In Proceedings of the 2002 ACM/IEEE SuperComputing Conference (SC’02), 2002.
4Saavedra, R. H., Smith, A. J. “Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times,” IEEE Transactions on Computers, vol. 44:10.
5Kerbyson, D. J.,Hoisie, A., Wasserman, H. J. “Modeling the Performance of Large-Scale Systems,” Keynote paper, UK Performance Engineering Workshop (UKPEW03), July, 2003.
6The ASCII Purple Benckmark Codes - www.llnl.gov/asci/purple/benchmarks/limited/umt/umt1.2.readme.html#Cod...
7This intermediate representation of the Open64 [10] infrastructure consists of annotated Abstract-Syntax-Tree representation of the input source code similar to other high-level representation of source code.
8Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D. “Automatic Program Parallelization,” In Proceedings of the IEEE, 1993.
9Mowry, T. “Tolerating Latency in Multiprocessors through Compiler-Inserted Prefetching,” ACM Transactions on Computer Systems, 16(1), pp. 55-92, Feb. 1998.
10The Open64 Compiler and Tools - sourceforge.net/projects/open64/
11Pipelining allows the functional unit allocated to an operation to be available again the cycle after it is obtained.

Pages: 1 2 3 4 5

Reference this article
"SLOPE - A Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/slope-a-compiler-approach-to-performance-prediction-and-performance-sensitivity-analysis-for-scientific-codes/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.