CTWatch Quarterly » SLOPE - A Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes

SLOPE - A Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes

Pedro C. Diniz, Information Sciences Institute, University of Southern California
Jeremy Abramson, Information Sciences Institute, University of Southern California

3.4 Discussion

The architectural model developed in the current SLOPE implementation is rather simple (but not simplistic) in several respects. First, it assumes a zero overhead instruction scheduling. This is clearly not the case although pipelining execution techniques can emulate this aspect. Second, it does not take into account register pressure in the execution. Lastly, it does not yet take into account advanced execution techniques such as software pipelining and multi-threading. Ignoring these techniques and compiler implementation details clearly leads to quantitative results that differ, maybe substantially, from current high-end machines. Last but not least, this approach needs to be validated against a real architecture.

Nevertheless, this approach allows the development of quantitative architectural performance trends and hence allows architecture designers to make informed decisions about how to most efficiently allocate transistors. In the above case, a determination could be made between the complexity and power consumption (for example) of having an arithmetic that pipelines all operations (including divides) and simply replicating standard arithmetic. This information also allows developers to have an idea of what the performance trend increases will be on a proposed “future” machine for a given code.

4. Conclusion

We have described SLOPE, a system for performance prediction and architecture sensitivity analysis using source level program analysis and scheduling techniques. SLOPE provides a very fast qualitative analysis of the performance of a given kernel code. We have experimented with a real scientific code that engineers and scientists use in practice. The results yield important qualitative performance sensitivity information that can be used when allocating computing resources to the computation in a judicious fashion for maximum resource efficiency and/or help guide the application of compiler transformations such as loop unrolling.

References

¹Bailey, D., Barton, J., Lasinski, T., Simon, H. “The NAS parallel benchmarks,” International Journal of Supercomputer Applications, 1991.
²SPEC - www.spec.org/
³Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A. “A Framework for Application Performance Modeling and Prediction,” In Proceedings of the 2002 ACM/IEEE SuperComputing Conference (SC’02), 2002.
⁴Saavedra, R. H., Smith, A. J. “Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times,” IEEE Transactions on Computers, vol. 44:10.
⁵Kerbyson, D. J.,Hoisie, A., Wasserman, H. J. “Modeling the Performance of Large-Scale Systems,” Keynote paper, UK Performance Engineering Workshop (UKPEW03), July, 2003.
⁶The ASCII Purple Benckmark Codes - www.llnl.gov/asci/purple/benchmarks/limited/umt/umt1.2.readme.html#Cod...
⁷This intermediate representation of the Open64 [10] infrastructure consists of annotated Abstract-Syntax-Tree representation of the input source code similar to other high-level representation of source code.
⁸Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D. “Automatic Program Parallelization,” In Proceedings of the IEEE, 1993.
⁹Mowry, T. “Tolerating Latency in Multiprocessors through Compiler-Inserted Prefetching,” ACM Transactions on Computer Systems, 16(1), pp. 55-92, Feb. 1998.
¹⁰The Open64 Compiler and Tools - sourceforge.net/projects/open64/
¹¹Pipelining allows the functional unit allocated to an operation to be available again the cycle after it is obtained.

Pages: 1 2 3 4 5

CTWatch is a collaborative effort				Sponsored By