CTWatch Quarterly » Performance Complexity: An Execution Time Metric to Characterize the Transparency and Complexity of Performance

Performance Complexity: An Execution Time Metric to Characterize the Transparency and Complexity of Performance

Erich Strohmaier, Lawrence Berkeley National Laboratory

3. Apex-MAP

Figure 1. Example results from parameter sweeps of Apex-MAP. Note the change of scale for access times from 2 to 4 magnitudes. L represents spatial and α temporal locality.

Apex-MAP is a synthetic benchmark generating global address streams with parameterized degrees of spatial and temporal locality. Along with other parameters, the user specifies L and α, parameters related to spatial locality and temporal reuse respectively. Apex-MAP then chooses indices into a data array that are distributed according to α, using a non-uniform random number generator. The indices are most dispersed when α=1 (uniform random distribution) and become increasingly crowded toward the starting address as α approaches 0. Apex-MAP then performs L stride 1 references starting from each index. Apex-MAP has been used to map the performance of several systems with respect to L and α (see Figure 1).

4. Performance Models and Modeling Methodology used

In my methodology the selection of the performance models is as important as the selection of the benchmarks. It is widely accepted that there cannot be a single measure for performance, which does not relate to specific codes. Likewise, performance transparency must be related to specific codes. Embarrassingly parallel codes typically will exhibit only small performance complexity, while this is not the case for tightly coupled, irregular scientific problems. In addition, the performance complexities programmers experience on a system also depend on the programming languages and coding styles they use.

Ideally, the features of the performance models should reflect characteristics of our programming languages, which the user can easily control and use to influence performance behavior. An example would be vector-length as it is easily expressed in most languages as a loop-count or an array-dimension. Unfortunately many programming languages do not complement system architectures well, as they do not have appropriate means of controlling hardware features. This situation is exacerbated as many hardware features are actually designed to not be user controllable, which makes performance optimization often a painful exercise in trial and error. Cache usage would be a typical example here as programming languages have little means to control directly which data should reside in the cache or not. In developing performance models we often have to revert to using such non-controllable hardware features.

Apex-MAP is designed to measure the dependency of global data access performance on spatial and temporal locality. This is achieved by a sequence of blocked data accesses of unit stride with a pseudo-random starting address. From this we can expect, that any good performance model for it should contain the features of access length and for access probabilities to different levels of memory hierarchy. The former is a loop length and easily controlled in programs, the later depends on cache hit-rates, which are usually not directly controllable by the programmer. The metric for Apex-MAP performance is [data-access/second] or any derivative thereof.

Pages: 1 2 3 4 5 6 7

CTWatch is a collaborative effort				Sponsored By