Model | Time per access | Parameters | |
0 | Flat memory | T = g = constant | g | 1 | Two level memory | T = P(c/M)*g1 + (1-P(c/M))*g2 | g1, g2 |
2 | Latency, Bandwidth | T = (l+g*(L-1))/L | l, g |
3 | L, B for two levels | T = P(c/M)*(l1+g1*(L-1))/L + (1-P(c/M))*(l2+g*(L-1))/L2 | l1, g1, l2, g2 |
Cache-hit probability: P(c/M) = (c/M)α |
I use four models to characterize serial benchmark execution and parallel execution of Apex-MAP. The simplest model represents an ideal system on which all data accesses take equal time (flat global memory). My second model assumes a two level memory hierarchy, with constant access time for each level. M is the global memory utilized by Apex-MAP and c the amount of memory in the faster second level. The probability to find data in c can be derived from the properties of the temporal locality parameter α as (c/M)α. The third model assumes access time depends linearly on block length L with two parameters for latency l and gap g (inverse bandwidth). The fourth model finally combines the previous two by assuming a linear timing model for each level in a hierarchical model. All models and their timing equations are shown in Table 2.
Performance models should be calibrated with hardware specifications or low-level micro-benchmarks. Due to the simplicity of my models, predictions based on hardware specifications can be expected to be off by large margins. I use a back-fitting procedure to minimize SSE and the prediction error of the models. To check the validity of the models I inspect residual error plots and fitted latency and gap values to rule out any fictitious models or parameter values.
For complex parallel system hierarchies it is not always clear what the second most important hierarchy level is and what the value of c should be. It is therefore advisable to at least use a backfitting approach to confirm an initial choice, or to probe the performance signature of a system, to determine which level is most influential on performance.