CTWatch Quarterly » The Role of Multicore Processors in the Evolution of General-Purpose Computing

The Role of Multicore Processors in the Evolution of General-Purpose Computing

John McCalpin, Advanced Micro Devices, Inc.
Chuck Moore, Advanced Micro Devices, Inc.
Phil Hester, Advanced Micro Devices, Inc.

Looking at these three options in turn:

Produce Smaller Chips

Shipping smaller/cheaper chips with modest frequency increases is of modest value in improving performance and performance/price for customers. In this example, halving the price of the processor reduces the cost of the overall system by 14% ($1,800 vs $2,100) while the 17% frequency boost provides from 0% to 14% improvement in performance, with median and geometric mean improvements of 8%-9% (Figure 1). The combination of these two factors provides performance/price improvements of 17% to 33% with median and geometric mean performance/price improvements of 27%-28%.

Add Lots of On-Chip Cache

Adding lots of cache provides more variable improvement across workloads than most other options. In this example, tripling the L2 cache from 1 MB to 3 MB provides performance improvements of 0% to 127%, with a median improvement of 0% and an improvement in the geometric mean of 11.8%.

The 17% CPU frequency improvement associated with the cache size improvement provides additional benefits; with the combined performance improvement ranging from 0% to 156%, a median improvement of 11.5%, and an improvement in the geometric mean of 22.5%.

In this case, the assumed cost of the chips is the same as the reference system, so the performance/price shows the same ratios as the raw performance.

Add CPU Cores

Adding cores improves the throughput across a broad range of workloads, at the cost of a modest (17%) reduction in frequency in order to meet power/cooling restrictions. Here we assume that the 50% area reduction allows us to include two CPU cores, each with the same 1 MB L2 cache as the reference chip and at the same cost. When running a single process, the performance varies from 15% slower than the reference platform to 0% slower, with median and geometric mean changes of -10% to -11%.

If we can use the second core to run a second copy of the code, the throughput of the system increases from a minimum of 0% to a maximum of 54%, with median and geometric mean speedups of 29% to 32%.

In this case the assumed cost of the chips is the same as in the reference system, so the performance/price shows the same ratios as the raw performance.

Discussion

The three examples above provided a disturbingly large number of independent performance and performance/price metrics – 70 relative values. Reducing the 14 performance values per benchmark to three (minimum, geometric mean, maximum) still leaves us with nine performance values and 12 performance/price values (of which nine are identical to the performance values). Combining these into a metric that can be used to make rational design decisions is not a trivial task.
Each of the three options has significant benefits and significant disadvantages:

Design Option	Major Benefits	Major Drawbacks
Small Chip	Price reduction	Weakest best-case improvements
Big Cache	Huge performance boost on a few codes	Weakest median and geometric mean performance/price improvements
Dual-Core	Strongest median and geometric mean throughput improvements	Decreased single-processor performance

It is relatively straightforward to find customers for whom any of the six Major Benefits or Major Drawbacks constitutes critical decision factors. It is much more complex to determine how to take that information, generalize it, and use it in support of the company's business model.

Engineering decisions must, of course, support the business model of the company making the investments. Even the seemingly simple goal of "making money" becomes quite fuzzy on detailed inspection. Business models can be designed to optimize near-term revenue, near-term profit, long-term revenue, long-term profit, market "buzz" and/or "goodwill", or they can be designed to attain specific goals of market share percentage or to maximize financial pressure on a competitor. Real business models are a complex combination of these goals, and unfortunately for the "purity" of the engineering optimization process, different business goals can change the relative importance of the various performance and performance/price metrics.

Caveat

In each of these scenarios the performance changes depend on the ratio of memory performance to CPU performance on the baseline system. As the amount of available bandwidth increases, the benefits of larger caches decrease and the benefit of having more CPU cores increases. Conversely, relatively low memory bandwidth makes large caches critical but significantly reduces the throughput improvements obtainable with additional CPU cores.

For the more cache-friendly SPECint_rate2000 benchmark, results on the IBM e326 servers running at 2.2 GHz show that doubling the number of cores per chip at the same frequency results in a throughput improvement of 65% to 100% on SPECint_rate2000 (geometric mean improvement = 95%).⁹

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By