CTWatch
February 2007
The Promise and Perils of the Coming Multicore Revolution and Its Impact
John McCalpin, Advanced Micro Devices, Inc.
Chuck Moore, Advanced Micro Devices, Inc.
Phil Hester, Advanced Micro Devices, Inc.

5
3. Current and Near-Term Issues
The "Power Problem"

Like performance, power is more multidimensional than one might initially assume. In the context of high-performance microprocessor-based systems, the "power problem" might mean any of:

  • Bulk current delivery to the chip through a large number of very tiny pins/pads. (Note that as voltages go down, currents go up and resistive heating in the pins/pads increases even at the same bulk power level.)
  • Bulk heat removal to prevent die temperature from exceeding threshold values associated with significantly shortened product lifetime.
  • "Hot Spot" power density in small areas of the chip that can cause localized failures. (Note that halving the size of the processor core on the die while increasing frequency to maintain the same bulk power envelope causes a doubling of power density within the core.)
  • Bulk power deliver to the facility housing the service – including cost of electricity plus the cost of electrical upgrades.
  • Bulk heat removal from the facility housing the servers – including the cost of electricity plus the cost of site cooling system upgrades.
  • Bulk heat removal from the processor chips that preheats air flowing over other thermally sensitive components (memory, disk drives, etc.).

So the power problem actually refers to at least a half-dozen inter-related, but distinct, technical and economic issues.

Throughput vs Power/Core vs Number of Cores

Restricting our attention to the issue of delivering increased performance at a constant power envelope, we can discuss the issues associated with using more and more cores to deliver increased throughput.

Depending on exactly what parameters are held constant, power consumption typically rises as the square or cube of the CPU frequency, while performance increases less than linearly with frequency. For workloads that can take advantage of multiple threads, multiple cores provide significantly better throughput per watt. As we saw earlier however, this increased throughput comes at the cost of reduced single-thread performance.

In addition to waiting for lithography to shrink so that we can fit more of our current cores on a chip, the opportunity also exists to create smaller CPU cores that are intrinsically smaller and more energy efficient. This has not been common in the recent past (with the exception of the Sun T1 processor chips) because of the assumption that single-thread performance was too important to sacrifice. We will come back to that issue in the section on Longer-Term Extrapolations, but a direct application of the performance model above shows that as long as the power consumption of the CPU cores drops faster than the peak throughput, the optimum throughput will always be obtained by an infinite number of infinitesimally fast cores. Obviously, such a system would suffer from infinitesimal single-thread performance, which points to a shortcoming of optimizing to a single figure of merit. In response to this conundrum, one line of reasoning would define a minimum acceptable single-thread performance and then optimize the chip to include as many of those cores as possible, subject to area and bulk power constraints.

There are other reasons to limit the number of cores to a modest number. One factor, which is missing in the simple throughput models used here, is communication and synchronization. If one desires to use more than one core in a single parallel/threaded application, some type of communication/synchronization will be required, and it is essentially guaranteed that for a fixed workload, the overhead of that communication/synchronization will be a monotonically increasing function of the number of CPU cores employed in the job.

This leads to a simple modification of Amdahl's Law:

Formula

Here T is the total time to solution, Ts is the time required to do serial (non-overlapped) work, Tp is the time required to do all of the parallelizable work, N is the number of processors applied to the parallel work, and To is the overhead per processor associated with the communication & synchronization required by the implementation of the application. This last term representing the growth in overhead as more processors are added is the addition to the traditional Amdahl's Law formulation.

In the standard model (no overhead), the total time to solution is a monotonically decreasing function of N, asymptotically approaching Ts. In the modified formation, it is easy to see that as N increases there will come a point at which the total time to solution will begin to increase due to the communication overheads. In the simple example above, a completely parallel application (Ts=0) will have an optimal number of processors defined by:

Formula

So, for example, if To is 1% of Tp, then maximum performance will be obtained using 10 processors. This may or may not be an interesting design point, depending on the relative importance of other performance and performance/price metrics.

Pages: 1 2 3 4 5 6 7 8 9

Reference this article
McCalpin, J., Moore, C., Hester, P. "The Role of Multicore Processors in the Evolution of General-Purpose Computing," CTWatch Quarterly, Volume 3, Number 1, February 2007. http://www.ctwatch.org/quarterly/articles/2007/02/the-role-of-multicore-processors-in-the-evolution-of-general-purpose-computing/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.