CTWatch Quarterly » Low Power Computing for Fleas, Mice, and Mammoth

Low Power Computing for Fleas, Mice, and Mammoth — Do They Speak the Same Language?

GUEST EDITOR
Satoshi Matsuoka, Tokyo Institute of Technology

So, Where Do We Obtain the Power Savings?

With mainstream information technology, such as standard office application suites, speed requirements may have “matured.” But the majority of application areas, in particular ones mentioned in this article, are still in need of significant (even exponential) improvements in both absolute performance and relative performance/power metrics over the next ten years, as we progress towards building a “true” cyberinfrastructure for science and engineering. Such is quite obviously the case for traditional HPC applications, where even a petaflop machine may not satisfy the needs of the most demanding applications. It is also evident in application areas that are taking a leap to next generation algorithms in order to increase scale, accuracy, etc. An example is large scale text processing/data mining where the proliferation of the web and the associated explosion of data call for more sophisticated search and mining algorithms to deal with "data deluge.” Another example is the push to develop humanoid robots, where one is said to require more than five to six orders of magnitude processing power while retaining the human form factor.

The question is, can we achieve these goals? If so, will the techniques/technologies employed in respective domains, as well as their respective requirements, be different? If there are such differences, will this cause one power range to be more likely than the others? Or are there some uncharted territories of disruptive technologies with even more possibilities?

Major power saving techniques, in particular those being exploited by more traditional embedded systems, plus the recent breed of low power HPC systems could be categorized as follows:

Fundamental decrease in transistor power consumption and wire capacitance reduction — Traditionally, one would save power “for free” with lower micron sizes, where the transistors become smaller and the wires become thinner. However, it is well known that this is becoming harder to exploit because of longer circuit delays, higher static leakage current, and other physical device characteristics that come into play. As an example, Intel’s move to the .09 micron with the new version of their Pentium 4 processor (Prescott) resulted in higher power consumption than its previous generation (Northwood). Granted, there were substantial architectural changes. But the original idea seemed to have been that the move to .09 micron would more than compensate for the added power consumption due to increased logic complexity and higher transistor count. However, this proved not to be the case.
Voltage reduction (DVS: Dynamic Voltage Scaling) — Closely related to the previous strategy is the idea of reducing voltage with each reduction in processor size. However, this too is reaching its limits, as the state machines (i.e. flip-flops constituting the various state elements and memory cells in the architecture) cannot get significantly below one Volt or so due to physical device limitations. Since DVS is one of the fundamental techniques that low power systems most frequently employ, especially for HPC applications, this is not good news. But there is still hope, as we will see later in the article.
Duty cycling / Power — Another classical methodology is turning off the power when the device is not being used. Many of the ULP devices rely on this technique because they have duty cycles in seconds or even minutes and are effectively turned off most of the remaining time. Dynamic Voltage scaling as well as other techniques are employed extensively along with duty cycling to reduce the idle power as much as possible.
Architectural overhead elimination —There are numerous features in modern-day processors and other peripherals that attempt to obtain relatively small increases in performance at significant hardware and thus power cost. By simplifying the architecture, as is done for embedded processors, one may obtain substantial gains in performance/power ratio while incurring only a small penalty.
Exploiting Parallelism (Slow and Parallel) — Because increases in processor frequency will also incur voltage increases, if we can attain perfect parallel speedup, then reducing the clockspeed in exchange for parallelism (slow and parallel) will generate greater power savings. This is the principle now being employed in various recent multi-core CPU designs; the technical details are covered in the BlueGene/L article (“Lilliputians of Supercomputing Have Arrived”) in this issue of CTWatch Quarterly.
Algorithmic changes — On the software side, one may save power by fundamentally changing the algorithm to consume less computing steps and/or reducing reliance on power-hungry features of the processors and instead using more efficient portions. While the former is obvious and always exploitable, the latter may not be so obvious and not always exploitable, depending on the underlying hardware. For example, in the latter one may attempt to utilize the on-die temporary memory to reduce the off-chip bus traffic as much as possible. But its effectiveness depends on whether the processor’s external bus driver power, relative to the power consumption of the internal processing, would be significant or not.
Other new techniques — there are other technologies in development, which we will not be able to cover here due to lack of space.

Pages: 1 2 3 4 5 6 7

CTWatch is a collaborative effort				Sponsored By