CTWatch Quarterly » Low Power Computing for Fleas, Mice, and Mammoth

Low Power Computing for Fleas, Mice, and Mammoth — Do They Speak the Same Language?

GUEST EDITOR
Satoshi Matsuoka, Tokyo Institute of Technology

The Future of Low Power HPC is “Overdesign” and “Portability”

We have examined the relationships between the various areas of low power computing, focusing especially on the similarities and differences. Overall, the attention given to low power in HPC is still not well recognized by the community, despite the success of BlueGene/L. In particular, controlling power requires a sophisticated application of self-system control. This type of control is being practiced as a norm in other disciplines but is quite crude in computers, especially large HPC machines.

For example, modern fighter aircraft are deliberately made to be somewhat aerodynamically unstable in order to improve their maneuverability; in order to recover and maintain operational stability, they use massive, dynamic, computer-assisted real-time control. Modern automobiles embed massive amounts of self-control for engines and handling without which the car would easily break down or at least suffer from poorer performance. Compared to these technology domains, power/performance controls in modern-day HPC machines are meager at best. They may contain some simple feedback loops that, for example, upturn the cooling fans when the internal chassis temperature climbs higher, or that apply some crude, spontaneous automated control of voltage/frequency without concern for application characteristics. There are other promising avenues of research, as the other articles in this issue show, but further investigation is required to identify the limits of such control methodologies, as well as discover better ways to conserve power.

One promising conceptual design principle that this author envisions is to “overdesign” the system, i.e. engineer it so that, without software self-control, the the system will break down (say thermally), hit other power limits, or become very power/performance inefficient. Most of the machines we design now follow quite conservative engineering disciplines so that no matter how much we hammer them they will not break. Altenatively, we may design for maximum efficiency out of the theoretical peak achievable. Now that we are quickly approaching the one billion transistor mark in our CPUs (and quickly going onto ten billion) there are many transistors to consume power if exploited directly or used for alternative purposes. Moreover, we will have better understanding of how we may monitor and control power, depending on the system/application states (including multiple applications within the system). With multiple failovers in place, we could “overdesign” the system so that it will operate at maximum performance/power ratio (which may be somewhat below the maximum computational efficiency), but driving the efficiency above this will “break” the system. In order to achieve such a subtle balance, there will be various hardware and software sensors to monitor performance/power metrics and perform regulatory feedback into the system, enabling dynamic fine tuning of both software (such as scheduling) and hardware (such as DVS).

Such a design principle may allow substantial improvement in the various metrics that motivate the pursuit of low power in HPC in the first place. For example, one may put an extensive set of thermal sensors in a machine that is densely packed to intricately control the power/performance so as to maintain thermal consistency throughout the system. In such a machine, it would be impossible to achieve theoretical maximum performance, since doing so would break the system and, as a result, some failover mechanism would have to kick in to throttle the system. Overall, its performance per volume may be substantially greater than a conservative machine for various reasons, including that the machine would be running more units in parallel at the best performance/energy tradeoff point.

Many technical challenges would have to be conquered for such a system to become a reality, however. For example, most current motherboards, including sever-grade, high-end versions, lack the sensors required to perform such intricate monitoring of thermal and power consumptions. In many cases, the only available sensors may be a few thermistors, with no power sensors present except voltage meters on power lines. Although the state of the art in analysis of performance/power tradeoffs are advancing (as seen in Dr. Feng’s article mentioned previously), most of the results are still early, with no real broad-based community efforts, such as standardization, to enable, facilitate, or promote usage of the technology. In fact, because of the significant effect such low power systems will have on the software infrastructure, including the compilers, run-time systems, libraries, performance monitors, etc., it is currently impractical to expect any portability across different types of machines. Here, theoretical modeling of such machines, leading to eventual standardization, will be necessary for realistic deployment to occur.

References

¹ The Energy Star Home Page, www.energystar.gov/
² Computers Division "Design of Eco Products SX-6," NEC Technical Journal, Vol. 57, No.1, 2004 www.nec.co.jp/techrep/ja/journal/g04/n01/t040105.pdf (in Japanese).
³ IBM Journal of Research and Development, special double issue on Blue Gene, Vol.49, No.2/3, March/May, 2005
⁴ W. Feng, "Making a Case for Efficient Supercomputing," ACM Queue, 1(7):54-64, October 2003.
⁵ Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, et. al. (2 more authors) "MegaProto: 1 TFlops/10 kW Rack Is Feasible Even with Only Commodity Technology," Proc. IEEE/ACM Supercomputing 2005, the IEEE Computer Society Press, Nov. 2005 (to appear).

Pages: 1 2 3 4 5 6 7

CTWatch is a collaborative effort				Sponsored By