CTWatch Quarterly » Low Power Computing for Fleas, Mice, and Mammoth

Low Power Computing for Fleas, Mice, and Mammoth — Do They Speak the Same Language?

GUEST EDITOR
Satoshi Matsuoka, Tokyo Institute of Technology

In the HPC arena, by contrast, the important point to note is that low power is now being considered the essential means to achieve the traditional goals of high performance. This may at first seem oxymoronic, since lower power usually means lower performance in embedded devices and ULP devices, and great efforts are made to “recover” lost performance as much as possible. However, BlueGene/L and other low power, HPC machines that utilize low power technologies have demonstrated that, by exploiting the “slow and parallel” characteristics, we may achieve higher performance.

Still, their properties produce a different opportunity space for low power than embedded or ULP devices confront:

HPC machines are typically powered by AC; so the motivation is not only lower energy utilization, which is what is needed to extend battery life, but also peak power requirements, as these requirements will mostly determine the necessary capacity of the electrical infrastructure as well as maximum cooling capacity.
HPC machines are more general purpose, and as such the application space is rather broad. Such applications usually demand continuous computing or are I/O intensive, or both. Also these applications are not necessarily real time, but will usually be optimized to minimize their execution time. This will restrict the use of duty cycling, as any idle compute time will be subject to elimination via some optimization.
The generality of the application space will make dedicated, hardwired hardware acceleration effective in only a limited set of applications. There are some instances of successful HPC accelerators such as the GRAPE system, but its effectiveness is restricted to a handful of (albeit important) applications.
Density is one of the driving factors for achieving low power, since some of the large machines are at the limit of practical deployability with respect to their physical size. Simply reducing their volume, however, will result in significant thermal complications, primarily critical “hot spots.” Thus, power control that will guarantee that such hot spots will not occur is an absolute must for stable operation of the entire system.
Many HPC applications are tightly coupled and make extensive use of networking capabilities. Network bytes/flop is an important metric, and the difficulty is to meet the low power requirements in high-bandwidth networking.

Are the Low-power HPC Systems Too Divergent to Traditional, Embedded Low Power Systems?

Given the observations above, could we go as far to say that low power HPC systems are so divergent from traditional embedded systems that there are no research results or engineering techniques they can share? As a matter of a fact, there are commonalities that permit such sharing, and we are starting to see some “convergence” between the low power realization techniques in HPC and those with other power ranges. Here are some of the examples:

Although it is difficult to duty cycle HPC applications, there are still opportunities to fine tune the usage and exploit the potentially “idle” occasions in the overall processing. One example of this approach, dubbed power aware computing, would be to adjust the processor DVS features in a fine-grained fashion so that one can achieve minimum energy consumption for a particular phase in a computation. Another possibility is to exploit the load imbalance in irregular parallel applications, where one may slow down processors so they all synchronize at the same time. Details of the techniques are covered in Dr. Feng’s article, “The Importance of Being Low Power in High-Performance Computing,” in this CTWatch Quarterly issue.
There are direct uses of low power embedded processors, augmented with HPC features such as vector processing, low power high performance networking, etc. Examples are BlueGene/L, Green Destiny,⁴ and MegaProto.⁵ Fundamental power savings are realized with lower voltage, smaller number of transistors, intricate DVS features, etc. In fact, BlueGene/L has demonstrated that the use of low power processors is one of the most promising methodologies. There are still issues, however, since the power/performance ratio of embedded processors applied to HPC are not overwhelmingly advantageous, especially with the development of the power efficient processors that will be arriving in 2006-2007, where similar implementation techniques are being used. Moreover, although one Petaflop would be quite feasible with today’s technologies, to reach the next plateau of performance, i.e. ten Petaflops and beyond, we will need a ten-fold increase in power/performance efficiency. In light of the limits in voltage reduction and other constraints, the question of where to harvest such efficiency is a significant research issue.
Dedicated vector co-processing accelerators have always been used in some MPPs; in the form of GPUs, they are already in use in PCs and will be more aggressively employed in next generation gaming machines such as Microsoft’s Xbox 360 and Sony’s PlayStation 3. Such co-processing accelerators offer much more general purpose programming opportunities than previous generations of GPUs have had, aiding to considerably boost the Flops/power ratio. For example, the Xenon GPU in the Xbox 360 has 48 parallel units of 4-way parallel SIMD vector processors + scalar processors, achieving 216 Gflops at several tens of watts, or about four to seven Gflops/Watt. Also, some embedded processors are starting to employ reconfigurable FPGA devices to dynamically configure hardware per each application. One example is Sony’s new flash-based “Network Walkman” NW-E507, where MP3 decode circuitry is programmed on-the-fly in its internal FPGA to achieve 50 hours of playback in a device as small as 47 grams. The use of reconfigurable devices and modern-day, massively-parallel vector co-processors is still not at the stage of massive use within the HPC arena due to cost and technical immaturity but it will be a promising approach for the future.

Pages: 1 2 3 4 5 6 7

CTWatch is a collaborative effort				Sponsored By