High Performance Computing and the Implications of Multi-core Architectures
Dave Turek, IBM
CTWatch Quarterly
February 2007

Over the past several years, public sector institutions (universities and government) and commercial enterprises have deployed supercomputing systems at an unparalleled rate, courtesy of favorable acquisition economics and compelling technological and business process innovation.

Manufacturers and producers of supercomputers have leveraged the price performance improvements of commodity microprocessors, the innovation in interconnect technologies, and the rise of Linux and open source software to reach classes of consumers unheard of as recently as five years ago.

In June of 1997, the so-called "fastest computer in the world," the ASCI Red machine at Sandia National Laboratory, was the first system exceeding a teraflop in compute power.1 Today, the same amount of compute power could be acquired for around $200,000, making supercomputing affordable to small companies, single academic departments and, in some cases, even individual researchers.

While these dramatic improvements in systems affordability have been taking place, the ability to extract value from the system is principally the consequence of well written and effective software. Here, the story for the industry is not quite as sanguine. From our customers, we hear that these systems are still difficult to use (for complete exploitation), and the applications they need either need to be ported, or even rewritten, to properly take advantage of all the hardware innovation in modern supercomputers.

This view is universal and strikes at the heart of the economic or scientific competitiveness of the institution: "Today's computational science ecosystem is unbalanced, with a software base that is inadequate to keep pace with and support evolving hardware and application needs … The result is greatly diminished productivity for both researchers and computing systems."2

As we contemplate each new hardware innovation for supercomputing, we must understand at the most fundamental level that software is the key to unlocking the value of the system for the benefit of the enterprise or the researcher.

Moore's Law and Multicore CPUs in Supercomputing

The dramatic increase in the deployment and utilization of supercomputing owes much to microprocessor innovation and the attendant improvement in price performance typically associated with improved transistor density. But the "simple" idea of Moore's Law, characterized as the doubling transistor density per unit time, has its limits. Just as one cannot fold a piece of paper in half and in half again without reaching a fundamental limit within just a few folds, neither is it reasonable to think that increased density of transistors can be pursued without reaching some rather material physical limits over time.

One reaction to this reality has been the emergence of multi-core CPUs, each core running at a somewhat modest (but still significant) frequency, but orchestrated to work in concert on the computational problem at hand. This approach is meant to finesse the design limitations of forever-faster, single core CPUs while still attending to the insatiable needs for compute power in all market segments. And the supercomputing market segment, having more need for speed than any other segment, is aggressive in its pursuit and acceptance of this innovation.

The implications of multi-core designs to supercomputing are profound in terms of overall benefit. The fastest supercomputer in the world today, the Blue Gene/L system from IBM installed at Lawrence Livermore National Laboratory,1 is an innovative, multi-core- designed system on a chip (with each core running at less than a GhZ) that is roughly three times faster than the next fastest machine on the TOP500 list and roughly eight times faster than the NEC Earth Simulator (which was the last non-IBM system to lead the TOP500 list).

It is instructive to note that measuring from an electrical power consumption perspective, the Blue Gene system is rated on one measure at 112 MFlop/watt and the Earth Simulator at 3MFlop/Watt.3 Even allowing for some ambiguity in measurement technique, this is a stunning difference in the power consumption characteristics of contemporary supercomputer designs and a benefit related directly to the multi-core design attributes of the Blue Gene system. Given that the fastest systems in the world today routinely consume a megawatt or more of power annually, this kind of efficiency materializes itself as very significant operational cost savings.

Perhaps one of the most ambitious multi-core designs currently available is the heterogeneous, multi-core Cell Broadband Engine (Cell BE), designed by a partnership of Sony, Toshiba and IBM and found in today's Sony PlayStation 3. This chip is currently one of the key technical components of the Roadrunner supercomputer system that IBM is building in collaboration with the Department of Energy's Los Alamos National Laboratory. This system, when complete, will be capable of sustained performance of one petaflop.

The architecture of the Roadrunner system is an example of what we expect to be a plethora of hybrid supercomputer designs that amalgamate a variety of technologies to achieve maximum performance for given applications.

Software and Multicore Supercomputers

With the exception of government laboratories, universities, and a handful of aggressive and capable industrial companies, most software applications and tools are supplied by independent software vendors (ISVs). However, more than a third of these companies are very small (revenue less than $5M) and have limited resources to apply to the latest hardware innovations.4

In some cases, skills to develop new algorithms that map to new hardware are in short supply, and an ISV will be more focused on serving his or her current install base of customers than in reaching out to new technologies. Also in some cases, the new hardware may be viewed as "unproven" so the ISV will wait until there is greater market acceptance before porting the application codes to the new platform. This is not a new phenomenon: "many applications for supercomputing only scale to 32 processors and some of the most important only scale to four."5

The current Blue Gene system at Lawrence Livermore National Laboratory has 131,000 processors and the last machine on the TOP500 list, #500, has 800 processors. Furthermore, it is not uncommon for ISVs to charge license fees in proportion to the number of processors on a system (note, for the most part they do not charge license fees in proportion to the number of transistors on a chip even though more transistors and more cores attack the same object: produce more speed) giving some supercomputer customers serious sticker shock when they get their software bill.

Ultimately, the marketplace should work to resolve these issues, but they will remain serious issues for some time to come unless we see an entrepreneurial move to disturb the status quo with innovative algorithms, software and business practices that map to the capabilities of state of the art supercomputers. In the meantime, progress will continue to be made through collaborations such as the Blue Gene Consortium, the IBM-Los Alamos Partnership on Roadrunner, and the Terra Soft HPC Collaboration around the Cell BE.

The Benefits are Worth the Effort

While the evolution of software to better exploit multi-core architectures will unfold over time, there is a huge benefit that customers will reap and are reaping from multi-core systems today. In areas as diverse as digital media, financial services, information based medicine, oil and gas production, nanotechnology, automotive design, life sciences, material sciences, astronomy and mathematics, supercomputers have been deployed to amazing effect with material impact on the daily lives of everyone on the planet.

The ambition to get to a petaflop of computing is near universal with major efforts going on in the US, Europe and Asia for major supercomputer acquisitions in the next few years. By itself, this ambition should go a long way towards providing the stimulus to close the hardware-software gap we witness today.

1 Top500 – http://www.top500.org/
2 "Computational Science: Ensuring America's Competitiveness," President's Information Technology Advisory Committee, June 2005.
3 The Green500 List – http://www.green500.org/
4 Joseph, E. "The Council on Competitiveness Study of ISVs Serving the High Performance Computing Market," IDC Whitepaper, http://www.compete.org/hpc
5 Joseph, E., ibid.

URL to article: http://www.ctwatch.org/quarterly/articles/2007/02/high-performance-computing-and-the-implications-of-multi-core-architectures/