CTWatch Quarterly » The NRC Report on the Future of Supercomputing

The NRC Report on the Future of Supercomputing

Susan L Graham, University of California at Berkeley
Marc Snir, University of Illinois at Urbana-Champaign

Supercomputing Thrives -- Supercomputing Falters

The Success of the Killer Micros

There are strong signs that supercomputing is a healthy business overall, and a healthy business in the US. Supercomputers at academic centers and government laboratories are used to do important research; supercomputers are used effectively in support of essential security missions; good progress is being made on stockpile stewardship, using supercomputing simulations. The large majority of supercomputers are US made: according to IDC, US vendors had 98% market share in capability systems in 2003; 91% of the TOP500 systems, as of June 2004, were US made.

On the other hand, companies that primarily develop supercomputing technologies, such as Cray, have a hard time staying in business. Supercomputers are a diminishing fraction of the total computer market, with a total value of less than $1 billion a year. It is an unstable market, with variations of more than 20 percent in sales from year to year. It is a market that is almost entirely dependent on government acquisitions.

The current state of supercomputing is largely a consequence of the success of commodity-based supercomputing. Most of the systems on the TOP500 list are now clusters, i.e., systems assembled from commercial, off-the-shelf (COTS) processors and switches; more than 95 percent of the systems use commodity microprocessor nodes. On the other hand, on the first TOP500 list of June 1993 only a quarter of the systems used commodity scalar microprocessors and none used COTS switches.

Cluster supercomputers have ridden on the coattails of Moore’s Law, benefiting from the huge investments in commodity processors and the fast increase in processor performance. Indeed, the top performing commodity-based system on the June 1994 TOP500 list had 3,689 nodes; in June 2004 it had 4,096 nodes. While the number of nodes increased only by 11 percent, the system performance, as measured by the Linpack benchmark, improved by a factor of 139 in ten years! Cluster technology offers, for many applications, supercomputing performance at a cost/performance of a PC: as a result, high-performance computing can be afforded by an increasing number of users. Indeed, the verdict of the market is that clusters offer better value for money in many sectors where custom vector systems were previously used.

Victory is not Complete

Yet clusters cannot satisfy all supercomputing needs. For some problems, acceptable time to solution can be achieved only by scaling to a very large number of commodity nodes. Communication overheads become a bottleneck. A hybrid supercomputer, where commodity processors are connected via a custom network interface (connecting to the memory bus, rather than to an I/O bus) can support higher per-node bandwidth with lower overheads, thus enabling efficient use of a larger number of nodes. (The Cray XT3 and the SGI Altix are examples of such systems). A custom supercomputer, built of custom processors, can provide higher per-node performance and thus reduce the need to scale to a large number of nodes, at the expense of using more intra-node parallelism. (The Cray X1 and the NEC SX6 are the two current examples of such systems). Custom processors are especially important for codes that exhibit low locality and, thus, do not take advantage of caches. In such a case, it is important that the intra-node parallelism of the processor support a large number of concurrent memory accesses, as vector or heavily multithreaded processors do.

The success of clusters has reduced the market for hybrid and custom supercomputers to the point where the viability of these systems are heavily dependent on government support. Government investment in the development and acquisition of such platforms has shrunk. Computer suppliers are reluctant to invest in custom supercomputing due to the small size of the market, the uncertainty of the financial returns, and the opportunity cost of not applying skilled personnel to products designed for the broader IT market. Furthermore, academic research on the design of supercomputers has diminished. From the mid-nineties to the early 2000's, the number of published papers on supercomputing or high-performance computing has shrunk by half; the number of National Science Foundation (NSF) grants on parallel architecture design has shrunk by half; and large projects that build prototype systems have disappeared. The reduced research investment is worrisome, as it will be harder to benefit from advances due to Moore’s law in the future. Some of the main obstacles are summarized next.

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By