CTWatch Quarterly » The NRC Report on the Future of Supercomputing

The NRC Report on the Future of Supercomputing

Susan L Graham, University of California at Berkeley
Marc Snir, University of Illinois at Urbana-Champaign

A Fragile Ecosystem

The problems listed above indicate a clear need for change. We need new architectures to cope with the breakdown in current designs due to the diverging rate of improvement of various components (e.g., processor speed vs. memory speed vs. switch speed). We need new languages, new tools, and new operating systems to cope with the increased levels of parallelism, and the low software productivity. We need continued improvements in algorithms to handle larger problems, new models (to improve performance or accuracy), and to exploit changing supercomputer hardware characteristics.

But it takes time to realize the benefits of research. It took more than a decade from the first vector product until vector programming was well supported by algorithms, languages and compilers; it took more than a decade from the first massively parallel processor (MPP) products to well-supported standard message-passing programming environments. As the research pipeline has emptied, we are in a weak position to cope with the obstacles that are likely to limit supercomputing progress in the next decade.

Change is inhibited by the large investments in application software. While new hardware is purchased every three to five years, large software packages are maintained and used over decades. Changes in architectures and programming models may require expensive recoding, a nearly impossible task for poorly maintained, large “dusty deck” codes. Ecosystems are created through the mutually reinforcing effect of hardware and software that supports well a certain programming model, application software designed for such a programming model, and people that are familiar with the programming model and its environment. Even though the ecosystem may be caught in a “local minimum” and better productivity could be achieved with other architectures and programming models, change requires coordination in all aspects of technology (hardware and software), and very large investments in code rewriting and people retraining to overcome the potential barrier.

Progress also will be hampered by the small size and fragility of the supercomputing ecosystem. The community of researchers that develop new supercomputing hardware and software and applications is small. For example, according to the Taulbee surveys of the last few years, out of more than 800 CS PhDs that graduate each year in the U.S., only 36 specialize in computational sciences (and only 3 are hired by national laboratories). Since supercomputing is a very small fraction of the total IT industry, and since large system skills are needed in many other areas (e.g., Google), people can easily move to new jobs. There is little flow of personnel among the various groups in industry working on supercomputing and little institutional memory: the same problems are solved again and again. The loss of a few tens of people with essential skills can critically hamper a company or a lab. Instability of long-term funding and uncertainty in policies compound this problem.

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By