CTWatch
February 2005
Trends in High Performance Computing
Fran Berman and Thom Dunning

Welcome to the inaugural issue of the Cyberinfrastructure Technology Watch Quarterly. We hope that this publication provides a window to the future of the hardware, software and human resources required to build a useful, usable and enabling cyberinfrastructure for science and engineering. Our goal is to provide a venue for describing emerging cyberinfrastructure technologies and discussing current trends and future opportunities, critical information relevant to the building and using of cyberinfrastructure, and a resource for the entire community.

In 2003, the NSF Blue Ribbon Panel on Cyberinfrastructure provided a compelling vision of the future:

“... a new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope, and scale of today's challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive ‘cyberinfrastructure’ on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.”

The Blue Ribbon Panel’s vision of cyberinfrastructure involves the coordination and integration of software, hardware and human resources to enable today's and future science and engineering applications. Coordinating compute, data, visualization, networking, field instruments, and other technologies presents enormous challenges to cyberinfrastructure builders and developers and pushes many of the component technologies to the limits.

Pioneers in the building and using of cyberinfrastructure have included a collection of advanced multi-disciplinary application projects including NSF's Network for Earthquake Engineering Systems (NEES) project (focusing on the development of enabling infrastructure for critical earthquake engineering experiments) and NIH's Biomedical Informatics Research Network (BIRN) project (focusing on distributed collaborations in brain imaging, human neurological disorders, and associated problems in animals); the development of community databases and data collections (the National Virtual Observatory data collection provides a comprehensive window of the heavens while the Protein Data Bank provides a global resource for protein information), and visionary technology-oriented projects such as OptIPuter (which is experimenting with a new generation of super optical networks). Perhaps the most visible project has been the Extensible Terascale Facility (ETF or TeraGrid), which involves a broad spectrum of national partners in the largest-scale, coordinated project to build and operate a production grid to date.

All of these projects demonstrate that the vision of a national cyberinfrastructure articulated by the Blue Ribbon Panel is complex and compelling, with both unprecedented opportunities and unprecedented challenges. Building a useful, usable and enabling cyberinfrastructure environment requires careful design and coordinated development, deployment and support of a robust set of integrated cyberinfrastructure technologies. Strategic investments and commitments will be required to achieve the vision laid out by the Blue Ribbon Panel.

Our goal is for the Cyberinfrastructure Technology Watch (CTWatch) Quarterly to provide a strategic resource for community efforts building the emerging cyberinfrastructure. We hope that you will find this inaugural issue and subsequent issues thought-provoking, illuminating and entertaining reading, and we hope that you will contribute to the community discussion on these critical topics. We look forward to your input.

Fran Berman
Director, San Diego Supercomputer Center
and
Thom Dunning
Director, National Center for Supercomputing Applications,

CTWatch Quarterly Publishers

Erich Strohmaier, Lawrence Berkeley National Laboratory
Jack J. Dongarra, University of Tennessee/Oak Ridge National Laboratory
Hans W. Meuer, University of Mannheim
Horst D. Simon, Lawrence Berkeley National Laboratory

1
Introduction

"The Only Thing Constant Is Change" -- Looking back on the last four decades this seems certainly to be true for the market of High-Performance Computing systems (HPC). This market was always characterized by a rapid change of vendors, architectures, technologies and the usage of systems.1 Despite all these changes the evolution of performance on a large scale however seems to be a very steady and continuous process. Moore's Law is often cited in this context. If we plot the peak performance of various computers of the last six decades in Fig. 1, which could have been called the 'supercomputers' of their time,2,3 we indeed see how well this law holds for almost the complete lifespan of modern computing. On average we see an increase in performance of two magnitudes of order every decade.

Figure 1

Fig. 1. Performance of the fastest computer systems for the last six decades compared to Moore's Law.

Pages: 1 2 3 4

Susan L Graham, University of California at Berkeley
Marc Snir, University of Illinois at Urbana-Champaign

1
Background

A variety of events led to a reevaluation of the United States supercomputing programs by several studies in 2003 and 2004. The events include the emergence of the Japanese Earth Simulator in early 2002 as the leading supercomputing platform; the near disappearance of Cray, the last remaining U.S. manufacturer of custom supercomputers; some criticism of the acquisition budgets of the Department of Energy's (DOE) Advanced Simulation and Computing (ASC) program; and some doubts about the level and direction of supercomputing R&D in the U.S. We report here on a study that was conducted by a committee convened by the Computer Science and Telecommunications Board (CSTB) of the National Research Council (NRC). It was chaired by Susan L. Graham and Marc Snir; it had sixteen additional members with diverse backgrounds: William J. Dally, James W. Demmel, Jack J. Dongarra, Kenneth S. Flamm, Mary Jane Irwin, Charles Koelbel, Butler W. Lampson, Robert F. Lucas, Paul C. Messina, Jeffrey M. Perloff, William H. Press, Albert J. Semtner, Scott Stern, Shankar Subramaniam, Lawrence C. Tarbell, Jr. and Steve J. Wallach. The CSTB study director was Cynthia A. Patterson, assisted by Phil Hilliard, Margaret Marsh Huynh and Herbert S. Lin. The study was sponsored by the DOE's Office of Science and the DOE's Advanced Simulation and Computing program.

The study commenced in March 2003. Information was gathered from briefings during 5 committee meetings; an application workshop in which more than 20 computational scientists participated; site visits to DOE labs and NSA; a town hall meeting at the 2003 Supercomputing Conference; and a visit to Japan that included a supercomputing forum held in Tokyo. An interim report was issued in July 2003 and the final report was issued in November 2004. The report was extensively reviewed by seventeen external reviewers in a blind peer-review process as well as by NRC staff. The prepublication version of the report (at over 200 pages), entitled "Getting up to Speed: The Future of Supercomputing," is available from the National Academies Press1 and also from DOE.2. The final published version of the report is due in early 2005.

The study focuses on supercomputing, narrowly defined as the development and use of the fastest and most powerful computing systems — i.e., capability computing. It covers technological, political and economic aspects of the supercomputing enterprise. We summarize in the following sections the main findings and recommendations of this study.

Pages: 1 2 3 4 5 6 7 8 9

Jim Gray, Microsoft
David T. Liu, University of California at Berkeley
Maria Nieto-Santisteban, Johns Hopkins University
Alex Szalay, Johns Hopkins University
David DeWitt, University of Wisconsin
Gerd Heber, Cornell University

1
Data-intensive science — a new paradigm

Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects — finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

The raw instrument and simulation data is processed by pipelines that produce standard data products. In the NASA terminology,1 the raw Level 0 data is calibrated and rectified to Level 1 datasets that are combined with other data to make derived Level 2 datasets. Most analysis happens on these Level 2 datasets with drill down to Level 1 data when anomalies are investigated.

We believe that most new science happens when the data is examined in new ways. So our focus here is on data exploration, interactive data analysis, and integration of Level 2 datasets.

Data analysis tools have not kept pace with our ability to capture and store data. Many scientists envy the pen-and-paper days when all their data used to fit in a notebook and analysis was done with a slide-rule. Things were simpler then; one could focus on the science rather than needing to be an information technology professional with expertise in arcane computer data analysis tools.

The largest data analysis gap is in this man-machine interface. How can we put the scientist back in control of his data? How can we build analysis tools that are intuitive and that augment the scientist’s intellect rather than adding to the intellectual burden with a forest of arcane user tools? The real challenge is building this smart notebook that unlocks the data and makes it easy to capture, organize, analyze, visualize, and publish.

This article is about the data and data analysis layer within such a smart notebook. We argue that the smart notebook will access data presented by science centers that will provide the community with analysis tools and computational resources to explore huge data archives.

Pages: 1 2 3 4 5 6 7 8

Dan Reed, University of North Carolina at Chapel Hill

In June 2004, the President’s Information Technology Advisory Committee (PITAC) was charged by John Marburger, the President's Science Adviser, to respond to seven questions regarding the state of computational science:

  1. How well is the Federal Government targeting the right research areas to support and enhance the value of computational science? Are agencies' current priorities appropriate?
  2. How well is current Federal funding for computational science appropriately balanced between short term, low risk research and longer term, higher risk research? Within these research arenas, which areas have the greatest promise of contributing to breakthroughs in scientific research and inquiry?
  3. How well is current Federal funding balanced between fundamental advances in the underlying techniques of computational science versus the application of computational science to scientific and engineering domains? Which areas have the greatest promise of contributing to breakthroughs in scientific research and inquiry?
  4. How well are computational science training and research integrated with the scientific disciplines that are heavily dependent upon them to enhance scientific discovery? How should the integration of research and training among computer science, mathematical science, and the biological and physical sciences best be achieved to ensure the effective use of computational science methods and tools?
  5. How effectively do Federal agencies coordinate their support for computational science and its applications in order to maintain a balanced and comprehensive research and training portfolio?
  6. How well have Federal investments in computational science kept up with changes in the underlying computing environments and the ways in which research is conducted? Examples of these changes might include changes in computer architecture, the advent of distributed computing, the linking of data with simulation, and remote access to experimental facilities.
  7. What barriers hinder realizing the highest potential of computational science and how might these be eliminated or mitigated?

Since that time, I have chaired a PITAC subcommittee composed of Ruzena Bajcsy (UC-Berkeley), Manuel Fernandez (SI Ventures), José-Marie Griffiths (UNC-CH) and Randall Mott (Dell) to prepare a response to these questions. The subcommittee has also been assisted by two consultants, Chris Johnson (Utah) and Jack Dongarra (Tennessee). The subcommittee has solicited input at public meetings and held a Birds-of-a-Feather (BoF) Town Hall meeting at SC04 in November 2004.

Based on this input and extended discussions, the subcommittee has developed a working definition of computational science, which it is using to prepare a draft report. This definition, which is still in flux, attempts to recognize the interplay among algorithms and software, computer and information science and infrastructure:

Computational science is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems. Computational science fuses three distinct elements: (a) algorithms(numerical and non-numerical) and modeling and simulation software developed to solve science (e.g., biological physical, and social), engineering and humanities problems; (b) computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components needed to solve computationally demanding problems; and (c) the computing infrastructure that supports both the science and engineering problem solving and the developmental computer and information science.

Computational science has several advantages over experimentation and theory. First, it often enables solution of problems more efficiently, more rapidly and less expensively. Second, it can solve problems computationally that otherwise could not be solved safely. Finally, it can solve problems whose solution is otherwise impossible (e.g., due to the inability to recreate experimental conditions).

The subcommittee has issued two interim working summaries, which are available on the web site of the National Coordination Office (NCO).1 These summaries contain draft findings and recommendations, which are still evolving. Preliminary findings, reported at the November PITAC meeting, include the following:

  • Computing has become the third component of scientific discovery, complementing theory and experiment.
  • The explosive growth in the resolution of sensors and scientific instruments has led to unprecedented volumes of experimental data. Computational science now broadly includes modeling, simulation and scenario assessment using sensor data from diverse sources.
  • Complex multidisciplinary problems, from public policy through national security to scientific discovery and economic competitiveness, have emerged as new drivers of computational science, complementing the historical focus on single disciplines.
  • Developing leading edge computational science applications is a complex process involving teams of people that must be sustained for a decade or more to yield the full fruits of investment.
  • Short-term investment and limited strategic planning have led to excessive focus on incremental research rather than on the long-term research with lasting impact that can solve critical problems.
  • Interdisciplinary education in computational science and computing technologies is inadequate, reflecting the traditional disciplinary boundaries in higher education. Only systemic change to university organizational structures will yield the needed outcomes.
  • Computational science would benefit from a roadmap outlining decadal priorities for investment, with a clear assessment of those priorities derived from a survey of the problems and challenges. Agencies could then respond to these with a strategic plan in recognition of those priorities and funding requirements.

The subcommittee invites comments on responses to the charge, its preliminary findings and draft recommendations. Comments can be sent to pitac-comments@nitrd.gov.

References
Reference this article
Reed, D. "PITAC’s Look at Computational Science," CTWatch Quarterly, Volume 1, Number 1, February 2005. http://www.ctwatch.org/quarterly/articles/2005/02/pitacs-look/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.