CTWatch
May 2006
Designing and Supporting Science-Driven Infrastructure
Charlie Catlett, Pete Beckman, Dane Skow and Ian Foster, The Computation Institute, University of Chicago and Argonne National Laboratory

2
2. Software Infrastructure

Software components in a grid facility include science applications, grid middleware, infrastructure support services, and mechanisms to integrate community-developed systems we call “Science Gateways.” If we define the fundamental components of infrastructure to be those that have the longest useful lifespan, then software is clearly the critical investment. While particular platforms (e.g., x86) may have long lifetimes, individual high-end computational resources have a useful lifespan of perhaps five years. In contrast, many components of our software infrastructure are already 10 years old. For example, TeraGrid deployed the Globus Toolkit6 nearly five years ago (it was not new at the time), and our expectation is that this software will be integral for the foreseeable future. Similarly, scientific communities have invested several years in building software infrastructure – tools, databases, and web portals for example – for their communities. Science application software and the tools for developing, debugging and managing that software are often even older. As we consider costs and investments for integrating grid facilities, it is essential that we leverage these investments.

2.1 Middleware Software and Services

The vast majority of scientific grid facilities rely heavily on a common core set of middleware systems, such as the Globus middleware (which includes numerous components, such as GridFTP for data transfer, GRAM for job submission, Grid Security Infrastructure, and the credential management software MyProxy7) and a variety of related tools such as the Condor scheduling system8 and the verification and validation suite Inca.9 The development and wide-scale adoption of these components has been made possible by substantial investments by DOE, NSF, and other agencies in the U.S. and abroad. In particular, NSF’s investment of roughly $50M in the NSF Middleware Initiative (NMI) program10 over the past five years has played a key role in developing and “hardening” these and other software systems such that they can be reliably used in grid facilities, as evidenced by their widespread adoption world-wide in hundreds of grid projects and facilities. For example, the NMI GRIDS Center11 has supported the development, integration testing, and packaging of many components. This work has reduced the complexity of creating a basic grid system and greatly simplified updating systems that adopted earlier versions of software. Additional investments of tens of millions of dollars has been made worldwide in grid deployment projects that have contributed to the maturation of these software systems, the development of tools for particular functions, and the pioneering of the new application approaches enabled by TeraGrid-class facilities. For example, the TeraGrid project invested roughly $1M in the initial design and development of the Inca system, which is one of many such components that are available today through the NMI program.

Continued investment in middleware capabilities development, through programs like NMI, is critical if we are to deliver on the promise of cyberinfrastructure. Major grid facilities like TeraGrid, and the user-driven application and user environment projects that build on those facilities, typically involve a two-year development schedule and a five-year capability roadmap, both of which rely on the progression of capabilities from research prototypes to demonstration systems to supportable software infrastructure.

2.2 Science Gateways

In parallel with NMI over the past several years, other programs within NSF, DOE, NIH, and other agencies have provided funding to bring together software engineers and computational scientists to create software infrastructure aimed at harnessing cyberinfrastructure for specific disciplines. For example, the Linked Environments for Atmospheric Discovery12 project is creating an integrated set of software and services designed for atmospheric scientists and educators. Similar cyberinfrastructure has been created in other disciplines such as high energy and nuclear physics,13 14 15 fusion science,16 earth sciences,17 18 astronomy,19 20 nanotechnology,21 bioinformatics,22 and cancer research and clinical practice.23

In the TeraGrid project we have formed a set of partnerships around the concept of “Science Gateways,” with the objective of providing TeraGrid services (e.g., computational, information management, visualization, etc.) to user communities through the tools and environments they are already using, in contrast to traditional approaches that require the user to learn how to use the Grid facilities directly. The most common presentation of these community-developed cyberinfrastructure environments is in the form of web portals, though some provide desktop applications or community-specific grid systems instead of or in addition to.

We have partnered in the TeraGrid project not only with gateway providers but also with other grid facilities to identify and standardize a set of services and interaction methods that will enable web portals and applications to invoke computation, information management, visualization, and other services. While still in the early stages, the TeraGrid Science Gateways program has catalyzed a new paradigm for delivering cyberinfrastructure to the science and education community, with a scalable wholesale/retail relationship between grid facilities and gateway providers. Additional benefits to this model include improved security architecture (offering targeted, restricted access to users rather than open login access) and collaboration support (community members can readily share workflows, tools, or data through and among gateway systems).

Pages: 1 2 3 4 5 6 7

Reference this article
Catlett, C., Beckman, P., Skow, D., Foster, I. "Creating and Operating National-Scale Cyberinfrastructure Services," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/creating-and-operating-national-scale-cyberinfrastructure-services/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.