CTWatch Quarterly » 2005

Volume 1 Number 4

November 2005
E-Infrastructure: Europe Meets the e-Science Challenge

Introduction

Tony Hey, Corporate Vice President for Technical Computing – Microsoft Corporation
Anne Trefethen, Director – UK e-Science Core Programme, EPSRC

This issue of CTWatch Quarterly is intended to give an overview of the activity on e-Science and Grids in Europe. The European Commission were very early to identify Grids as a key technology for collaboration and resource sharing. The pioneering European DataGrid project, led by Fabrizio Gagliardi at CERN, worked with the world particle physics community, and with US computer scientists Ian Foster, Carl Kesselman and Miron Livny, to develop a global Grid infrastructure capable of moving large amounts of data and providing the vast shared compute resources needed to analyse this data. Many Petabytes of data per year will be generated by experiments at the Large Hadron Collider, currently under construction at the CERN Laboratory in Geneva. Similarly, the UK was also early to see the potential of Grid technologies for building the scientific ‘Virtual Organizations’ needed for networked scientific collaborations. In 2001, the UK announced the beginning of a $400M ‘e-Science’ initiative – where the term e-Science was introduced by John Taylor, the Director of the UK’s Office of Science and Technology, as a short-hand for the set of collaborative technologies needed to support the distributed multi-disciplinary science and engineering projects of the future.

It is now 2005 and the European Union has invested in a new generation of projects, building on the lessons of the European DataGrid and other similar projects. Besides investing in further R&D projects, the Commission has identified the need to develop and sustain an ‘e-Infrastructure’ consisting of a pan-European, high-speed research network, GEANT-2, together with a set of core Grid middleware services to support distributed scientific collaborations. The set of reports included here therefore includes three new R&D projects – SIMDAT, NextGrid and OntoGrid – plus the major Research Infrastructure project, EGEE – Enabling Grids for E-Science – which in many ways can be seen as the direct successor of the original European DataGrid project.

The SIMDAT project is developing generic Grid technology for the solution of complex application problems, and demonstrating this technology in several representative industry sectors. Special attention is being paid to security and the objective is to accelerate the uptake of Grid technologies in industry and services. Major European companies from the aerospace and automotive sectors and from the pharmaceutical industry are partners in the project which also involves major European Meteorological centers. By contrast, the NextGrid project is looking further out at the next generation of Grid technologies and is focused on inter-enterprise computing in the business sector with partners such as SAP, BT, Fujitsu, NEC and Microsoft.

OntoGrid represents another strand of activity and builds on pioneering work towards the development of a truly ‘Semantic Grid’ in the UK e-Science program. The project brings together knowledge services – such as ontology services, metadata stores and reasoning engines – with Grid services – such as workflow management, Virtual Organisation formation, debugging, resources brokering and data integration. This semantics-based approach to the Grid goes hand-in-hand with the exploitation of techniques from intelligent software agents for negotiation and coordination and peer-to-peer (P2P) computing for distributed discovery. These four projects are by no means all of the current EU Grid projects and details of these and other projects may be found on the Cordis website: http://www.cordis.lu/

The UK e-Science Program has now entered its third phase and this is focusing on laying the foundations for a sustainable national e-Infrastructure – or Cyberinfrastructure in US-speak. These activities are described in a short article by the editors.

Complementing the other article's in this issue, Dan Reed's personal reflections on the recent report of his PITAC subcommittee on Computational Science shows that a shared sense of current challenges and current opportunities is driving the development of e-Infrastructure and e-Science on both sides of the Atlantic.

The last article in this issue is of a different character and is a personal account by Microsoft’s Chief Technology Officer, Craig Mundie, of his own roots in High Performance Computing and the reasons why Microsoft are now taking steps to become engaged with the HPC community. This is a fascinating glimpse into the future and indicates that Microsoft intends to play a major role in the development of ‘commodity HPC’ systems.

Tony Hey and Anne Trefethen

Printable Format

The e-Science Challenge: Creating a Reusable e-Infrastructure for Collaborative Multidisciplinary Science

Tony Hey, Corporate Vice President for Technical Computing – Microsoft Corporation
Anne Trefethen, Director – UK e-Science Core Programme, EPSRC

1. Introduction

This issue of CTWatch Quarterly contains four articles that provide an overview of some of the major Grid projects in Europe. All these projects are aimed at developing distributed collaborative research capabilities for the scientists that are built on the deployment of a persistent middleware infrastructure on top of the high bandwidth research networks. The combination of a set of middleware services running on top of high speed networks is called ‘e-Infrastructure’ in Europe and ‘Cyberinfrastructure’ in the USA. In this brief article we shall abstract the key elements of such an e-Infrastructure from these projects and from our experience in our UK e-Science program. We look at the problems of creating and implementing a sustainable, global e-Infrastructure that will enable multidisciplinary and collaborative research across a wide range of disciplines and communities.

2. Background

The UK e-Science Initiative began in April 2001 and over the last four years, more than £250M has been invested in science applications and middleware development. In addition, the program created a pipeline from the science base to genuine industrial applications of this technology and, most importantly, has enabled the creation of a vibrant, multidisciplinary, e-Science community. This community comes together in its totality at the UK’s annual e-Science All Hands Meeting which is held each September. We now have a community of over 650 who attend and join in to share experience and technologies. These meetings have brought together an exciting mix of scientists, computer scientists, IT professionals, industrial collaborators and, more recently, social scientists and researchers in the arts and humanities. Research scientists from all domains of science and engineering–particle physics, astronomy, chemistry, physics, all flavours of engineering, environmental science, bioinformatics, medical informatics and social science–as well as the arts and humanities are beginning to appreciate the need for e-Science technologies that will allow them to make progress with the next generation of research problems. In most cases, researchers are now finding themselves faced with an increasingly difficult burden of both managing and storing vast amounts of data as well as analyzing, combining and mining the data to extract useful information and knowledge. Often this can involve automation of the task of annotating the data with relevant metadata as well as constructing complex search engines and workflows that capture complex usage patterns of distributed data and compute resources. Most of these problems and the tools and techniques to tackle them are similar across many different types of application. It makes no sense for each community to develop these basic tools in isolation. We need to identify and capture a set of generic middleware services and deploy them on top of the high-bandwidth research networks to constitute a reusable e-Infrastructure. In the UK e-Science Initiative, this task–of identifying and implementing the key features of a national e-Infrastructure–was the remit of the Core Program.

The phrase e-Infrastructure–or Cyberinfrastructure in the US–is used to emphasize that these applications will be facilitated by a set of services that permit easy but controlled access to the traditional infrastructure of science–supercomputers, high performance clusters, networks, databases and experimental facilities. The e-Science challenge is to provide a set of Grid middleware services that are sufficiently robust, powerful and easy to use that application scientists are freed from re-inventing such low-level ‘plumbing’ and can concentrate on their science. A second challenge is to make this combination of middleware and hardware into a truly sustainable e-Infrastructure in much the same way as we take for granted the research networks of today.

Pages: 1 2 3 4

Printable Format

OntoGrid: A Semantic Grid Reference Architecture

Carole Goble and Sean Bechhofer, University of Manchester, UK

What is a Semantic Grid?

The Grid aims to support secure, flexible and coordinated resource sharing by providing a middleware platform for advanced distributing computing. Consequently, the Grid’s infrastructural machinery aims to allow collections of any kind of resources—computing, storage, data sets, digital libraries, scientific instruments, people, etc—to easily form Virtual Organisations that cross organisational boundaries in order to work together to solve a problem. A Grid depends on understanding the available resources, their capabilities, how to assemble them and how to best exploit them. Thus Grid middleware and the Grid applications they support thrive on the metadata that describes resources in all their forms, the Virtual Organisations, the policies that drive then, and so on, together with the knowledge to apply that metadata intelligently.

The Semantic Grid is a recent initiative to systematically expose semantically rich information associated with Grid resources to build more intelligent Grid services.¹ The idea is to make structured semantic descriptions real and visible first class citizens with an associated identity and behaviour. We can then define mechanisms for their creation and management as well as protocols for their processing, exchange and customisation. We can separate these issues from both the languages used to encode the descriptions (from natural language text right through to logical-based assertions) and the structure and content of the descriptions themselves, which may vary from application to application.

In practice, work on Semantic Grids has primarily meant introducing technologies from the Semantic Web² to the Grid. The background knowledge and vocabulary of a domain can be captured in ontologies – machine processable models of concepts, their interrelationships and their constraints; for example a model of a Virtual Organisation.³ Metadata labels Grid resources and entities with concepts, for example describing a job submission in terms of memory requirements and quality of service or a data file in terms of its logical contents. Rules and classification-based automatic inference mechanisms generate new metadata based on logical reasoning, for example describing the rules for membership of a Virtual Organisation and reasoning that a potential member’s credentials are satisfactory.

In recognition of the potential importance of Semantics in Grids, the Global Grid Forum standards body chartered a Semantic Grid Research Group in 2003.⁴ The Forum’s XML-based description languages such as the Job Submission Description Language, the Data Format Description Language and Oasis’ Security Assertion Markup Language all identify the role of semantics. Their recent Database Access and Integration Services Working Group specification identifies the importance of semantics in integration, metadata management and discovery. In July 2005 the Grid and Semantic Web Communities came together in a week long Schloss Dagstuhl seminar.⁵

In the last few years, several projects have embraced the Semantic Grid vision and pioneered applications combining the strengths of the Grid and of semantic technologies, particularly the use of ontologies for describing Grid resources and improving interoperability.⁶ The UK myGrid⁷ project uses ontologies to describe and select web-based services used in the Life Sciences; the UK Geodise project uses ontologies to guide aeronautical engineers to select and configure Matlab scripts;⁸ the Collaboratory for Multi-scale Chemical Science⁹ and CombeChem¹⁰ projects both use semantic web technologies to describe provenance metadata for chemistry experiments; the US-based Biomedical Informatics Research Network uses technologies to mediate between different databases in neuroscience;¹¹ and the UK’s CoAKTing project uses ontologies to assist in virtual meetings between scientists.¹² On the Semantic Grid road we are now moving from a phase of exploratory experimentation to one of systematic investigation, architectural design and content acquisition for a semantic infrastructure that accompanies a cyberinfrastructure (Figure 1).

Pages: 1 2 3 4 5

Printable Format

How to Build an International Grid: Infrastructure, Applications and Community

Fabrizio Gagliardi, EGEE Project Director – CERN
Bob Jones, EGEE Technical Director – CERN
Owen Appleton, EGEE Communications Officer – CERN

Introduction

The Enabling Grid for E-sciencE project (EGEE) is Europe’s flagship Research Infrastructures Grid project¹ and the world’s largest Grid infrastructure of its kind. It involves more than 70 partners from 27 countries, arranged in 12 regional federations, and providing more than 16,000 CPUs, more than 160 sites and 10 petabytes of available network storage. This infrastructure supports six scientific domains and more than 20 individual applications.

Started in March 2004, EGEE has rapidly grown from a European to a global endeavour, and along the way learned a great deal about the business of building production-quality infrastructure. The consortium behind this effort represents a significant proportion of Europe’s Grid experts, including not only academic intuitions but also partners from the Research Network community and European industry. This article outlines the project’s structure and goals, its achievements and the importance of cooperation in such large scale international efforts.

A distributed effort – project structure and goals

Figure 1: Extent of EGEE infrastructure for EGEE-II The aim of EGEE is to leverage the pre-existing grid efforts in Europe, thematic, national and regional, to build a production quality multi-science computing Grid. As a result, the primary objective is to build the infrastructure itself, connecting computing centres across Europe (and more recently, around the globe) into a coordinated service capable of supporting 24/7 use by large scientific communities. To support this production service, the project also aims to re-engineer existing middleware components to produce a service-orientated middleware solution. Finally, the project aims to engage the maximum number of users running applications on the infrastructure through dissemination, training and user support. These tasks have been divided into different activity areas, which are tackled by different groups within the project. These groups are distributed across a number of partner institutes with relevant experience, such that the project helps to connect its partners and encourage knowledge transfer in the process of achieving its goals.

Pages: 1 2 3 4 5

Printable Format

SIMDAT

Mike Boniface and Colin Upstill, University of Southampton IT Innovation Centre

Introduction

In the context of this project, a Grid is defined to be a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. Typically, these shared assets are under different ownership or control. The SIMDAT project¹ is developing generic Grid technology for the solution of complex application problems and demonstrating this technology in several representative industry sectors. Special attention is being paid to security, e.g. where third-party suppliers have need-to-know access to data, and correlation and inference may provide insight into confidential processes. The objective is to accelerate the uptake of Grid technologies in industry and services, provide standardised solutions for capability currently missing, and validate the effectiveness of a Grid in simplifying processes used for the solution of complex, data-centric problems.

The SIMDAT consortium is comprised of leading software and process system developers–IBM, IDESTYLE Technologies, InforSense, Intel, Lion Bioscience, LMS International, MSC Software, NEC, Ontoprise and Oracle; Grid technology specialists–Fraunhofer Institute AIS, Frauenhofer Institute SCAI, IT Innovation, Universitat Karlsruhe, Universite libre de Bruxelles and the University of Southampton; and representatives from strategic industry and service sectors–Audi, BAESystems, DWD, EADS, ESI, EUMETSAT, ECMWF, GlaxoSmithKline, MeteoFrance, Renault, and the UK Met Office. IT Innovation is leading the basic Grid infrastructure level architecture work in SIMDAT and this article will therefore be focused on this aspect of the project rather than the applications.

Grids for complex problem solving in industry

Development of industrial and large-scale products and services poses complex problems. The processes used to develop these products and services typically involve a large number of independent organisational entities at different locations grouped in partnerships and supply chains. Grid is connectivity plus interoperability and is a major contributor to improved collaboration and an enabler of virtual organisations. It has the potential to substantially reduce the complexity of the development process, thereby improving the ability to deal with product complexity.

The heart of the issue is data. Applications and their associated computing power are central to the product development process. Grid technology is needed to connect diverse data sources, to enable flexible, secure and sophisticated levels of collaboration and to make possible the use of powerful knowledge discovery techniques.

Key to seamless data access is the federation of problem-solving environments using grid technology. The federated problem solving-environments will be the major result of SIMDAT.

Seven key technology layers have been identified as important to achieving the SIMDAT objectives:

an integrated grid infrastructure, offering basic services to applications and higher-level layers
transparent access to data repositories on remote Grid sites
management of Virtual Organizations
workflow
ontologies
integration of analysis services
knowledge services.

The strategic objectives of SIMDAT are to:

test and enhance data grid technology for product development and production process design
develop federated versions of problem-solving environments by leveraging enhanced grid services
exploit data grids as a basis for distributed knowledge discovery
promote de facto standards for these enhanced grid technologies across a range of disciplines and sectors
raise awareness of the advantages of data grids in important industry sectors.

SIMDAT focuses on four exemplar application areas: product design in the automotive, aerospace and pharma industries; and service provision in meteorology. For each of these application areas a challenging problem has been identified that will be solved using Grid technology, e.g. distributed knowledge discovery to enable better understanding of the different Noise, Vibration and Harshness (NVH) behaviour of different designs of cars based on the same platform; Grid technology will allow seamless access to all relevant data for all engineers of the development centers of large multinational car manufacturers.

Pages: 1 2 3 4

Printable Format

The Next Generation Grid

Mark Parsons, NextGRID Project Chairman

Do we really need a next generation of the Grid?

To some people it seems premature to talk of the next generation of the Grid when in many cases the Grid has yet to deliver according to its original vision. Grid research has come a long way since it was originally mooted–in terms analogous to the electric power grid–as an infrastructure that was always-on and delivered chargeable access to compute, data and other resources when and wherever they were required. Pioneering projects, largely science-based, in Europe, the US and Asia have demonstrated the positive benefits afforded by large-scale, widely distributed computation and data access and such projects are now undertaking previously impracticable scientific research. This is particularly true in the health sector where some large cancer research projects are now gathering speed and will hopefully afford real benefits and breakthroughs across society.

However, although the Grid can be said to be delivering in a scientific context, the same is not true in the business domain. Visit any investment bank in the United Kingdom and they will (privately) talk proudly of the success of their Grid. In reality, they are actually talking about the success of their clustered computing approach. There are two main reasons for this. Firstly, the hijacking of the “Grid” word by over-eager vendor marketing departments following the dot com bubble in the early part of this decade has confused may potential users about what the Grid is really for–inter-enterprise, joined-up computing. Secondly, and more importantly, the Grid used and promulgated by the science and research communities does not take into account the typical regulatory and management issues faced by many industries. Unless the Grid can be seen to offer real benefits to business it will remain a powerful tool for science and will be largely ignored by business, except in its simplest application server and clustered computing form. In the worst case we will see a complete divergence in Grid computing between science and business.

It doesn’t have to be like this.

It is very easy to complain that the Grid to date has failed to link its developments to the real needs of its users. In the scientific domain this simply is not true. Wide-ranging requirements-gathering activities have taken place and will continue. These activities have helped to guide the development of the tools most needed by these programmes of scientific research. In most cases these are programmes of research where a specific end-point is reasonably clear, and the main motivation for using Grids is to collaborate in order to pool resources. In the business domain, the requirements that Grids have to meet are far broader and more varied. A wide variety of projects, notably in the UK and Europe, have been undertaken, and there have been many notable demonstrations of the efficacy of Grids both in the cluster and broader Grid contexts. However, these projects did not produce universal solutions spanning many business applications, because different solutions were required in each case. For example, the GRASP project used “traditional” academic Grid principles to support resource sharing within a cooperative of application service providers, providing higher performance and reliability for ASP services, but requiring mutual trust between the providers. The GRIA project implemented an inter-enterprise collaboration infrastructure allowing the users to pool resources obtained on commercial terms from independent service providers. The GEMSS project took a similar approach for medical simulation services, but resources from different service providers cannot be pooled in a single application because that would make it very difficult to meet European privacy regulations for processing patient data.

Pages: 1 2 3 4 5

Printable Format

Perspectives

The Next Decade in HPC

Craig Mundie, CTO – Microsoft Corporation

The Role of High Performance Computing

I have had a long history in the HPC community: I spent from 1982 to 1992 as the founder and architect of Alliant Computer Systems Corporation in Boston. We spent a long time trying to develop tools and an architecture whose components today would look like they were all fairly slow. But architecturally, many of the concepts that were explored back then by Seymour Cray and the many supercomputer companies–of which Alliant was just one–still to this day represent the basic architectures that are being reproduced and extended as Moore’s Law continues to allow these things to be compacted.

In my present role as CTO of Microsoft, it is probably fair to say that I have been the ‘god-father’ in moving Microsoft to begin to play a role in the area of technical computing. Up to now, the company has really never focused on this area. It is of course true that there are many people in the world who, whether they are in engineering or science, business or academia, use our products like tools on their desktop much like they think of pencil and paper. They would not want to work without them. But such tools are never really considered as an integral part of the mission itself. It is my belief that many of the things that HPC and supercomputing have tended to drive will become important as you look down the road of general computing architectures. The worldwide aggregate software market in technical computing is not all that large on a financial scale. However, Bill Gates and I, over the last couple years, have agreed that engaging with HPC is not just a question of how big the market is for software per se in technical computing. Rather it is a strategic market in the sense of ultimately making sure that there will be well-trained people who will come out of a university environment and help society solve the difficult problems it will be facing in the future. The global society has an increasing need to solve some very difficult large-scale problems in engineering, science, medicine and in many other fields. Microsoft has a huge research effort that has never been focused on such problems. I believe that it is time that we started to assess some application of our research technology outside of our traditional ways of using it within our own commercial products. We think that by doing so, there is a lot that can be learned about what will be the nature of future computing systems.

Many of the things that we thought of as de rigueur in terms of architectural issues and design problems in supercomputers in the late eighties and early nineties have now been shrunk down to a chip. Between 2010 and 2020 many of the things that the HPC community is focusing on today will go through a similar shrinking footprint. We will wake up one day and find that the kind of architectures that we assemble today with blades and clusters are now on a chip and being put into everything. In my work on strategy for Microsoft I have to look at the 10 to 20 year horizon rather than a one to three year horizon. The company’s entry into high performance computing is based on the belief that over the next 10 years or so, there will be a growing number of people who will want to use these kinds of technologies to solve more and more interesting problems. Another of my motivations is my belief that the problem set, even in that first 10-year period, will expand quite dramatically in terms of the types of problems where people will use these kinds of approaches.

There was a time certainly, when I was in the HPC business, that the people who wrote high performance programs were making them for consumption largely in an engineering environment. Only a few HPC codes were more broadly used in a small number of fields of academic research. Today, it is doubtful whether there is any substantive field of academic research in engineering or science that could really progress without the use of advanced computing technologies. And these technologies are not just the architecture and the megaflops but also the tools and programming environments necessary to address these problems.

Pages: 1 2 3

Printable Format

Perspectives

PITAC’s Look at Computational Science

Dan Reed, Renaissance Computing Institute, University of North Carolina at Chapel Hill

In June 2004, the President's Information Technology Advisory Committee (PITAC) was charged by John Marburger, the President's Science Advisory, to respond to seven questions regarding the state of computational science. Following over a year of hearings and deliberations, the committee released its report, entitled Computational Science: Ensuring America's Competitiveness, in June 2005. What follows are some of my personal perspectives on computational science, shaped by the committee experience. Any wild eyed, crazy ideas should be attributed to me, not to the committee.

Based on community input and extensive discussions, the PITAC computational science report¹ included the following principal finding and recommendation.

Principal Finding. Computational science is now indispensable to the solution of complex problems in every sector, from traditional science and engineering domains to such key areas as national security, public health, and economic innovation. Advances in computing and connectivity make it possible to develop computational models and capture and analyze unprecedented amounts of experimental and observational data to address problems previously deemed intractable or beyond imagination. Yet, despite the great opportunities and needs, universities and the Federal government have not effectively recognized the strategic significance of computational science in either their organizational structures or their research and educational planning. These inadequacies compromise U.S. scientific leadership, economic competitiveness, and national security.

Succinctly, the principal finding highlights the emergence of computational science as the third pillar of scientific discovery, as a complement to theory and experiment. It also highlights the critical importance of computational science to innovation, security and scientific discovery, together with our failure to embrace computational science as a strategic, rather than a tactical capability. In many ways, computational science has been everyone’s “second priority,” rather than the unifying capability it could be.

Principal Recommendation. Universities and the Federal government’s R&D agencies must make coordinated, fundamental, structural changes that affirm the integral role of computational science in addressing the 21st century’s most important problems, which are predominantly multidisciplinary, multi-agency, multisector, and collaborative. To initiate the required transformation, the Federal government, in partnership with academia and industry, must also create and execute a multi-decade roadmap directing coordinated advances in computational science and its applications in science and engineering disciplines.

The principal recommendation emphasizes the silos and stovepipes (choose your favorite analogy) that separate disciplinary domains within computational science. There was widespread consensus from both those who testified and those on the committee that solving many of the most important problems of the 21st century will require integration of skills from diverse groups. The group also felt deeply that current organizational structures in academia and government placed limits on interdisciplinary education and research.

Based on this recognition, the committee’s principal recommendation was to create a long-term, regularly updated strategic roadmap of technologies (i.e., software, data management, architectures and systems, and programming and tools), application needs and their interplay. The long term, strategic aspect of this recommendation cannot be over-estimated. Many of our most important computational science challenges cannot be solved in 1-3 years. Nor is a series of three year plans the same as a 10-15 year plan.

Substantial, sustained investment, driven by multi-agency collaboration, is the only approach that will allow us to escape from our current technology quandary–high-performance computing systems that are based on fragile software and an excessive emphasis on peak performance, rather than sustained performance on important applications. Simply put, today’s computational science ecosystem is unbalanced, with a software and hardware base that is inadequate to keep pace with and support evolving application needs. By starving research in enabling software and hardware, the imbalance forces researchers to build atop crumbling and inadequate foundations. The result is greatly diminished productivity for both researchers and computing systems.

Similarly, we must embrace the data explosion from large-scale instruments and ubiquitous, microscale sensors–the personal petabyte is in sight! Given the strategic significance of this scientific trove, the Federal government must provide long-term support for computational science community data repositories. HPC cannot remain synonymous with computing, but must be defined broadly to include distributed sensors and storage.

Opportunities for the Future

In the 19th and 20th centuries, proximity to transportation systems (navigable rivers, seaports, railheads, and airports) was critical to success. Cities grew and developed around such transportation systems, providing jobs and social services. In today’s information economy, high-speed networking, data archives and computing systems play a similar role, connecting intellectual talent across geographic barriers via virtual organizations (VOs)–teams drawn from multiple organizations, with diverse skills and access to wide ranging resources, that can coordinate and leverage intellectual talent. Two examples serve to illustrate both the challenges and the opportunities that could accrue from visionary application of computational science.

Disaster Response. Hurricane Katrina drove home the centrality of VOs. In computational science terms, a rapid response VO would include integrated hurricane, storm surge, tornado spawning, environmental, transportation, communication and human dynamics models, together with the experts needed to analyze model outputs and shape public policy for evacuation, remediation and recovery. Computationally, solving such a complex problem requires real-time data fusion from wide arrays of distributed sensors, large and small; coupled, computational intense environmental models; and social behavior models. There are thousands of such 21st century problems, each awaiting application of computational science tools and techniques.

Systems Biology. The fusion of knowledge from genomics, protein structure, enzyme function and pathway and regulatory models to create systemic models of organelles, cells and organisms and their relation to the environment is one of the great biological challenges of the 21st century. By combining information from experiments, data gleaned from mining large-scale archives (e.g., genomic, proteomic, structural and other data), and large-scale biological simulations and computational models, we can gain insights into function and behavior–understanding life in a deep way. The time is near to mount a multidisciplinary effort to create artificial life, a computational counterpart to Craig Venter’s minimal genome project. Such an effort would combine engineering, genomics, proteomics and systems biology expertise, with profound implications for medicine and deep insights into biology.

The computational science opportunities have never been greater. It is time to act with vision and sustained commitment.

References

¹ The PITAC report on computational science can be downloaded from www.nitrd.gov. Paper copies of the report can be requested there as well.