CTWatch Quarterly » Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Introduction

Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Rick Stevens, Associate Laboratory Director, Computing and Life Sciences – Argonne National Laboratory, Professor, Computer Science Department – The University of Chicago

What are some notable accomplishments in applying CI to biology research?

There are a handful of systems that have fundamentally changed how biologists work. The most important has been the system developed by the National Center for Biotechnology Information¹ including Entrez, which is a search engine (google like) that supports searching across many types of biological data. There are similar systems like this in Europe² and Japan.³ These systems and systems like them have provided the global community access to sequence data (starting out as outgrowths from genome and protein sequence databases) and more recently to publications, annotations, linkage maps, expression data, phylogeny data, metabolic pathways, regulatory and signally data, compounds and molecular structures. Search techniques have expanded from keywords to computed properties (sequence similarity, and more generally “associations”) that enable one to find connections between biological or chemical entities. While these systems have enormous user bases and require considerable computing capabilities for indexing and integration, they are essentially client/server in nature, and the computing that an end user can request is closely controlled.

Approximately a decade ago a number of groups began to produce more flexible tools that support a more unstructured workflow, enabling the user to construct their own mini-environment to pursue computational approaches to problems. One of the first such systems was the Biology Workbench developed at the University of Illinois and now hosted at the University of California, San Diego.⁴ Other systems were developed to provide access to a specific type of data (e.g. microbial genomes) in well engineering data integrations. These systems are often associated with teams of curators. Three are particularly important: the Institute for Genomic Research’s Comprehensive Microbial Resource;⁵ the SEED, an annotation system developed by the Fellowship for the Interpretation of Genomes at the University of Chicago;⁶ and the DOE’s Joint Genome Institute’s Integrated Microbial Genomes resource.⁷ These systems provide the user with an integrated view of hundreds of genomes and provide a rich environment for discovery.

Are there some good road mapping documents available?

In the past couple of years there have been several worthwhile road-mapping documents written by the community. These reports in general attempt to identify the trends in the field and provide some structure for understanding directions. The first is a report from the NSF committee for building a cyberinfrastructure for the biological sciences;⁸ the second is the National Academy of Sciences Report on computing and biology.⁹ The third report is more oriented towards systems biology and is a program roadmap document developed by the DOE for their Genomes to Life program,¹⁰ which contains a section on computing and infrastructure to support the building out of systems biology, focused on microbial organisms, energy, and the environment. All three documents are worth reading to gain an understanding of where the field is going.

Pages: 1 2 3 4 5

CTWatch is a collaborative effort				Sponsored By