CTWatch Quarterly » Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Introduction

Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Rick Stevens, Associate Laboratory Director, Computing and Life Sciences – Argonne National Laboratory, Professor, Computer Science Department – The University of Chicago

In this issue you will find a number of articles outlining the current trends and tool development strategies in the bioinformatics and computational biology community. It is by necessity an incomplete survey of today’s thinking about the directions of our field. In addition to the four submitted articles, I’ve enclosed my thoughts on a few of the questions likely to be of interest to CTWatch readers.

What is the most important trend today in biology research?

Probably the most important trend in modern biology is the increasing availability of high-throughput (HT) data. The earliest forms of HT were genome sequences, and to a lesser degree, protein sequences, however now many forms of biological data are available via automated or semi-automated experimental systems. This data includes gene expression data, protein expression, metabolomics, mass spec data, imaging of all sorts, protein structures and the results of mutagenesis and screening experiments conducted in parallel. So an increasing quantity and diversity of data are major trends. To gain biological meaning from this data it is required that this data be integrated (finding and constructing correspondences between elements) and that it be curated (checked for errors, linked to the literature and previous results and organized). The challenges in producing high-quality, integrated datasets are immense and long term.

The second trend is the general acceleration of the pace of asking those questions that can be answered by computation and by HT experiments. Using the computer, a researcher can be 10 or 100 times more efficient than by using wet lab experiments alone. Bioinformatics can identify the critical experiments necessary to address a specific question of interest. Thus the biologist that is able to leverage bioinformatics is in a fundamentally different performance regime that those that can’t.

The third trend is the beginnings of simulation and modeling technologies that will eventually lead to predictive biological theory. Today, simulation and modeling applied at the whole cell level is suggestive of what is to come, the ability to predict an organisms phenotype computationally from just a genome and environmental conditions. That capability is probably five years away for microbial organisms and 10 to 20 years away for complex eukaryotes (such as the mouse and human).

What is the role of cyberinfrastructure in biological research?

As I noted above, modern biology will become increasingly coupled to modern computing environments. This means that rates of progress of some (but not all) biological investigations will become rate limited by the pace of cyberinfrastructure development. Certainly, it will make it much easier for the biologist to gain access to both data and computing resources (perhaps without them knowing it) once cyberinfrastructure is more developed and in place. Today, we have early signs of how some groups will use access to large-scale computing to support communities by developing gateways or portals that provide access to integrated databases and computing capabilities behind a web-based user interface. But, that is just the beginning. It is possible to imagine that, in the future, laboratories will be directly linked to data archives and to each other, so that experimental results will flow from HT instruments directly to databases which will be coupled to computational tools for automatically integrating the new data and performing quality control checks in real-time (not that dissimilar from how high-energy physics and astronomy work today). In field research, cyberinfrastructure can not only connect researchers to their databases and tools while they are in the field, but it will enable the development of automated instruments that will continue working in the field after the scientists and graduate students have returned home.

Pages: 1 2 3 4 5

CTWatch is a collaborative effort				Sponsored By