CTWatch
August 2006
Trends and Tools in Bioinformatics and Computational Biology
Eric Jakobsson, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign

1

All leading edge research in biology now utilizes computation, as a result of the development of useful tools for data gathering, data management, analysis, and simulation of biological systems. While there is still much to be done to improve these tools, there is also a completely new frontier to be attacked. The new initiatives to be undertaken will require much more interaction between applications scientists and cyberinfrastructure architects than has previously been the case. The single word that provides a common thread for the new initiatives needed in the next few years is Integration, specifically

  • Integration of time and length scales of description.
  • Integration of informatics, dynamics, and physical-based approaches.
  • Integration of heterogenous data forms.
  • Integration of basic science with engineering design.
  • Integration of algorithmic development with computing architecture design.
Integration of time and length scales of description

Biological systems display important dynamics on time scales ranging from femtoseconds and faster (eg., interactions with electromagnetic radiation) to billions of years (evolution), and distance scales ranging from single atoms to the entire biosphere. Events at all time and length scales are linked to each other. For the most extreme example, the emergence of the photosynthetic reaction center (a protein that couples absorption of photons with synthesis of other biological molecules) over a billion years ago produced as a by-product a major change in the composition of the atmosphere (an increase in oxygen) that profoundly altered the course of biological evolution from that time on. Yet the vast majority of the computational tools that we use to understand biology are specialized to a particular narrow range of size and distance scales. We badly need computing environments that will facilitate analysis and simulation across time and length scales, so we may achieve a quantitative understanding of how these scales link to each other.

Integration of informatics, dynamics, and physics-based approaches

There are three core foundations of computational biology: a) Information-based approaches, exemplified by sequence-based informatics and correlational analysis of systems biology data, b) Physics-based approaches, based on biological data analysis and simulation founded in physical and chemical theory, and c) Approaches based on dynamical analysis and simulation, notably exemplified by successful dynamics models in neuroscience, ecology, and viral-immune system interactions. Typically these approaches are developed by different communities of computational biologists and pursued largely independently of each other. There is great synergy, however, in the three approaches when they are integrated in pursuing solutions to major biological problems. This can be seen notably in molecular and cellular neuroscience. Understanding of the entire field is largely organized around the dynamical systems model first put forth by Hodgkin and Huxley, which also had an underpinning of continuum physical chemistry and electrical engineering theory. Extension of the systems and continuum understanding to the molecular level depended on using informatics means to identify crystallizable versions of the membrane proteins underlying excitability. Physics-based computing has been essential to interpreting the structural data and to understand the relationship between the structures and the function of the excitability proteins. All areas of biology need a comparable synergy between the different types of computing. As a corollary, we need to train computational biologists who can use, and participate in developing, all three types of approach.

Integration of Heterogenous Data Forms

The types of data that are relevant to any particular biological problem are quite varied, including literature reports, sequence data, microarray data, proteomics data, a wide array of spectroscopies, diffraction data, time series of dynamical systems, simulation results, and many more. There is a major need for an integrated infrastructure that can enable the researcher to search, visualize, analyze, and make models based on all of the relevant data to any particular biological problem. The Biology Workbench1 is a notable example of such integration in the specific domain of sequence data. This approach needs to be extended to much more varied and complex data forms.

Pages: 1 2

Reference this article
Jakobsson, E. "Specifications for the Next-Generation Computational Biology Infrastructure," CTWatch Quarterly, Volume 2, Number 3, August 2006. http://www.ctwatch.org/quarterly/articles/2006/08/specifications-for-the-next-generation-computational-biology-infrastructure/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.