CTWatch Quarterly » Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Introduction

Trends in Cyberinfrastructure for Bioinformatics and Computational Biology

Rick Stevens, Associate Laboratory Director, Computing and Life Sciences – Argonne National Laboratory, Professor, Computer Science Department – The University of Chicago

The following table gives examples of high-impact problems that could be addressed in the next two to three years on an open access petascale platform and that leverage the methods have already been ported to the IBM BG/L platform.

Biology Problem Area	@ 360 TF/s	@1000 TF/s	@ 5000 TF/s
Determining the detailed evolutionary history of each protein family ⇒ This will enable rational planning for structural biology initiatives and will provide a foundation for assessing protein function and diversity	3,000 hours to build reference database	300 hours to build reference database	60 hours to build reference database
Determining the frequency and detailed nature of horizontal gene transfers in prokaryotes ⇒ This will shed light on the molecular and genetic mechanisms of evolution by means other than direct “Darwinian” descent and will contribute to our understanding of the acquisition of virulence and drug resistance in pathogens and the means by which prokaryotes adapt to the environment	1,000 hours to study 200 gene families	1,000 hours to study 2000 gene families	1,000 hours to study 10,000 gene families
Automated construction of core metabolic models for all the sequenced DOE genomes ⇒ This will enable dramatic acceleration of the promise of the GTL program and the use of microbial systems to address DOE mission needs in energy, environment, and science	One hour per organism, 100 hours per metagenome	10 organisms per hour, 10 hours per metagenome	50 organisms per hour, two hours per metagenome
Predict essential genes for all known sequenced micro-organisms ⇒ This will enable a broader class of genes and gene products to be targeted for potential drugs and to predict culturability conditions for environmental microbes	300 hours for 1,000 organisms 10 hours to predict culturability per organism	30 hours for 1,000 organisms, one hour to predict culturability per organism	30 hours for 5,000 organisms
Computational screening all known microbial drug targets against the public and private databases of chemical compounds to identify potential new inhibitors and potential drugs ⇒ The resulting database would be a major national biological research resource that would have a dramatic impact on worldwide health research and fundamental science of microbiology	2 M ligands per day per target (1 year to screen all microbial targets)	20 M ligands per day per target (~1 month to screen all microbial targets)	1 machine year to screen all known human drug targets
Model and simulate the precise cellulose degradation and ethanol and butanol biosynthesis pathways at the protein/ligand level to identify opportunities for molecular optimization ⇒ This would result in a set of model systems to be further developed for optimization of the production of biofuels	Simulate in detail the directed evolution of individual enzymes	Simulate the co-evolution and optimzation of a degradation or biosynthesis pathway of up five enzymes	Simulate the optimization of a complete cellulose to ethanol or butanol production system of over a dozen enzymatic steps
Model and simulate the replication of DNA to understand the origin of and the repair mechanisms of genetic mutations ⇒ This would result in dramatic progress in the fundamental understanding of how nature manages mutations and understanding which molecular factors determine the broad range of organism susceptibility to radiation and other mutagens	30 ns simulation of DNA polymerase	10 ensembles of different DNA repair enzymes	Complete polymerase mediated base pair addition step
Model and simulate the process of DNA transcription and protein translation and assembly ⇒ This would enable us to move forward on understanding post-transcription and post-translation modification and epi-genetic regulation of protein synthesis	Validate current understanding of ribosomal function	Explore splisosome function and the evolution of intron/exon functions	Model the complete coupled processes of DNA transcription to protein translation including regulatory processes
Model and simulate the interlinked metabolisms of microbial communities ⇒ This project is relevant to understanding the biogeochemical cycles of extreme, natural and disturbed environments and will lead to the development of strategies for the production of bio-fuels and the development of new bio-engineered processes based on exploiting communities rather than individual organisms	20 organisms in a linked metabolic network	100 organisms in a linked metabolic network	200 organisms in a linked metabolic network
In silico prediction of mutations and activity, conformational changes, active site alterations	One enzyme	Five-enzyme pathway	Eight enzyme pathway optimization

References

¹ www.ncbi.nlm.nih.gov/
² www.ebi.ac.uk/
³ www.genome.jp/
⁴ workbench.sdsc.edu/
⁵ cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
⁶ theseed.uchicago.edu/FIG/index.cgi
⁷ img.jgi.doe.gov/cgi-bin/pub/main.cgi
⁸ research.calit2.net/cibio/archived/CIBIO_Overview_Report.pdf
⁹ darwin.nap.edu/books/030909612X/html/R1.html
¹⁰ doegenomestolife.org/roadmap/index.shtml
¹¹ www.tgbioportal.org/
¹² lsgw.mcs.anl.gov/about
¹³ compbio.mcs.anl.gov/gaduvo/gaduvo.cgi
¹⁴ www.mygrid.org.uk/
¹⁵ www.genome.jp/kegg/soap/
¹⁶ taverna.sourceforge.net/

Pages: 1 2 3 4 5

CTWatch is a collaborative effort				Sponsored By