The following table gives examples of high-impact problems that could be addressed in the next two to three years on an open access petascale platform and that leverage the methods have already been ported to the IBM BG/L platform.
| Biology Problem Area | @ 360 TF/s | @1000 TF/s | @ 5000 TF/s |
| Determining the detailed evolutionary history of each protein family ⇒ This will enable rational planning for structural biology initiatives and will provide a foundation for assessing protein function and diversity | 3,000 hours to build reference database | 300 hours to build reference database | 60 hours to build reference database |
| Determining the frequency and detailed nature of horizontal gene transfers in prokaryotes ⇒ This will shed light on the molecular and genetic mechanisms of evolution by means other than direct “Darwinian” descent and will contribute to our understanding of the acquisition of virulence and drug resistance in pathogens and the means by which prokaryotes adapt to the environment | 1,000 hours to study 200 gene families | 1,000 hours to study 2000 gene families | 1,000 hours to study 10,000 gene families |
| Automated construction of core metabolic models for all the sequenced DOE genomes ⇒ This will enable dramatic acceleration of the promise of the GTL program and the use of microbial systems to address DOE mission needs in energy, environment, and science | One hour per organism, 100 hours per metagenome | 10 organisms per hour, 10 hours per metagenome | 50 organisms per hour, two hours per metagenome |
| Predict essential genes for all known sequenced micro-organisms ⇒ This will enable a broader class of genes and gene products to be targeted for potential drugs and to predict culturability conditions for environmental microbes |
300 hours for 1,000 organisms 10 hours to predict culturability per organism |
30 hours for 1,000 organisms, one hour to predict culturability per organism | 30 hours for 5,000 organisms |
| Computational screening all known microbial drug targets against the public and private databases of chemical compounds to identify potential new inhibitors and potential drugs ⇒ The resulting database would be a major national biological research resource that would have a dramatic impact on worldwide health research and fundamental science of microbiology | 2 M ligands per day per target (1 year to screen all microbial targets) | 20 M ligands per day per target (~1 month to screen all microbial targets) | 1 machine year to screen all known human drug targets |
| Model and simulate the precise cellulose degradation and ethanol and butanol biosynthesis pathways at the protein/ligand level to identify opportunities for molecular optimization ⇒ This would result in a set of model systems to be further developed for optimization of the production of biofuels | Simulate in detail the directed evolution of individual enzymes | Simulate the co-evolution and optimzation of a degradation or biosynthesis pathway of up five enzymes | Simulate the optimization of a complete cellulose to ethanol or butanol production system of over a dozen enzymatic steps |
| Model and simulate the replication of DNA to understand the origin of and the repair mechanisms of genetic mutations ⇒ This would result in dramatic progress in the fundamental understanding of how nature manages mutations and understanding which molecular factors determine the broad range of organism susceptibility to radiation and other mutagens | 30 ns simulation of DNA polymerase | 10 ensembles of different DNA repair enzymes | Complete polymerase mediated base pair addition step |
| Model and simulate the process of DNA transcription and protein translation and assembly ⇒ This would enable us to move forward on understanding post-transcription and post-translation modification and epi-genetic regulation of protein synthesis | Validate current understanding of ribosomal function | Explore splisosome function and the evolution of intron/exon functions | Model the complete coupled processes of DNA transcription to protein translation including regulatory processes |
| Model and simulate the interlinked metabolisms of microbial communities ⇒ This project is relevant to understanding the biogeochemical cycles of extreme, natural and disturbed environments and will lead to the development of strategies for the production of bio-fuels and the development of new bio-engineered processes based on exploiting communities rather than individual organisms | 20 organisms in a linked metabolic network | 100 organisms in a linked metabolic network | 200 organisms in a linked metabolic network |
| In silico prediction of mutations and activity, conformational changes, active site alterations | One enzyme | Five-enzyme pathway | Eight enzyme pathway optimization |
2 www.ebi.ac.uk/
3 www.genome.jp/
4 workbench.sdsc.edu/
5 cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
6 theseed.uchicago.edu/FIG/index.cgi
7 img.jgi.doe.gov/cgi-bin/pub/main.cgi
8 research.calit2.net/cibio/archived/CIBIO_Overview_Report.pdf
9 darwin.nap.edu/books/030909612X/html/R1.html
10 doegenomestolife.org/roadmap/index.shtml
11 www.tgbioportal.org/
12 lsgw.mcs.anl.gov/about
13 compbio.mcs.anl.gov/gaduvo/gaduvo.cgi
14 www.mygrid.org.uk/
15 www.genome.jp/kegg/soap/
16 taverna.sourceforge.net/






