CTWatch
August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Timo Hannay, Nature Publishing

6
Open data and mashups

Another area with huge potential – but one that I have space to deal with only cursorily here – is that of open scientific data sets and forms of interoperability that allow these to be transferred not only between scientists but also between applications in order to create new visualizations and other useful transformations. There are numerous challenges, but there is also progress to report on each front. Too often scientists are unwilling to share data, whether for competitive or other reasons, though increasingly funders (and some publishers) are requiring them to do so. Even when the data are available, they usually lack the consistent formats and unambiguous metadata that would enable them to be efficiently imported into a new application and correctly interpreted by a researcher who was not present when they were collected. Yet data standards such as CML 66 and SBML 67 are emerging, as are metadata standards such as MIAME.68 As software applications also adopt these standards, we enter a virtuous circle in which there are increasing returns (at least at the global level) to openly sharing data using common standards.

For a glimpse of the benefits this can bring, witness the work of my colleague, Declan Butler, a journalist at Nature. While covering the subject of avian flu, it came to his attention that information about global outbreaks was fragmented, incompatible, and often confidential. So he took it upon himself to gather what data he could, merge it together and provide it in the form of a KML file, the data format used by Google Earth.69 Shortly afterwards he overlaid poultry density data.70 This not only meant the information was now available in one place, it also made it much more readily comprehensible to experts and non-experts alike. Imagine the benefits if this approach, largely the work of one man, was replicated across all of science.

Wither the scientific web?

Over the last 10 years or so, much of the discussion about the impact of the web on science – particularly among publishers – has been about the way in which it will change scientific journals. Sure enough, these have migrated online with huge commensurate improvements in accessibility and utility. For all but a very small number of widely read titles, the day of the print journal seems to be almost over. Yet to see this development as the major impact of the web on science would be extremely narrow-minded – equivalent to viewing the web primarily as an efficient PDF distribution network. Though it will take longer to have its full effect, the web's major impact will be on the way that science itself is practiced.

The barriers to full-scale adoption are not only (or even mainly) technical, but rather social and psychological. This makes the timings almost impossible to predict, but the long-term trends are already unmistakable: greater specialization in research, more immediate and open information-sharing, a reduction in the size of the 'minimum publishable unit,' productivity measures that look beyond journal publication records, a blurring of the boundaries between journals and databases, reinventions of the roles of publishers and editors, greater use of audio and video, more virtual meetings. And most important of all, arising from this gradual but inevitable embracement of technology, an increase in rate at which new discoveries are made and exploited for our benefit and that of the world we inhabit.

References
1Now called the Web 2.0 Summit – www.web2summit.com/
2O'Reilly, T. "What Is Web 2.0.?" www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.ht..., 2005.
3O'Reilly, T. Architecture of Participation. www.oreillynet.com/pub/wlg/3017/, 2003.
4Berners-Lee, T. "Weaving the Web," Texere (London), 2000.
5Sifry, D. "The State of the Live Web," www.sifry.com/alerts/archives/000493.html, 2007.
6Nielsen, J. A Billion Internet Users. www.useit.com/alertbox/internet_growth.html, 2005.
7Nature Publishing Group estimate.
8The science blog aggregator, Postgenomic.com lists 735 blogs in its index but certainly misses some.
9Anonymous. "The invisible hand on the keyboard," The Economist 3rd August, 2006.
10Alexa – www.alexa.com/
11ScienceBlogs – www.scienceblogs.com/
12Postgenomic.com – www.postgenomic.com/
13Postgenomic.com zeitgeist – www.postgenomic.com/stats.php
14Anonymous. "Overview: Nature's peer review trial," www.nature.com/nature/peerreview/debate/nature05535.html, 2006.
15Liu, S.V. ”Why are people reluctant to join in open review?” Nature, Vol. 447, pp. 1052,.2007.
16Leuf, B. and Cunningham, W. "The Wiki Way," Addison-Wesley, 2001.
17WikiSpecies – species.wikimedia.org/wiki/Main_Page
18Proteins Wiki – proteins.wikia.com/wiki/Main_Page
19OpenWetWare – openwetware.org/
20UsefulChem - usefulchem.wikispaces.com/
21Anonymous. "Los Angeles Times Suspends 'Wikitorials'," Associated Press, www.msnbc.msn.com/id/8300420/, 2005.
22A Million Penguins – www.amillionpenguins.com/wiki/index.php/Main_Page
23Freebase - www.freebase.com/
24Berners-Lee, T., Hendler, J., Lassila, O. "The Semantic Web," Scientific American, 2001.
25Slashdot – slashdot.org/
26digg - digg.com/
27See the following comparison on Alexa: tinyurl.com/2faroc
28ChemRank – www.chemrank.com/
29SciRate – scirate.com/
30BioWizard – www.biowizard.com/
31DissectMedicine – www.dissectmedicine.com/
32Nature China – www.natureasia.com/ch/
33Scintilla – scintilla.nature.com/
34arXiv.org – arxiv.org/
35Scribd – www.scribd.com/
36YouTube – www.youtube.com/
37SlideShare – www.slideshare.net/
38Nature Precedings – precedings.nature.com/
39Journal of Visualized Experiments – www.jove.com/
40MySpace – www.myspace.com/
41Facebook – www.facebook.com/
42Gonzalez, N. "Facebook Users Up 89% Over Last Year; Demographic Shift," TechCrunch, 2007.
43LinkedIn – www.linkedin.com/
44Sermo – www.sermo.com/
45Nature Network – network.nature.com/
46OpenID – openid.net/
47craigslist – sfbay.craigslist.org/
48Jobster – www.jobster.com/
49NHS Jobs – www.jobs.nhs.uk/
50Wilson, F. "The Freemium Business Model," avc.blogs.com/a_vc/2006/03/the_freemium_bu.html, 2006.
51EBay – www.ebay.com/
52Elance – www.elance.com/
53InnoCentive – www.innocentive.com/
54Second Life – www.secondlife.com/
55Nature Publishing Group estimate.
56Linden, P. "Embracing the Inevitable," blog.secondlife.com/2007/01/08/embracing-the-inevitable/, 2007.
57Hammond, T., Hannay, T., Lund, B., Scott, J. “Social Bookmarking Tools (I): A General Review,” D-Lib, Vol. 11, no. 4, 2005.
58del.icio.us – del.icio.us/
59Vander Wal, T. Folksonomy. www.vanderwal.net/folksonomy.html, 2007.
60Shirky, C. "Ontology Is Overrated," www.shirky.com/writings/ontology_overrated.html, 2005.
61Weinberger, D. "Everything is Miscellaneous," Times Books, 2007.
62Hannay, T. "Introduction," tagsonomy.com/index.php/introduction-timo-hannay/, 2005.
63Connotea – www.connotea.org/
64EPrints – www.eprints.org/
65Lund, B. "Tagging and Bookmarking In Institutional Repositories," blogs.nature.com/wp/nascent/2006/03/tagging_and_bookmarking_in_ins.htm..., 2006.
66Chemical Markup Language – www.ch.ic.ac.uk/cml/
67Systems Biology Markup Language – sbml.org/
68Minimum Information About a Microarray Experiment - www.mged.org/Workgroups/MIAME/miame.html
69KML – code.google.com/apis/kml/documentation/
70Butler, D. "The spread of avian flu with time; new maps exploiting Google Earth’s time series function," declanbutler.info/blog/?p=58, 2007.

Pages: 1 2 3 4 5 6

Reference this article
Hannay, T. "Web 2.0 in Science," CTWatch Quarterly, Volume 3, Number 3, August 2007. http://www.ctwatch.org/quarterly/articles/2007/08/web-20-in-science/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.