CTWatch
August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Screencast link Compound Information Object Demo Screencast
Herbert Van de Sompel, Los Alamos National Laboratory
Carl Lagoze, Cornell University

8
5. Conclusion

Compound information objects are becoming the norm rather than the exception in the new scholarly communication environment. As a result, it is essential to augment the existing technical communication infrastructure with an interoperable approach that allows using, re-using, referencing, and discovering them across the borders of scholarly disciplines and applications. The international OAI-ORE effort works towards a solution that fully leverages the web architecture and that consists of publishing Resource Maps that describe compound objects, referencing resources in their compound object context, and mechanisms to facilitate discovery of Resource Maps.

Although OAI-ORE has made significant conceptual progress since it started in September 2006, important questions remain unanswered. How will the solution deal with versioning? How can the trustworthiness of Resource Maps be assessed? Which kinds of relationship types should OAI-ORE define to support bootstrapping adoption, and which should be left to individual communities? Which technologies should be used to represent Resource Maps, and how does a choice affect potential adoption? Some of these questions will receive at least a preliminary answer by the end of September 2007, which is the deadline that OAI-ORE has set itself for the release of a public alpha specification. Following that release, OAI-ORE will encourage experimentation by various scholarly communities and solicit feedback from potential stakeholders worldwide. The insights gained from those activities will be taken into account for a version 1 specification that is planned for September 2008.

Appendix

In the course of May 2007, the Digital Library Research & Prototyping Team of the Los Alamos Laboratory launched an experiment to explore the notion of Resource Map publishing as a means to expose compound object boundary-type information to the web. More particularly, the experiment explored whether an existing web application would be able to take advantage of published Resource Maps, without requiring any modifications to the application itself. The experiment pertained to archiving compound information objects as they evolve over time and the applications that were used were the Internet Archive’s Heritrix toolkit that contains a web crawler and its Wayback Machine user interface.

The experiment’s optimistic scenario assumes that Resource Map publishing has become so commonplace that the Internet Archive starts to actively collect them. The experiment zooms in on two publishers that make Resource Maps discoverable via dedicated Sitemaps. When a Resource Map listed in a SiteMap changes, its associated Sitemap date-time is changed. When a new Resource Map is published, it is added to the SiteMap. The Internet Archive uses these Sitemaps and their contained date-times as a trigger to collect and archive Resource Maps as well as the resources they reference. As a result, the Wayback Machine now allows searching for a specific Resource Map of a specific date and for immediately seeing the version of the resources referenced by that Resource Map as they existed on that same date. Understanding that Resource Maps expose the boundaries of compound objects, the net result is in effect an archive of evolving compound objects, versioned by the date-time of the Resource Map that describes them.

The screencast below shows a walk-through of the various components involved in the experiment and follows the evolution of some Resource Maps over time.

Screencast link

Acknowledgments
OAI-ORE is supported by the Andrew W. Mellon Foundation, the Coalition for Networked Information, Microsoft, and the National Science Foundation (IIS-0430906).
The authors acknowledge the contributions to the OAI-ORE effort from the ORE Technical Committee, Liaison Group and Advisory Committee. The authors also acknowledge the contributions of John Erickson (HP Labs) and Sandy Payette (Cornell Information Science).
Many thanks to Lyudmila Balakireva, Ryan Chute, Stephan Dresher, and Zhiwu Xie of the Digital Library Research & Prototyping Team of the Los Alamos Laboratory for their work on the prototype described in the Appendix.
References
1 Van de Sompel, H., Payette, S., Erickson, J., Lagoze, C., Warner, S. "Rethinking Scholarly Communication: Building the System that Scholars Deserve," D-Lib Magazine, September 2004.
2 Roosendaal, H. E., Guerts, P. A. T. M. "Forces and functions in scientific communities: an analysis of their interplay," in CRISP 97: Cooperative Research Information Systems in Physics, Oldenburg, Germany, 1997.
3 National Science Foundation Cyberinfrastructure Panel, "Cyberinfrastructure Vision for 21st Century Discovery," National Science Foundation, Washington, D.C. 2007, www.nsf.gov/od/oci/CI_Vision_March07.pdf.
4 "ImageWeb server," imageweb.zoo.ox.ac.uk/. Accessed June 29, 2007.
5 Crane, G. "What Do you Do with a Million Books?," D-Lib Magazine, Vol. 12, March 2006.
6 Razum, M. "eSciDoc - A Scholarly Information and Communication Platform in the Age," in Digital Library Goes e-Science (DLSci06), Alicante, Spain, 2006.
7 Dmitriev, P., Lagoze, C., Suchkov, B. "As We May Perceive: Inferring Logical Documents from Hypertext," in HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia, Salzburg, Austria, 2005.
8 Lagoze, C., Krafft, D., Cornwell, T., Eckstrom, D., Jesuroga, S., Wilper, C. "Representing Contextualized Information in the NSDL," in ECDL2006, Alicante, Spain, 2006.
9 Berners-Lee, T. "Semantic Web Road Map," W3C, www.w3.org/DesignIssues/Semantic.html.
10 Jacobs, I., Walsh, N. "Architecture of the World Wide Web," W3C, Proposed Recommendation April 2004, www.w3.org/TR/2004/PR-webarch-20041105/.
11 Berners-Lee, T. "Linked Data," W3C 2006, www.w3.org/DesignIssues/LinkedData.html.
12 Carroll, J. J., Bizer, C., Hayes, P., Stickler, P. "Named Graphs, Provenance and Trust," in WWW 2005 Chiba, Japan: ACM, 2005.
13 Carroll, J. J., Bizer, C., Hayes, P., Stickler, P. "Named Graphs," 2005, sites.wiwiss.fu-berlin.de/suhl/bizer/pub/NamedGraphs-WebSemanticsJourn....
14 Davis, I. "GRDDL," W3C October 2006, www.w3.org/TR/grddl-primer/.
15 R. Lewis, "Dereferencing HTTP URIs " W3C, www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14.
16 "The Digital Object Identifier System Home Page," International DOI Foundation (IDF), www.doi.org/.
17 Lagoze, C., Payette, S., Shin, E., Wilper, C. "Fedora: An Architecture for Complex Objects and their Relationships," International Journal of Digital Libraries, Vol. 6, pp. 124-138, April 2005.
18 Van de Sompel, H., Bekaert, J., Liu, X., Balakireva, L., Schwander, T. "aDORe: a modular, standard-based Digital Object Repository," www.arxiv.org/abs/cs.DL/0502028.
19 Van de Sompel, H., Hammond, T., Neylon, E., Weibel, S. "The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces," IETF RFC 4452, 2006, www.rfc-editor.org/rfc/rfc4452.txt.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Van de Sompel, H., Lagoze, C. "Interoperability for the Discovery, Use, and Re-Use of Units of Scholarly Communication," CTWatch Quarterly, Volume 3, Number 3, August 2007. http://www.ctwatch.org/quarterly/articles/2007/08/interoperability-for-the-discovery-use-and-re-use-of-units-of-scholarly-communication/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.