CTWatch Quarterly » 2007

Volume 3 Number 3

August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure

Introduction

Lee Dirks, Microsoft Corporation
Tony Hey, Microsoft Corporation

By now, it is a well-observed fact that scholarly communication is in the midst of tremendous upheaval. That is as exciting to many as it is terrifying to others. What is less obvious is exactly what this dramatic change will mean for the academic world – specifically what influence it will have on the research community – and the advancement of science overall. In an effort to better grasp the trends and the potential impact in these areas, we’ve assembled an impressive constellation of top names in the field – as well as some new, important voices – and asked them to address the key issues for the future of scholarly communications resulting from the intersecting concepts of cyberinfrastructure, scientific research, and Open Access. All of the hallmarks of sea-change are apparent: attitudes are changing, roles are adjusting, business models are shifting – but, perhaps most significantly, individual and collective behaviors are very slow to evolve – far slower than expected. That said, each of the authors in this CTWatch Quarterly issue puts forward a variety of visions and approaches, some practical considerations, and in several cases, specific prototypes or break-through projects already underway to help point the way.

Leading off is Clifford Lynch’s excellent overview (“The Shape of the Scientific Article in the Developing Cyberinfrastructure”) – an outstanding entry point to the broad range of issues raised by the reality of cyberinfrastructure and the impact it will have on scientific publishing in the near-term. His paper is an effective preview to the fundamental shift of how scholarly communication will work, namely how the role of the author is changing in a Web 2.0 environment. A core element in this new world is the growing potential benefit for inclusion of data in submissions (or links to data sets). Lynch thoughtfully addresses the many implications arising in this new paradigm (e.g., papers + data) – and how policies and behaviors will need to adapt – most especially the impact this will have on the concept of peer review. He astutely raises the issue of the importance of software/middleware in this new ecosystem – namely in the areas of viewing/reading and visualization. This is a critical point for accurate dissemination to facilitate further research – and is also integral to discoverability as well as the ability to aggregate across multiple articles.

In his piece, “Next-Generation Implications of Open Access,” Paul Ginsparg provides an invaluable perspective on the current state of affairs – a “long view” – as one of the originators of the Open Access movement. Having in essence invented the Open Access central repository when he launched arXiv.org in 1991, Ginsparg’s brief retrospective and forward-looking assessment of this space is a useful look at the features and functionality that open repositories must consider to stay relevant and to add value in this changing environment. Indeed, it is a testament that arXiv.org has been able to remain true to its original tenets of remaining low-cost, selective, and complementary/supplemental to other publishers or repositories. However, Ginsparg’s treatment hints at several new directions and areas for enhancement/improvement relating to the issues of (a) storage and access of documents/articles at scale, (b) the social networking implications for large-scale repositories as well as (c) a discourse on how to handle compound objects, data and other related supporting documentation. Also insightful are Ginsparg’s musings of the economics of Open Access, and he surfaces the important theme highlighted by several of the authors in this issue—the notion that a generational shift is required to enable the necessary behavioral change, and the recognition that our field(s) may not progress until this reality is brought about.

Timo Hannay’s extremely useful survey of the Web 2.0 landscape is an especially valuable landscape map. In this environmental scan, Hannay takes a snapshot of the current state-of-the-art and provides not only definitions but also definitive examples/applications that demonstrate the reality, the potential, and the remaining hurdles faced by the social-networking phenomenon. Now that we’ve finally begun to realize the power and potential that had been promised us with the “web-as-platform” – we’re also understanding the many benefits and the driving-force of the network effect: the more who participate, the richer the experience. (Yet, Hannay also points out the cruel truth that the scientific community has been miserably late to the game, when it should have been first – considering the Internet was initially constructed to facilitate the sharing of scientific data.) As exciting as it might be at this point in time, a core tenet of this article is to point out that – as a community – we have yet to realize the full potential of Web 2.0, as we are still so very early in the initial phase. Considering the very medium we are using changes/alters the methods we employ, Hannay stresses that is “impossible to predict” the future, but the hints he provides promise us a very exciting journey.

Pages: 1 2 3

Printable Format

The Shape of the Scientific Article in The Developing Cyberinfrastructure

Clifford Lynch, Coalition for Networked Information (CNI)

Introduction

For the last few centuries, the primary vehicle for communicating and documenting results in most disciplines has been the scientific journal article, which has maintained a strikingly consistent and stable form and structure over a period of more than a hundred years now; for example, despite the much-discussed shift of scientific journals to digital form, virtually any article appearing in one of these journals would be comfortably familiar (as a literary genre) to a scientist from 1900. E-science represents a significant change, or extension, to the conduct and practice of science; this article speculates about how the character of the scientific article is likely to change to support these changes in scholarly work. In addition to changes to the nature of scientific literature that facilitate the documentation and communication of e-science, it’s also important to recognize that active engagement of scientists with their literature has been, and continues to be, itself an integral and essential part of scholarly practice; in the cyberinfastructure environment, the nature of engagement with, and use of, the scientific literature is becoming more complex and diverse, and taking on novel dimensions. This changing use of the scientific literature will also cause shifts in its evolution, and in the practices of authorship, and I will speculate about those as well here.

A few general comments should be made at the outset. First, I recognize that it is dangerous to generalize across a multiplicity of scientific disciplines, each with their own specialized disciplinary norms and practices, and I realize that there are ample counterexamples or exceptions to the broad trends discussed here; but, at the same time, I do believe that it is possible to identify broad trends, and that there is value in analyzing them. Second, as with all discussions of cyberinfrastructure and e-science, many of the developments and issues are relevant to scholarly work spanning medicine, the biological and physical sciences, engineering, the social sciences, the humanities, and even the arts, as is suggested by the increasingly common use of the more inclusive term “e-research” rather than “e-science” in appropriate contexts. I have focused here on the sciences and engineering, but much of the discussion has broader relevance.

Finally, it’s crucial to recognize that the changes to the nature of scholarly communication and the scientific article are not being driven simply or solely by technological determinism as expressed through the move to e-science. There are broad social and political forces at work as well, independent of, but often finding common cause or at least compatibility with, e-science developments; in many cases, the transfigured economics and new capabilities of global high-performance networking and other information technologies are, for the first time, making it possible for fundamental shifts in the practices and structures of scholarly communication to occur, and thus setting the stage for political demands that these new possibilities be realized. Because the same technical and economic drivers have fueled much of the commitment to e-science, these other exogenous factors that are also shaping the future of scholarly communication are often, at least in my view, overly identified with e-science itself. Notable and important examples include the movements towards open access to scientific literature; movements towards open access to underlying scientific data; demands (particularly in the face of some recent high-profile cases of scientific fraud and misconduct) for greater accountability and auditibility of science through structures and practices that facilitate the verification, reproducibility and re-analysis of scientific results; and efforts to improve the collective societal return on investment in scientific research through a recognition of the lasting value of much scientific data and the way that the investment it represents can be amplified by disclosure, curation and facilitation of reuse. Note that in the final area the investments include but go beyond the financial; consider the human costs of clinical trials, for example.

Pages: 1 2 3 4 5 6 7

Printable Format

Next-Generation Implications of Open Access

Paul Ginsparg, Cornell University

Introduction

The technological transformation of scholarly communication infrastructure began in earnest by the mid-1990s. Its effects are ubiquitous in the daily activities of typical researchers, instructors and students, permitting discovery, access to, and reuse of material with an ease and rapidity difficult to anticipate as little as a decade ago. An instructor preparing for lecture, for example, undertakes a simple web search for up-to-date information in some technical area and finds not only a wealth of freely available, peer reviewed articles from scholarly publishers, but also background and pedagogic material provided by its authors, together with slides used by authors to present the material, perhaps video of a seminar or colloquium on the material, related software, on-line animations illustrating relevant concepts, explanatory discussions on blog sites, and often useful notes posted by 3rd party instructors of similar recent or ongoing courses at other institutions. Any and all of these can be adapted for use during lecture, added to a course website for student use prior to lecture, or as reference material afterwards. Questions or confusions that arise during lecture are either resolved in real time using a network-connected laptop, or deferred until afterwards, but with instructor's clarification propagated instantly via a course website or e-mail. Or such lingering issues are left as an exercise for students to test and hone their own information gathering skills in the current web-based scholarly environment. Some courses formalize the above procedures with a course blog that also permits posting of student writing assignments for commentary by other students and the instructor. Other courses employ wikis so that taking lecture notes becomes a collaborative exercise for students.

Many of these developments had been foreseen a decade ago, at least in principle, though certainly not in all the particulars. When the mass media and general public became aware of the Internet and World Wide Web in the mid-1990's, this new "information superhighway" was heavily promoted for its likely impact on commerce and media, but widespread adoption of social networking sites facilitating file, photo, music, and video sharing was not regularly touted. Web access is now being built into cell-phones, music players, and other mobile devices, so it will become that much more ubiquitous in the coming decade. People currently receiving their PhDs became fluent in web search engine usage in high school, and people receiving their PhDs a decade from now will have had web access since early elementary school. (My 3.5 year old son has had access to web movie trailers on demand since the age of 1, but is at least two decades from a doctorate.)

Many aspects of teaching and scholarship will remain unchanged. Web access will not fundamentally alter the rate at which students can learn Maxwell equations for electromagnetism, and many web resources of uncertain provenance, e.g., Wikipedia entries, will require independent expertise to evaluate. We've also already learned that having a vast array of information readily available does not necessarily lead to a better informed public, but can instead exacerbate the problem of finding reliable signal in the multitude of voices. Recent political experience suggests that people tend to gravitate to an information feed that supports their preexisting notions: the new communications technologies now virtually guarantee that it will exist, and moreover make it easy to find. In the below, I will focus on questions related to the dissemination and use of scholarly research results, as well as their likely evolution over the next decade or so. The issues of generating, sharing, discovering, and validating these results all have parallels in non-academic pursuits. In order to guide the anticipation of the future, I'll begin by looking backwards to developments of the early 1990's.

Pages: 1 2 3 4 5 6 7

Printable Format

Web 2.0 in Science

Timo Hannay, Nature Publishing

What is Web 2.0?

Perhaps the only thing on which everyone can agree about Web 2.0 is that it has become a potent buzzword. It provokes enthusiasm and cynicism in roughly equal measures, but as a label for an idea whose time has come, no one can seriously doubt its influence.

So what does it mean? Web 2.0 began as a conference,¹ first hosted in October 2004 by O'Reilly Media and CMP Media. Following the boom-bust cycle that ended in the dot-com crash of 2001, the organisers wanted to refocus attention on individual web success stories and the growing influence of the web as a whole. True, during the late 1990s hype and expectations had run ahead of reality, but that did not mean that the reality was not epochal and world-changing. By the following year, Tim O'Reilly, founder of the eponymous firm and principal articulator of the Web 2.0 vision, had laid down in a seminal essay² a set of observations about approaches that work particularly well in the online world. These included:

"The web as a platform"
The Long Tail (e.g., Amazon)
Trust systems and emergent data (e.g., eBay)
AJAX (e.g., Google Maps)
Tagging (e.g., del.icio.us)
Peer-to-peer technologies (e.g., Skype)
Open APIs and 'mashups' (e.g., Flickr)
"Data as the new 'Intel Inside'" (e.g., cartographical data from MapQuest)
Software as a service (e.g., Salesforce.com)
Architectures of participation (e.g., Wikipedia)

The sheer range and variety of these concepts led some to criticize the idea of Web 2.0 as too ill-defined to be useful. Others have pointed out (correctly) that some of these principles are not new but date back to the beginning of the web itself, even if they have only now reached the mainstream. But it is precisely in raising awareness of these concepts that the Web 2.0 meme has delivered most value. Now, those of us without the genius of Jeff Bezos or Larry Page can begin to glimpse what the web truly has to offer and, notwithstanding the overblown hype of the late 1990s, how it really is changing the world before our eyes.

Initially the first item in the list above – the web as platform – seemed to have primacy among the loose collection of ideas that constituted Web 2.0 (see, for example, Figure 1 in ²). The most important thing seemed to be that talent and enthusiasm in software development was migrating from traditional operating system platforms to the web. New applications were agnostic with respect to Unix versus Macintosh versus Windows and were instead designed to operate using web protocols (specifically, HTTP and HTML) regardless of the precise underlying software running on the server or client machines.

However, this view taken on its own overlooks one very important reason why that migration has happened: the web is more powerful than the platforms that preceded it because it is an open network and lends itself particularly well to applications that enable collaboration and communication. With his usual eye for pithy phrasing, Tim O'Reilly described this aspect using the terms "architecture of participation"³ and "harnessing collective intelligence."² He pointed out that the most successful web applications use the network on which they are built to produce their own network effects, sometimes creating apparently unstoppable momentum. This is how a whole new economy can arise in the form of eBay, why tiny craigslist and Wikipedia can take on the might of mainstream media and reference publishing, and why Google can produce the best search results by surreptitiously recruiting every creator of a web link to its cause. In time, this participative aspect came to the fore, and these days "Web 2.0" is often seen as synonymous with websites that do not merely serve users but also involve them, thus enabling them to achieve that most desirable of business goals: a service that gets better for everyone the more people use it.

This brief survey will use a relatively broad definition of Web 2.0. So, while it will deal mainly with participative services and network effects, it will also cover certain other aspects of the original Web 2.0 vision that have particular relevance in science, including mashups and tagging.

Pages: 1 2 3 4 5 6

Printable Format

Reinventing Scholarly Communication for the Electronic Age

J. Lynn Fink, University of California, San Diego
Philip E. Bourne, University of California, San Diego

Introduction

Cyberinfrastructure is integral to all aspects of conducting experimental research and distributing those results. However, it has yet to make a similar impact on the way we communicate that information. Peer-reviewed publications have long been the currency of scientific research as they are the fundamental unit through which scientists communicate with and evaluate each other. However, in striking contrast to the data, publications have yet to benefit from the opportunities offered by cyberinfrastructure. While the means of distributing publications have vastly improved, publishers have done little else to capitalize on the electronic medium. In particular, semantic information describing the content of these publications is sorely lacking, as is the integration of this information with data in public repositories. This is confounding considering that many basic tools for marking-up and integrating publication content in this manner already exist, such as a centralized literature database, relevant ontologies, and machine-readable document standards.

We believe that the research community is ripe for a revolution in scientific communication and that the current generation of scientists will be the one to push it forward. These scientists, generally graduate students and new post-docs, have grown up with cyberinfrastructure as a part of their daily lives, not just a specialized aspect of their profession. They have a natural ability to do science in an electronic environment without the need for printed publications or static documents and, in fact, can feel quite limited by the traditional format of a publication. Perhaps most importantly, they appreciate that the sheer amount of data and the number of publications is prohibitive to the traditional methods of keeping current with the literature.

To do our part to get the revolution turning, we are working with the Public Library of Science ¹ and a major biological database, the RCSB Protein Data Bank,² to destroy the traditional concept of a publication and a separate data repository and reinvent it as an integration of the two information sources. Here, we describe new authoring tools that are being developed to consummate the integration of literature and database content, tools being developed to facilitate the consumption of this integrated information, and the anticipated impact of these tools on the research community.

Pages: 1 2 3 4

Printable Format

Interoperability for the Discovery, Use, and Re-Use of Units of Scholarly Communication

Compound Information Object Demo Screencast

Herbert Van de Sompel, Los Alamos National Laboratory
Carl Lagoze, Cornell University

1. Introduction

Improvements in computing and network technologies, digital data capture, and data mining techniques are enabling research methods that are highly collaborative, network-based, and data-intensive. These methods challenge existing scholarly communication mechanisms, which are largely based on physical (paper, ink, and voice) rather than digital technologies.

One major challenge to the existing system is the change in the nature of the unit of scholarly communication. In the established scholarly communication system, the dominant communication units are journals and their contained articles. This established system generally fails to deal with other types of research results in the sciences and humanities, including datasets, simulations, software, dynamic knowledge representations, annotations, and aggregates thereof, all of which should be considered units of scholarly communication.¹

Another challenge is the increasing importance of machine agents (e.g., web crawlers, data mining applications) as consumers of scholarly materials. The established system by and large targets human consumers. However, all communication units (including the journal publications) should be available as source materials for machine-based applications that mine, interpret, and visualize these materials to generate new units of communication and new knowledge.

Yet another challenge to the existing system lies in the changing nature of the social activity that is scholarly communication. Increasingly, this social activity extends beyond traditional journals and conference proceedings, and even beyond more recent phenomena such as preprint systems, institutional repositories, and dataset repositories. It now includes less formal and more dynamic communication such as blogging. Scholarly communication is suddenly all over the web, both in traditional publication portals and in new social networking venues, and is interlinked with the broader social network of the web. Dealing adequately with this communication revolution requires fundamental changes in the scholarly communication system.

Many of the required changes in response to these challenges are of a socio-cultural nature and relate directly to the question of what constitutes the scholarly record in this new environment. This raises the fundamental issue of how the crucial functions of scholarly communication ² – registration, certification, awareness, archiving, rewarding – should be re-implemented in the new context. The solutions to these socio-cultural questions rely in part on the development of basic technical infrastructure to support an innately digital scholarly communication system.

This paper describes the work of the Object Re-Use and Exchange (ORE) project of the Open Archives Initiative (OAI) to develop one component of this new infrastructure in order to support the revolutionized scholarly communication paradigm – standards to facilitate discovery, use and re-use of new types of compound scholarly communication units by networked services and applications. Compound units are aggregations of distinct information units that, when combined, form a logical whole. Some examples of these are a digitized book that is an aggregation of chapters, where each chapter is an aggregation of scanned pages, and a scholarly publication that is an aggregation of text and supporting materials such as datasets, software tools, and video recordings of an experiment. The ORE work aims to develop mechanisms for representing and referencing compound information units in a machine-readable manner that is independent of both the actual content of the information unit and nature of the re-using application.

Pages: 1 2 3 4 5 6 7 8

Printable Format

Incentivizing the Open Access Research Web

Publication-Archiving, Data-Archiving and Scientometrics

Tim Brody, University of Southampton, UK
Les Carr, University of Southampton, UK
Yves Gingras, Université du Québec à Montréal (UQAM)
Chawki Hajjem, Université du Québec à Montréal (UQAM)
Stevan Harnad, University of Southampton, UK; Université du Québec à Montréal (UQAM)
Alma Swan, University of Southampton, UK; Key Perspectives

Introduction

The research production cycle has three components: the conduct of the research itself (R), the data (D), and the peer-reviewed publication (P) of the findings. Open Access (OA) means free online access to the publications (P-OA), but OA can also be extended to the data (D-OA): the two hurdles for D-OA are that not all researchers want to make their data OA and that the online infrastructure for D-OA still needs additional functionality. In contrast, all researchers, without exception, do want to make their publications P-OA, and the online infrastructure for publication-archiving (a worldwide interoperable network of OAI ¹-compliant Institutional Repositories [IRs]²) already has all the requisite functionality for this.

Yet because so far only about 15% of researchers are spontaneously self-archiving their publications today, their funders and institutions are beginning to require OA self-archiving,³ so as to maximize the usage and impact of their research output.

The adoption of these P-OA self-archiving mandates needs to be accelerated. Researchers’ careers and funding already depend on the impact (usage and citation) of their research. It has now been repeatedly demonstrated that making publications OA by self-archiving them in an OA IR dramatically enhances their research impact.⁴ Research metrics (e.g., download and citation counts) are increasingly being used to estimate and reward research impact, notably in the UK Research Assessment Exercise (RAE).⁵ But those metrics first need to be tested against human panel-based rankings in order to validate their predictive power.

Publications, their metadata, and their metrics are the database for the new science of scientometrics. The UK’s RAE, based on the research output of all disciplines from an entire nation, provides a unique opportunity for validating research metrics. In validating RAE metrics (through multiple regression analysis) ⁶ against panel rankings, the publication archive will be used as a data archive. Hence the RAE provides an important test case both for publication metrics and for data-archiving. It will not only provide incentives for the P-OA self-archiving of publications, but it will also help to increase both the functionality and the motivation for D-OA data-archiving.

Now let us look at all of this in a little more detail:

Reasearch, Data, and Publications

Research consists of three components: (1) the conduct of the Research (R) itself (whether the gathering of empirical data, or data-analyses, or both), (2) the empirical Data (D) (including the output of the data-analyses), and (3) the peer-reviewed journal article (or conference paper) Publications (P) that report the findings. The online era has made it possible to conduct more and more research online (R), to provide online access (local or distributed) to the data (D), and to provide online access to the peer-reviewed articles that report the findings (P).

The technical demands of providing the online infrastructure for all of this are the greatest for R and D – online collaborations and online data-archiving. But apart from the problem of meeting the technical demands for R and for D-archiving, the rest is a matter of choice: if the functional infrastructure is available for researchers to collaborate online and to provide online access to their data, then the rest is just a matter of whether and when researchers decide to use it to do so.⁷⁸ Some research may not be amenable to online collaboration, or some researchers may for various reasons prefer not to collaborate, or not to make their data publicly accessible.

In contrast, when it comes to P, the peer-reviewed research publications, the technical demands of providing the online infrastructure are much less complicated and have already been met. Moreover, all researchers (except those working on trade or military secrets) want to share their findings with all potential users, by (i) publishing them in peer reviewed journals in the first place and by (ii) sending reprints of their articles to any would-be user who does not have subscription access to the journal in which it was published. Most recently, in the online age, some researchers have also begun (iii) making their articles freely accessible online to all potential users webwide.

Pages: 1 2 3 4 5 6 7

Printable Format

The Law as Cyberinfrastructure

Brian Fitzgerald, Queensland University of Technology, Australia
Kylie Pappalardo, Queensland University of Technology, Australia

In almost everything we do, the law is present. However, we know that strict adherence to the law is not always observed for a variety of pragmatic reasons. Nevertheless, we also understand that we ignore the law at our own risk and sometimes we will suffer a consequence.

In the realm of collaborative endeavour through networked cyberinfrastructure we know the law is not too far away. But we also know that a paranoid obsession with it will cause inefficiency and stifle the true spirit of research. The key for the lawyers is to understand and implement a legal framework that can work with the power of the technology to disseminate knowledge in such a way that it does not seem a barrier. This is difficult in any universal sense but not totally impossible. In this article, we will show how the law is responding as a positive agent to facilitate the sharing of knowledge in the cyberinfrastructure world.

One general approach is to develop legal tools that can provide a generic permission or clearance of legal rights (e.g., copyright or patent) in advance (usually subject to conditions) that can be implemented before or at the point of use. This has become known as open licensing and will be discussed below in terms of copyright and patented subject matter. ¹

However, open licensing will not be adopted by everyone nor in every situation is it suitable. A generalisation is that it will be advocated in the context of publicly funded research producing tools and knowledge upon which platform technologies are built where considerations such as privacy are not an issue.

Where open licensing is not being used, the many parties to a collaborative endeavour will normally be required to map the scope and risk of their mutual endeavour through a contract. Contracts can take time to negotiate and, in many instances, promise to frustrate the fast paced and serendipitous nature of research fuelled by high powered cyberinfrastructure. To this end a number of projects throughout the world, for example The Lambert Project in the UK,² the University Industry Demonstration Project (UIDP) in the USA,³ and (amongst other projects) the 7th Framework Project in the EU,⁴ have begun asking how we might be able to improve this situation. Suggestions include standard form or off the shelf contracts covering a variety of situations, a database of key clauses and, in the case of the UIDP project, a software based negotiation tool called the Turbo-Negotiator. Legal instruments that can match the dynamic of the technology and appear seamless and non-invasive are the goal. More work in this area is needed (and happening) and is critical to ensuring we have the law and technology of cyberinfrastrcuture working to complement each other.

In the remainder of this article we will focus on the open licensing model.

Pages: 1 2 3 4 5 6 7

Printable Format

Perspectives

Cyberinfrastructure For Knowledge Sharing

John Wilbanks, Science Commons

Infrastructure never gets adequately funded because it cuts across disciplinary boundaries, it doesn't benefit particular groups. Infrastructure is a prerequisite to great leaps forward and is thus never captured within disciplinary funding, or normal governmental operations. We need to revise radically our conception of cyberinfrastructure. It isn't just a set of tubes through which bytes flow, it is a set of structures that network different areas of knowledge...and that is software and social engineering, not fiber optic cable. The superhighways of the biological information age should not be understood as simply physical data roads, long ropes of fiber and glass. They need to be structures of knowledge. The Eisenhower Freeways of Biological Knowledge are yet to be built. But that doesn't mean the task isn't worth starting.

- James Boyle, William Neal Reynolds Professor of Law, Duke University Law School

Knowledge sharing and scholarly progress

Knowledge sharing is at the root of scholarship and science. A hypothesis is formulated, research performed, experimental materials designed or acquired, tests run, data obtained and analyzed, and finally a publication. The scholar writes a document outlining the work for dissemination in a scholarly journal.

If it passes the litmus test of peer review, the research enters the canon of the discipline. Over time, it may become a classic with hundreds of citations. Or, more likely, it will join the vast majority of research, with less than two citations over its lifetime, its asserted contributions to the canon increasingly difficult to find – because, in our current world, citations are the best measure of relevance-based search available.

But no matter the fate of an individual publication, the system of publishing is a system of sharing knowledge. We publish as scholars and scientists to share our discoveries with the world (and, of course, to be credited with those discoveries through additional research funding, tenure, and more). And this system has served science extraordinarily well over the more than three hundred years since scholarly journals were birthed in France and England.

The information technology revolution: missed connections and lost opportunities

Into this old and venerable system has come the earthquake of modern information and communication technologies. The Internet and the Web have made publication cheap and sharing easy – from a technical perspective. The cost of moving, copying, forwarding, and storing the bits in a single scientific publication approach zero.

These technologies have created both enormous efficiency gains in traditional industries (think about how Wal-Mart uses the network to optimize its supply chains) and radical reformulation of industry (Amazon.com in books, or iTunes in music). Yet the promise of enormous increases in efficiency and radical reformulations have to date failed to make similar shattering changes to the rate of meaningful discovery in many scientific disciplines.

For the purposes of this article, I focus on the life sciences in particular. The problems I articulate affect all the scientific disciplines to one extent or another – but the life sciences represent an ideal discussion case. The life sciences are endlessly complex and the problems of global health and pharmaceutical productivity such an enormous burden that the pain of a missed connection is personal. Climate change represents a problem of similar complexity and import to the world, and this article should be contemplated as bearing on research there as well, but my topic is in the application of cyberinfrastructure to the life sciences, and there I’ll try to remain.

Despite new technology after new technology, the cost of discovering a drug keeps increasing, and the return on investment in life sciences (as measured by new drugs hitting the market for new diseases) keeps dropping. While the Web and email pervade pharmaceutical companies, the elusive goal remains “knowledge management:” finding some way to bring sanity to the sprawling mass of figures, emails, data sets, databases, slide shows, spreadsheets, and sequences that underpin advanced life sciences research. Bioinformatics, combinatorial drug discovery, systems biology, and an innumerable number of words ending with “-omics” have yet to relieve the skyrocketing costs and increase the percentage of success in clinical trials for new drug compounds.

The reasons for this are many. First and foremost, drug discovery is hard – really, really hard. And much of the low-hanging fruit has been picked. There are other reasons having to do with regulatory requirements, scientific competition, distortions in funding, and more. But there is one reason that stands out as both a significant drag on discovery and as a treatable problem, one that actually can be solved in the short term: we aren’t sharing knowledge as efficiently as we could be.

Pages: 1 2 3 4 5

Printable Format

Perspectives

Trends Favoring Open Access

Peter Suber, Earlham College

This article began with a simple attempt to identify trends that were changing scholarly communication. I expected to find trends that were supporting the progress of OA and trends that were opposing it or slowing it down. The resulting welter of conflicting trends might not give comfort to friends or foes of OA, or to anyone trying to forecast the future, but at least it would describe this period of dynamic flux. It might even explain why OA wasn't moving faster or slower than it was.

But with few exceptions I only found trends that favored OA. Maybe I have a large blind spot; I'll leave that for you to decide. I'm certainly conscious of many obstacles and objections to OA, and I address them every day in my work. The question is which of them represent trends that are gaining ground.

While it's clear that OA is here to stay, it's just as clear that long-term success is a long-term project. The campaign consists of innumerable individual proposals, policies, projects, and people. If you're reading this, you're probably caught up in it, just as I am. If you're caught up in it, you're probably anxious about how individual initiatives or institutional deliberations will turn out. That's good; anxiety fuels effort. But for a moment, stop making and answering arguments and look at the trends that will help or hurt us, and would continue to help or hurt us even if everyone stopped arguing. For a moment, step back from the foreground skirmishes and look at the larger background trends that are likely to continue and likely to change the landscape of scholarly communication.

I've found so many that I've had to be brief in describing them and limit the list to those that most affect OA.

First there are the many trends created by OA proponents themselves: the growing number of OA repositories, OA journals, OA policies at universities, OA policies at public and private funding agencies, and public endorsements of OA from notable researchers and university presidents and provosts. Each new OA repository, journal, policy, and endorsement contributes to a growing worldwide momentum and inspires kindred projects elsewhere. Funding agencies are now considering OA policies in part because of their intrinsic advantages (for increasing return on investment by increasing the visibility, utility, and impact of research) and in part because other funding agencies have already adopted them. The laggards are being asked why the research they fund is less worth disseminating than the research funded elsewhere. The growing mass of OA literature is becoming critical in the sense that the growth is now a cause, and not just an effect, of progress. OA literature is the best advertisement for OA literature; the more we have, the more it educates new scholars about OA, demonstrates the benefits of OA, and stimulates others to provide or demand it.
Although knowledge of OA among working researchers is still dismally low, every new survey shows it increasing, and every new survey shows increasing rates of deposits in OA repositories and submissions to OA journals. The absolute numbers may still be low, but the trajectories are clearly up.

Pages: 1 2 3 4 5 6 7