Studies have shown a correlation between openly accessible materials and citation impact,15 though a direct causal link is more difficult to establish, and other mechanisms accounting for the effect are easily imagined. It is worthwhile to note, however, that even if some articles currently receive more citations by virtue of being open access, it doesn't follow that the benefit would continue to accrue through widespread expansion of open access publication. Indeed, once the bulk of publication is moved to open access, then whatever relative boost might be enjoyed by early adopters would long since have disappeared, with relative numbers of citations once again determined by the usual independent mechanisms. Citation impact per se is consequently not a serious argument for encouraging more authors to adopt open access publication. A different potential impact and benefit to the general public, on the other hand, is the greater ease with which science journalists and bloggers can write about and link to open access articles.
A form of open access appears to be happening by a backdoor route regardless: using standard search engines, over a third of the high impact journal articles in a sample of biological/medical journals published in 2003 were found at non-journal websites.16 Informal surveys17 of publications in other fields, freely available via straightforward web search, suggest that many communities may already be further along in the direction of open access than most realize. Most significantly, the current generation of students has grown up with a variety of forms of file and content sharing, legal and otherwise. This generation greets with dumbfounded mystification the explanation of how researchers perform research, write an article, make the figures, and then are not permitted to do as they please with the final product. Since the current generation of undergraduates, and next generation of researchers, already takes it for granted that such materials should be readily accessible from anywhere, it is more than likely that the percentage of backdoor materials will only increase over time, and that the publishing community will need to adapt to the reality of some form of open access, regardless of the outcome of the government mandate debate.
There is more to open access than just free access. True open access permits any 3rd party to aggregate and data-mine the articles, themselves treated as computable objects, linkable and interoperable with associated databases. The range of possibilities for large and comprehensive full text aggregations are just starting to be probed. The PubMed Central database,18 operated in conjunction with GenBank and other biological databases at the U.S. National Library of Medicine, is a prime exemplar of a forward-looking approach. It is growing rapidly and (as of June 2007) contains over 333,000 recent articles in fully functional XML from over 200 journals (and additionally over 683,000 scanned articles from back issues19). A congressionally mandated open access policy for NIH supported publications would generate an additional 70,000 articles a year for PubMed Central.20
The full text XML documents in this database are parsed to permit multiple different "related material views" for a given article, with links to genomic, nucleotide, inheritance, gene expression, protein, chemical, taxonomic, and other databases. For example, GenBank accession numbers are recognized in articles referring to sequence data and linked directly to the relevant records in the genomic databases. Protein names are recognized and their appearances in articles are linked automatically to the protein and protein interaction databases. Names of organisms are recognized and linked directly to the taxonomic databases, which are then used to compute a minimal spanning tree of all the organisms contained in a given document. In yet another "view," technical terms are recognized and linked directly to the glossary items in the relevant standard biology or biochemistry textbook in the books database. Sets of selected articles resulting from bibliographic queries can also have their aggregated full texts searched simultaneously for links to over 25 different databases, including those mentioned above. The enormously powerful sorts of data-mining and number-crunching, already taken for granted as applied to the open access genomics databases, can be applied to the full text of the entirety of the biology and life sciences literature, and will have just as great a transformative effect on the research done with it.