CTWatch Quarterly » Data Mining, Collaboration, and Institutional Infrastructure for Transforming Research and Teaching in the Human Sciences and Beyond

Printable Format

Data Mining, Collaboration, and Institutional Infrastructure for Transforming Research and Teaching in the Human Sciences and Beyond

Cathy N. Davidson, Duke University

What is salient about that example is that, in the humanities, as in the sciences and social sciences, cyberinfrastructure does not simply change the quantity of information. It allows for the conceptualization of more complex, intertwined, and interconnected problems that are as vast as the data bases themselves. However, the immense intellectual ambition of projects enabled by new access to massive data sets is precisely what has spurred the evolution to what I’m calling second-generation digital humanities. As with O’Reilly’s Web 2.0, in the human sciences we are seeing far more user-generated content, customization, collaborative archiving, writing, and research, distributed among large numbers of scholars, students, and sometimes amateur intellectuals who, together, are arriving at new and often challenging concepts and not simply at ever-increasing amounts of data.

A project such as the International Dunhuang Project combines both first and second-generation digital humanities. It is both a professionally-archived digitization project and one that is collaborative across multiple sites. The city of Dunhuang was a crossroads on the trading route that would later become known as the “Silk Road.” When archeologists excavated Dunhuang in the nineteenth century, they divvied up the spoils to museums in Beijing, Berlin, London, Tokyo, St. Petersburg, and elsewhere. Now, shards or fragments of text in one physical location are being put together virtually with those in another to create legible artifacts that are changing our view of what is “West” about so-called Western culture. Dunhuang flourished from 100 BC to 1200 AD.⁵ Over 20 languages have been found in materials there, underscoring that cultural fusion and exchange was happening from East to West, North to South, from Africa to Japan, and from at least the time of Julius Caesar.

A second project is even more exemplary of how second-generation digital humanities work. The Law in Slavery and Abolition Project shows how laws in one country reverberate around the world, with consequences for humans, institutions, and states.⁶ In this project, much of the content—the archive itself—is located and digitized by students who are learning collaboratively even as they are making interoperable databases for others to learn from. Classes are coordinated across universities in the US, France, Germany, Brazil, Canada, and Cuba. New archives remake history, remake causalities. Like the Dunhuang project, this one is paradigm-changing in its content but also in its collaborative teaching/learning/research/archiving methods.

We live in an exciting time for the human sciences, yet the amount of material to be digitized is so vast that, in real terms, we are only at the tip of the data iceberg. In non-textual fields (such as art, music, performance studies, media studies) we are at the tip of that tip. As is well-rehearsed by now, the data needs of the humanities are incalculable. The Sloan Digital Sky Survey—the most ambitious astronomical study ever undertaken—uses 40 terabytes of data.⁷ By contrast, the Survivors of the Shoah’s Visual History project requires 200 terabytes of compressed data.⁸ These enormous data needs (exacerbated by the under-funding of the human sciences) result in impoverished resources in many areas, especially in data-intensive areas such as media studies. For example, the Museum of Television and Radio has an archive of 120,000 English-language programs, beginning with the 1918 speech of Labor Leader Samuel Gompers. Only 1500 of these have been digitized.⁹ None are searchable. And, as historian Timothy Lenoir reminds us, the situation is worst of all for New Media. He calls ours not the “Information Age” but the “Digital Dark Ages,”¹⁰ because we have preserved almost none of the archive of the virtual materials (early code, software, hardware, websites, the first on-line games) of Web 1.0. Even digitized financial records of major corporations and universities turn out to be inaccessible now because of rapidly-changing hardware and software that left brontobytes of data behind.

Yet, even acknowledging that we are only beginning to digitize the record of the world’s knowledge, how are we going to make sense of all that data? No one person can. Projects such as Dunhuang or Law in Slavery and Freedom require many scholars, working from different intellectual traditions, with different assumptions and different languages, pooling not only local archives but interpretations of those archives. And we need interpretations that are not conceptually rooted in Western ideas that create the intellectual binaries that pervade code and carry over into what currently constitutes AI (Artificial Intelligence).

Pages: 1 2 3

CTWatch is a collaborative effort				Sponsored By