Thursday, September 22, 2016

Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?

(A print version of this interview is available here)

Seventeen years ago 25 people gathered in Santa Fe, New Mexico, to discuss ways in which the growing number of e-print servers and digital repositories could be made interoperable. 

As scholarly archives and repositories had begun to proliferate a number of issues had arisen. There was a concern, for instance, that archives would needlessly replicate each other’s content, and that users would have to learn multiple interfaces in order to use them. 
Photo courtesy Susan van Hengstum
It was therefore felt there was a need to develop tools and protocols that would allow repositories to copy content from each other, and to work in concert on a distributed basis.
 

With this aim in mind those attending the New Mexico event – dubbed the Santa Fe Convention for the Open Archives Initiative (OAI) – agreed to create the (somewhat wordy) Open Archives Initiative Protocol for Metadata Harvesting, or OAI-PMH for short.

Key to the OAI-PMH approach was the notion that data providers – the individual archives – would be given easy-to-implement mechanisms for making information about what they held in their archives externally available. This external availability would then enable third-party service providers to build higher levels of functionality by using the metadata harvesting protocol.

The repository model that the organisers of the Santa Fe meeting had very much in mind was the physics preprint server arXiv This had been created in 1991 by physicist Paul Ginsparg, who was one of the attendees of the New Mexico meeting. As a result, the early focus of the initiative was on increasing the speed with which research papers were shared, and it was therefore assumed that the emphasis would be on archiving papers that had yet to be published (i.e. preprints).
 

However, amongst the Santa Fe attendees were a number of open access advocates. They saw OAI-PMH as a way of aggregating content hosted in local – rather than central – archives. And they envisaged that the archived content would be papers that had already been published, rather than preprints. These local archives later came to be known as institutional repositories, or IRs.

In other words, the OA advocates present were committed to the concept of author self-archiving (aka green open access). The objective for them was to encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals. 

As these repositories would be on the open internet outside any paywall the papers would be freely available to all. And the expectation was that OAI-PMH would allow the content from all these local repositories to be aggregated into a single searchable virtual archive of (eventually) all published research.

Given these different perspectives there was inevitably some tension around the OAI from the beginning. And as the open access movement took off, and IRs proliferated, a number of other groups emerged, each with their own ideas about what the role and target content of institutional repositories should be. The resulting confusion continues to plague the IR landscape.

Moreover, today we can see that the interoperability promised by OAI-PMH has not really materialised, few third-party service providers have emerged, and content duplication has not been avoided. And to the exasperation of green OA advocates, author self-archiving has remained a minority sport, with researchers reluctant to take on the task of depositing their papers in their institutional repository. Given this, some believe the IR now faces an existential threat. 

In light of the challenging, volatile, but inherently interesting situation that IRs now find themselves in I decided recently to contact a few of the Santa Fe attendees and put some questions to them. My first two approaches were unsuccessful, but I struck third-time lucky when Clifford Lynch, director of the Washington-based Coalition for Networked Information (CNI), agreed to answer my questions.

I am publishing the resultant Q&A today. This can be accessed in the pdf file here.

As is my custom, I have prefaced the interview with a long introduction. However, those who only wish to read the Q&A need simply click on the link at the head of the file and go directly to it.