Digital collections of primary source materials such as texts, images and moving images have become an integral part of the ecosystem of scholarly content that researchers, teachers and learners use in their academic activity. They provide an important complement to other more traditional resources such as journals and books, especially within the arts, humanities and social sciences.
There is plenty of evidence of the impact that such collections have on research as well as teaching and learning. The Impact of Digital Collections, which is the culmination of a number of impact studies that were conducted over the years,1 concluded that ‘digital collections have become fundamental to modern scholarship’, noting a shift towards humanities data science and data-driven research and at the same time potential for growth of the digital humanities, in particular in teaching.2 Recent surveys also found that ‘digital collections were not only essential to scholars’ ability to access materials, but they influenced multiple aspects of their research practices’ and contributed to the development of new skills and collaborative networks.3
How can we ensure that digital collections continue to be available and widely accessible to scholars and students against the backdrop of a tougher economic climate?
The ‘golden age’ of digitization
Typically, digital collections are the product of grant-funded digitization activity undertaken by the academic and heritage sector or are made available to the academic community by publishers, at a cost.
The first decade or so of the 21st century saw the buzz and excitement of what might be termed the ‘golden age’ of UK digitization, with high levels of public funding and support available. The scale of national programmes such as the New Opportunities Fund ICT Content programme4 would be unthinkable today. The programme distributed National Lottery funds and demonstrated the UK government commitment to a vision for increased access to a wide range of open resources to support lifelong learning through digitization. It included a £50m digitization strand and almost 150 projects within museums, galleries, universities and community groups. Initiatives such as Europeana, which brings together digitized collections from the European cultural heritage sector,5 and later the Digital Public Library of America, which aggregates millions of digitized items from North American libraries, museums and archives,6 were enabled by the large amount of digitized material within the cultural and heritage sector. Big commercial players such as Microsoft and Google also started their book digitization programmes in the early 2000s in the UK and abroad.7
Within the academic context, it was also during this earlier period that Jisc initiated a number of innovative programmes aimed at democratizing access to commercially available digital collections across the academic sector. Jisc Collections, the Jisc service that supports the procurement of digital content for higher education and research in the UK, negotiated national purchases of resources such as Early English Books Online and Eighteenth Century Collections Online, which would have a lasting impact on scholarly practice in the years to come.
In parallel, through large-scale digitization programmes, Jisc supported the creation of resources we now take for granted such as the British Library 19th-century newspapers and many other open access (OA) collections. As Jean Sykes noted, ‘Back in 2004 suggesting a small number of very large digitization projects was a bold and ambitious thing to do. But it has proved to be a seminal turning point in the UK where mass digitization is concerned.’8
In more recent years, however, the climate in which we operate has changed – and become a lot tougher. Digitization funding has reduced, making this a competitive and time-consuming activity for institutions to pursue, especially in the context of OA. On the other hand, the high cost of content is a concern for most academic libraries.9 This increasingly has the knock-on effect of inhibiting the ability that libraries have to purchase other resources such as digital archival collections, and that ‘end of year’ budget that tended to be carved out for this kind of material is quickly drying up.
While there is a sense in the community that digitization has perhaps plateaued in the last few years, academic libraries have been trying to grapple with what a more strategic and co-ordinated approach to digitization activity might look like for some time. Chris Pressler’s briefing on digitization of special collections both within RLUK libraries and across Europe, for example, was created a few years ago to support discussions in the development of a new RLUK strategy.10 The RLUK strategy also highlights the need to ensure that their libraries’ rich collections are discoverable and used by researchers and students to maximize their impact.11
More broadly, consultation that Jisc has carried out with library leads has revealed the need to support this kind of content in new ways, not least because of the recognition that the sciences tend to benefit from large grants but libraries also have to support the humanities. Libraries play a critical role in supporting the delivery of humanities courses. As humanities budgets shrink, libraries find themselves at the sharp end of satisfying student demand for resources which the students expect to be available, when they are now paying significant fees for their humanities courses, and this puts considerable pressure on library resources.
An alternative way forward?
In recent years, alternative funding models based on the notion of the crowd have started to make their way into the academic and cultural sector. Crowdfunding has been seen as a likely way to fund more risky or unconventional research,12 and digitization of heritage and archival material has been the subject of some big and small crowdfunding projects, from Neil Armstrong’s space suit13 to the Peter Mackay archive14 and WW1 records.15 Nesta, the UK innovation foundation, has also recently completed an Arts and Heritage Matched Crowdfunding Pilot project to explore the potential of crowdfunding in the arts and heritage sector.16
However, it is more strategic and community-based funding models such as the Open Library of Humanities17 for journals and Knowledge Unlatched18 for books that are having a wider impact on scholarly communication. So, when US-based Reveal Digital19 approached Jisc about their ‘library crowdfunding’ model for digitization of OA collections, we were keen to explore how such a model might work in the UK.
Reveal Digital – what it is?
Reveal Digital (RD) started in 2011 with its first collection, ‘Independent Voices’20 (IV) which brings together a vast array of 20th-century North American alternative press publications. RD’s approach is based on a cost-recovery OA model. The costs associated with digitizing, copyright clearing and delivering the collection are made available publicly. Once enough library pledges are accumulated, and after a brief embargo period, the collection is published on OA.
Pledging institutions benefit from early access to the collection in advance of it going OA, MARC records and copies of digital files, if they have contributed content. But above all they benefit from contributing to the strategic decision-making process of what gets digitized and who ultimately benefits from it. OA is a big driver for RD and the contributing libraries to ensure that once digitized, content does not become ‘out of bounds’ for many libraries due to often unaffordable paywall fees. While it is the pledging libraries that enable the digitization of all material through their financial contribution, it is important to note that all libraries, as well as society at large, eventually benefit from the investment of the supporting institutions once the content is made available on OA.
RD sees itself as a hub and facilitator of library OA publishing programmes by bringing together special collections archives, rights owners and contributing institutions. After the success of IV, which raised over US$1.7m from more than 130 libraries mainly in North America, RD is moving towards an ‘investment fund’ model to better align themselves with the strategic investments that libraries are making in OA initiatives and to create a more sustainable funding base for their operations. This model will involve libraries making a multi-year commitment to a common fund and providing strategic input into the selection and prioritization of which collections get digitized. The end result will be the same, the production of OA digital collections that support the humanities, with a focus on 20th-century material.
While in the UK we simply do not have the same scale or budgets as US universities, the emerging needs of academic libraries and scholars in relation to this type of resources are common. Might a similar approach work in the UK context?
Reveal Digital–Jisc collaboration
We felt that RD’s approach for a sustainable OA model aligned strongly with Jisc’s ethos. In addition, research undertaken by Jisc had highlighted academics’ demand for digitization of 20th-century material, the ‘black hole’ of digitization, so IV provided a very good fit for an experiment. Back in September 2016, when we started discussing a collaboration with RD, only the University of Sussex in the UK had pledged for IV. After consulting with staff at Sussex Library on what drove their pledge – a mixture of academic demand and support for the OA model – we were encouraged that there might be wider interest in the community.
In order to make access to IV more affordable to UK institutions, we reached an agreement with RD based on Jisc-banded pledging fees.21 These were set as a one-off payment, with no recurrent annual fees, at a 65% discount on the US pledging rates, and a pledging period Jan–July 2017 (later extended to 31 December 2017). UK pledging libraries get the same benefits as US libraries in terms of early access to the full IV collection and MARC records. As an incentive to UK libraries, we also agreed that 50% of the contributions from UK institutions would be kept by Jisc and put towards a ‘digitization fund’ to digitize UK-based complementary material, where possible sourced from contributing libraries, to add to the existing IV collection and also delivered in the UK by Jisc. Once digitized, the UK material will be available to everybody on OA. Participating institutions would set the priorities for content selection and digitization.
After holding an initial webinar in December 2016, followed by another one in June 2017, and some awareness raising through mailing lists and Jisc channels, it was clear that there were some enthusiastic early adopters and, at the time of writing (mid-September 2017), ten institutions had signed up: the universities of Sussex, Bristol, Sheffield, Manchester, the West of England, York, Reading, Salford, University College London and Birkbeck University of London. Pledging is open until 31 December 2017 through the Jisc Collections website.22
‘Independent Voices UK’ – new ways of working
Early conversations with the pledging institutions revealed that a key driver for them is developing OA collections. Even when an institution may not have any special collections of its own, they may have an active academic interest in content held by other participating libraries. By contributing to a jointly held fund, they support the opening up of content in perpetuity for the benefit not only of researchers and collaborative research, but also teachers, students and the public at large.
With Jisc agreeing to matchfund the small amount pledged by the libraries, in mid-July 2016 we started discussion with institutions on potential UK small/alternative press magazines to digitize as a complement to IV and how to approach the project as a whole. This in itself tests a new way for Jisc to work with its members. In the past, Jisc funded digitization programmes mainly with a view to enabling institutions to digitize their own collections. While Jisc had always been involved in developing standards and innovations in the way digitization is conducted and enabling exploitation of text and image and resource discovery, we had not directly co-ordinated selection and conversion activities.
As we approach the start of the project, it is clear that over the next year or so we will have to confront a number of challenges ranging from the nature of the 20th-century content we are dealing with, to approaches to selection criteria and copyright clearance, governance, operational workflows and final delivery and discovery of the digitized content.
Need for collaborative approaches
The material published by small presses is both narrow in one sense but immensely wide in another. Content may range from magazines (or zines) run off a mimeograph and cut and pasted on the kitchen table through to titles where production was undertaken along more traditional lines. Much of the material tends to be fairly rare as it is ephemeral in nature. Finding complete runs of a magazine is a challenge in itself. Quite often full sets are to be found in private collections. Institutions have often not collected this kind of material in a consistent manner so identifying complete runs will present a challenge.
An added complexity is that the content in question is ‘owned’ by communities associated with underground or small press publications. Many of these publications were driven by a particular personality or a group of personalities and are often related to a big cause such as ending racism, promoting collective action or challenging mainstream presses, so there are strong emotions at play for many of those who participated in making (and consuming) the publications.23
The intention of the project is for each contributing institution to have a say in what is ultimately selected, and each institution may have differing expectations. There are therefore a number of stakeholders that need to be part of the process for making this content more accessible. It is not just a case of Jisc and participating institutions deciding what we wish to digitize. We will need to work as collaboratively as possible with rights holders and those who were involved in the creation of the content to understand their motivations and expectations.
Defining demand and selection criteria
Ideally, demonstrating sufficient academic demand for a certain type of content would be a key criterion to satisfy. However, in this case, reaching consensus will be challenging, as demand within institutions might differ. With 20th-century underground press publication, we are dealing with an emergent set of interests. Those studying modern history, cultural history, sociology, feminism and feminist thought, the politics of the counter-culture and works of avant-garde art and literature are some of the key audiences for these types of publication. But one issue is that there is no corpus (bounded set). These publications often have short runs, exemplifying particular political interests or reflecting the work of a particular group of poets or artists. This does not mean that they are any less important and in fact some of them had a significant impact on the course of publishing history.
We may find ourselves in a situation where demand is hard to define and where a ‘long tail’ of different kinds of publications are wanted by many, as typical within arts and humanities scholarship.
The copyright and moral rights challenge
OA, the primary driver for most of the participating institutions, implies use beyond a defined community, so ensuring that rights owners are on board will be paramount. Jisc will not necessarily have the resources to clear the rights so we will most likely engage a specialist to help us and resource that from within the project budget.
To enable the clearance in a reasonable time frame, we will seek out key rights owners but we will also explore possibilities afforded by the Orphan Works Directive (OWD). Identifying rights owners will be a great challenge, as demonstrated by previous experience such as the British Library’s work on clearing permissions for the digitization of Spare Rib. Rights clearance was the biggest hurdle in that project with the team having identified over 4,500 contributors to Spare Rib and therefore potential copyright holders to clear permissions from.24 In some instances the authors may be entirely anonymous. This presents an opportunity under OWD, but rights owners need to be acknowledged, even though their names may not be known. In some instances rights owners may wish to stay anonymous or in others, where they are identifiable, they may not want to be associated with something they produced in their teens or which was never meant for wider publication and distribution. Wherever possible, we will need to work with originators of publications so that they maintain a feeling of ownership. We will need to test if there are any scalable approaches or community frameworks to copyright clearance that we can implement for future activities, including the potential offered by the Extended Collective Licence.
The challenge of making content discoverable
With small magazines, we will most likely provide a series-based presentation of the content in the same way as RD have done with the US content. We see the same approach with how Jisc presented Spare Rib.25 The grouping of the content is important in the same way that it is with archival material. With issue-based content on small runs, we are less dependent on platform-specific search functions, though full text can allow people to drill down to article level. However, with these publications, articles may be harder to define, the format of the magazine often being integral to how the content needs to be understood. If we are working with small magazines, we may be dealing with individual items, so we will need to work on the specificities of search once we have made our selection. Small magazines often have novel formulations, such as being presented in a box or having strange sequencing, e.g. not volume one, two, three. A publisher might not have followed conventions, such as providing issue numbers, or may have used no sequential numbering, and this will present its own challenges in terms of presentation and the approach to discovery. We hope to ensure the content is discoverable by delivering it through existing Jisc content service as well as enabling cross-search with the IV collection.
These are just some of the initial challenges that have surfaced as we begin work on this project in close collaboration with Jisc member institutions. Over the next few months as we progress our research on content, we will refine our approach to governance, selection criteria and clearing copyright and will devise a workflow for digitization and metadata creation. We will have to strike a balance between what is desirable and what is pragmatically doable and work in conjunction with the creators of the content, when possible.
We hope that this pilot project will provide us with a tangible output, and experience, for what can be achieved through a community-based approach to digitization and OA publishing. It will provide us with the starting point for a broader discussion with our member institutions, and other stakeholders such as the British Library and The National Archives, on what might be the elements and infrastructure of a model that can be adapted to the UK. We would be very interested in hearing the views of the readers. You can follow the course of the project and get in touch via the Jisc Content and Digitization blog.26