Introduction

The University of Glasgow has a history of providing research data management (RDM) support, with a fuller service established in 2012. We provide training and advice to help researchers draw up robust, costed RDM plans. In addition, as part of our suite of repositories that encompass publications, theses and impact, we maintain a data registry and repository to store and publish details about data sets created by researchers.

However, limited work has been undertaken to actively preserve other classes of digital records, e.g. corporate records and digital archives deposited with Glasgow University Archive Service. University staff recognized that this issue needed to be addressed to safeguard long-term access to key records, as they underpin business continuity and legal compliance.

A group of staff from Information Services, in particular from Research Information Management, were therefore tasked with investigating digital preservation approaches and solutions. The decision was taken to participate in a digital preservation service pilot project which had been set up by Jisc, a membership organization providing digital solutions and advice for UK education and research, as this represented an opportunity to learn more about implementing digital preservation while contributing to wider sector development. Project team members also investigated what good practice for digital preservation comprises, talked to people already carrying out digital preservation activities to learn from their practical experience, and explored current practice within the University for managing records.

Jisc Open Research Hub

Jisc launched a Research Data Shared Service (RDSS) pilot project in January 2016. The project’s intended outcome was to develop a hosted, subscription-based, management service offering a data repository, digital preservation functions and integrated management and reporting, principally for preserving research data, but also open to other types of records. At the time of writing, three years later, Jisc is testing the first version of its new service, Open Research Hub, with the pilot group before a full launch later in 2019. More information about the pilot can be found on Jisc’s RDSS project webpage.

Pilot participants were asked to test two digital preservation tools, Archivematica and Preservica, investigate integrations with existing software in their institution, and share feedback with Jisc and other participants to help define service requirements and shape the development of the final product.

As the University of Glasgow already had an established RDM service, we joined the Jisc shared service pilot project in August 2017 to investigate the preservation of digital corporate and archival records. The University’s Digital Preservation Policy and draft list of digital preservation requirements provided a framework for our activities. A new member of staff was appointed in January 2018 to help deliver the pilot project.

We tested the preservation tools with a variety of document types and shared our findings with the other pilot institutions via the project discussion board. We discovered that the two preservation tools are quite different in the way they work and the user requirements they seek to meet. Archivematica is primarily a preservation tool. It carries out preservation actions on uploaded material and creates archival-standard packages ready for storage and ongoing management in a separate system of the user’s choice. The user can view details of each step in the process and create customized workflows. Preservica provides a full digital management system, offering ongoing storage and management of preserved material, granulated user access and reporting tools in addition to its preservation functions. Working with these two tools prompted us to consider what systems and tools the University already has for managing digital records, how we would transfer records securely between existing document management systems and a preservation tool, and, in particular, the level of automation or staff intervention which would be desirable. Consequently, we have been able to incorporate these issues into our recommendations for a digital preservation programme for our institution.

Key activities and learning

In addition to testing and evaluating the preservation tools, project team members undertook a number of other activities to gain an insight into the University’s requirements, understand ways to get a digital preservation programme up and running, and develop insight into good practice.

Understanding the University’s requirements

A number of key questions about aspects of the preservation process emerged for the project team as a result of the Jisc project:

  • At what point in a digital object’s life cycle do preservation actions need to be applied to ensure that it remains accessible?
  • What is the best way to manage metadata?
  • How do we integrate preservation activities into existing University systems and workflows?
  • How do we ensure the terminology is comprehensible to all users?

As we collated information about current policies and practices for managing current, semi-current and archival digital material within the University, we started asking when preservation actions would need to be undertaken on digital material. Paper records are generally straightforward to manage. Once they are no longer actively used, the records can be placed in a records management store until ready for disposal or transfer to archival storage for permanent preservation. Preservation actions are only usually applied once records have been transferred to the University Archive Service, possibly many years after the records were created. However, it is clear that this model is not an effective template for digital records, which may require active intervention long before they become the responsibility of the Archive Service. In particular, we identified record series, e.g. pension records, which will not be kept permanently but which will be retained for many years. Such records will require active care to ensure that they remain accessible throughout their lifetime.

We also learned that early intervention may be necessary to ensure that the appropriate version of records is retained by departments and that they are accompanied by sufficient metadata to interpret them. For example, in a University team which creates digital films, only compressed web versions were being retained and not the high resolution masters. In addition, staff moved quickly from one project to the next and did not have time to document information about each film. Consequently, when Archive staff were invited to select material for permanent preservation, in some instances they discovered that preservation-quality files were missing or only minimal metadata was available. Staff were able to recall details of more recent projects but key facts about material just three or four years old were uncertain, especially if the creator had left. Even the date of creation was difficult to establish in some cases. We are conscious that this situation could be replicated in other departments. Therefore, liaising with staff to ensure that key records are identified and retained, along with adequate information to support future interpretation and access, will be fundamental to the success of our digital preservation programme.

Stack of analogue film canisters.

National Library of Scotland, used under CC BY 4.0 licence.

How metadata should be managed, stored and accessed has generated a great deal of discussion among team members. We wondered whether digital material should be preserved with a complete copy of the metadata which exists for it, or whether just basic details should be included in the archival package and users referred to the relevant catalogue for full details e.g. the archive catalogue (see Figure 1). As metadata may be updated over time, we feel that it is better to simply preserve basic information with the archival package and maintain the separate catalogue entry as the definitive metadata source. However, as part of our next steps, we would like to talk with other institutions to see how they manage their metadata.

Figure 1 

Options for managing and accessing metadata

A particular appeal of the Jisc shared service project was the opportunity to collaborate with others in the sector and learn with them. Regular pilot group meetings and webinars kept us up to date with project developments and more importantly, provided a forum for questions and discussion with other participants. Software user forums were a useful place to ask questions and find out how to make optimal use of the tools. We also carried out some joint testing with colleagues working on the Jisc pilot project at the University of St Andrews. Like us, they are investigating the preservation of corporate records and theses, so it was very beneficial to talk through technical issues and organizational questions relating specifically to these record types. We tested several processes using each preservation tool and compared our experiences and outcomes. The staff from St Andrews showed us how to create and upload a tailored processing configuration to specify how theses are processed within Archivematica. This was very useful and we have experimented with it subsequently. We agreed that content may need to be encrypted when it is submitted for preservation processing and once it is stored as an archival package, and discussed how this might be achieved. We also explored options for managing user access and thought about who might interact with the preservation tool and in what ways. This collaboration was very productive and has helped us formulate how we might deliver digital preservation and make choices about access, security and workflows.

Collaborative working

Many record custodians within universities and the cultural heritage sector are currently seeking to establish what good practice for digital preservation looks like and are addressing similar technical and organizational challenges. It therefore makes sense for us to work together to understand the issues involved, share experiences and identify simple steps which people can implement. Over the past couple of years, project members have undertaken training and met with practitioners in different fields to discuss implementing digital preservation in practice.

Our project team has benefited greatly from our membership of the Digital Preservation Coalition (DPC), a membership organization facilitating knowledge exchange, technological research and engagement. Team members attended various DPC workshops, including ‘Getting started with Digital Preservation’ in May 2018, a helpful introduction to the organizational risks and priorities we should be considering in addition to technical factors. Practical exercises tackled risk management, digital asset registers and applying the National Digital Stewardship Alliance (NDSA) levels of digital preservation (see Table 1). These helped us to analyse what our organization is already doing and pinpoint some straightforward actions which we could take to improve record preservation, e.g. ensuring copies of essential records are stored in more than one geographical location.

Table 1

National Digital Stewardship Alliance levels of digital preservation, Version 1

NDSA, 201, used under CC BY 4.0 licence

Level 1 (Protect your data)Level 2 (Know your data)Level 3 (Monitor your data)Level 4 (Repair your data)

Storage and Geographic Location
  • - Two complete copies that are not collocated
  • - For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the medium and into your storage system
  • - At least three complete copies
  • - At least one copy in a different geographic location
  • - Document your storage system(s) and storage media and what you need to use them
  • - At least one copy in a geographic location with a different disaster threat
  • - Obsolescence monitoring process for your storage system(s) and media
  • - At least three copies in geographic locations with different disaster threats
  • - Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems
File Fixity and Data Integrity
  • - Check file fixity on ingest if it has been provided with the content
  • - Create fixity info if it was not provided with the content
  • - Check fixity on all ingests
  • - Use write-blockers when working with original media
  • - Virus-check high risk content
  • - Check fixity of content at fixed intervals
  • - Maintain logs of fixity info; supply audit on demand
  • - Ability to detect corrupt data
  • - Virus-check all content
  • - Check fixity of all content in response to specific events or activities
  • - Ability to replace/repair corrupted data
  • - Ensure no one person has write access to all copies
Information Security
  • - Identify those people authorized to read, write, move and delete individual files
  • - Restrict who has those authorizations to individual files
  • - Document access restrictions for content
  • - Maintain logs of who performed what actions on files, including deletions and preservation actions
  • - Perform audit of logs
Metadata
  • - Inventory of content and its storage location
  • - Ensure backup and non-collocation of inventory
  • - Store administrative metadata
  • - Store transformative metadata and log events
  • - Store standard technical and descriptive metadata
  • - Store standard preservation metadata
File Formats
  • - When you can advise on the creation of digital files, encourage use of a limited set of known open formats and codecs
  • - Inventory of file formats in use
  • - Monitor file format obsolescence issues
  • - Perform format migrations, emulation and similar activities as needed

Meanwhile, the DPC’s workshop on migrating data between systems (July 2018) offered a valuable opportunity to learn from information professionals who have been managing and preserving digital records for some time. As before, contributors emphasized that the key factors are organizational and policy related, rather than technical. All agreed that cleaning and enhancing metadata takes up a considerable amount of time, so this needs to be built into project timescales. Speakers recommended planning and testing exit strategies throughout the life of your preservation system, not just when you decide to remove your data from it. Our group enjoyed an animated discussion about ways to verify that all data has been migrated correctly from one system to another and is intact: it was most useful to hear how attendees had addressed this. Learning about good practice now, before we set up any digital preservation tools and workflows, should help us to put together a robust preservation system with exit strategies built in.

We also collaborated with DPC colleagues to organize our own digital preservation events, in March 2017 and, on International Digital Preservation Day (30 November 2017), ‘Aye Preserve: Digital Preservation in the West of Scotland’. This brought together digital practitioners from different backgrounds to network, share best practice, discuss challenges and update themselves on recent developments and training. It was a great opportunity for us to benefit particularly from the experiences of practitioners in the business sector, where digital preservation is already routine.

Colleagues implementing digital preservation in other universities have also been generous in sharing their learning with us. For example, the University of Westminster’s University Records and Archives team has been actively preserving digital archival material since 2016, using a cloud-based, managed digital preservation service. Staff showed us the automated preservation workflows which they have set up for specific types of material and it was very helpful to see preservation processes running in a live environment. The University of Westminster Records and Archives team’s webinar addressed questions we have about managing relationships between preserved data and metadata in catalogue systems, cataloguing digital material, and ensuring sensitive records are dealt with appropriately. It also raised the interesting question of what is ‘good enough’. If two files out of one hundred in an archival package have not migrated successfully from their original file format to a preservation format, do you investigate and then run the whole preservation process again? Or, given time pressures and quantity of material to process, do you have to accept that this is ‘good enough’? When we start to process records at scale, then we may need to accept that not all errors can be fixed. These are useful questions for us to contemplate now as we consider what level of preservation service we will be able to deliver. The Westminster University Records and Archives team also suggested that processing digitized records first of all, before tackling born-digital material, acted as a helpful first step to test processes and gain confidence, so this is something we will bear in mind.

The Research Data team at The University of Strathclyde also took time to show us how they preserve their research data. We were able to watch how they transfer the data from their data management system into the preservation tool and create a workflow for preserving digital research data. Again, it was useful to discuss the choices they have made about workflows and tools.

Overall, learning from colleagues’ practical experience and seeing both simple first steps and established preservation programmes in action has been very constructive. They have helped us to put together recommendations for establishing a digital preservation programme, given us a solid understanding of first principles and reassured us that we will not ‘get it wrong’.

Tackling terminology barriers

Technical terms within any field can act as a barrier to effective communication and a shared understanding. The field of digital preservation is full of sector-specific terminology and the project team has had to familiarize themselves with many new terms in order to understand the processes and literature. However, digital preservation terminology can be confusing or unclear, and the same term can mean different things to different people. Some terms, e.g. ‘digital curation’, are so problematic that people avoid using them. There are a few glossaries online, such as the one produced by the DPC, but their contents differ and there is no fully comprehensive resource which people can refer to, so clarifying exactly what a term means is not always straightforward.

The University project team therefore decided to run a digital preservation terminology workshop at the CASRAI (Consortia Advancing Standards in Research Administration Information) Reconnect UK 2018 conference, based on the DPC glossary. CASRAI is an international, not-for-profit organization seeking to develop efficient information requirements for research organizations. Participants were asked to highlight terms that they did not understand and list words which they had expected to find but which were absent. Terms considered ambiguous or unclear included ‘authentication’ and ‘metadata’, and attendees discussed possible definitions. Meanwhile, ‘graphical user interface’, ‘bit rot’ and ‘user experience’ were among the terms identified as not currently in the glossary.

The workshop participants considered that it is easier to help develop terms when they are presented in a more layered arrangement. They also agreed that, rather than arranging glossary terms alphabetically, terms grouped together in families e.g. acronyms for organizations, preservation tools/processes, or standards, are more useful, can be defined more effectively and be contextualized. This is the way the CASRAI dictionary works: it defines the root term, then presents context-specific uses (root term with qualifying adjective) as separate entries, rather than trying to make the root definition cover all uses (see Figure 2, for example). A report was compiled summarizing the workshop findings and was shared on the CASRAI forum for further discussion.

Figure 2 

Example of definitions of types of archives in the CASRAI dictionary

Following on from this workshop, CASRAI and the DPC are investigating whether they can work together to enhance and maintain a comprehensive collection of digital preservation terms and we hope to be able to support these efforts.

Conclusion

Addressing digital preservation can seem a daunting task, with its subject-specific vocabulary and complex technical requirements around metadata, file formats and integration of tools and systems. Managing files is not necessarily a prime concern for staff or stakeholders within the organization, and competing priorities make it difficult to promote digital preservation, secure adequate resources and implement change. However, this period of experimentation and learning for our project team has demonstrated how the collaborative approach of the preservation community is helping to define good practice, identify robust, workable solutions for any scale of organization and develop resources for everyone’s benefit.

Next steps

The University of Glasgow Digital Preservation Working Group has submitted a paper to the Senior Management Team to establish what the University’s priorities are for digital preservation in the short and medium term. Meanwhile, Jisc has just launched its new preservation and data repository service, Open Research Hub, developed out of the pilot project. Once we know how the University wants to move forward with digital preservation, we will be able to decide whether Jisc’s new service will best fulfil the University’s needs or if another approach is required.

While we are waiting for the Senior Management Team’s decision, we will use some of the assessment tools we have learned about to evaluate the University’s current processes for managing digital material. The results should help us to identify straightforward steps to improve what we do.

We also plan to start actively preserving some digital records. Glasgow University Archive Service staff have identified a collection of born-digital films which would benefit from preservation, so we have started work to identify what preservation actions are required, format the metadata and set up a preservation workflow. As well as meeting a real need, this will provide useful experience in developing preservation processes.

Finally, we will continue to organize and co-host ‘Aye Preserve’ events in the West of Scotland with our colleagues from the DPC and will work with them and CASRAI to clarify preservation terminology. Our digital preservation blog, Digital Preservation @ University of Glasgow, will remain a place to share our journey with the wider community.

Participation in the Jisc pilot has acted as a catalyst to take more decisive action and establish a clearer framework for digital preservation within our institution. Talking to people who are already undertaking digital preservation activities and learning from their expertise and practical experience has given us confidence to make a start.