Research data is central to research; sharing and enabling access to research data are now seen as essential to research integrity. Making research data accessible goes beyond validation as it also supports new research and innovation. However, sharing of research data is not yet ‘business as usual’, though digital technology is making data sharing much easier and Jisc is currently harnessing this in partnership with the UK research community to develop the research data shared service (RDSS). The RDSS will enable research organizations to support researchers to easily deposit data for publication, discovery, safe storage, long-term archiving and preservation. Ultimately it will support researchers in sharing and re-using data and will enable increased reproducibility of research. The initial impetus for the development is to better enable institutions to meet policy requirements around research data, whilst exploiting efficiencies and best practice generated by working collectively. This article examines the development of this service so far, from initial ideas and requirements gathering to entering technical development.
Throughout this article we use ‘research data’ as defined by the Engineering and Physical Sciences Research Council (EPSRC) research data definition: ‘Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created.’1
For a number of years, managing research data has been on the agenda of research funders and organizations, and Jisc has worked with universities and funders to seek to address related needs. Towards the end of the Jisc managing research data programme, the Data Pool project at the University of Southampton summarized the situation as follows:
‘… it is clear that the issues surrounding research data management are becoming more complex rather than less. We now understand much more about the range of data to be managed, its size and sophistication and the expectations of researchers to manage workflows and share data. We also know that at institutional level the requirements of government and funders are placing potentially significant financial costs on institutions which they are finding challenging to discharge in the present financial climate.’2
This characterizes the environment in which the research data shared service (RDSS)3 emerged as a priority development for Jisc. Through the consultation process surrounding the Research at Risk4 co-design challenge – led by Jisc in partnership with Research Libraries UK (RLUK), the Russell Universities Group IT Directors forum (RUGIT), The Society of College, National and University Libraries (SCONUL) and the Universities and Colleges Information Systems Association (UCISA) – the following high-level goals that higher education institutions (HEIs) would need a shared service to meet were identified:
Initially, the need to address the inefficient approach to RDM drove the project and, through consultation with the sector, a picture emerged of a fragmented RDM landscape. This was made up of a number of existing commercial and open source services that fulfil requirements for parts of the services needed to meet universities’ goals, but there was no one system that fulfilled them all. The procurement and development of all the necessary components for a complete end-to-end RDM solution can be resource intensive for institutions to piece together. This market and the varying approaches of institutions can lead to systems with gaps in functionality or institutions with no RDM systems at all. Widespread gaps in functionality were noted in terms of interoperability between research systems and a lack of preservation actions and systems beyond backing up data. A Jisc-procured, -managed and -hosted system would relieve institutions of this procurement, management and development burden and would create financial efficiencies in the sector, as well as enabling best practice in RDM and meeting the funder mandates.
At the same time as our initial work in late 2014, early 2015, there was a national policy discussion under way about developing a Concordat between funders and universities on Open Research Data.7 This was discussed by university leaders via Universities UK8 and the need for a practical solution to support the aspirations of the Concordat was sought. The Jisc RDSS project was therefore seen as an important development to meet this need.
In response to the top-level priorities from institutions, Jisc defined a scope for the initial RDSS project. We would focus on a system that will allow the ingest, publication, long-term storage and preservation of finalized data objects for publication or archiving and would create links to existing services in the ‘data creation’ and ‘managing active data’ parts of the lifecycle (see Figure 1). Therefore it would not include provision for active ‘live’ data storage systems for objects that are being created and worked on by researchers within their own workflows.
In order to produce a system that was relevant to as much of the higher education sector as possible, a comprehensive requirements gathering process was undertaken in the second half of 2015. This process consisted of three main components:
The institutional survey around research systems was designed to look at the current use of research systems and current state of RDM readiness, current and future data storage needs and what the sector wanted Jisc to provide in the RDM space. The responses showed a range of maturity, with some universities already having plans in place, but a large proportion of respondents sought Jisc action and it was clear that research data needs were set to increase fairly dramatically over the next three years.
Desk research looked at existing requirements for RDM systems that had been put together by institutions within the previous Jisc ‘managing research data’9 programme of work and those requirements emerging from Jisc’s ‘research data spring’10 initiative. Projects that were initially drawn upon included:
These requirements were combined with survey findings and aggregated into an overall requirements document19 that was consulted on and prioritized by the community offline and in consultation workshops. We did not want to create new requirements from scratch, but use best practice from the UK and around the world.
The consultation workshops took the form of a small workshop20 to test the water with conceptual ideas, to explore potential architectures (such as the one produced by the University of Sheffield)21 and to get expert input from institutions that were already tackling RDM systems. A larger event,22 involving 70 representatives from UK HEIs, was then organized and provided feedback on priorities and requirements. Participants also shared knowledge and experiences on current gaps in provision in RDM services, lessons learned from relevant service implementation at an institutional level and discussed aspirations and concerns about shared services for RDM.
There was consensus that Jisc should:
A smaller workshop with potential system suppliers, who provided feedback and advice on the proposals and requirements generated by the community, was also held. The end result of this stage of requirements gathering and analysis was the operational requirements for the RDSS that were published as part of The Official Journal of the European Union (OJEU) tender for the service.23
Since the steer from the sector was that any solution needed to cater for a range of institutional needs, Jisc sought a number and a variety of pilot HEIs to partner with to develop the RDSS. In order to create a ‘balanced portfolio’ universities that had expressed an interest in joining the pilot were evaluated against criteria taking into account the size and type of institution (e.g. research intensive, small and specialist, teaching led), availability of varying types of data, their degree of RDM readiness (e.g. from greenfield sites to more established) and current use of institutional systems.24 As a result of this selection process, Jisc is collaborating with the following pilot institutions to develop the service:
A range of goals from the pilots have been prioritized and a selective summary includes:
In-depth technical and user requirements gathering has taken place with each pilot institution and these have been aggregated and prioritized across the board to define a development path.
While there would be some bespoke development required and there was added value to be developed across the end-to-end system, there was a decision to build from what already existed. So Jisc used an OJEU procurement process to create a supplier framework from which to select system suppliers and developers to create the service. The framework was divided into eight lots as follows:
More information on the detail of the functions of these lots can be found in the RDSS Operational Requirements document25 and a list of successful suppliers on the lots can be found on the Jisc RDM blog.26
RDSS will allow researchers to deposit data for publication, discovery, safe storage, long-term archiving and and preservation. However, this raises a whole range of different questions, including:
Over the course of 2016 Jisc worked with Research Consulting27 and the pilot institutions to try to find some answers.
To carry out this work it was decided that the Data Asset Framework (DAF)28 provided a useful methodology and had been tried and tested by institutions around the world. The DAF was developed in 2009 to help organizations identify, locate, describe and assess how they are managing their research data assets. It uses surveys and interviews to gather the necessary information, and we chose to follow a similar approach. Even better, six of the RDSS pilot institutions had already run a DAF survey within the last couple of years, which provided us with some ready-made data. However, when we came to analyse this data, we found that every institution had tweaked the survey questions to some extent, leaving us with a set of individually valid results that could not be meaningfully aggregated. What is more, the research data landscape has evolved rapidly since 2009. Funder policies on data sharing are much more demanding, the use of current research information systems is more widespread, and new services such as ORCID29 and DataCite30 have emerged. As a result, a lot of the information we needed to know was not covered by the original Data Asset Framework31 question set.
The obvious solution was to develop a new version of the survey: ‘The 2016 DAF survey’. Using the 2009 version as a starting point, Jisc staff and the RDSS pilot institutions developed a revised question set between April and June 2016. The new survey was then run by six RDSS pilots in July and August of that year. The data included 1,185 unique responses. For a summary of the headline results, see the summary slideshow.32
The survey data allows us to draw some broader conclusions on the current state of RDM in the UK:
In addition to informing the requirements for the RDSS via this work, we have developed resources that could be reused. A new DAF toolkit has been published that outlines the steps involved in running a DAF survey, and makes recommendations on how to approach them.
As RDSS aims to create a multidisciplinary data service, we need to make sure that our metadata and data models and associated processes can support the use cases of a wide range of researchers from extremely diverse research domains, so we needed to test our ideas with researchers. To achieve this, Jisc worked with Clax35 to hold nine focus groups with pilot institutions’ researchers. A full report from the focus groups is available36 alongside a comprehensive set of researchers’ use cases;37 some of the emerging themes are laid out below:
The requirements gathering and user needs work led to the ambition for the RDSS being defined, as detailed in Figure 2. This diagram shows the core RDSS platforms of repository, preservation and reporting systems in the centre, backed up by Jisc-provided data storage. This includes input from the researchers via a web user interface and other tools to manage large and sensitive data. Interoperability is key to the service and a number of integrations have been identified, such as integrations with CRIS systems, data management planning tools and publications repositories. RDSS will not function in isolation: links will be made to the scholarly communications infrastructure through the use of permanent identifiers, such as DOIs and ORCID iDs, and aggregation services, RDSS will utilise metrics and altmetrics infrastructure and where possible will communicate with funder systems.
RDSS will also join up with other services, policy, practice and standards that are being developed through the Jisc Research at Risk portfolio, such as being interoperable with and providing metadata to the UK Research Data Discovery Service,39 integrating the metrics and usage statistics work developed in IRUS Data UK40 and integrating ORCID41 into the service. The pilot’s requirements were already informed by Jisc’s funder policy guidance42 and will look to harness the innovation from successful research data spring projects.
Much of this article has been about laying the foundations for delivery for RDSS. However, Jisc entered technical production for RDSS alpha in November 2016. A technical architecture approach was agreed with the pilots, suppliers and Expert Advisory Group in October 2016. Some of the key elements to the approach were:
A conceptual diagram of the architecture can be viewed here.43
This work also laid out our expectations in terms of how we work with suppliers in development to testing and rolling out a production service.
For RDSS to run effectively as a whole it requires an underlying data model, not just to allow researchers to fill in forms to ingest data and provide the minimum metadata for a DataCite DOI, but also to allow systems to create and push events and messages to each other to achieve the goal of an end-to-end system. The data model also allows for the enrichment and auto-generation of metadata by allowing for links to institutional systems and external, scholarly communications systems.
A data model for RDSS alpha has been produced for consultation.44 This current draft of the data model has been constructed under the requirements of being interoperable with as many of the products and services that have been identified in the supplier lots or elsewhere. For example, the data model is aligned to the metadata requirements of scholarly communications services.
The result of this work is a dense and complex data model, involving many entities, relations and vocabularies. The role of the consultation is to fine-tune this data model from a practical perspective and ensure that it is interoperable with existing infrastructure. This means that properties can be removed if not used within the system being described. A reduced data model (and vocabularies) can then be identified as core, with additional structure added as required.
Jisc is now in alpha development and is setting up the foundations for the service. This includes putting in place the technical architecture and on-boarding all of the repository and preservation systems suppliers in a test environment. This test environment also includes widely used research systems, such as EPrints and DSpace as well as test data, so that our suppliers and developers can work together to deliver interoperability early to our pilot institutions. A User Experience Lead has been appointed to provide principles and a governance framework across the project. Discovery, requirements gathering and testing will continue with the pilot institutions, along with community input from the sector through workshops.45
Other work currently under way is around cost reporting and business modelling, so that Jisc can produce a financially attractive offer to the HE sector for the production service. Alongside that, market research is also taking place to discover the needs of institutions outside the pilot group and whether a Jisc-hosted RDM system is attractive to them.
Alpha development is due to finish in Summer 2017 and beta development will look at scalability of the system, more integrations and challenging issues such as managing large data sets, managing sensitive data sets and challenges around preserving such a diverse set of digital objects that fall outside the remit of the traditional digital preservation community. Beta development is due to finish in April 2018; however, a business case for a production service will be presented to the Jisc board by December 2017.
Throughout the alpha and beta phases, the RDSS project has been given challenging requirements from our pilot institutions, including:
The authors would like to thank the Jisc RDSS Team, RDSS pilot institutions and their researchers, RDSS Expert Advisory Group, RDSS Framework suppliers, Digirati, Research Consulting, Clax, Open Preservation Foundation, Digital Preservation Coalition and all of the consultation workshop participants.
A list of the abbreviations and acronyms used in this and other Insights articles can be accessed here – click on the URL below and then select the ‘Abbreviations and Acronyms’ link at the top of the page it directs you to: http://www.uksg.org/publications#aa
The authors have declared no competing interests.
EPSRC Scope and Benefits: https://www.epsrc.ac.uk/about/standards/researchdata/scope/ (accessed 26 January 2017).
Brown, M L and White, W (2013). A partnership approach to research data management In: Pryor, G, Jones, S and White, A eds. Delivering Research Data Management Services: Fundamentals of Good Practice. London: Facet. http://eprints.soton.ac.uk/id/eprint/356247 (accessed 26 January 2017).
Jisc research data shared service: https://www.jisc.ac.uk/rd/projects/research-data-shared-service (accessed 26 January 2017).
Jisc Research at Risk: https://www.jisc.ac.uk/rd/projects/research-at-risk (accessed 22 February 2017).
EPSRC policy framework on research data: https://www.epsrc.ac.uk/about/standards/researchdata/ (accessed 26 January 2017).
Research Excellence Framework 2014: http://www.ref.ac.uk/ (accessed 26 January 2017).
Research Councils UK (). Concordat on Open Research Data launched: http://www.rcuk.ac.uk/media/news/160728/ (accessed 26 January 2017).
Universities UK: http://www.universitiesuk.ac.uk/ (accessed 26 January 2017).
Jisc Managing Research Data programme: https://www.jisc.ac.uk/rd/projects/managing-research-data (accessed 26 January 2017).
Jisc research data spring: https://www.jisc.ac.uk/rd/projects/research-data-spring (accessed 26 January 2017).
University of Leeds Research Data Management Pilot Roadmap (). Project Outputs: https://library.leeds.ac.uk/roadmap-project-outputs (accessed 26 January 2017).
Parsons, T and Berry, M (2012). Research Data Management Technical Requirements: A report to ADMIRe and IS stakeholders, Nottingham, The University of Nottingham: https://admire.jiscinvolve.org/wp/files/2013/05/ADMIRe-RDM-Technical-Requirements-Report.pdf (accessed 26 January 2017).
Garret, L, Silva, C and Gramstadt, M-T (2011). Kaptur technical analysis report, University for the Creative Arts: https://vads.ac.uk/kaptur/outputs/Kaptur_technical_analysis.pdf (accessed 26 January 2017).
Jones, R (2012). Sword Data Deposit Scenarios. July 3 2012 SWORD: http://swordapp.org/2012/07/data-deposit-scenarios/ (accessed 26 January 2017).
Filling the Digital Preservation Gap: https://www.york.ac.uk/borthwick/projects/archivematica/ (accessed 26 January 2017).
Miller, A (2015). Project Report: A consortial approach to integrated RDMS In: figshare. https://dx.doi.org/10.6084/m9.figshare.1480451.v1 (accessed 26 January 2017).
Data Vault: http://libraryblogs.is.ed.ac.uk/jiscdatavault/ (accessed 26 January 2017).
Effective learning analytics: https://www.jisc.ac.uk/rd/projects/effective-learning-analytics (accessed 26 January 2017).
Kaye, J (). Jisc RDM Shared Service Pilot Initial Statement of Requirements: https://researchdata.jiscinvolve.org/wp/files/2015/11/Draft-RDMSS-Requirements-Specification-V1.0.docx (accessed 26 January 2017).
Duca, D (2015). What makes up the ‘ideal’ research data management system?. July 9 2015 Jisc Shared Services Workshop: https://researchdata.jiscinvolve.org/wp/2015/07/30/makes-ideal-research-data-management-system/ (accessed 26 January 2017).
Lewis, J A (2014). Research Data Management Technical Infrastructure: A Review of Options for Development at the University of Sheffield In: figshare. https://dx.doi.org/10.6084/m9.figshare.1202230.v9 (accessed 26 January 2017).
RDM Shared Services November Workshops: https://researchdata.jiscinvolve.org/wp/2015/11/23/rdm-shared-service-workshops/ (accessed 26 January 2017).
Kaye, J, Stokes, P and Bruce, R (). Jisc Research Data Shared Service Operational Requirements. Zenodo: http://doi.org/10.5281/zenodo.48261 (accessed 26 January 2017).
Research Data Management Shared Service – Call for Formal Expressions of Interest. https://researchdata.jiscinvolve.org/wp/2015/11/06/research-data-management-shared-service-call-for-formal-expressions-of-interest/ (accessed 26 January 2017).
Kaye, J, Stokes, P and Bruce, R (). Jisc Research Data Shared Service Operational Requirements. Zenodo: http://doi.org/10.5281/zenodo.48261 (accessed 26 January 2017).
Kaye, J (). Research Data Shared Service – OR2016: https://researchdata.jiscinvolve.org/wp/2016/06/14/jisc-research-data-shared-service-or2016/ (accessed 26 January 2017).
Research Consulting: http://www.research-consulting.com/ (accessed 26 January 2017).
Data Asset Framework (DAF): http://www.data-audit.eu/index.html (accessed 26 January 2017).
ORCID: https://orcid.org/ (accessed 14th January 2017).
DataCite: https://www.datacite.org/ (accessed 26 January 2017).
Research Consulting: 2016 DAF Survey Results: https://researchdata.jiscinvolve.org/wp/files/2016/11/2016-DAF-survey-results-for-blog.pptx (accessed 26 January 2017).
Johnson, R, Parsons, T, Chiarelli, A and Kaye, J (2016). Jisc Research Data Assessment Support – Findings of the 2016 data assessment framework (DAF) surveys. DOI: https://doi.org/10.5281/zenodo.177856 Zenodo; (accessed 26 January 2017).
Johnson, R, Chiarelli, A and Parsons, T (2016). Data asset framework (DAF) survey results In: figshare. http://dx.doi.org/10.6084/m9.figshare.3796305.v4 (accessed 26 January 2017).
Clax Ltd. http://www.clax.co.uk/ (accessed 26 January 2017).
Ferguson, N (2016). Report for the proposed Research Data Shared Service on focus groups held between May and October 2016 and the metadata issues and requirements identified. DOI: https://doi.org/10.5281/zenodo.193018 Zenodo; (accessed 14 January 2017).
Ferguson, N (2016). Jisc Research Data Shared Service metadata focus group use cases [data set]. DOI: https://doi.org/10.5281/zenodo.193011 Zenodo; (accessed 26 January 2017).
Kaye, J and Bruce, R (). Research data shared service, Poster to Open Repositories 2016: https://researchdata.jiscinvolve.org/wp/files/2016/06/RDSS_POSTER_JUNE2016_FINAL.pdf (accessed 26 January 2017).
UK Research Data Discovery Service (Alpha) (). http://ckan.data.alpha.jisc.ac.uk/dataset (accessed 26 January 2017).
IRUS for Data [data set]: https://www.jisc.ac.uk/rd/projects/research-data-metrics-for-usage (accessed 26 January 2017).
UK ORCID consortium (). https://www.jisc.ac.uk/orcid (accessed 26 January 2017).
Meeting the requirements of the EPSRC research data policy: https://www.jisc.ac.uk/guides/meeting-the-requirements-of-the-EPSRC-research-data-policy (accessed 26 January 2017).
RDSS Conceptual Technical Architecture: https://www.lucidchart.com/documents/edit/6398e8e5-51fb-46ff-8e08-7f97e8861265 (accessed 16 January 2017).
RDSS Canonical Data Model: https://github.com/JiscRDSS/rdss-canonical-data-model (accessed 26 January 2017).
Research Data Network: http://researchdata.network (accessed 26 January 2017).
Jisc RDM Blog: https://researchdata.jiscinvolve.org/wp/ (accessed 26 January 2017).