A persistent identifier (PID) is defined by Wikipedia as ‘a long-lasting reference to a document, file, web page, or other object … usually used in the context of digital objects that are accessible over the Internet. Typically, such an identifier is not only persistent but actionable: you can plug it into a web browser and be taken to the identified source.’1
PIDs have been around for a long time, especially in scholarly communications. Think of the ISBN (International Standard Book Number), first introduced in 1966; or its journal equivalent, the ISSN (International Standard Serial Number), launched five years later in 1971. But they really started to take off when scholarly communications went digital in the late 1990s, and with the launch of Crossref as a provider of digital object identifiers (DOIs) for research articles and other works. Since then, there has been a dramatic increase both in the number of persistent identifiers, with nearly 150 million DOIs assigned at the time of writing (October 2017),2 as well as close to four million ORCID iDs for researchers.3
PIDs enable authoritative, unambiguous, digital connections between people (researchers), places (their organizations), and ‘things’ (their research contributions and outputs). The research infrastructure – from more established tools like manuscript submission and grant application systems, to innovative new services such as Altmetric, Kudos and Publons – is increasingly reliant on these connections.
But, despite the ubiquity of PIDs in scholarly communications, until recently the PID community lacked a dedicated space in which to explore ideas of networked research and scholarly communications infrastructure. To fill this gap, in November 2016 a diverse group of experts from California Digital Library, Crossref, DataCite and ORCID organized the first PIDapalooza.4
Described as ‘the first open festival for scholarly research persistent identifiers’, PIDapalooza took its cue from the music festival after which it is named, Lollapalooza.5 The intention was to bring together PID enthusiasts – those who create and/or use persistent identifiers for scholarly communications – for two days of high-level but informal and interactive discussions.
What kinds of PIDs will we need in future? How should they be used? What are the best ways to get researchers to adopt and use PIDs? What are the theoretical and practical approaches to persistence and interoperability? These and many other questions were addressed in the first PIDapalooza, which was attended by 120 PID experts globally.
Much of the meeting was spent in short (half hour or less) parallel sessions, but there were also five plenaries. Together with the session on organization identifiers (which was so well attended it was virtually a plenary), these provide a good representation of the PIDapalooza experience.
First up was Jonathan Clark, Executive Director of the International DOI Foundation, whose talk, ‘PIDvasive – What’s possible when everything has a persistent identifier?’,6 looked at what we should expect from our persistent identifiers (as well as persistence and uniqueness). The answer: provenance, metadata, machine readability, and policies/guarantees. In a broad-ranging talk, Clark then went on to discuss the risk of having too many PIDs, the types of services that might be built on them, and the critical need for both interoperability and a social infrastructure for PIDs.
Day one ended with the second plenary, by Simon Porter, VP of Research Engagement and Information Architecture at Digital Science, entitled ‘Research Information Citizenship’.7 He called on each scholarly communications sector – universities, publishers, funders, service providers and researchers themselves – to play their part in making the digital research infrastructure work better. Porter also raised the need for collaboration to build shared infrastructure tools and services, especially among service providers.
Clifford Tatum, Project Manager at ACUMEN and researcher at Leiden University, kicked off day two. His talk, ‘Towards Governance of PID Portability for Research Evaluation’,8 looked at the use of PIDs in the collection of research information for the purpose of evaluation, and the challenges this creates – in a world where open science and interoperability are increasingly the norm – in terms of privacy, security and commercial concerns. Tatum’s proposed solution was to focus on improving the portability of PIDs through better standards and protocols.
The fourth plenary was by Herbert Van de Sompel, team leader of the Prototyping Team at the Research Library of the Los Alamos National Laboratory. His talk, ‘Signposting for Persistent Identifiers’,9 demonstrated that many papers cite uniform resource identifiers (URIs) other than the DOI URI, reducing the potential power of PIDs. His solution to this problem was to create a signposting pattern for PIDs to enable the automatic discovery and use of the DOI URI rather than other types of URI associated with the DOI-identified object.
The last official plenary speaker was Carly Strasser, Program Officer for the Data-Driven Discovery Initiative at the Gordon and Betty Moore Foundation. She had the (un?)enviable task of drawing together everything that went on at PIDapalooza and she did so with aplomb – and a little Lollapalooza inspiration. Strasser described her talk, ‘Reaching Nirvana: The Future of Persistent Identifiers’,10 as ‘a “Greatest Hits” of takeaways, lessons learned, points for discussion, and new directions’.
As mentioned, there was also an unofficial plenary – a very well-attended (and lively!) session on organization identifiers. Led by Patricia Cruse, Laure Haak and Ed Pentz (respectively Executive Directors of DataCite, ORCID and Crossref), it began with an update on the work that the three organizations had undertaken to review the current work on organization identifiers and define use cases. The rest of the time was spent on a wide-ranging discussion about next steps, with a range of (sometimes divergent) views expressed. However, there was general agreement that none of the current providers of organization identifiers meet all scholarly communications use cases – especially in terms of researcher affiliations – and there was support for a community working group to seek a solution to this challenge.
The response to PIDapalooza 2016 was enthusiastic, so we are now planning the next one, to be held in Girona, Spain on 23–24 January 2018. Like its predecessor, the goal of PIDapalooza 2018 is to create an open, welcoming atmosphere in which to discuss persistent identifiers, and it’s open to anyone who creates or uses PIDs.
Content will fall into eight broad themes:
Are PIDs better in our minds than in reality? PID stands for Persistent IDentifier, but what does that mean and does such a thing exist?
So many factors affect persistence: mission, oversight, funding, succession, redundancy, governance. Is open infrastructure for scholarly communication the key to achieving persistence?
PIDs for emerging uses
Long-term identifiers are no longer just for digital objects. We have use cases for people, organizations, vocabulary terms, and more. What additional use cases are you working on?
There are thousands of venerable old identifier systems that people want to continue using and bring into the modern data citation ecosystem. How can we manage this effectively?
What would make heterogeneous PID systems ‘interoperate’ optimally? Would standardized metadata and APIs across PID types solve many of the problems, and, if so, how would that be achieved? What about standardized link/relation types?
It’s a challenge for those who provide PID services and tools to engage the wider community. How do you teach, learn, persuade, discuss and improve adoption? What does it mean to build a pedagogy for PIDs?
Which strategies worked? Which strategies failed? Tell us your horror stories! Share your victories!
Kinds of persistence
What are the frontiers of ‘persistence’? We hear lots about fraud prevention with identifiers for scientific reproducibility, but what about data papers promoting PIDs for long-term access to reliably improving objects (software, preprints, data sets) or live data feeds?
The programme for PIDapalooza 2018 (which of course has its own DOI!)11 is not finalized at the time of writing, since proposals are still being accepted, but I can guarantee that the content will be just as diverse and thought-provoking as the last one and that the level of audience participation and engagement will be just as high. You can find out more on the pidapalooza.org website, register at http://pidapalooza2018.eventbrite.com, and follow @pidapalooza for updates on speakers, sessions, and more. (See Figure 1 for the official logo and a reminder of the dates.)