The identifiers that underpin everything in research need stronger foundations, says Ulrich Herb
Persistent Identifiers are vital components of modern scholarly communication and research, essential for identifying and categorising a wide array of digital and non-digital entities. Perhaps the best-known example is the Digital Object Identifier—a unique label assigned to objects or entities such as scholarly papers that ensures their distinctiveness and directs users to their locations.
PIDs are more than just labels and signposts. They also provide valuable metadata, offering insights into the content and technical attributes of the identified object. This facilitates connections with other entities, interlinking and integrating research objects, and creating a network known as the research graph. This web of information gives a comprehensive view of research activities, encompassing individuals, institutions, methodologies and outcomes, and an overview of the complex landscape of open science.
PIDs also increase efficiency: reference-manager software can use DOIs to look up metadata, saving authors time, while current research information system software can be effortlessly updated with publication and project data, reducing workloads.
In 2022, a team of consultants estimated that Australia’s public research sector could save person-days worth A$24 million (€15m) per year by making better use of PIDs, potentially rising to Aus$84 million if the costs linked to technology transfer and innovation-driven growth were taken into account.
Finally, PIDs also contribute to scientific reproducibility by enabling computers to manage a vast range of digital items from diverse providers worldwide.
Trust and risk
Reaping the full benefits of PIDs, however, relies on seamless integration and robust interoperability that are currently lacking. In late 2021, Knowledge Exchange, a coalition of six national providers of digital infrastructure and services for higher education and research, enlisted myself and colleagues to identify the best way to develop PID infrastructure for KE member states and beyond.
We focused on perceptions of trust and risk, which have a strong influence on whether PIDs are adopted. Interviews with service providers, repository operators, funding agencies and other specialists showed that PIDs have both social and technical dimensions, with trust stemming from the credibility of providers as well as technical features. Interviewees identified risks as political, economic, social and technological. Political risks include organisational changes that lead to a service’s discontinuation, while economic risks revolve around the sustainability of funding.
Social risks involve community support, system adoption and, for the operation of more niche PIDs, reliance on key individuals. Technological risks focus on the quality of metadata, interoperability of different systems and scalability.
These risks are not hypothetical. For example, the persistent uniform resource locator, Purl, a form of permanent online address, nearly disappeared when its inventor announced that it would stop supporting the technology. This was averted by a partnership with the Internet Archive. The loss could have had dire consequences, as Purls are used to address many different entities in research and are an important part of metadata systems.
Different PIDs for research objects and actors may also compete or overlap, leading to fragmentation and inefficiencies. This risk is acute for new PIDs, such as those developed for instruments and facilities.
Addressing these risks requires coordinated action from national-level bodies, funders and research organisations. Research funders can set a good example by adopting a consistent set of PIDs and issuing grant IDs for funded research. The Plan S initiative on open-access publication could aid this effort by designating and promoting their use.
Another recommendation is the creation of a federation to provide a snapshot and register of common and emerging PIDs, while aiding the design and harmonisation of digital infrastructure. Such a body is not yet in sight, but is urgently needed to head off risks such as fragmentation and ineffective PID implementations.
The effort would be more organisational than financial, as the federation should not offer services itself—such a concentration of activity would itself be a huge risk in the event of the federation’s dissolution.
Organisations with the social capital necessary to lead on such an initiative include the European Science Foundation, the OpenAIRE coalition of open-science infrastructure providers or a consortium of stakeholders of ICT infrastructures for higher education and research. The EU-funded Faircore4Eosc project to develop the European Open Science Cloud would also be a potentially interesting candidate, since its work plan includes the development of research graphs built on PIDs as part of the Eosc.
Ulrich Herb is an associate and consultant at Scidecode Science Consulting in Berlin
This article also appeared in Research Europe