Introduction

‘The original idea of the web was that it should be a collaborative space … by writing something together [people] could iron out misunderstanding.’

  —

The world wide web has both fulfilled and fallen short of its early promise. Its impact on how we access and share information has been revolutionary. Individuals and society have undoubtedly used it to ‘cross barriers and connect cultures’ as its inventor hoped. However, the web has also become a tool of capitalist hegemony and political disruption, where a small number of powerful corporations control information on an unprecedented scale and conspiracy theories can propagate with potentially damaging consequences for democracy, ecology and global equality. Meanwhile ‘experts’ are derided and climate change deniers and other contrarians can reach huge audiences via online platforms.

In March 2017 Berners-Lee himself expressed concern about the web’s future, highlighting how easily misinformation can spread, due in part to corporations harvesting and abusing personal data. Disparate platforms with different agendas, in some cases outright disinformation, can result in users retreating into a ‘filter bubble’ of trusted friends and family on social media, thus making them vulnerable to algorithmically targeted messages with a political or commercial agenda. In November 2019 Berners-Lee announced a new initiative from the World Wide Web Foundation proposing a set of principles for governments, companies and citizens to ‘make our online world safe and empowering for everyone’.

In principle, we should be the most well-informed population in history with more peer-reviewed research produced and published online than ever before. Yet, this research often fails to reach the wider public. Much research is still behind paywalls, or is not translated into other languages or summarized in plain language for a lay audience. As disinformation becomes more aggressive, it has never been more urgent to actively communicate the results and methods of research to the public and to better equip them with digital and information literacy skills. With these goals in mind, this article considers ways that universities can promote a web in line with Berners-Lee’s original vision.

The free encyclopedia that anyone can edit

‘Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.’

  —

There is one domain on the web where the utopian vision of the early years still applies: Wikipedia. ‘The free encyclopedia that anyone can edit’ is currently the tenth most popular website in the world and the fourth most visited ‘Western’ domain behind Google (1), YouTube (2) and Facebook (7), ahead of major online service destinations including Amazon (13), Netflix (21) and Twitter (35).

As one of the most recognized brands on the web, Wikipedia’s mission and policies are in marked contrast to nearly all other online media. The other Western sites in the top 50 are sustained either through advertising revenue (Google, Facebook, Twitter) or by providing a commercial service (Amazon, Netflix, PayPal). Wikipedia is a charitable project driven by a belief in universal free access to knowledge as a public good.

At the time of writing, English Wikipedia comprises just over 6 million articles with more than 38 million registered users. Only a minority of users contribute regularly, however: in the region of 100–150,000 and fewer still, usually cited at around 3,000, are considered to be ‘very active Wikipedians’ with >100 edits per month.

Of course, Wikipedia is far from unique in relying on user-generated content. Its contributing user base is a fraction of the big social media sites like Facebook and Twitter, which have 2.4 billion and 330 million users, respectively. These obviously facilitate the exchange of information but in a manner that is largely ephemeral, algorithmic and with no systematic checking of its veracity. Wikipedia, by contrast, is a permanent and evolving source of verified information that retains a transparent record of edits over time and is predicated on a set of fundamental principles.

Recently, Facebook has in fact committed to independent fact-checking of news on the site but has been criticized for not extending this to political advertising. Meanwhile, Twitter announced that they have banned paid political content which nevertheless does not preclude unverified or ideologically motivated misinformation from propagating organically.

Universal access to a summary of all human knowledge is an aspiration, and Wikipedia is clear that it falls short in several respects. Only 18 per cent of biographies on the site are about women and there are major discrepancies in geographical coverage, with more articles about the Netherlands than the whole continent of Africa. These inequalities are due to both the culture of the site and to wider social issues, such as patriarchal society or the availability of broadband internet. Moreover, gender inequality is also a feature of professionally published sources, with women only accounting for eight per cent of biographies in the Oxford Dictionary of National Biography, for example.

Wikimedia

Wikipedia is just one of 16 interconnected projects that are also linked to a wider ecosystem of sites and apps. Wikimedia Commons is a repository of openly licensed media files including photographs, diagrams, video and audio. Wikisource is a free library of out-of-copyright texts, while Wikiversity and Wikibooks encourage collaborative creation of open educational resources (OERs). The fastest growing Wikimedia project is Wikidata, a store of structured data that can be read and edited by humans or machines. See Figure 1 for the full family of Wikimedia projects.

Figure 1 

The icons of the 16 Wikimedia projects

Whereas commercial sites aspire to traffic and ‘clicks’, the goal of Wikimedia is to freely share knowledge in the most convenient form for its users. So, these platforms encourage reuse of text, images and data in other sites, publications and printed material, even in CD-ROMs or USB drives that are sent to schools in remote areas of the world.

The fundamental principles of Wikipedia

‘Wikipedia … is not the bottom layer of authority, nor the top, but in fact the highest layer without formal vetting.’

  —

Like any encyclopedia, Wikipedia’s role is tertiary. It does not collect or analyse raw data to draw conclusions, which is the role of researchers and traditional peer review. Contributors are explicitly forbidden from posting their own opinions or theories at least until they are published in the peer-reviewed literature. Rather than deciding what is true, the Wikipedia community arbitrates on what is verifiable from reputable sources. While there are formal review processes, they do not evaluate the underlying research, they merely assess whether an article fairly summarizes its sources and if those sources are high quality.

Wikipedia, therefore, does not compete with the scholarly literature but makes it accessible to the widest possible audience. Improving an article means that more, not fewer, people read the peer-validated literature because readers follow links to cited sources. It is a founding principle that academic consensus reflected in peer-reviewed literature is the best available claim to knowledge. This pro-expert, pro-scholarship ethos contrasts with many other media sources and online communities, both in their editorial line and invitation to contribute personal comment and opinion.

Unlike traditional academic publication where the reader is shielded from the editorial process, Wikipedia is entirely transparent. Each article has a ‘Talk page’ for contributors and editors to publicly discuss how it can be improved, with discussions permanently archived and available for scrutiny. (See Figure 2 for an example.)

Figure 2 

The ‘Article milestones’ section on the Talk page shows the formal reviews to which the Amphetamine article was subjected before earning its ‘featured article’ badge. Screenshot with article milestones expanded

Wikipedia is particularly sensitive about conflict of interest. It would be inappropriate for an editor affiliated with Nike to contribute to the sweatshop article, to remove or mitigate reference to anti-sweatshop protests directed against the corporation, for example. This sensitivity can cause frustration for university staff who might find their motives questioned when writing about their institution or their own work. Fortunately, there are ways to suggest improvements to an article while being open about any potential conflicts of interest. It is worth working with the system rather than against it.

Education and information literacy

‘For God sake [sic], you’re in college; don’t cite the encyclopedia.’

  —

A 2011 investigation in the UK and US found that many students used Wikipedia for their homework, often with a sense of guilt because they had been advised against it by teachers. Yet, Wikipedia helped them to find information that would be marked as correct. Teachers who warned against its use merely pushed the practice underground. Treating Wikipedia as an unconditionally reliable source is no more desirable and goes against the purpose of the site as an accessible summary of reliable sources. A better approach is to treat Wikipedia’s variable quality and open editing as an educational platform. By taking part in the codification of knowledge, students experience for themselves the debate and judgement involved.

Assignments to improve Wikipedia are already used at universities around the world. North American students have added 60 million words and similar work in many other countries, including the UK, has added greatly to its content. When you read about a psychological or management theory, or a rural English parish, you are very possibly reading student work that was assessed for their degree. These assignments are often for final-year undergraduates in lieu of a dissertation. They can also be introduced earlier to encourage good habits of fact-checking, citation and giving constructive feedback. Translation assignments are another educational opportunity because it is easy to find articles in a given topic area and language that lack English equivalents, or vice versa.

Usability and documentation of the Wikimedia platforms can be frustrating and, in addition to the editing process itself, new users need to learn how the community works. Like any publisher, Wikipedia has a scope (what it will or will not publish), a house style and standard ways to resolve disagreement. It is advisable to engage an experienced trainer who can also help to identify articles to improve. Wikimedia UK, the national charity supporting the Wikimedia projects, has a roster of trainers and maintains an informal network of academics running Wikipedia assignments for mutual advice and support.

Many organizations go a step further and employ a Wikimedian in Residence (WiR) to deliver training within and outside the organization and liaise with the online community. They are not paid to directly improve Wikipedia but to share skills and content that empower others to do so. Organizations including the National Library of Wales, the Royal Society of Chemistry, the Scottish Library and Information Council, the Wellcome Library and Jisc all employ, or have employed, WiRs.

Wikimedia and universities

We have seen how Wikipedia occupies a distinctive place in the information ecosystem, linking informal discussion to scholarly publications. Universities can build this bridge by adding links to open access (OA) versions of cited articles in their repositories or linking digitized theses from biographies of notable alumni. OA papers or lay summaries which review a topic rather than presenting original research can even be used wholesale to create new articles.

A WiR can help to embed the practices described in Table 1.

Table 1

The goals of universities and how they can utilize Wikimedia

Universities want:… and can:

impact for research projectsshare openly licensed text and images to improve Wikipedia articles.
use of institutional repositoriesmake sure links are included in Wikipedia citations and Wikidata bibliographic records.
create researcher profiles in Wikidata.
engaging assessment for studentsuse Wikipedia or Wikibooks as a platform for writing assignments.
use of library special collectionsshare images and data to help create educational materials and encourage incoming links.
use of specialist databasesshare surface-level data from the database with Wikidata and create links.
public engagementrun events or campaigns where attendees improve coverage of a topic on Wikimedia platforms.

The University of Edinburgh and Coventry University both currently employ WiRs, while the University of Oxford employed one for four years ending in 2019. The University of Bristol was the first, hosting a summer placement in 2011.

Of course, university staff are likely to be engaged as volunteer contributors in their own right, citing primary research on Wikipedia, for example. Academics are obviously well-placed to improve the encyclopedia. Indeed, as domain experts they might reasonably be expected to do so, albeit with some nuance around potential conflicts of interest, if exclusively and cynically citing their own or their institution’s research.

Copyright and the importance of open access

‘A measure of a paper’s standing may be conveyed by the number of links it is away from an encyclopaedia.’

  —

Given the broad audience for Wikipedia, it is especially important that cited research is available OA, and preliminary research across the White Rose Consortium (Universities of Leeds, Sheffield and York) has found that approximately half of citations are behind a paywall. At the time of writing, there are over 600 links to records in White Rose Research Online (the institutional repository shared by the three universities) and Wikipedia is a significant referrer to the repository.

Openly licensed research outputs make it much easier to add scholarly information to Wikipedia which itself uses an Attribution-ShareAlike (CC BY-SA) licence. Text or figures with this or a more liberal licence (i.e. CC BY or CC 0) can easily be used with appropriate attribution. Some academic journals have adapted their papers into Wikipedia articles by this method.

Knowledge as a service

The Wikimedia Foundation strategy uses the phrase ‘knowledge as a service’ to describe how the Wikimedia projects can improve other sites, databases and apps. This reuse means that beyond its headline readership, there is an even larger audience who encounter content from Wikipedia on other platforms. Facebook and YouTube use extracts to provide contextual information about videos and news outlets, while Google extracts text and key facts for the boxes that appear alongside search results. Voice assistants like Siri and Alexa mine Wikipedia and Wikidata to answer questions. These services sometimes strip extracts of vital context or fail to make clear the provenance of the information.

The latest Wikimedia tool to make knowledge freely available to sites and apps is Wikidata, an example of a knowledge graph that represents knowledge through the connections between things. For example, a paper has an author who has a nationality, a name and date of birth; they graduated from a particular university, which in turn has a geographic location, a vice-chancellor, other notable alumni, and so on. As with Wikipedia, Wikidata is not meant as a platform for original research; all information must already have been published by a reliable source.

Where Wikipedia has multiple language versions, Wikidata is a single site with contributors in hundreds of languages, making it a hub connecting identifiers from thousands of disparate systems that can be queried, using the database query language SPARQL, to answer all manner of questions and build data visualizations.

An unusual aspect of Wikidata is that it does not aim for consistency. Where a fact is contested in the scholarly literature – multiple possible birth years for a historical figure, for instance – it can hold each contradictory statement and link to its source reference. So, a query could return statements from just one type of source, for example papers published in peer-reviewed journals from the last decade.

Figure 3 

Datamodel in Wikidata

Open scholarly profiles

One significant use of Wikidata is for citation data. A SPARQL query can generate a list of the most cited authors on climate change, or generate a timeline of papers about the 2019–20 Coronavirus outbreak. The Wikidata entry for a paper can describe its copyright status and provide multiple links including preprints, making it easier to find an OA version.

Wikidata differs from a purely bibliographic database in that researchers and their publications are described on the same platform as the things – people, places, genes, species, compounds – that the papers are about. A claim that a pharmaceutical is effective for a given disease in a given species can be linked to papers that establish that statement. As these data become more complete, literature reviews can be semi-automated, saving time.

One application that explores this data set is Scholia which, in addition to individual researchers’ profiles, can display papers relating to a given topic, published in a particular venue or originating with researchers at a given institution. It can also highlight collaborations or other links between researchers. While platforms like Elsevier’s Scopus or ResearchGate also collect this type of information, Scholia is distinctive in that the data are free and open and not monetized in any way. In 2019 the Sloan Foundation announced half a million dollars of funding to further develop the platform.

Universities can improve Scholia by adding repository links to the Wikidata records for published papers and tagging with appropriate topics. Where publishers have added data with ‘author name strings’, links can be added to authority files such as ORCID. Author profiles can be improved with the addition of profile links (ORCID, ResearchGate, Google Scholar, Twitter, etc.).

A 2019 report by the Association of Research Libraries recommends Wikidata as an authority hub, a platform for community outreach and for representation of diverse communities beyond the Western canon. The Library of Congress now makes Wikidata links visible in its authority file and VIAF, the Virtual International Authority File, also harvests information from Wikidata. Gradually, the site is becoming a hub connecting thousands of other databases and knowledge systems.

Community engagement

Being open and community based, Wikimedia is effective in engaging students or a wider public around a particular topic. Events or campaigns can be organized around a goal for participants to work towards, or present a new research resource to improve free knowledge.

Examples that have been run in UK universities include:

  • Editathon: improve Wikipedia articles by adding cited facts. This can include creating new articles, but requires a combination of skills that people do not normally pick up in one session.
  • Transcribe-a-thon: Wikisource, ‘the free library’, is a platform on which users create definitive electronic versions of public domain texts by correcting errors in optical character recognition (OCR). One event celebrating women in science created a text version of a paper and a booklet by the 19th-century scientist Mary Somerville.
  • Image-a-thon: participants make use of images from a freely licensed collection by identifying Wikipedia articles to add them to with an appropriate caption.

Promoting engagement in universities

‘Universities really can’t afford not to have a Wikimedian in Residence these days. It still surprises me how few do.’

Melissa Highton, Director of Learning, Teaching and Web Services,

  —

With over 150 higher education institutions in the UK, it is notable that so few currently, or have ever, employed a Wikimedian in Residence. Other universities are undoubtedly exploring these tools without a formal role, though it is difficult to quantify this activity. From October-December 2019 we ran an informal survey to gauge attitudes to Wikimedia in universities. Summary data is presented in the appendix. The evidence is that very few institutions have Wikimedia as part of their strategy and, anecdotally at least, there are still negative attitudes to student engagement with Wikipedia in particular. Nevertheless, there is evidence that university collections are increasingly linked across Wikipedia, if not strategically by universities themselves, then organically by virtue of being easily discoverable and openly licensed from research repositories.

Using the ‘insource’ parameter on a special page to search across all of Wikipedia, it is possible to easily search for links to the URL of any domain, including institutional repositories. The linked data set indicates that there are currently around 6,500 links from Wikipedia to Russell Group institutional repositories as well as 1,200 to theses from the British Library EThOS service (snapshot data from 8 December 2019; an attempt was made to crowdsource across all UK-based repositories, but this is incomplete). The best-linked repositories are UCL and White Rose with 844 and 829, respectively. Repositories of OA publications and/or theses are well linked, with very few links to data repositories (excepting the Zenodo repository with nearly 8,000 links, greater than any other repository by a factor of ten).

Using these data as a starting point, we describe below a brief case study of grass-roots activity through several Library-based initiatives at the University of Leeds to engage staff and senior management.

Case study: Leeds University Library

There is increasing emphasis on open research practices in universities, focused on ensuring research outputs are freely available to reuse and redistribute. While this agenda is driven largely by the replicability crisis, it also contributes to the impact of research and its broader contribution to society. It enables other organizations and the public to actively engage in knowledge production through collaboration and contribution of their own expertise, for example through citizen science projects. Wikimedia taps directly into this movement through its open infrastructure that easily enables research outputs, media and other digital assets to be distributed at scale with clear provenance and copyright information that can be linked back to institutional systems via persistent identifiers (PIDs) such as DOIs.

It was this principle that underpinned a successful proposal in 2018 to encourage good practice in research data management. With the support of co-sponsors Jisc, SPARC and the University of Cambridge, the project has sought to identify suitable media from research data repositories to upload to Wikimedia Commons. It has explored how this material can be used to improve Wikipedia to promote a cycle of sharing and reuse. The project has been challenging due to a still limited culture of sharing research data and lack of strategic engagement with Wikimedia, across the university and the sector at large. It has been valuable, however, to tease out synergies across the Library and the wider University and to consider ‘research data’ from a broader conceptual standpoint. The Special Collections and Research Support teams have begun to explore how their disparate collections can be brought together, for example, or presented in novel ways using Wikidata.

An academic library is well-placed to foster collaboration across its academic community, with connections to schools, research centres and other departments across campus. It is responsible for curating institutional collections, including OA research repositories and archival material, with related expertise in copyright, metadata and use of PIDs. In a large research-intensive university, however, the library itself can be prone to operating in its own silos, with ostensibly similar material managed by different teams with discrete workflows.

The data management engagement award has been a springboard for more strategic engagement across the Library, with colleagues from research support, Special Collections, metadata and collections development liaising on several projects and events. In 2016–2017 Special Collections ran an internship that created a Wikipedia article for its Cookery Collection, one of five designated collections held at Leeds. With the support of Wikimedia UK, the Research Support team ran an editathon in October 2019 and has liaised with several research projects, collaborating on an event with colleagues from the School of Media and Communication. In addition to staff members from the University of Leeds, events have included external colleagues from other organizations with a community of people interested in Leeds’ cultural institutions beginning to develop, working together on Wikimedia projects across the city.

Next steps

We hope this article will go some way to encouraging universities to consider more strategic engagement with Wikimedia in the contexts of information literacy, public engagement and the developing open research agenda. As a form of crowdsourcing, it is the ‘net change over time’ that gives ‘the wiki way of working’ its power though it requires a paradigm shift for libraries to accept unpredictability, imperfection and diminished control.

Over half a decade on from the Jisc report Crowdsourcing – the wiki way of working it is clear that there is yet to be such a paradigm shift. Options going forward might include more focused liaison with Wikimedia UK and developing a toolkit to help universities engage with the Wikimedia suite of tools.

To explain how a seminar room in a university supports learning, we would have to not just talk about its whiteboard and WiFi but the fact that people use it in particular ways. Similarly, Wikimedia projects are not just sites or software, but communities that create and transform text, images, or data. This has implications for how universities engage with the projects and become active participants within those communities. While an individual academic might run their own impromptu seminar, adding some educational value for a small cohort, it works much more effectively if seminars are properly planned and part of an integrated curriculum with standardized pedagogical methods. Unlike a physical whiteboard in a seminar room, the virtual whiteboards of Wikimedia are never erased, just continually edited and improved.

Data accessibility statement

Wikimedia Links to UK Repositories (snapshot from December 8, 2019) available from Zenodo https://doi.org/10.5281/zenodo.3567963.

Raw Altmetric and Unpaywall data available from Sheffield Online Research Data (ORDA) https://doi.org/10.15131/shef.data.12097797.v1.