Background

This paper was inspired by the discovery at two UK universities, the University of Surrey and the London School of Economics and Political Science (LSE), that following a pilot project by ProQuest to digitize theses at these institutions, digital theses in their respective collections scored an extraordinarily high number of downloads (when compared, say, with e-journals or e-books). The aim of the research was to gain a greater understanding of how digital theses, clearly an important academic resource, are used and how they fit into the scholarly resources landscape. The research focused particularly on the LSE collection of theses because they had been digitized most recently and were from a smaller, more interrelated group of academic disciplines than the Surrey collection. The LSE ranks 35th in the QS Global University Rankings for 2015.1

A literature search revealed that not much prior work had been carried out in this specific area,2 although there are several published articles on some of the challenges associated with setting up digital thesis collections. Some of these have been cited in this paper.

Usage statistics – measuring full-text PDF downloads both from the LSE’s own institutional repository and the ProQuest PQDT (ProQuest Dissertations and Theses) database – provided the main quantitative basis for the study. Where they were available, citation statistics were also used. Comparisons were made with statistics supplied by Surrey. Qualitative information to complement the statistics was obtained by carrying out three focus groups at the LSE, with undergraduates, postgraduates and librarians respectively, and by means of four semi-structured telephone calls with LSE academics working in different disciplines.

Growth in content correlated to growth in usage

The LSE digitization project commenced in 2014. By May 2015, 2,000 digitized theses had been uploaded to LSE Theses Online (LSETO)3. The decision was made to digitize theses from 2010-11 ‘backwards’ to the early 1990s. Authors were contacted and told they could opt out if they wished; only 14 chose to do so. Five take-down requests were received and complied with immediately.

The trend in downloads was upwards, and rapid. (See Figure 1.) The fast expansion of the number of theses available led to an immediate impact on downloads. The inescapable conclusion was that a higher volume of available theses attracts much more traffic. The resulting increase in the size of the repository had a temporary impact on downloads per item, which dropped briefly to an average of ten per item per month, but rapidly returned to the 2014 average of 15 downloads per item per month and at the time of writing (April 2016) is continuing to rise. Users come into the site from across the globe, with Western countries and others with large economies dominating.

Figure 1 

Relationship between number of theses in LSETO and downloads each month

Source and objectives of visitors

Figure 2 illustrates the key sources of traffic to LSETO from January 2011 to February 2016 (on the left) and from January 2015 to February 2016 (on the right). This shows that the expansion of the repository did not substantially change access methods. The entry point for around 80% of users remained constant, with Google dominant in directing traffic. The decline in traffic share from the LSE’s own website suggests that the LSE could probably do more to promote the thesis collection on its website. Many LSE referrals come from the past PhD students’ page (e.g. for the the Department of Statistics) or the research students’ profiles (e.g. for the Department of Sociology). Others come from clicks on collection profiles on the library pages.

Figure 2 

Largest sources of traffic to LSE Theses Online

The top searches via Google landed on the LSETO home page, which shows that many users were carrying out general searches for LSE theses, perhaps just to view an example of a thesis in their own area of research. However, many searches also led to specific theses, which both demonstrated Google’s strong indexing capabilities4 and that the LSE was able to contribute scholarly research that is being sought after from its thesis collection.

The key question that the project sought to investigate was how much impact the thesis collection was having on scholarly activity. Figure 3 shows download figures and Google Scholar (GS) citations for the LSE’s top ten downloaded theses. There was no real correlation between the numbers of downloads and the citations. Even some of the older theses (more than five years old), which had had a reasonable timeframe in which to make an impact, had only achieved one or two citations.

Figure 3 

Top ten downloaded theses in LSE Theses Online

It is tantalizing to speculate why theses are being viewed if not because of their direct academic impact. It may be that they are on a topical subject, have a broader societal impact, or are useful for reading lists. However, further inspection of the top LSE downloaded thesis provided little further information. It appears on an international relations theory website and is cited in many foreign Masters theses, which may account for its popularity.

Figure 4 looks further at the download/citation relationship by examining the LSE theses with a strong (at least ten) citation count. The numbers of downloads for these varied considerably, and some were quite low. The cross-hatched bars show works for which there is both an original thesis and a subsequent publication with a similar title, and for which the records have been merged in GS. The cross-hatched bars stand out because they account for larger numbers of citations. This demonstrated that in order to achieve academic impact, a researcher really needs to produce a journal article/book chapter/book from the original work carried out for the thesis. However, the fact that not all have high downloads indicates that often readers do not go back to the original research output (the thesis) to investigate the work in greater depth. This therefore offers only modest proof that theses can have significant academic impact.

Figure 4 

Sample of theses with at least ten citations according to Google Scholar (GS). The cross-hatched bars show subsequent publications which have a similar title to the thesis. * denotes download statistics are from EThOS as the full text is not in LSETO. NB: The Dolan thesis has a related book publication, but the records are not merged in GS as the titles are substantially different

Table 1 shows further detail on the relationship between citations and downloads. It shows the top ten downloaded theses from the ProQuest digitization project. (They had therefore been available for approximately ten months online when the research was undertaken.) The theses that had been most sought after did not necessarily have the highest citation figures. The top downloaded thesis is highly topical: it is on the Greek economy. But it was written in 1999, before Greece’s current financial problems. Perhaps it has been downloaded by researchers to get an idea of whether economists at the time were aware of nascent problems in the Greek economy. Most important for this project was that the number of times it had been downloaded made a powerful case for digitizing older theses, thus giving alumni the opportunity to reintroduce their research to contribute to relevant contemporary issues. Furthermore, if LSE theses are still important and sought-after ten, 15 or 20 years after they were submitted, this can only enhance the reputation of the institution itself.

Author Google Scholar citations Downloads Published

Konsolas, Ioannis (1999) The competitive advantage of nations: The case of Greece. 0 1705
Yaffe, Helen (2007), Ernesto ‘Che’ Guevara: Socialist political economy and economic management in Cuba, 1959–1965. 1 927
Ahuja, Monika Sangeeta (1996), Public interest litigation in India: A socio-legal study. 1 900
Holt, Andrew Derek (2005), The role of management accounting within the development of environmental management systems. 1 895
Mohamed, Mohamed Sameh Ahmed (1997), The role of the International Court of Justice as the principal judicial organ of the United Nations. 49 686 Book 2003
Baxell, Richard (2002), The British Battalion of the International Brigades in the Spanish Civil War, 1936–1939. 1 672
Michelutti, Lucia (2002), Sons of Krishna: The politics of Yadav community formation in a north Indian town. 4 520
Joao da Costa Cabral Andresen Guimaraes, Fernando (1992), The origins of the Angolan civil war. International politics and domestic political conflict 1961–1976. 106 514 Book 1998
Stubb, Alexander (1999), Flexible integration and the Amsterdam Treaty: negotiating differentiation in the 1996–97 IGC. 12 491
Fuller, Harcourt (2010), Building a nation: Symbolic nationalism during the Kwame Nkrumah era in the Gold Coast/Ghana. 1 483

Table 1

Top ten downloaded theses from the ProQuest digitization project

This paper has already touched on methods of access to the LSE digital thesis collection. Facebook is the largest social media source that directs traffic to LSETO. This differs from the main institutional repository, LSE Research Online,5 for which Twitter is the key source. One possible explanation for this is that new PhD graduates still hesitate to engage with a wider audience to promote their research, preferring to choose a platform that enables them to share their work with a smaller, closer circle.

Figure 5 shows two access ‘spikes’ that occurred. On 19 June 2015, 473 people landed on the ‘browse by year’ page, which suggests that they were looking for a thesis but could not find it in LSETO. It was discovered that the majority of users came from Taiwan and may have been looking for the thesis of the Presidential candidate (who was the subject of a Time magazine article6 in which it was mentioned that she had completed her doctorate at LSE). The LSE Library had also received e-mail enquiries concerning the availability of the thesis. As it happened, her thesis had not been digitized; and she may or may not have been pleased if it had. The account of the focus group findings below captures the mixed response of alumni to having their theses published in this way. In the second spike, on 12 February 2016, 847 people landed on the Finance Minister of Finland’s thesis after he promoted it through Facebook and Twitter. He was clearly proud of his work and wished to bring it into the ‘Brexit’ debate, 15 years after it was originally submitted.

Figure 5 

Spikes in traffic to LSETO

Awareness and perceptions of digital theses

Qualitative information to complement the statistics was obtained by carrying out three focus groups at the LSE, with undergraduates, postgraduates and librarians, respectively. Of the seven undergraduates who took part (none of whom was British), only four knew of the digitization project. None had been recommended to consult theses as a scholarly resource, though one had been advised to look at a digital thesis for the layout. They were enthusiastic about the opportunities that digital theses offer for accessing cutting-edge research, pointing out that they become available much more quickly than monographs or journal articles. Some also said that, since each piece of research builds on the body of work that precedes it, it is useful to have a collection of theses going back some distance in time to provide a kind of audit trail. Nevertheless, most had reservations about making their own work available as a digital thesis. Some already had ambitions to publish their research with a traditional (they actually used the word ‘proper’) publisher and were worried that if it appeared as a digital thesis their chances of this would be damaged. This was a concern shared also by the postgraduates and academics.

The six postgraduate focus group participants (again, none was from the UK) were extremely motivated and quite high-powered: for example, one was working with the Bank of England, one was conducting research on far-right propaganda in France and one on food distribution and its effect on poverty in Asia. Only one knew of the digitization project and none had been told of it by their supervisors. Like the undergraduates, they believed that complying with what is now an LSE requirement to provide an electronic copy of their thesis would undermine opportunities for commercial publication. They had some other concerns, too: one thought that publishing what she described as ‘academic juvenilia’ might impair her reputation later on, ‘when I am famous’, as she put it. This group also raised concerns about copyright and permissions issues. They were unanimous in their view that the number of citations made a digital thesis of potential interest while the number of downloads did not.

The research found that the perception that publishing a thesis digitally affects future publication opportunities has almost become a bête noire in academic circles. The concept is so ubiquitous that it is one of the areas relating to digital theses in which a considerable amount of research has already taken place. Marisa L Ramirez, with several colleagues on each occasion, has carried out two fairly large-scale pieces of research entitled ‘Do Open Access electronic theses and dissertations diminish publishing opportunities in the Sciences?’7 and ‘Do Open Access electronic dissertations and theses diminish publishing opportunities in the Social Sciences and Humanities?’8. Ramirez and her colleagues found that in the sciences, ‘a slim majority of science journals (51.4%) reported that manuscripts derived from openly accessible electronic theses and dissertations (ETDs) are always welcome for submission, and an additional 19.4% of science journals would accept revised ETDs on a case-by-case basis’, and in the social sciences and humanities (HSS), 45% of respondents considered that ‘manuscripts that are revisions of openly accessible ETDs are always welcome for submission’ and 27% of respondents would consider such manuscripts on a case-by-case basis. Only 12.5% of editors in the sciences and 4.5% in HSS indicated that they would under no circumstances consider such material for further publication. There is a risk, therefore, but it is nowhere near as great as most researchers imagine.

The librarian focus group participants agreed that having the digital collection of theses was good for the reputation of both the University and the Library itself – ‘Academics expect us to be there to sort out this kind of thing’ – but that better metrics were needed in order to be able to assess impact more effectively. They also pointed out that there is a lot more to building and maintaining a digital thesis collection than straightforward digitization. They, in conjunction with colleagues, have to do a lot of time-consuming training and hand-holding, especially in the form of providing support on such issues as copyright and permissions.9 This obstacle was also mentioned by the librarians at Surrey, who said that although Surrey authors have been instructed about third-party copyright for many years, the Surrey project highlighted that the rules had often been disregarded. Managing expectations was also a problem: ‘We have 3,500 digitized theses. We have that number again and more that haven’t been digitized. This means that we can’t manage expectations.’

Of the four academics who took part in the semi-structured telephone calls, two knew of the digitization project. Two were in favour of digital theses, while the other two had reservations, again connected with copyright and publication issues. Three had consulted digital theses and all could see where their value lay: helping promote cutting-edge research and putting new research in a historical context were particularly mentioned. Like the postgraduates, they were interested only in citations, not in downloads.

Figure 6 shows a comparison between the LSE’s usage statistics and Surrey’s. The pattern is very similar, showing that use of digital theses mirrors the academic cycle (quieter in the summer). They also show that these two very different institutions attracted more or less the same level of traffic per item over time. (However, Surrey’s top ten downloads attracted a much higher volume of hits than the LSE’s).

Figure 6 

Downloads per item for LSE and University of Surrey

Conclusions

Digitization of theses has brought many more users to LSETO. It is difficult to assess with any accuracy the impact this has had on scholarly research: there seems to be no direct relationship between downloads and citations. If researchers wish to make an impact on the body of scholarly resources, subsequent publication in a book or journal seems to be of more consequence (and other research suggests that making the thesis available digitally has a relatively minor effect on opportunities to do this). However, impact on the micro level should still be considered important. The LSE’s alumni deserve the chance that a digital thesis collection offers of bringing their work to the debate again when it may have languished in unjustified obscurity because of more limited opportunities to promote it when they first submitted their theses.

The project demonstrated that more discussions need to take place between librarians and academic departments to enable better promotion of theses. The LSE has used the evidence that very few authors opted out of the ProQuest digitization project and few take-down requests were received to push for a change in its policy with EThOS (the e-theses online service). It has adopted the policy of many other UK universities, including Surrey, and now no longer chases author permissions, but instead views digitization as merely a format change which requires no permission and relies on a take-down policy. This will allow for a greater inclusion of older theses and generate repository growth.

The LSE and Surrey, with ProQuest as a partner, hope to take this research forward. ProQuest understands that better data, as well as better promotion of digitization of theses projects are important in order to find out more about how digital theses are being used as a scholarly resource, and continues to improve its PQDT statistics. The relationship between publishing a digital thesis and opportunities for converting it to a traditionally-published monograph need to be better understood and Surrey, in conjunction with the UK Publishers Association, has begun work on this. Universities and university libraries need more help with explaining permissions, copyright and intellectual property rights, embargoes and other author and publication issues. As well as being important in its own right, to enable maximization of use of theses as a scholarly resource, it is believed that future work will contribute directly to the impact of the open access movement.10