Global electronic thesis and dissertation repositories – collection diversity and management issues

based on key parameters such as regional distribution, subject classification, language diversity, etc. and identifies the critical management issues of the ETD repositories related to collection management, software management, content management and metadata policies. The ETD repositories were identified in the Directory of Open Access Repositories (OpenDOAR). The required data were manually collected from the OpenDOAR and websites of repositories to achieve the prescribed objectives of the study. The data were later tabulated, analysed and interpreted using simple arithmetic techniques.


Introduction
'Theses and dissertations are the most useful kinds of invisible scholarship and the most invisible kinds of useful scholarship because of their high quality and low visibility.' 1 Electronic theses and dissertations (ETDs) are primary, rich, unique and valuable sources of scholarly information, which is the outcome of focused, extensive and in-depth research work of several years, involving intellectual labour by scholars and their supervisors. These ETDs were historically always kept under lock and key by vigilant information managers, possibly to avoid plagiarism and theft. Access to these valuable and scholarly sources was restricted to a few users within the four walls of the library of each institution, and most libraries do not lend theses and dissertations through inter-library loan. The closed access system affected their usage badly and these valuable sources mostly remained undiscovered, unutilized and uncited. The emergence of electronic sources, developments in open access (OA) and the creation of the digital repositories all make possible the best use of scholarly information sources including theses and dissertations. These repositories have become showcases of the intellectual achievements of scholars and their institutions by making their research output available globally in various forms, including ETDs. Since the digital repositories have started to archive ETDs, their usage statistics have been positively affected. The ETD repositories not only increase the visibility of ETDs but also increase institutional research impact and their ranking in the scholarly world. Data from these repositories suggest a dramatic increase in the use and citation of doctoral theses in current research activity.

Literature review
ETDs are a major topic of interest for researchers worldwide. A good number of studies have been conducted on ETDs. The current literature focuses on two aspects -the growth and development of ETDs and their management issues.
The origin of ETDs can be traced back to the first meeting held in Michigan, USA in 1987, organized by UMI and attended by representatives from Virginia Tech, the University of Michigan, and two small software companies-Toronto-based SoftQuad and Michiganbased Arbortext. Later, ETDs started to emerge in various institutions of the developed world. As a result, the National Digital Library of Theses and Dissertations (NDLTD) was established in 1996. 2 The NDLTD is a collaborative effort by the world's universities to create, archive, distribute and access ETDs. The ETD repositories flourished internationally and membership of the NDLTD increased significantly. 3 Zhang and associates 4 found a significant increase in the usage of ETDs in Korea by both national and international patrons. Sale 5 studied the impact of mandatory policies on ETD acquisition in Australia and found that only 15% of ETDs were deposited in repositories voluntarily, whereas mandatory policies increased the deposit rate to 100%. Sugita and Murakami 6 examined the university library policies on theses and dissertations in Japan and found that the university libraries had begun to deposit and disseminate ETDs to institutional, subject-specific and contentspecific repositories.
Ghosh 7 examined the developments in the ETDs scene in India to explore the possibilities for creating a national repository for the deposit, discovery, use and long-term care of research theses in an OA environment. The India University Grants Commission made it mandatory for all universities to deposit a copy of each thesis in the National ETD repository, Shodhganga, in 2009, however, the universities did not initially take it seriously. 8 Despite all issues, repositories of ETDs are now becoming common in universities of all countries across the world.
At the end of 2013, the Directory of Open Access Repositories (OpenDOAR) listed more than 1,400 repositories -of which more than 50% archived ETDs. 9 With the advancement of ETD repositories, many issues emerged. Looi and Yeng 10 proposed an ETD framework to capture and preserve the intellectual output of Malaysian universities and discuss various issues such as archiving, preservation, accessibility, scalability, security, searchability and copyright issues of ETDs. Jin 11 analysed the ETD repositories in China and identified 'grey issues' like metadata standards, submission issues, software selection, content formats, copyright protection, fair use, access and preservation, among other issues. Al Salmi 12 evaluated the status of ETDs in the university libraries of Gulf countries and concluded that university libraries in this region have the necessary infrastructure for ETD programmes, but they face technological, administrative and legal barriers. Yiotis 13 also found various grey issues relating to ETD repositories such as copyright, plagiarism, costs and preservation. Similarly, Juznic 14 argued that besides the preservation and availability of ETDs, plagiarism and other forms of cheating at all levels are also a big concern for universities.
Park and Richard 15 assessed the metadata element sets of electronic theses and dissertations used at the Canadian academic institutional repositories. The results revealed that the metadata elements had a significant level of inconsistency and variation. Perrin and her associates 16 examined the problems that arose after the transition from a physical to the electronic collection of ETDs and presented documentation solutions for preservation and curation. Steele and Sump-Crethar 17 conducted a study on university repositories in the United States and provided valuable suggestions for bibliographic description and vocabulary control of ETDs. Ndungu 18 identified and analysed the 'repositories of ETDs are now becoming common in universities of all countries across the world' 'the metadata elements had a significant level of inconsistency and variation' challenges faced in the bibliographic control of theses and dissertations in Kenya. The study found delays in the lack of consistency and uniformity in bibliographic records. Schopfel and Rasuli 19 argued the applicability of the concept of grey to ETDs and concluded that, '"greyness" remains a challenge for ETDs, a problem waiting for the solution through the application of the FAIR (findability, accessibility, interoperability, and reusability) principles'.
In the initial stage, many institutions started to establish ETD repositories worldwide after these developments and consequently created national ETD repositories like ETHOS (E-Thesis Online Service). These national-level repositories collaborated with the international ETD platforms such as NDLTD and OATD (Open Access Theses and Dissertations) to make the research in these ETDs more visible and useful. However, there emerged various issues with the growth and development of ETDs in collection management that needed to be addressed with the other developments.

Research design
The objectives of the study were to discover the collection diversity of the ETD repositories, based on key parameters such as regional distribution, subject classification, language diversity, etc. and to identify the critical management issues of the ETD repositories related to collection management, software management, content management, and metadata policies. The OpenDOAR was selected as a source for identifying the ETD repositories. All the repositories archiving ETDs (1,938) have been selected for the study. The required data were manually collected from the OpenDOAR and websites of these repositories in December 2017 to achieve the objectives of the study. The data were later tabulated, analysed and interpreted using simple quantitative techniques to reveal the findings.

Contents archived
Next to journal articles, ETDs are the most frequent document type found in OA repositories listed in OpenDOAR. Out of 3,504 repositories, 1,938 (55.30%) accept the submission of ETDs (Figure 1). The findings are consistent with the study conducted by Loan and Sheikh 20 on health and medical repositories, in which the results also reveal that the highest number of repositories store articles, followed by theses. 'Next to journal articles, ETDs are the most frequent document type found in OA repositories'

Growth-rate
The number of ETD repositories from 2006 to 2017 shows a constantly increasing trend.
In the year 2006, the number of ETD repositories was only 418 whereas the number had increased to 1,938 by the end of 2017. The year 2008 had the highest growth rate, of about 39% increase in the number of ETD repositories. The growth rate was high in the initial years, but in recent years it has gone slightly down (Figure 2). The findings are in tune with some of the earlier studies 21 wherein it was revealed that the number of repositories has increased exponentially since 2006.

Countries contribution
The United States tops the list of countries by contributing about 12% (234) of the total repositories, followed by Germany with 7.38% (143) and Japan with 5.52% (107) respectively. Other countries especially France, UK, Spain, Turkey, Italy, Brazil and Indonesia also make significant contributions to the ETD repositories ( Table 2). The findings are in tune with the study conducted by Loan and Sheikh 23 to a great extent. They revealed that the highest number of repositories is contributed by the USA, followed by Japan and the UK. However, developed countries contribute more than developing countries.

Content language
The ETD repositories archive content in 35 languages. Most of the repositories (67.75%, 1,313) accept contents written in English followed by Spanish (13.78%, 267), German (8.88%, 172) and French (7.43%, 144) respectively. It is also revealed that most of the repositories are multilingual, archiving content in more than one language (Table 3). English is the dominant language and the majority of the repositories archive content in the English language, which is also confirmed by the present study.

Classification of repositories
The repositories have been classified into four categories -institutional, disciplinary, aggregating and governmental. The majority of the ETD repositories are institutional (93.71%) whereas the disciplinary repositories and aggregating repositories contribute a very small percentage of 3.2 and 2.3 respectively (Table 4).

Subject Coverage
It has been found that most of the ETD repositories are multidisciplinary (71.93%) in nature, archiving ETDs of more than one subject area, whereas only 28.07% of the repositories are subject-specific, covering repositories on particular subjects only (Figure 3).

Operational status
The operational status of the ETD repositories shows that almost 96% of them are operational, 2.53% are in trial mode, while only 1.6% are broken (not functional) ( Table 5). The study conducted by Yaseen, Loan and Jan 24 also confirmed that more than 96% of all the ETD repositories are fully operational whereas a small percentage of repositories are available on a trial basis (1.9%, 25) and 2% are non-functional.

Software Used
DSpace is the most used software, operational in 50% of the ETD repositories. Other software brands used by the ETD repositories are EPrints (12.85%), Digital Commons (5.83%), OPUS (3.51%) and WEKO (2.89%) respectively (Table 6). DSpace is the first choice of administrators to manage content in digital repositories all over the world. In 2011, DSpace was used by more than 1,000 digital repositories 25 and since then the number is increasing constantly.

Policies
Policies are very important to the operation of the repositories. Metadata policies are the set of policies related to the information describing items in the repository. Preservation is a crucial element in the process of managing electronic information resources in digital repositories. Content submission policies provide information about the content that can be archived in the repositories. The data clearly shows that more than half of the repositories have explicitly undefined metadata policies (54.95%, 1065), content submission policies (51.39%, 996) and preservation policies (55%, 1,058), which is a very serious issue in the management of ETDs (Table 7).

Discussion
Electronic theses and dissertations (ETDs) are the most frequent document type found in open access repositories after journal articles. They are accepted by more than 55% of the repositories in the OpenDOAR. ETDs are also perhaps the most important research products after journal articles. Journal articles are mostly 'shined and polished products' of ETDs. Therefore, most of the digital repositories enrich their collection by accepting the ETDs. The online availability of the ETDs is a very good sign for the optimum use of these resources. Researchers worldwide can take advantage of the research conducted at any institution in the world, along with other benefits. The growth of ETD repositories has also shown a constantly increasing trend since 2006.
Many countries created national repositories, like Shodhganga (India), to archive the ETDs of all disciplines at the national level and made regulations for scholars to compulsorily deposit the ETDs in the national repository to facilitate use and avoid plagiarism and duplication. Many institutions have created repositories to archive subject-specific ETDs as well. These efforts have increased the number of ETD repositories worldwide. Further, all the continents contribute to the ETD repositories as per their capacities and developed continents like Europe, and countries like the United States, top the list. However, the movement of archiving the ETDs in repositories is not limited to any specific region or country but has crossed boundaries. Many developing countries have followed the steps of developed nations to create digital repositories and archive contents, including ETDs.  These ETD repositories archive content in 35 languages and most of the repositories accept ETDs written in the English language. The valuable information can be in any language, not necessarily in English. Further, many countries in the world, like China and Iran, also prefer to research using their national languages. Therefore, the ETDs written in other languages may also be archived for optimum use by the present and future generations. The other positive signs are that the majority of the ETD repositories are created by reputed institutions, are archiving quality content and are fully operational. DSpace is the most used software, operational in 50% of the ETD repositories. Besides having extraordinary features, the DSpace community provides healthy support for creating digital repositories worldwide. It is the prominent reason that most of the ETDs have opted for DSpace for content management.
The findings show that more than half of the repositories have explicitly undefined metadata policies, content submission policies, and preservation policies and these are very serious issues in the management of ETDs.

Conclusion
The ETD repositories have come up with many strengths, weaknesses, opportunities and challenges for the Library and Information Management profession. The strengths need to be fully utilized, the weaknesses need to be identified and overcome, the opportunities need to be elaborated and the challenges need to be addressed for the upgradation of services. The metadata policies, content submission policies and preservation policies have not been fully addressed as more than half of the repositories have explicitly undefined metadata policies, content submission policies and preservation policies. The ETD metadata scheme 'ETD-MS: an Interoperability Metadata Standard for Electronic Theses and Dissertations' has not been adopted by all the repositories. The growing landscape of the ETD calls for explicit content policies to inform users about their rights and reuse policies. Authors complained about the absence of adequate policies and infrastructure to handle the ETDs at the national level as early as 2007 26 and very little progress has been made since then. Other issues like archiving, preserving, cataloguing, harvesting, interoperability, copyright and plagiarism are also noteworthy issues for the ETD repositories which need immediate attention for their redressal.

Abbreviations and Acronyms
A list of the abbreviations and acronyms used in this and other Insights articles can be accessed here -click on the URL below and then select the 'full list of industry A&As' link: http://www.uksg.org/publications#aa

Competing Interests
The authors have declared no competing interests.
'ETD repositories archive content in 35 languages and most accept ETDs written in the English language' 'The growing landscape of the ETD calls for explicit content policies'