The debate about globalization and internationalization, and what they mean for the academic world, will be ongoing for quite some time. Internationalization might seem to lead to a publication culture that puts more emphasis on global issues, written in one language – English. This raises the question of whether global interest causes a shift away from regional concerns or not. In an attempt to partially answer this, we will look at the preferences of a worldwide audience regarding open access books.

When the main differences in publication practices between the humanities and social sciences (HSS) and science, technology and medicine (STM) are considered, there are two aspects that almost always get mentioned: the prominence of books and the role of local languages. Not surprisingly, the work of HSS researchers written in languages other than English is most often linked to more regional concerns or a more regional community. Simply put, an author using Dutch will have a different audience in mind than an author writing in English.

The OAPEN Library contains thousands of open access monographs in more than 50 languages. Launched in 2010, it is managed by the OAPEN Foundation. All books and chapters in the OAPEN Library are available for direct download without any cost or registration requirements for the user. At the end of 2021, the collection consisted of over 19,000 titles. The OAPEN Library is accessed globally, and during 2021 over 11 million downloads were registered.

While books in English are in the majority – over 60% of the collection – this still leaves a large collection of publications in other languages. If a global audience can freely choose from this collection, will there be a preference for global subjects, or will more regional concerns take precedence? This explorative research looks at the most popular books from 100 countries and tries to determine the level of regional interest.

Literature review

In the literature, non-English academic publications have been linked to regional issues. This is visible in a European context, where Kulczyck et al. investigated multilingual publications patterns in the HSS. A recent literature study by Balula and Leão concludes that a ‘balanced multilingualism’ is vital for a diverse academic publication landscape. Bibliodiversity () as a way to enhance local knowledge is also the subject of Mkhize and Ndimande-Hlongwa. The authors argue that local African languages and indigenous knowledge systems are indispensable for higher education. The dominance of English versus the role of regional languages is also debated by Flowerdew and Li, who investigated the publication choices of Chinese HSS scholars. And finally, the study of Argentinian research output conducted by Chinchilla et al. links Spanish language publications to regional subjects and English language articles to more global issues.

This short literature overview of academic writing reveals the tension between English as the ‘lingua franca’ enabling a global reach versus local languages that provide a better cultural ‘fit’. Multilingualism could also be framed in terms of centre and periphery.

The literature does not discuss the readers, however. Earlier research on the usage of the OAPEN Library revealed that the readership of non-English books tends to be concentrated in certain countries. For example, the usage of German language books is skewed toward the ‘DACH countries’ – Germany, Austria and Switzerland. Furthermore, clusters of German, Dutch and Italian language books tend to be more downloaded by residents from the ‘corresponding’ countries. A recent article investigating the effects of open access at the Springer Nature e-books platform also analysed regional effects. The authors concluded that when the publication’s title contained either ‘Africa’ or ‘Latin America’, this slightly increased the readership from those regions. The publication’s language was not considered, possibly because almost all titles of the dataset were written in English.

In this article, the preference of global readers is examined in a systematic manner. Based on the ten most downloaded books from 100 countries during a 12-month period, the focus on regional topics is measured in two ways: the number of books written in non-English languages and the amount of English language books that mention the country. In the next section, the methodology and data are further explained. The section following that will explore the results.

Methodology and data

Currently, no standardized procedure to measure the regional focus of open access books exists. Thus, it is important to describe the method used here in sufficient detail. Furthermore, to ensure maximum reproducibility, the method is based on verifiable data that is not subject to interpretation.

Selection of data

Several choices determined the data selection. Firstly, which countries should be examined? To measure global usage, it is important to create a sufficiently large group of countries. However, the number of books that have been downloaded and the total number of downloads strongly differ for each country. Usage data from the OAPEN Library was captured for the period from May 2020 to April 2021. Usage varies greatly from country to country. At one end of the spectrum, people residing in the USA were responsible for over 1.1 million downloads – of thousands of titles. By contrast, during the same period, only eight downloads – of three titles – originated from Antarctica. This issue has been resolved by selecting 100 countries with the highest number of downloads.

The next choice was to determine how many books should be considered. For each country, the ten most popular titles were selected. In the event of there being an equal number of downloads of two books that qualify to be added to the ‘top ten’, both titles were listed. Given the extensive collection of titles available at the OAPEN Library, there is a high possibility that different titles are ‘popular’ in different parts of the world, and we might expect to see a little over 1,000 different titles in our dataset. This is not the case: the set of documents consists of 710 different titles. Of those, 175 titles were part of the top ten in multiple countries. Nine books are among the most popular in seven or more countries.

Measuring web usage, such as the number of books downloaded, is not straightforward. For instance, the large differences between Google Analytics and COUNTER Release 5 show that the choice of metric service has considerable consequences for what is reported. When both are used to measure the same events – the downloads of open access books from the OAPEN Library – the variation is striking. The COUNTER Code of Practice aims to measure only ‘genuine, user-driven usage’, and automatically created downloads are discarded. Google Analytics uses a less strict policy, resulting in a difference of almost 60% for the total number of downloads. The usage reported in this article is based on the COUNTER Release 5 standard, as provided by IRUS-UK.

At this point, there is a group of countries, and for each country, there is a set of books. The next step is to determine if those books have a regional focus. Given the discussion on multilingualism, the next question is whether the book is written in a language other than English. When we look at the distribution of titles in English versus titles in other languages, the dataset consists of 60% English language titles. This is in line with the total collection of the OAPEN Library at that period, which consists of 62% English language titles. See Table 1 for more details.

Table 1

Language and publication period of the books in the data set

Publication periodLanguageTotal

English onlyNon-English

Before 200072027
Between 2000 and 20099324117
Between 2010 and 2019342132474
2020 and beyond652792

As will be visible in the results section, the most popular titles of many countries are written in one of the national languages. As the current collection of the OAPEN Library is skewed towards English, this might lead to situations where there are fewer than ten books in the national language. Only selecting countries where the ten most popular books are written in the national language would lead to a result that is biased towards English. Hence the selection of 100 countries.

Using text mining techniques on English language books

Most of the titles examined are written in English. For these titles, text mining techniques are deployed to check whether the name of the country is mentioned frequently. The text mining method used in this article is applied for English only. It is based on filtering out all sets of three consecutive words – trigrams – that contain either the commonest English words or terms that are regularly used in academic book publishing. The remaining trigrams are analysed to find the subjects discussed in the book. Of course, the words to be filtered out differ for each language. Unfortunately, these word collections were not available for all the other languages in the dataset.

The full text of the English language books was used to find terms that are specific for that document. The algorithm splits the full text of the book or chapter into trigrams. Then it removes all trigrams containing words that are commonly used in everyday language and the trigrams containing terms that are commonly used in (open access) book publishing. When a trigram contains a word – or multiple words – that is commonly used, the whole trigram is discarded.

The goal is to find trigrams that contain words specific to that publication. The remaining trigrams are distinctive to the book or chapter and selecting those that occur most frequently indicates the concepts under discussion. The algorithm is depicted in Figure 1.

Figure 1 

Text mining algorithm

The result is a set of 90 words in the English language (30 trigrams). When an English language book has been downloaded in a certain country, the next step is to search for the name of that country. To allow for multiple variants, the country name is truncated. For example, instead of searching for ‘Belgium’, the term ‘belg’ is used. This enables us to find variants such as ‘Belgian’. If the truncated term is found, the book is marked as discussing the country. These selected trigrams are part of the dataset.

When the content of a book has been reduced to 90 words, obviously the chance that the country’s name is among them is not high. Furthermore, the region is strictly defined as a country, and no larger or smaller areas are considered. Regional concerns might also be defined by other keywords, such as local traditions or beliefs, but this requires the interpretation of what constitutes a regional concern. By sticking to a strict algorithm – reducing the text to 30 trigrams and then searching for the country name – the results are unambiguous and verifiable.

Scoring the books

The language check and the text mining techniques allow us to count how many of the ten most used books per country have a regional focus, either by looking at the book’s language or – for English language books – whether the country in question has been mentioned regularly. As a result, we can easily assess for each country how strong the focus on regional issues is and express it as a score from 0 to 10.

Each book that is written in a language other than English and each English language book that mentions the country is counted. The total number of books per country that fall into either of these categories is the score used in this article. Looking at the total scores, it is interesting to see that no usage of regionally focused books could be found for only seven of the 100 countries. In the next section, the results for the 100 countries are described.

The dataset is freely available using the link in the data accessibility statement at the end of the article.

Results: global interest in regional issues

Here we will discuss the results for the different countries listed by continent.

North America

There are scores for four countries in North America with a clear difference between the Dominican Republic and Mexico versus Canada and the United States. For the first two countries, the majority of the top ten books were written in Spanish. In contrast, all books listed for the United States and nine out of the ten books most popular in Canada are English. The one non-English book is partly English, partly French. Three of the English language books mention Canada; in the USA, four popular books mention the country. Figure 2 and Table 2 contain more details.

Figure 2 

North America – book scores

Table 2

North America – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

Dominican RepublicSpanish1,139819
CanadaEnglish, French7,139134
United StatesEnglish2,6147044

South America

In six countries in South America, the score is eight or higher. The most profound outlier is Brazil, with a score of only two books. This is clearly visible in Figure 3. As was to be expected, the role of English language books is marginal. Only three countries – Peru, Ecuador and Costa Rica – have one English title as part of their scoring list. There is a definite preference for other languages, in this case, Spanish. More details can be found in Table 3.

Figure 3 

South America – book scores

Table 3

South America – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

EcuadorSpanish, Quechua2,646718
Costa RicaSpanish485516


The dataset contains 37 European counties. We will discuss this relatively large group as three subsections: countries with a score from eight to ten, countries with a score from five to seven and countries with a score of four and lower. Within the first subsection, the role of English is marginal: other languages dominate the top ten almost completely. The most popular books downloaded in France were written in French and German, with only two in English. The preference for books in the national language is clearly visible in the Netherlands (Dutch), Norway (Norwegian), Germany and Austria (German). This is also the case for Switzerland, with seven German and one Italian language titles. In a similar vein, the most popular titles in Italy are written in Italian. The majority of the most downloaded books in Finland are also in the national language. The most popular books in Belarus, Ukraine and the Russian Federation are written in Russian.

The subsection of countries with a score between five and seven shows a slightly different pattern: here the role of non-English language is still important, but this group also contains more English language books that mention the country. For Luxembourg, Portugal and Belgium, half or more of the popular books have been published in their national languages. In this group, we also see combinations of either German or English and – in most cases – the national languages. See for instance Slovenia, Croatia and Estonia. The role of Russian is visible in the top ten of Latvia and Georgia.

The last subsection consists of countries with a score of four or lower. Within Sweden, four of the most popular titles are in the native language, the same holds true for Spain. For Denmark, the number of books in Danish is one. Another distinctive aspect of this group is the role of German. For five countries, the top ten consists of more books in German and – possibly – one or more English language books mentioning the country. This applies to Serbia, Hungary, Romania, Lithuania and Slovakia. In Cyprus and Great Britain, two English language books mention the country. The other European countries display a more diverse choice of languages. Here we see combinations of German, Russian, Chinese, Spanish, Albanian and French plus one or more English language books. The geographical differences are depicted in Figure 4 and in Table 4.

Figure 4 

Europe – book scores

Table 4

Europe – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

BelarusBelarusian, Russian1,18210010
Russian FederationRussian11,47710010
FinlandFinnish, Swedish6,482808
SwitzerlandFrench, German, Italian, Romansh2,808808
LuxembourgFrench, German, Luxembourgish341707
EstoniaEstonian, Russian1,016167
BelgiumDutch, French, German6,542516
Czech RepublicCzech, Slovak1,181505
IrelandIrish, English4,851123
CyprusGreek, Turkish, English421022
United KingdomEnglish23,643022


In Africa, French plays an important role: by far, most of the non-English books that are part of our dataset are written in this language. For instance, all top ten books downloaded from Cote d’Ivoire are French. The same holds true for the nine books connected to Senegal and the eight books linked to Tunisia, Cameroon and Algeria. However, there are two additional English books mentioning Cameroon, and one Arabic title is linked to Algeria. In Egypt, we see two English titles, one German and one Arabic language title. More details can be found in Table 5.

Table 5

Africa – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

Cote d’IvoireFrench1,23010010
CameroonEnglish, French2,6868210
AlgeriaArabic, Tamazight, French2,215909
MoroccoArabic, Tamazight, French1,603729
South AfricaAfrikaans, English18,975099
EthiopiaAfar, Amharic, Oromo, Somali, Tigrinya4,187066
MalawiEnglish, Chichewa726033
NamibiaEnglish, Afrikaans2,688022
UgandaEnglish, Swahili1,399022

In contrast, all nine books that mention South Africa are written in English. This is not the only country where English is the main language as far as the most popular books are concerned. We see the same pattern for Ethiopia, Nigeria, Kenya, Malawi and Zimbabwe. And lastly, Namibia and Uganda, who both list two titles mentioning their country.

In Africa, we encounter the first countries from the top 100 where there is no regional connection to any of their most downloaded books: Ghana, Tanzania and Zambia. Figure 5 depicts the book scores and countries that are not part of the data set.

Figure 5 

Africa – book scores


Contrary to the scores from the other continents, the highest score is seven. The only country reaching this score is Kazakhstan. Together with Uzbekistan – with a score of six – these countries list only titles in Russian. See Table 6 for more details.

Table 6

Asia – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

KazakhstanKazakh, Russian1,989707
LebanonArabic, English, French1,139235
IndiaHindi, English20,037055
Sri LankaSinhala, Tamil2,620044
Hong KongCantonese, English2,546123
IraqArabic, Kurdish2,354112
PhilippinesFilipino, English20,192112
Korea, Republic ofKorean1,520022
Saudi ArabiaArabic975101
TaiwanMandarin Chinese1,799101
Iran, Islamic Republic ofPersian888011
SingaporeEnglish, Malay, Mandarin Chinese, Tamil1,470011
ChinaMandarin Chinese2,813000
PakistanUrdu, English6,497000
United Arab EmiratesArabic751000

Moving slightly further east, we see that another non-English language – Arabic – is amongst the most popular titles in Jordan, Saudi Arabia, Qatar and Iraq. The top ten lists of both Qatar and Iraq feature one English language title with a national focus, while the score for the United Arab Emirates and Oman is zero. In Lebanon, two French and two country-focused English titles can be counted. Israel and Iran do not feature non-English books in their top ten.

For the next group of countries, the most popular titles are all written in English. The number of books with national focus ranges from five for India to four for Sri Lanka and Nepal, three for Myanmar and Vietnam and two for Bangladesh, Thailand and Cambodia. There are no titles in Pakistan’s top ten list that mention the country.

In Japan, the top ten contains one book in Japanese and five English language titles. Most other countries in the region have a lower score. For some countries, only English books are part of the most downloaded titles: Indonesia – with a relatively high score of five titles – together with South Korea, Singapore and Malaysia. A combination of English with another language can be found in Taiwan, Hong Kong and the Philippines. The score for China is zero. See Figure 6 for an overview.

Figure 6 

Asia – book scores


In Oceania, all books in the top ten are written in English. Fiji’s top ten contains nine titles that mention the country. The score for Australia is seven, and Papua New Guinea lists six books with a national focus. Compared to the other countries, New Zealand’s score of four is moderately low. More details can be found in Table 7 and Figure 7.

Table 7

Oceania – languages, downloads and book scores

CountryNational languagesTotal downloads of top ten titlesNon-English booksEnglish books mentioning countryTotal score

FijiEnglish, Fijian, Fiji Hindi2,377099
Papua New GuineaEnglish4,085066
New ZealandEnglish, Māori2,654044
Figure 7 

Oceania – book scores

Books, popular in multiple countries

In the Methodology and data section, we mentioned books that are among the most popular in several countries. Of these, the book Professional Learning in Education is widely downloaded in twelve countries. This English language title does not mention any of the countries. How the World Changed Social Media, also written in English, is listed in ten countries but does not touch upon them. The following books are both listed in eight countries: the Spanish language Tratamiento biológico de aguas residuals and Marine Anthropogenic Litter. Also, these English language titles have not focused on any of the eight countries. Perhaps not surprisingly, the subjects of these books are important to many people: education, social media, wastewater treatment and marine pollution.

Figure 8 gives an overview of the ten most highly connected books – titles that are popular in seven countries or more. The figure shows publications that are listed in the top ten of several countries at once, for instance Tratamiento biológico de aguas residuals and Manifiesto del Nuevo Realismo. Both titles are among the top ten of Argentina, Costa Rica and Mexico.

Figure 8 

Highly connected books


It might sound contradictory, but the results show a global interest in books with a regional focus. Given a choice from a freely accessible collection, many readers will download a book that either diverts from the clichéd English language monograph or – when the book is written in English – discusses issues that are relevant for the country of the reader. A systematic focus on the most popular books for many countries helps to identify patterns that may have otherwise remained hidden. Using the method described in this article, there is no interference from the large number of downloads from countries in the global north.

The outcomes of this study do not fit in a story of English language publications as the only or the main source of scholarly communication. This is clearly visible in Austria, Belarus, Germany, Italy, the Netherlands, Norway and the Russian Federation, where the top ten most downloaded books are written in the official language of the country. A comparable outcome can be found for Finland: eight of the most downloaded books are written in Finnish, while the total amount of Finnish language books is less than 1% of the OAPEN Library collection. Looking at Cote d’Ivoire, Cameroon, Algeria, Senegal, Morocco and Tunisia, we see a clear preference for French. In these countries, French is either one of the official languages or it is widely used. Spanish is the national language throughout South America – with the exception of Brazil. Again, the use of English is marginal in these countries.

Even when English language titles are part of the top ten, many mention regional concerns. The scores for English are most likely an under-estimate, as it is based on country name only. The method to determine the regional focus does not consider transnational regions, such as the Maghreb. On top of that, books focusing on smaller areas might be excluded. For instance, a monograph whose subject is the city of Jakarta instead of Indonesia would not be marked as ‘country focused’. It is interesting to note that in most countries where English is an official or national language, the scores are relatively low. It does not seem likely that in English-speaking countries there will be less interest in topics of national concern. Instead, the methodology based on the text mining results is probably too ‘strict’.

The results of this study depended on the titles available in the OAPEN collection. The lack of certain subjects or languages especially could have played a role: how much demand has not been met? An even more diverse collection might have led to a different outcome. Another angle could be the distinction between books about HSS subjects versus STM subjects, although the number of titles might differ, and whether there is also a significant difference in usage. Conducting similar research using another open access book platform with a different collection would be welcome. For instance, by analysing the global usage of the SciELO Books platform, where 96% of the titles are written in Portuguese or the OpenEdition Books platform, where 85% of the collection is French.

In conclusion, this article clearly indicates a demand for regionally focused titles. Furthermore, it counters the narrative of the dominance of English as the language of scholarly communication. Instead, it supports the value of bibliodiversity.