Introduction

The benefits of free, unmediated access to research data are widely acknowledged, especially in the life sciences. Despite mandates from both funding agencies and publishers, however, open data initiatives have been only partially successful. Previous research suggests that this can be attributed to a lack of incentives for data creators, who are often expected to expend considerable effort without receiving meaningful rewards for their work. Data creators who have documented their procedures in detail, made their data user-friendly, and met data archives’ strict submission requirements, will often receive nothing more than an acknowledgment, which counts for little within the framework of research funding, promotion and tenure.

The first section of this paper, ‘Incentivizing data access’, shows how data journals – those that publish data reports rather than conventional articles – provide strong incentives for data creators to thoroughly verify, document, review and disseminate their work (i.e. to document and publish their data in accordance with open data principles). Unlike conventional journals with data-sharing mandates, data journals reward authors who share their data. Unlike data archives, data journals bring access and documentation into the mainstream of scholarly communication through conventional practices such as authorship, publication and citation.

The second section of the paper, ‘Characteristics of data journals’, updates an earlier study by Leonardo Candela and associates, providing current information about the data journal landscape: the characteristics and policies of data journals in biology, environmental science, chemistry, medicine, and health sciences. The results may be useful to librarians, to authors in STEM disciplines and to researchers in areas such as scholarly communication, science studies and information science.

The final section, ‘Data journals: potential and continuing challenges’, summarizes the advantages of data journals – advantages that may grow or diminish with changes in the scholarly communication system. It also describes three continuing difficulties: the need for sustainable data management practices, the fact that the incentives provided by data journals may not always offset the advantages of keeping data private, and the exclusionary effect of open access (OA) article processing charges (APCs).

Incentivizing data access

The authors of journal articles do not usually engage in unmediated data sharing (i.e. sharing through methods that do not require the requester to contact the author or data provider) unless mandates or other mechanisms require or encourage it. Only 37% of natural and social scientists have ever shared their data through a public repository or archive, and just 25% have shared their data through a journal’s website. Moreover, mediated requests for data, such as e-mail messages sent to authors, are unsuccessful at least one-third of the time. The success rate for mediated requests may be higher in particular fields, however.

Open data mandates

At least 24 major funding agencies in the life sciences have established policies that require or promote data sharing. Nearly a quarter of health and life science journals have data-sharing policies of one kind or another, and many prominent journals have adopted policies that require authors to provide unmediated access to the data used in their analyses. For example, the Public Library of Science (PLOS) requires authors to make their data publicly available when the paper is submitted.

There are two problems with these mandates, however. First, many researchers fail to comply with data-sharing mandates, even when the submission of data is a nominal requirement. In the BioMed Central (BMC) journals, which require data sharing, full data are immediately available for just 31% of the papers. Sixty per cent include a notation that data are available on request, while 9% appear not to comply with the requirement in any way. Although strict standards for the protection of human subjects may partially account for non-compliance with data-sharing requirements, it is not clear why an author would submit to a journal with a data-sharing mandate while knowing that his or her data could not be shared. Higher rates of unmediated data sharing, from 78% to 86%, have been reported for life science journals other than the BMC journals, but more than a third of the data files are incomplete; they do not provide enough details for replication. In the field of metabolomics, there is only a weak relationship between journals’ data-sharing policies and the extent to which data are actually made available. Second, and perhaps more significantly, data-sharing mandates provide no real incentives for compliance, since the reward for publishing in a journal that requires data sharing is no greater than the reward for publishing in a journal that does not require data sharing. Given the extensive time and effort that data sharing requires, mandates may even encourage adaptive strategies that are not in the best interests of the scholarly community. For instance, data-sharing requirements at individual journals may encourage authors to send their work to other journals of similar quality and scope that do not require data sharing, since sharing involves a cost without a commensurate benefit. Likewise, authors may be less likely to generate or compile original data if the relative benefits of that work (to them) are reduced through mandatory data disclosure. That is, effective data-sharing mandates at conventional journals might increase data availability in the short term, but decrease it in the long term once authors adjust their behavior in accordance with the more widespread availability of data – once they fully understand that it is more cost-effective to use available data than to generate their own. Authors may also postpone their article submissions in order to more fully exploit their data files before handing them over to others, thereby delaying the reporting of potentially important findings.

It is important to realize that even the widespread enforcement of data-sharing mandates is unlikely to change the underlying system of incentives. We can expect data sharing to become more common system-wide only if scholars, promotion committees and funding agencies assign greater credit to the production, documentation and dissemination of data.

The importance of incentives

Surveys and discussions with researchers in the life sciences suggest several reasons why authors may be reluctant to make their data available on the open web. (See Table 1.) When asked about obstacles to the free, online dissemination of data, survey respondents mentioned the need to keep data private during ongoing projects, inadequate credit for those who create and share data, legal concerns, the possible misinterpretation or misuse of publicly available data and the potential loss of control over valuable intellectual property. Researchers recognize the benefits of open data initiatives, but they also want full credit for their contributions.

Table 1

Reasons authors may be reluctant to share their data

Poor research practices

    Absence of a culture of data sharing in the academic field
    Potential for discovery of errors in the data creator’s published analyses
    Inadequate documentation of data-related procedures
    Failure to save and safeguard data, metadata or statistical code
    Loss of data or interpretive expertise due to the retirement or migration of personnel
Limited data storage and dissemination mechanisms

    Relatively few journals or other outlets devoted to data publication
    Lack of technical expertise in data publishing
    Hardware or software problems
    Obsolete devices and file formats
Limited awareness of open data principles

    Concern that public disclosure of data will violate legal or ethical norms
    Difficulty dealing with open access licensing terms (e.g. Creative Commons licenses)
Ongoing research

    Desire to keep data private until the research project is completed
Expenditure of effort

    Considerable effort required to produce documentation that is unlikely to be needed by the data creator in his or her own research
    Awareness that the expenditure of effort needed to comply with a data request, even if minimal, could be otherwise devoted to activities that bring greater rewards
Inadequate credit for data-sharing activities

    Reluctance to share valuable data due to a general sense of ownership
    Absence of universal mechanisms, such as authorship and citation, by which data creators can be recognized and rewarded
    Concern that the costs of data dissemination are considerably greater than the individual rewards – that the sharing of data without compensation, and the use of data without credit, are inherently unfair
    Concern that commercial firms will use the data inappropriately or without compensation
Potential for misuse of data

    Fear that data dissemination will facilitate plagiarism
    Concern that users without an understanding of the data will draw unwarranted or misleading conclusions

Incentives are especially important to scholars working on long-term research projects. James A. Mills et al. surveyed 73 ecologists serving as principal investigators on projects of five to 68 years’ duration, reporting that two thirds were unhappy with mandates for public data access and that only 8% were in favor of making their data freely accessible online. Nearly 55% stated that they would avoid publishing in journals that required them to share their data on the open web. Several noted that the data collected for a long-term project may provide the foundation for an entire career of scientific activity. Because no single grant provides funding for the entire duration of a project, the principal investigators must constantly identify new sources of short-term funding for interim projects that make use of the unique data they have collected. Widespread data dissemination ‘could lead to a loss of funding opportunities if data for their next project are routinely mined by other researchers’. Consequently, data sharing may reduce the number of long-term projects by decreasing the incentives for undertaking them.

Individual motives and incentives are not mentioned in most open data policies, which focus almost exclusively on the broader societal benefits of data sharing. For instance, the FAIR Data Principles do not address authorship, citation, or credit for the creation or maintenance of data files. After all, the main beneficiaries of data-sharing policies are data users rather than producers. ‘Researchers’ incentives to release their own data may or may not align with their motivations to gain access to the data of others’, especially when data submission and data archiving require a considerable investment of time and effort.

Data reports and data journals

Data journals are those that publish data reports on a regular basis, either exclusively or as a primary article type. Each data report describes the data that underlie an empirical paper or a broader research project. Data reports often include greater methodological detail than would normally be found in a research paper – information on the procedures used to generate or compile the data, the population of interest, the sampling methods, the variable names and response codes, difficulties encountered, decisions made, user notes and suggestions for further use of the data. Nearly all data reports are peer-reviewed. Although some present simple descriptive statistics or frequency tables, they do not normally include cross-tabulations, multivariate analyses, or other attempts to describe the relationships among the variables. For any empirical study that draws on the data, three elements – the study itself, the data report and the data file – should provide all the information needed to replicate the analysis.

Although the Journal of Chemical and Engineering Data first appeared in 1956, most data journals have been founded within the past five or ten years. Most conform to OA principles and are therefore open data journals.

As noted earlier, data-sharing mandates and data repositories seldom provide substantial incentives for authors to make their data openly available. In contrast, data journals do so by adopting a universally accepted mechanism of quality control (peer review), providing authorship credit for journal articles (data reports), and facilitating the indexing and citing of those reports. Perhaps most importantly, they provide both authorship and citation credit, since data reports are peer-reviewed articles that can be readily cited and recognized for their scholarly impact in terms that are widely understood by promotion and tenure committees. Moreover, data journals bring data publishing into the mainstream of scholarly communication, since data reports are authored, published, indexed, cited and used in much the same way as conventional journal articles. After all, initiatives to encourage or require data dissemination are unlikely to be as effective as the systems and incentives that underlie data publishing. Through publication, data reports emerge as first class research products that are fully integrated into the scholarly communication system through processes such as authorship, validation (e.g. peer review), dissemination, preservation and citation. Moreover, they can provide valuable information that is often not fully presented within either data sets or conventional research papers.

Although the number of data journals has been increasing over the past few years, many researchers are still unaware of them. In a recent survey, only 16% of researchers in the natural and social sciences were able to name one or more data journals. Nonetheless, many respondents were intrigued by the prospect of authoring data reports, and one wrote ‘I’ve never heard of this, but it sounds fantastic’. A similar 2009 survey found that only 9% of meteorologists had heard of Earth System Science Data, but 69% said they would use a data journal to find data relevant to their work. Likewise, 67% reported that the prospect of getting authorship credit for their contributions would make them more likely to publish their data.

Characteristics of data journals

This section examines the characteristics of data journals in the fields of biology, environmental science, chemistry, medicine and health sciences, presenting detailed information on each of the 13 data journals that regularly publish in those areas.

Methods

I used Google to identify an initial set of journals, following much the same procedure as used by Candela and associates in 2015. Specifically, I searched for the phrases ‘data journals’, ‘open data journals’ and ‘open access data journals’. I also searched for the names of well-known data journals, since the websites that mention one data journal often mention others as well. Finally, I added all the data journals identified by Candela et al., which resulted in a list of 169 journals.

Further review of the journals’ websites revealed that only 19 of the 169 journals (11%) currently publish data reports on a regular basis. Specifically, the 169 journals include:

  • 19 ‘pure’ data journals, for which data reports comprise at least half the papers in the journal (Group 1)
  • 109 journals that publish data reports but are devoted mainly to other types of contributions (Group 2). On average, data reports account for just 1.6% of the papers in these journals, and there are just three journals for which data reports comprise more than 8% of the published items
  • 21 journals that do not actually include data reports as a publication type (Group 3). Some may have been inadvertently described as data journals due to their coverage of data science topics or their strict requirements for dissemination of the data used in empirical papers
  • 20 journals that are no longer published, or that no longer publish data reports (Group 4). Very few of these journals were devoted mainly to data reports.

The Appendix lists the journals in each group. The fact that only 19 of the journals are pure data journals (Group 1) is consistent with the findings of Candela et al., who reported that only seven of their 116 journals (6%) were devoted solely to data reports.

No online list of data journals is comprehensive. Moreover, most of the online lists include journals that are not data journals in any real sense. Some journals claim to accept data reports but have never published any. Others publish articles about data science rather than data reports. (See the Appendix.)

The variables (journal characteristics) examined in this study are more extensive than those presented by Candela et al. (See Table 2.) However, information is provided only for the 13 Group 1 data journals that publish in the fields of medicine, health, biology, or chemistry. Information was compiled mainly from the websites of publishers and journals, although other sources were consulted.

Table 2

Variables for which information was compiled

General information

    URL
    Open access?
    Year founded
    Items published, July 2018 through June 2019
    Percentage of published items that are data reports
    Subject scope
    Publisher
    Publisher information
    General note
Characteristics of data reports

    Term for data reports
    Typical length of data reports
    Required or recommended sections of data reports
    Original or secondary data?
    Data files hosted on journal’s platform or elsewhere?
    % of data files included in text of report
    % of data files on journal’s website as supplementary files
    % of data files in external data repository
    % of data files not found
    Data hosting note
Editors and peer review

    Editor(s) in chief
    Editorial board
    Review process
    Time from submission to first decision
    Time from acceptance to publication
    Acceptance rate
Licenses and article processing charges

    Creative Commons license(s) for OA data reports
    Article processing charge (APC) for OA data reports
    Waivers or reductions of the APC?
Indexing and citation impact

    Indexed in BIOSIS?
    Indexed in PubMed?
    Indexed in Science Citation Index (SCI)?
    Indexed in Scopus?
    SCI Impact Factor percentile
    Scopus CiteScore percentile

The concentration of data journals in the life sciences is nothing new. The current percentage – 68%, or 13 of the 19 Group 1 journals – is consistent with the 77% value reported by Candela et al. The number of multidisciplinary data journals appears to have increased over time, however. Candela and associates identified just one multidisciplinary data journal, but this investigation includes three: Data in Brief, Scientific Data and Data.

Results and discussion

The results are presented in five subsections that correspond to the headings shown in Table 2. For the full results and associated notes, see Supplementary Table 1. (Details may be found in the data accessibility statement at the end of this article.)

General information

Table 3 shows general information on the 13 data journals included in the investigation. Three of the 13 journals, including the two oldest, are not actually open data journals, since they require a subscription for access. The other ten journals, all OA, are relatively new, with founding dates no earlier than 2013.

Table 3

Data journals included in the investigation

Data journalOA?FoundedItemsa Data reportsb Subject scopePublisher

Data in BriefYes20141,520100%All subjectsElsevier
Scientific DataYes201427490%Natural sciencesSpringer Nature
IUCrDataYes2016181100%Crystallography & related fieldsInternational Union of Crystallography
DataYes201614550%Natural scis., some social scis.Multidisciplinary Digital Publishing Institute
Earth System Science Data (ESSD)Yes200913055%Earth system sciencesCopernicus Publications
Biodiversity Data JournalYes20138074%Biodiversity sciencePensoft
Geoscience Data JournalYes20131887%GeosciencesWiley, Royal Meteorological Society
Journal of Open Psychology DataYes2013497%PsychologyUbiquity Press
Open Data Journal for Agricultural ResearchYes20162100%Agriculture & food (in)securitySeveral universities and research foundations
Open Health DataYes20132100%Health & medicineUbiquity Press
Journal of Chemical & Engineering DataNo195656997%Materials scienceAmerican Chemical Society
Chemical Data CollectionsNo2016155100%ChemistryElsevier
Journal of Physical & Chemical Reference DataNo19721499%Physical sciencesAmerican Institute of Physics, with NIST

a. Items published, July 2018 through June 2019. Includes data reports, conventional research articles, and other items such as editorials.

b. Percentage of published items that are data reports.

The three multidisciplinary journals – Data, Data in Brief and Scientific Data – are larger than most of the others. In fact, Data in Brief publishes about as many data reports as the other 12 journals combined. In contrast, the three smallest data journals each publish fewer than five data reports per year.

Ten of the 13 journals are devoted exclusively to data reports, except for occasional editorials or feature articles. Data reports account for at least 87% of the items published in those journals. However, three of the journals – Biodiversity Data Journal, Data and Earth System Science Data – routinely publish items other than data reports, such as reviews or empirical articles on data science topics.

Nine of the 13 journals are published by commercial publishers, and Elsevier accounts for more than half the data reports that appear each year. Notably, non-profit organizations publish two of the three journals that are accessible only to subscribers.

Although Candela et al. list BMC as a major publisher of data journals, the Appendix shows that data reports account for no more than 3% of the items published in any BMC journal. Data reports are welcome at nearly all the BMC journals, but always as one of several article types.

Overall, the publishers of data journals are notable for their good reputations. Publishers such as Elsevier and the American Chemical Society are well known, and Pensoft earned the 2016 Innovator Award of the Scholarly Publishing and Academic Resources Coalition (SPARC).

Characteristics of data reports

The 13 journals use 11 different terms for their data reports: data paper (five instances), article (two instances), data descriptor (two instances), data article, data description paper, data in brief, interactive key, research article, single taxon treatment, species conservation profile and taxonomic paper. Biodiversity Data Journal is unique in publishing several distinct types of data reports: data papers, interactive keys, single taxon treatments, species conservation profiles and taxonomic papers. The characteristics of data reports vary from one journal to the next. (See Table 4.)

Table 4

Characteristics of data reports in each of the 13 journalsa

Data journalTypical length (printed pages)b Percentage of data file(s)c

Original or secondary data?Included in text of reportd On journal’s website as suppl. filesIn external data repositorye Not found

Data in Brief6–9Either14%56%26%4%
Scientific Data7–10Either0%0%100%0%
IUCrData6–10Original0%100%0%0%
Data8–16Either0%28%60%12%
Earth System Science Data (ESSD)13–21Original0%0%100%0%
Biodiversity Data Journal12–22Original26%44%28%2%
Geoscience Data Journal8–14Either0%0%87%13%
Journal of Open Psychology Data4–6Either0%0%100%0%
Open Data Journal for Agricultural Research6–9Either0%0%100%0%
Open Health Data4–6Either0%4%74%22%
Journal of Chemical & Engineering Data8–11Original60%38%0%2%
Chemical Data Collections8–14Original76%6%10%8%
Journal of Physical & Chemical Reference Data25–40Original98%0%0%2%

a. Length and data file statistics are based on the 50 most recent data reports in each journal – or on all the published data reports, for journals with fewer than 50.

b. Each page range represents the middle two-thirds of the values (i.e. the median ± 1 standard deviation, adjusted to account for the natural breaks in the distribution of page lengths).

c. If the same data were presented in multiple places, they were counted in the leftmost column: in text of report rather than on journal’s website, and on journal’s website rather than in external data repository. Data sites owned by the journal publisher but separate from the journal were counted as external repositories.

d. This category includes most chemical data as well as most image data (photographs, blots, diagnostic images, etc.).

e. Includes cases in which the data could be readily located despite an incorrect URL or identifier in the data report.

Although the average length of a data report is nine pages, the typical length ranges from five pages at Open Health Data and the Journal of Open Psychology Data, to more than 30 pages at the Journal of Physical and Chemical Reference Data. These differences in length often represent differences in the number of sections or elements expected by the editors of each journal (e.g. data collection methods, sampling strategies, validation, limitations, unique or innovative characteristics, variables, coding, descriptive statistics, file specifications and user notes).

Six of the 13 journals accept only reports that describe original data – data based on the author’s own experimental, observational, computational or statistical work. Seven also accept reports based on secondary data – data compiled from publicly available sources (e.g. archives, documents or websites). With secondary data, the author is expected to have added value through processes such as compilation, standardization or verification.

While all these data journals require authors to make their data freely available without mediation, only two of the 13 have policies that require authors to host their data on the journals’ own websites. Five require authors to deposit their data in an external repository and six allow authors to present their data either on the journal’s website or elsewhere. Table 4 shows, for each journal, the percentage of data sets that are (a) included in the text of the data report itself, (b) available on the journal’s website as supplementary files or (c) hosted in an external repository. As Table 4 reveals, there is no consistency in the practices adopted by the 13 journals, other than a tendency to present chemical and image data within the data report itself. In particular, the three largest data journals – Data in Brief, Scientific Data and IUCrData – have each adopted different approaches to data access. Aside from a few minor discrepancies, the actual data access practices of the journals are consistent with their stated policies. For example, 96% of the data reports published in Open Health Data have data files hosted elsewhere, in keeping with the journal’s policy, and 4% have data files hosted on the journal’s website.

Previous research has revealed high rates of non-compliance with data mandates at conventional journals, from 14% to 69%. For the 13 data journals shown in Table 4, the rate of non-compliance is considerably lower at around 3%. Data journals’ lower rate of non-compliance is presumably due to the fact that only authors with a commitment to data archiving (publication) are likely to submit their work to a data journal. In contrast, conventional journals, including those that require data sharing, may attract authors who have no particular interest in making their data accessible.

For the non-compliant data reports – those that that did not provide immediate access to the data – broken links were the main difficulty. Specifically, the 565 data reports evaluated for this purpose (i.e. those that appeared most recently in each data journal) include ten with broken links to data repositories (for which the available information was not enough to provide ready access to the data), five with instructions such as ‘contact the author for data access’, four that include only summary statistics rather than raw data or microdata, three for which a supplementary file is mentioned but not accessible, three for which data access requires registration with the data repository and one for which the data repository includes a data set that is clearly incomplete. Broken links are especially prevalent at Data and Geoscience Data Journal. They do not necessarily represent non-compliance on the part of the author, however, since they can also result from errors by journal publishers and repository managers.

Access restrictions that require data users to identify themselves (i.e. ‘contact the author’ and ‘register with the data repository’) were counted as a form of non-compliance, since they are contrary to the spirit of OA; they provide an opportunity for authors and repository administrators to deny particular data requests. We should keep in mind, however, that access restrictions may sometimes be instituted to protect the privacy or safety of human subjects. At Open Health Data, the most clinically oriented of the 13 data journals, all five instances of non-compliance can be traced to access restrictions that require data users to identify themselves.

In 2015 most data journals required authors to submit their data to third-party archives, since ‘maintaining a 24/7 operational data repository service requires investments in specialized computing, software resources, and skilled technical staff’. Although this is understandable, a system that relies on multiple agencies and technologies is inherently less stable than one in which responsibility is clearly delineated. With external (third-party) data deposit, at least three actors are involved in every transaction – every attempt to deposit, evaluate, revise, verify or maintain the data. For instance, there is no mechanism by which the modification of a data file on PhysioNet necessarily leads to a change in the data report published by Scientific Data. Moreover, limited evidence suggests that non-compliance rates are lower when authors are required to post their data on the journal’s platform, either within the report or as a supplementary file. The point-biserial correlation, rpb, between compliance rate and data policy is 0.30 when the policies are coded 1 (data are included within the report or on the journal’s website), –1 (data are hosted in an external repository), or 0 (either option is acceptable).

All 13 data journals are willing to publish data that have not (yet) been used in a conventional research paper. However, the editors of Data in Brief recommend that authors first publish the research that draws on their data, then cite that research in the subsequent data report. This practice gives authors the exclusive use of their data, at least for a time, and ensures that the data are of proven utility. In contrast, the editors of Data recommend that authors first publish a data report, then cite the data report in their research.

Editors and peer review

The editors of the 13 journals are almost all at well-known universities or research institutes, such as Harvard University, Oxford University, MIT, the University of Copenhagen, the University of Melbourne, Uppsala University and the National Institute of Standards and Technology (NIST). The editorial boards vary in size from eight to 258 members (median = 20). Apart from the Journal of Physical & Chemical Reference Data, which has a strong US focus, no journal is dominated editorially by a single institution or country. All have broad international representation.

Candela et al. reported in 2015 that nearly all data journals use conventional peer review, in which:

  • the review process is intended to both evaluate papers and improve them through revision
  • at least two anonymous reviewers are selected by the editors
  • the reviewers’ comments are the primary factor in the editors’ decision
  • the reviews are not made available to readers of the journal
  • there is no provision for post-publication review.

More recent evidence (Table 5) shows that conventional peer review is still the norm. Nonetheless, the peer review criteria used by data journals do account for the journals’ unique characteristics. For example, Open Health Data asks reviewers to consider several criteria that apply to both the data and the data report: correctness of data description, level of methodological detail, appropriateness of methods, ability to replicate the data, extent to which reuse of the data is addressed, accessibility of the data, protection of study participants’ privacy, appropriateness of data repository, accessibility, licensing, use of non-proprietary file formats, labeling, user notes and inclusion of software or other materials needed to make use of the data. Overall, the review criteria adopted by data journals correspond closely to data users’ expectations. Data users want reliable, well-documented data collection and processing methods, adequate metadata that allow for replication, technical details that inspire confidence in the quality of the data, and data files and notes that can be understood without assistance.

Table 5

Notes on review processes

Data in Brief

    Six criteria: Is the data format in alignment with existing standards? Are the protocol/references for generating data sufficiently explained? Is the data description complete and is data well-documented? Do the authors adequately explain the data’s utility? Are the data potentially reusable? Does the article adhere to the template?
Scientific Data

    Each paper is reviewed by one data standards expert and at least one subject expert based on ‘the technical quality of the procedures used to generate the data, the reuse value of the resulting datasets and their alignment with existing community standards, and the completeness of the data description. [Acceptance] is not based on the perceived impact or novelty of the findings’.
IUCrData

    Single-blind review by at least two reviewers. Papers not accepted after two rounds of revision will not be published.
Data

    Each paper is evaluated by at least two reviewers. Reviewers may choose to sign their reviews. Authors may choose to include the reviewers’ reports as supplementary materials.
Earth System Science Data (ESSD)

    Papers that meet the standards of an initial rapid review are posted to the journal’s website. Readers are invited to submit reviews or comments, and the editors’ decision accounts for both the solicited reviews and any additional remarks. If the paper is accepted, it is published with the referees’ comments (anonymous or attributed), the readers’ comments (attributed), and the authors’ replies.
Biodiversity Data Journal

    After initial editorial review, each paper is sent to two or three nominated reviewers, who are expected to submit their comments within ten days; and to several panel reviewers, who may choose whether to comment. Authors’ revisions are expected within one week, although extensions may be granted. Most revised papers are re-evaluated by the editors, although some are sent for another round of review.
Geoscience Data Journal

    The review process evaluates the data report (completeness, appropriateness of methods, uniqueness, applicability and utility of the data), the metadata (completeness and quality) and the data (accessibility and usability).
Journal of Open Psychology Data

    The review criteria include content, structure and argument, figures/tables, formatting and language.
Open Data Journal for Agricultural Research

    No information provided.
Open Health Data

    The review process evaluates the data report (description of methods, appropriateness of methods, ability to replicate methods, correctness of data description, extent to which reuse of the data is addressed, accessibility of the data) and the data (appropriateness of data repository, accessibility and licensing, file formats, labeling and user notes, study participants’ privacy, inclusion of software or other necessary supplements).
Journal of Chemical and Engineering Data

    ‘Articles should present a significant amount of experimental or computational data on properties of systems of technological or theoretical interest that are not available in the original literature, that have lower uncertainty than those published, or that help resolve conflicts in previously published values.’
Chemical Data Collections

    Six criteria: Is the data format in alignment with existing standards? Are the protocol/references for generating data sufficiently explained? Is the data description complete and data well-documented? Do the authors adequately explain the data’s utility? Are the data potentially reusable? Does the article adhere to the template?
Journal of Physical and Chemical Reference Data

    No information provided.

Just one of the 13 journals, Scientific Data, has adopted the ‘soundness-only’ standard used by some OA journals such as PLOS ONE and Scientific Reports. This standard is meant to ensure that scientifically rigorous work is not excluded due to its presumed lack of novelty, significance or expected impact, and to avoid the publication bias that results when only statistically or theoretically significant work appears in the literature. However, there is evidence that reviewers consider perceived importance even when instructed not to do so, and the distinction between conventional and ‘soundness-only’ peer review may be less meaningful for data reports than for conventional research articles.

A few of the 13 journals have adopted innovative procedures while maintaining conventional peer review standards. Scientific Data sends each paper to at least one subject expert and at least one data standards expert. IUCrData accepts or rejects each paper after no more than two rounds of revision. In a somewhat greater departure from the norm, Biodiversity Data Journal solicits reviews from both regular reviewers (who agree to review the paper) and panel reviewers (who may or may not choose to comment). Reviewers’ comments are expected within ten days, and authors are expected to complete their revisions ten days later, although extensions may be granted. Likewise, papers that meet rapid review standards at Earth System Science Data are posted to the journal’s website. Readers are asked to submit their comments and the editors’ decision accounts for both the solicited reviews and any additional remarks. If the paper is accepted, it is published with the referees’ reviews (anonymous or attributed), the readers’ comments (attributed) and the authors’ replies.

For the ten journals with available data, the median time from submission to first decision is 52 days. Two of the ten have median review times of 30 days or less (Data: 17 days; IUCrData: 24 days), five of 35 to 60 days, and three of 130 days or more (Journal of Chemical and Engineering Data: 132 days; Scientific Data: 165 days; Geoscience Data Journal: 167 days). There is far less variation in the usual time from acceptance to publication, the median being 18 days, with values of 38 days or fewer for all but Geoscience Data Journal. (See Supplementary Table 1 – details may be found in the data accessibility statement at the end of this article.) Unfortunately, acceptance rate data were available for only three of the journals: Chemical Data Collections (37%), Data in Brief (39%) and IUCrData (83%).

Licenses and APCs

As noted earlier, ten of the 13 data journals are open data journals that comply with the Berlin Declaration on Open Access. OA principles are represented fully in the Creative Commons CC BY licenses adopted by each of the data journals for which information is available. The CC BY license allows others to redistribute, modify and build upon the data report (and the accompanying data) as long as they credit the author/creator of the original work. Six of the data journals – Data in Brief, Scientific Data, the Journal of Open Psychology Data, Open Health Data, the Journal of Chemical and Engineering Data and Chemical Data Collections – also offer other licensing options that (for example) limit the creation of derivative works, restrict commercial use, or limit redistribution and use in the first 12 months after publication.

At the ten open data journals – the first ten shown in Table 6 – the APCs vary dramatically, from no charge at all (three journals) to $1690 at Scientific Data. The average APC is $574, with no consistent difference between commercial and non-profit journals. These results are in line with those of Candela et al., who reported an average APC of $523–$566 in 2019 dollars. Although many authors have grant funds or institutional support to cover these charges, that is not always the case, and data archives/repositories may charge additional fees. Fortunately, most of the 13 journals have generous APC waiver policies for authors in developing countries, and most will also consider granting waivers for other reasons.

Table 6

Article processing charges (APCs) and waiver policies

Data journalAPCa Waivers and reductions

Data in Brief$600Possible, especially for authors in developing countries
Scientific Data$1,690Automatic for authors in developing countries; possible for others
IUCrData$200Possible, especially for authors in developing countries
Data$1,020Possible, especially for authors in developing countries and in disciplines with less funding
Earth System Science Data (ESSD)$0Not applicable
Biodiversity Data Journal$110–$510Automatic for retirees, independent scholars, students, and authors in developing countries
Geoscience Data Journal$1,200–$1,500Automatic for authors in developing countries
Journal of Open Psychology Data$0Not applicable
Open Data Journal for Agricultural ResearchNot statedNot stated
Open Health Data$0Not applicable
Journal of Chemical & Engineering Data$1,250–$5,000Automatic for authors in developing countries
Chemical Data Collections$500Possible, especially for authors in developing countries
Journal of Physical & Chemical Reference DataNot statedNot stated

a. The Journal of Open Psychology Data and Open Health Data ask for voluntary contributions of $435 and $125, respectively.

Indexing and citation impact

Data journals bring data dissemination efforts into closer alignment with scholarly norms through peer review, indexing and citation. Although several authors have set forth guidelines for the direct citation of data files, none of those guidelines have been widely adopted. Data files are often used but not cited and data citations, when they do appear, are often inconsistent in format. The inclusion of data reports in bibliographic databases such as BIOSIS, PubMed, Google Scholar, Science Citation Index (SCI) and Scopus provides a way around these difficulties. Data reports can be indexed and cited in the same way as conventional research articles, using the same mechanisms that have proven effective within the broader system of scholarly communication.

These advantages will be achieved, however, only if data journals are actually included in the major bibliographic databases. Of the 13 data journals, just three are indexed in BIOSIS. Ten are indexed in PubMed, however, six in SCI, and eight in Scopus. (See Table 7.) Despite the poor coverage of these data journals in BIOSIS and SCI, there are two reasons why the inclusion of data journals in bibliographic databases may provide an incentive for authors to publish there. First, the indexing of Group 1 data journals – those devoted mainly to data reports – appears to have improved substantially in recent years. In 2015, none of the seven Group 1 journals identified by Candela et al. were included in either SCI or Scopus. Since many bibliographic databases are reluctant to include recently founded journals, we might expect better coverage of data journals in the coming years as each builds a multi-year record of publication and scholarly impact. Second, the index coverage rates for all 13 journals, combined, are higher than might be suggested by the entries for the individual journals (Table 7). This is because the journals that publish more data reports are more likely to be indexed in BIOSIS (rpb = 0.53) and Scopus (rpb = 0.34). Ninety-four per cent of the data reports in these 13 journals are indexed in PubMed, 91% in Scopus, 63% in BIOSIS, and 33% in SCI.

Table 7

Bibliographic index coverage and citation impact (Impact Factor and CiteScore percentiles)

Data journalBIOSISPubMedSCIScopusIFa CiteScoreb

Data in BriefYesYesYes71
Scientific DataYesYesYesYes8799
IUCrDataYes
DataYes
Earth System Science Data (ESSD)YesYesYes9999
Biodiversity Data JournalYesYesYesYes2643
Geoscience Data JournalYesYesYes6688
Journal of Open Psychology Data
Open Data Journal for Agricultural Research
Open Health DataYes
Journal of Chemical & Engineering DataYesYesYes5174
Chemical Data CollectionsYes41
Journal of Physical & Chemical Reference DataYesYesYes7691

a. IF (Impact Factor) is the average number of times the articles published in the journal over a two-year period (the two years prior to the report year) were cited during the report year, based on SCI data. It is expressed here as a percentile rank among journals in the appropriate subject category.

b. CiteScore is the average number of times the articles published in the journal over a three-year period (the three years prior to the report year) were cited during the report year, based on Scopus data. It is expressed here as a percentile rank among journals in the appropriate subject category.

Among the 13 journals, inclusion in BIOSIS, PubMed, SCI and Scopus is not generally associated with variables such as founding date, non-profit status, report length or APC. There are a few exceptions, however. SCI is especially likely to index the journals that were founded earlier (rpb = 0.51), those that publish longer data reports (rpb = 0.56) and those with higher APCs (rpb = 0.49). Scopus is especially likely to index the journals that publish longer data reports (rpb = 0.47) and those with higher APCs (rpb = 0.41).

Only the journals indexed by SCI are assigned Impact Factors (IFs). Likewise, only the journals indexed by Scopus are assigned CiteScores. As Table 7 shows, both indicators reveal the same pattern: two data journals, Scientific Data and Earth System Science Data, have exceptionally high citation impact; four have higher impact than the average journal in their subject areas and seven have lower impact or are not covered by SCI and Scopus. The omission of a journal from those two databases does not necessarily indicate low impact, however. It may also be due to insufficient data (e.g. a recent founding date), failure to adhere to a regular publication schedule, a high self-citation rate or other factors. The high citation impact of Scientific Data is notable, especially since it was among the less cited data journals just a few years ago. Its CiteScore puts it in second place (99th percentile) among the 206 journals in the Scopus ‘statistics and probability’ category. It also ranks at or above the 94th percentile in five other subject categories. Likewise, Earth System Science Data is ranked first of the 182 journals in general earth and planetary sciences. Even Data in Brief, somewhat lower in the hierarchy, is ranked 26th of the 90 journals (71st percentile) in the Scopus ‘multidisciplinary’ category. Despite their recent emergence, nearly half of the 13 data journals have higher citation rates than most of the conventional journals in their subject areas.

Summary

Of the 169 journals identified as data journals by Candela et al., or in various online sources, only 19 are Group 1 data journals – pure data journals devoted primarily to data reports. The 13 Group 1 journals that publish in the fields of medicine, health, biology or chemistry vary greatly in size, subject scope, publisher characteristics, length of data reports, data hosting policies, time from submission to first decision, APCs, bibliographic index coverage and citation impact. Nonetheless, nearly all are similar in their peer-review criteria, their OA license terms and the characteristics of their editorial boards.

Data journals: potential and continuing challenges

Data archives and data journals both make data freely accessible online. However, there are several advantages specific to data journals:

  • Quality control
    Conventional peer review ensures the quality and completeness of both data and documentation, thereby facilitating replication and reuse. The data report format encourages the replication and transparency that are essential to scientific research.
  • Discoverability
    The indexing of data reports increases their discoverability, thereby encouraging the use and citation of data while also promoting opportunities for collaboration between data creators and other scholars.
  • Incentives for data publishing
    The article format allows authors and institutions to receive full credit for their data-related work. It also facilitates citation linking from the data report to the studies that have used the data, and vice versa. The established system of authorship and citation credit gives researchers strong, direct incentives to publish their data, and these same incentives may encourage the production and dissemination of new data.
  • Efficiency of effort
    Data reports reduce the need to include data details in all the papers that use the data; authors may simply refer to the earlier data report.
  • Sustainability
    Hosting data on the publisher’s online platform helps ensure long-term access by reducing data users’ dependence on multiple organizations and multiple links.

The benefits of data journals are system-wide. The author who publishes a data report gets a peer-reviewed article, perhaps in a high-impact journal, for work that might otherwise go unacknowledged. Other researchers get a free data resource that has been evaluated more rigorously and described more fully than it might otherwise have been. The author’s institution gets an opportunity to raise its profile in rankings that account for publishing productivity or for the citation impact of scientific research. Finally, the publisher gets a journal – perhaps a highly cited journal – and an initial advantage in the data publishing arena.

Of course, authorship credit is a reliable incentive only if scholars acknowledge its value. Recent survey evidence suggests that they do. Natural and social scientists agree that conventional peer-reviewed articles carry more weight than peer-reviewed data reports, which in turn carry more weight than peer-reviewed, stand-alone data files. Most data files are not peer-reviewed, however, and all forms of peer-reviewed work are regarded more highly than other contributions.

Moreover, authorship credit is widely understood and accepted by scholars in a broad range of fields, unlike new forms of acknowledgment such as ‘data steward’ credit. At the moment, data archives provide no similar incentive – no true authorship credit – since the peer-reviewed article (or the book, in some fields) remains the gold standard by which research outputs are evaluated. Although changes in formal assessment programs such as the Research Excellence Framework might provide greater credit for data archive submissions (and thereby reduce the advantages associated with data reports and data journals), there is currently no sign that changes of this type are anticipated.

Despite the advantages of data journals, three problems remain. The most serious problem, which affects data archives and data journals equally, is the need for sustainable data management practices. A shift in responsibility – from individual authors to stand-alone data archives to data journals, for instance – does not alter the underlying fact that some individual or group must undertake the long-term management of hardware, software and data (e.g. maintenance of links and migration from older to newer file formats). Unfortunately, none of the 13 data journals shown in Table 3 have formal, publicly accessible policies that describe how they will ensure long-term data preservation and access. In that respect, they are similar to conventional economics journals. As the Appendix shows, not all data journals are financially or administratively viable, and many appear to have no ‘insurance’ in the event that they are no longer able to maintain the reports and files entrusted to them.

Second, previous research suggests that the advantages of keeping data private are especially great in fields such as ecology, where long-term, externally funded research projects (up to several decades’ duration) are the norm. Data journals may have limited impact in those subject areas, since the credit associated with a few data articles is unlikely to offset the benefits of maintaining exclusive access to unique research materials for an extended period.

Finally, the exclusionary effect of APCs is no less a problem for open data journals than for other OA journals. Although authors in developing countries can usually obtain APC waivers or reductions, others – independent scholars and students, for instance – may not be able to do so.

Data accessibility statement

Supplementary Table 1 is available at http://doi.org/10.5281/zenodo.3755191.