As a part of our investigation into what might constitute “Reasonable Costs” for Public Access to US Federally Funded Research and Scientific Data, we are examining what is known about the costs to service providers (publishers, repositories, etc.) for providing those services, and what is known about the prices researchers are charged to provide public access to research. For research data, the pathways to publication are diverse and may include submission and deposit in various types of repositories (disciplinary, institutional, etc), publication as supplements to research papers or as data papers, and the development of bespoke solutions. Because the 2022 OSTP memo “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research” (PDF) directs federal funding agencies to recommend or require that researchers use existing and appropriate online repositories, we focus this investigation on prices (or fees; we use these terms interchangeably) researchers are charged to deposit data to repositories.
The clearest information we have found on the topic of prices charged for data deposit is the Generalist Repository Comparison Chart (2020), which includes “costs to the researcher" information for the group of seven general purpose repositories it covers. There is additional work on sustainability, business models and revenue generation for data repositories, for example the OECD’s Business models for sustainable research data repositories (2017), that describes the range of approaches currently taken to generate revenue, which may or may not include deposit fees.
Our research so far
To gain a wider view of fees charged to researchers for data deposit, we turned to re3data, a global registry of research data repositories. Version 4.0 of the re3data repository metadata schema (not yet fully implemented in the registry) will allow the specification of a range of applicable repository types, including disciplinary, multidisciplinary, governmental, project-related, and other (defined as “neither institutional nor disciplinary”). Most of these types have yet to be applied to the current repository descriptions, and the assigned types we encountered were institutional, disciplinary, and other, and we explored fee and other upload (deposit) restrictions across these types.
We queried the re3data registry on 6 November, 2023, selecting the repository provider type “data provider” in order to exclude metadata-only repositories from consideration. We assume that project- or program-specific repositories are included in either the category “other” or “disciplinary.” We then recorded the upload restriction conditions (fee or membership requirements) for each repository type (institutional, disciplinary, and other), as well as across all repository types.
Deposit fees would appear to be uncommon across all repository types (Table 1). A total of just 23 of 2900 data provider repository entries (repositories hosting data) indicate a fee for deposit. Examples include the Archaeology Data Service, Bitbucket, and protocols.io. We note that some services do offer free deposit for smaller datasets (size limits vary), charging for deposits exceeding a specified limit, or charging researchers with grant funding or other resources, but still providing a basic, free option for deposit. This creates some uncertainty as to how repositories with tiered service models are represented in the registry.
Table 1. Upload restrictions by repository type from the re3data repository registry (re3data.org, queried 6 November 2023).
* A repository may appear in more than one of the repository type and upload restriction categories. Totals represent the result count for a given set of conditions for all repository types, and not the sum of each repository type.
A requirement for membership or institutional affiliation as an upload restriction condition is somewhat more common than deposit fees, particularly for institutional and disciplinary repositories. Membership can take multiple forms: affiliation with an organization (faculty, staff or student status at a university, for example) may be required for deposit privileges to an organization’s institutional repository, and fees may or may not be charged. Membership can also refer to a repository’s paid membership model for institutions, whose members are allowed to deposit to that repository, although it seems unlikely these fees would be passed on to a member institution’s researchers.
What can we conclude from this exercise? First, we note the possibility that repository entries are incomplete. A 2017 analysis of re3data metadata made this point, but also noted that entries are subject to a comprehensive review by the re3data editorial team. Even with careful and comprehensive review, this information can be difficult to determine from inspecting a repository’s website. Second, the variation and complexity in direct fee models appears to be difficult to represent with the current metadata schema. It is possible, even likely, that more repositories charge fees for deposit than the data indicate. With these caveats, we conclude that we can’t say with precision just how many repositories in the registry charge fees to depositors, but we’re reasonably certain that the total number of repositories charging deposit fees is low relative to those that do not charge fees.
Next steps
This investigation represents just one part of a forthcoming white paper and literature review synthesizing what is currently known about the costs and pricing of services that may be used to satisfy forthcoming requirements for providing public access to US federally funded research. The IOI research team is also gearing up for a series of focus groups and interviews with consortia, research institutions, repositories, and professional societies in 2024-2025, to develop an understanding of their concerns, priorities and processes for meeting expanded public access requirements.
Stay up-to-date with the latest from this project, including opportunities to get involved in the research, by subscribing to our newsletter.
Acknowledgements: We thank Michael Witt for his thoughts on an earlier version of this post, and re3data for additional information that we've incorporated into a minor revision of the original post on 12 December, 2023.