Invest in Open Infrastructure (IOI) is dedicated to improving funding and resourcing for open technologies and systems supporting research and scholarship. As outlined in our 2021-2024 Strategic Plan, we set as our first goal to “Increase our collective understanding of the funding and infrastructure landscape by conducting research.”  To that end, we’ve been working to better understand the funding supporting open technologies and systems in research and scholarship. As we’ve stated previously in describing our initial research prototypes, the data on current investment in the sector, whether from the various funders and other supporters, or the recipients themselves, is at best disaggregated but also often incomplete if not inaccessible.

Today, we’re starting a series of blog posts outlining our work to address this challenge by assessing the available data sources and aggregating this information for initial evaluation and preliminary analysis. There is a great deal of funding for important work researching, evaluating, training, and otherwise advancing the cause of openness, transparency, and equity in scientific research and scholarship, but our focus in this research (and as an organization) is on the tangible work of creating, developing, and maintaining technologies and services that make open scholarship and research possible, particularly the ubiquitous and widely-used tools without which the work of research and scholarship would be challenging if not impossible.

Our Starting Place

As we’ve previously discussed, not every tool used by researchers and scholars exists as infrastructure and not all infrastructure is open. As we discussed in a previous post, we selected a small sample of 10 services to begin this analysis. With this small group, we’ve developed methods of evaluation and analysis we feel confident can be expanded to other services and their providers. We understand some in the community may be concerned both about the services and providers not considered as the ones that were. We look forward to engaging with the community on this question with the intention of crafting an approach that can ultimately be expanded to every open infrastructure service. Our hope is to elevate our shared understanding of infrastructure and openness as we continue this work together.

We entered into this funding data exploration with a few key assumptions about the challenges ahead, including:

  • Funding data will likely be inconsistent, if not wholly missing from funders and recipients
  • A wide variety of support provided, including monetary grants, institutional resources, in-kind services, and other forms of non-monetary support
  • A variety of paths by which funding flows between organizations, including funder to service provider, funder to affiliated researcher/primary investigator (PI), funder to intermediary organization to service provider, all of which without full documentation of funding purpose or even the actual recipient

These assumptions were confirmed in our analysis, further revealing the complex nature of these funding relationships and the challenge of creating a complete and accurate accounting even for the relatively small number of organizations we looked at in detail.

Getting the Data

To get this data, we searched the websites of large funders, including private foundations and government agencies, as well as academic institutions and service providers for references to funding awarded or received, as well as costs and other financial information. We also accessed financial information from the US Internal Revenue Service (IRS) for those service providers incorporated in the US and qualified under Section 501(c)(3) or 501(c)(6) of the US Internal Revenue Code for tax-exempt status.

Whenever possible, we downloaded the available information or recorded it in spreadsheets when it wasn’t possible to download the information. We attempted to automate this process whenever possible but, by necessity, much of the collection was done through manual review and collation. We stored large datasets in a Google Cloud Big Query instance and visualized the data using several custom Google Data Studio dashboards. We look forward to further developing these tools so the data and visualizations can be made publicly available.

Initial Insights

Based on this exploration, we identified 5 categories of funders in the space. We enumerate these categories below and intend to go into more detail about these in future posts:

  1. Large philanthropic organizations with large and diverse monetary investments in the field that demonstrate a sustained interest in developing open technologies and systems.
  2. Large philanthropic organizations with a few monetary investments in the field that demonstrate a more tangential interest in the space as part of other funding priorities.
  3. Government-sponsored funding bodies providing monetary support to open technologies and systems for research and scholarship as part of a broad portfolio of support for science in the public interest.
  4. Academic and research institutions that offer a variety of monetary and non-monetary support to providers, whether as hosting institutions, sponsors, or customers, in support of their research and education mandates.
  5. Other non-profit, commercial, and private organizations with an interest but not extensive involvement in the space, who offer monetary and non-monetary support (usually in-kind services) to providers.

From our research using data available from the US IRS on revenue, expenses, assets, and liabilities, we identified two broad models of primary revenue support:

  1. Contribution, grants, and gift revenue
  2. Program service revenue

In line with the scholarly literature in the field of non-profit management evaluation that we have begun to explore, we have come to understand the importance of this distinction for assessing the financial health and resiliency of these organizations, with an understanding that neither is superior to the other, as both sources of revenue, along with investment income, royalties, and other revenue sources, can be useful in meeting the obligations of service providers and there is no one-size fits all model of sustainability in this space. However, in providing an enduring service as opposed to just delivering on a time-bound project, long term stability is important to attracting both users and investments to the service. We will further explore this area of analysis in a future blog post.

Additionally, in the course of this research, we came across questions about organizational governance we believe are also worth exploring. In a community that values openness, transparency, and accountability, it seems important that organizations be as open, transparent, and accountable with their governance policies and practices as they are with their code and data. While some take this commitment seriously, we found some practices that we feel should be highlighted for further discussion among the community to elevate the conversation about how our shared values should be put into practice. We will also be discussing these practices in a future post to be released in the coming weeks.

What's to Come

In our future posts, we will cover the following topics:

  • Our definition of key terms in this analysis
  • A thorough discussion of the various available data sources, including both the opportunities and challenges associated with each source we accessed
  • A thorough discussion of our methodology in collecting funding and financial performance data
  • A fuller evaluation of the work we’ve done and some preliminary insights we’ve gained
  • Next steps in our analysis and further opportunities in this line of research

We look forward to sharing our findings with the larger community and gathering feedback in our continuing effort to shed light on key challenges as we work with decision makers to enact meaningful change.

Posted by Richard Dunks