As we’ve described in our overview post on our funding open infrastructure investigation, we’ve been working to better understand the funding for open technologies and other systems in research and scholarship. To do this, we’ve examined the funding and financial performance data for these organizations, with a particular emphasis on the 10 projects of our initial exploration of the open infrastructure landscape.

In doing so, we evaluated a variety of data sources, though confined our work to the investigation of data sources in English that were accessible online. These sources were primarily in the US, Canada, and Europe (though we discuss an available data source in Australia below). We are working to expand our understanding of other sources of funding outside this limited area and are aware of funding sources in Latin and South America through universities, local and national government agencies, as well as international non-governmental organizations, but have been as yet unable to find reliable sources of information on the funding these organizations have provided to open infrastructure services. We expect there are likely similar funding sources in the Middle East, Asia, Africa, and Oceania and look forward to feedback from the community on the important sources of funding in these regions we should be investigating as part of this work.

For transparency and as a guide to others interested in this analysis, we’re now discussing the data sources we’re aware of and our evaluation of the relative value each source contributes to our understanding of how services and those who provide them are funded in this space. For a better understanding of the terminology we use in this analysis, please take a moment to review our post describing these terms and how we define them for our purposes.

Funder grant databases

In our last post, we discussed that, many (but not all) large philanthropic organizations disclose their grant funding on their websites. This information typically includes the following information:

  • the grant recipient
  • grant amount
  • the date of the grant
  • the duration of the grant
  • a description of the project being funded

Most apply some high level categorization to the grant, placing it in a granter-specific funding category, initiative, program, or portfolio. These not only vary between granting organizations but can change from one funding cycle to the next as the granting organizations shift their priorities, interests, or personnel.

As we’ve previously discussed, funding organizations develop their own standards for disclosure. These can vary between sources, with some smaller granting organizations and other non-philanthropic organizations choosing not to disclose their grants or other support in any public form at all. Additionally, while some granting organizations make their data available for download in a spreadsheet format, most do not make this data easily downloadable from their website in any format other than raw HTML and only one funder we’ve found makes this data available via a publicly accessible application programming interface (API)

This makes accessing this data a highly manual process of searching and transcribing the available information into spreadsheets or other records for analysis. While some web pages with grant data can be “scraped” using computer code, some pages are structured to make this activity difficult (if not impossible) and others indicate their websites shouldn’t be scraped using the robots.txt protocol.

While there are initiatives to centralize this information from funders and make the data more accessible, such as Crossref’s Funder Registry, the Curtin Open Knowledge Initiative (COKI), Europe PMC’s Grant Finder, and 360Giving’s GrantNav, these services (particularly Funder Registry, COKI, and Grant Finder) are focused on the funding of published research rather than operational grants to service providers that don’t directly lead to publication. 360Giving has funding for service providers and includes additional information not available from some funder websites but is limited to funding from UK-based funding organizations.

Provider and Service Websites

Most providers disclose the names of their major funders on their websites but provide few, if any, additional details about when they received the money or what amounts they received. A few disclose details about their funding, including the amount, date, and a description of the intended use of the funding, but this is rare. While these disclosures may link to the funder’s main website, few link directly to the specific grant information on the funder site (likely due to the challenge of maintaining links that change as funders revise their sites) and there are discrepancies between what is reported by the granting organization and the provider.

Some of these discrepancies come from differences between the funder and the provider in how funding is accounted for. While a funder will typically indicate they awarded an amount to a recipient on a particular date for a particular duration (36 months, for example), providers may choose to display this funding in the fiscal year it was received, so a provider may display multiple grants from a funder who describes it in their grants database as a single grant. In the case of a 36 month grant, this may be displayed by the provider (when they publicly display this information) as 3 or even 4 payments from the funder depending on when the amounts were disbursed over those 36 months.

While often the amounts reported by the funder and recipient are the same, there are noticeable instances where the amount specified by the funder and the amount reported by the provider are different, even when taking into account the different ways the money can be tracked (lump sum or booked as disbursed). Nothing in our research shows this discrepancy is intentional and is most likely due to simple accounting errors, but it has the effect of distorting the actual amount of funding providers receive from funders. This may indicate some organizations are in need of additional assistance to manage and accurately report their finances.

In addition to directly listing funders on their website, some organizations publish annual reports that summarize the organization’s finances for the preceding year. These reports may outline key grants and other sources of funding, but often don’t provide details on the specific grant funding provided and may not disclose this information at all. These are also often incomplete as organizations decide which, if any, sources of funding to highlight in these reports. Many smaller providers don’t publish annual reports or choose to disclose only minimal financial information in the reports they release.

Similar to annual reports, funders and providers will sometimes publicize their funding in press releases, blog posts, or other information releases. These will sometimes disclose amounts and funding purposes but rarely include sufficient detail for an accurate accounting of funding received by a provider.

As with funder websites, the funding information on provider websites described above often must be manually transcribed or scraped from the raw HTML in order to be compiled and analyzed.

IRS 990

For those providers incorporated as tax-exempt non-profit organizations in the US, there is a requirement to report their financial information annually using the Internal Revenue Service (IRS) Form 990. Smaller organizations can use a consolidated Form 990-EZ that has fewer discrete categories and allows providers to summarize expenses, income, assets, and liabilities they would otherwise have to report in more detail on the Form 990. Private foundations file this information using the Form 990-PF and are asked to disclose their grants to the IRS along with their financial information, but with far less detail than is usually provided on their website.

Those organizations receiving USD $1,000 or more in unrelated business income are also required to file a Form 990-T and, despite being tax exempt, may owe business tax on the unrelated business income not related to their exempt purpose. In addition to the financial disclosures, all 990 filers are asked to disclose various business and governance practices in their annual filings, as well as identify directors, senior leaders, and other key employees and their salaries in their annual filings.

All of this information for Form 990, 990-EZ, 990-PF, 990-T, and other non-profit tax forms filed after 2011 are available in electronic form from the IRS. While the data is digital and provided in a machine-readable XML format, the data model is challenging to parse for analytical purposes and changes each tax year with changes in the applicable tax code. In addition, there are blocks of unstructured text in the submissions that require manual processing to extract key pieces of information about expenses, governance, and other details of business operations described in these sections.

A variety of community-sponsored tools, such as the Open990 catalog, the ProPublica Nonprofit Explorer, and additional tools listed here, have been developed for parsing, searching, filtering, and downloading various extracts of the information for use by researchers, journalists, and other interested parties, including a harmonized data model for comparing tax information across years, but these tools are still in development and are at varying levels of sophistication and usability. They are particularly challenging for users not familiar with processing tax information or working with machine-readable data formats.

Required audits

For organizations expending more than $750,000 or more of funds provided by the US government in a fiscal year, there is an additional requirement to provide copies of an audited financial statement, which is then made available through the Federal Audit Clearinghouse. These are PDFs of the audit report and exist as unstructured text, but describe and evaluate the finances and accounting practices of the audited organization. Despite being digital, as with most PDF documents, they are a challenge to scrape and parse into a machine-readable format for analysis. Most (but not all) of the providers we examined don’t fall under this requirement, but this is a useful source of financial information, as well as an additional cost for providers that comes with greater support from government funders in the US.

Other collateral sources

In addition to the sources of information outlined above, researchers involved with the services and providers often disclose funding on their curriculum vitae (CV), naming the funder, amount, dates (often only as the year), and a brief description of the funding. This is self-reported information that can be helpful in identifying the purpose of grant funding received for evaluation purposes but can also conflict with amounts, timeframes, and purposes described by funders. It's also not often clear whether the grant listed on a researcher's CV went to a particular service in whole or in part after being used to cover other expenses related to the researcher’s other projects, initiatives, or general expenses. As with the other datasets described, this must be manually transcribed for use in analysis.

Summary

While there is a great deal of funding devoted to this space and a good deal of data available on where that funding goes, the data is fractured among the funders and inconsistent between funders and providers making it difficult to access. Gathering it requires manual effort to find, transcribe, and organize the data for analysis across funders and providers. Even when the data is available, it is, in some cases, unreliable or, at the very least, can’t be taken at face value without additional cross-referencing and validation. This is the work we’ve done over the past several months and will present our findings from collecting and analyzing this data in future blog posts.

Posted by Richard Dunks