Hacking article processing charges

Najko Jahn, Bielefeld University Library
9 - 11 June

Please introduce yourself!

What do you expect from the workshop?


The aim of this workshop is to gain a first understanding of how to collect, process and disseminate APCs paid as Open Data

Day 1 Considering available skills and tools

Day 2 Producing useful inputs for the library community

Day 3 Presentation of workshop results


Bielefeld University Library has been in charge of local funds to support publications in Open Access Journals since 2008.

The funds are supported by the DFG, thus certain conditions have to be met, e.g:

  • Fee per article may not exceed 2,000 €
  • Yearly reporting and application duties including determination of publication output, fees paid, work programme on how we manage APCs and complementary support

Huge workload, which is only available for review :-(

APCs spent by Bielefeld University Library (2012 - 2014)

plot of chunk unnamed-chunk-2

What is the value of making this information openly available?

Discuss potential uses and applications!

Open Data Initiatives for APCs

OpenAPC - Aim

The aim of this repository is:

  • to release datasets on fees paid for Open Access journal articles by German Universities under an Open Database License
  • to share a copy of Directory of Open Access Journals (DOAJ) journal master list (downloaded January 2014)
  • to demonstrate how reporting on fee-based Open Access publishing can be made more transparent and reproducible across institutions.


Information on both open access journal articles and open access publication of articles in toll-access journals (“hybrid”) are provided.

In total, 3706751€ for 3029 articles were paid by the participating unviversities. Average fee is 1223.8€ and the median 1190€.


plot of chunk unnamed-chunk-4

Source variable description
CrossRef publisher Title of Publisher
CrossRef journal_full_title Full Title of Journal
CrossRef issn International Standard Serial Numbers (collapsed)
CrossRef issn_print ISSN print
CrossRef issn_electronic ISSN electronic
CrossRef license_ref License of the article
CrossRef indexed_in_CrossRef Is the article metadata registered with CrossRef? (logical)
EuropePMC pmid PubMed ID
EuropePMC pmcid PubMed Central ID
Web of Science ut Web of Science record ID
DOAJ DOAJ Is the journal indexed in the DOAJ? (logical)

Are there any other sources we can use to enrich and disambiguate metadata on fees paid?

List three complementary sources!


We were inspired by the growing landscape of Open Science Tools, especially

  1. Version control with Git and collaboration through GitHub
  2. Clients that provide access to bibliographic data (rOpenSci, Librecat)
  3. Authoring tools such as knitr and R Markdown for automatic report generation

Ram, K. (2013). Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol Med, 8(1), 7. doi:10.1186/1751-0473-8-7

What other tools are available to make use of APC data?

Introduce your favorite Open Data tool!

Preparing Day Two

Potential works:

  • Provide a pre-filled spreadsheet for participating libraries
  • Explore interesting patterns in the data
  • Support semantic versioning of the data
  • Create dynamic reporting templates with RStudio

Please, team up!

Day 3

Hacking article processing charges


The aim of this workshop is to gain a first understanding of how to collect, process and disseminate APCs paid as Open Data

Day 1 Considering available skills and tools

Day 2 Producing useful inputs for the library community

Day 3 Presentation of workshop results

Summarizing Day 1

What is the value of making this information openly available?

  • help negotiations with publishers
  • open model from the start helps to keep the model open
  • promoting library services
  • teaches Open Data to library staff
  • bench marking across universities, funders and countries

Are there any other sources we can use to enrich and disambiguate metadata on fees paid?

  • Scimago
  • article-level metric, e.g. usage
  • author id, e.g. ORCID, however privacy issues may need to be solved
  • links to full text
  • indexed by GoogleScholar
  • Kudos

What other tools are available to make use of APC data?

  • Excel, GoogleDocs, Open Refine
  • Email and shared file storage
  • LibreCat provides a list of ETL-tools
  • Text editors, Sublime, Atom

Summarizing Day 2

Hands-On: adding APC information to the OpenAPC dataset.


  1. clean DOI list
  2. enrich with disambiguated metadata from CrossRef, Europe PMC and DOAJ
  3. merge with main spreadsheet
  4. automatic generation of report
  5. push to github
  6. generate Word Document for the Head Librarian