Having read Ross Mounce’s interesting article on wrongly paywalled Open Access articles in hybrid Elsevier journals, I came up with the idea of throwing a bit more data at the issue. We have a lot of those articles in our OpenAPC data collection (2630 articles in 670 different journals, to be precise), and since the contributing institutions reported their APC costs for all of them, we can be pretty confident that they were all paid for and none should be paywalled at the moment.
So, after fiddling around with some regular expressions and the structure of the landing pages at sciencedirect.com, I came up with a small python script which does the following:
- Extract all hybrid Elsevier articles from our collection
- Resolve the DOI for every article (will lead to a landing page at sciencedirect.com)
- Scrape the page content
- Use regexes to search for a PDF download link
Resolving and obtaining the page contents is quite time consuming, so the whole process took some hours. Here is a screenshot of the script’s output while being busy:
All green at the University of Bristol - but Elsevier is not off the hook yet. :-) The script collects all negative results and prints out a concise summary after finishing. So let’s have a look at the bottom line:
Surprise - 6 articles out of 2630 (or about 0.23%) were still hidden behind a paywall where it should not be the case according to our data.
Here is a detailed overview of the articles in question:
|institution||period||APCs paid (€)||DOI||journal|
|University College London (confirmed case)||2014||2257.64||10.1378/chest.13-0179||Chest|
|University of Glasgow||2014||1882.5||10.1016/j.renene.2014.11.024||Renewable Energy|
|University of Milan||2016||400||10.1016/j.puhe.2016.10.024||Public Health|
To be fair, while we can clearly tell that something is wrong here, it’s too early to lay the blame on Elsevier yet. Five of the items in question were part of aggregated data sets (The first one is from the Wellcome Trust open data, the next 4 articles from British universities were contained in the JISC collections data), so it is quite possible that something went wrong during the aggregation process. The article from Milan University has only been published this month, so maybe the transaction has not been fully processed yet and the full text will become available in the near future.
Nonetheless, this issue seems worth to investigate further. We will try to contact the involved institutions and see if we can shed some light upon this.
Update, Mar 1, 2017:
We have now cleared up 4 of the 6 cases:
- Number 1 turned out to be a false positive: Instead of the article DOI, a DOI belonging to the article abstract had been reported. The article itself is not paywalled. (Thanks to Ross Mounce for pointing this out)
- Number 2 remains somewhat unclear, but the UCL is positive that no APC has been paid for this article and there’s also no UCL affiliated author on it. In any case the data is clearly incorrect and we have removed this entry from our collection.
- Number 5 is also a false positive: Cambridge had erroneously reported a page/colour charge as APC (Again, thanks to Ross for looking into this).
We have corrected all of the entries mentioned above in our data collection.
- Number 3, however, turned out to be a true positive. The institution had paid an APC for this article in March 2014, so it is clearly incorrectly paywalled. The UCL will contact Elsevier and ask them to correct this case.
As a side note, according to UCL’s records the original payment was not payable to Elsevier, but to the ACCP, and it seems the journal has moved to Elsevier since then. This is interesting as it identifies changes in journal ownership as another possible source of such mistakes.