Today, the Chronicle of Higher Education reported that in the coming weeks JSTOR will make a subset of it’s archive of academic journals available to anyone who registers for a free account. This is, generally speaking, a good thing. However, as Alexis Madrigal of the Atlantic points out, details from the Chronicle’s article suggests that this is, at best, a very small victory for open access. He writes:

JSTOR told the Chronicle that each and every year, they turn away 150 million attempts to gain access to articles. That’s right. 150 million attempts!

The way I see it, that’s 150 million chances lost to improve the quality of the Internet. JSTOR, as the keeper of so much great scholarly work, should be one of the Internet’s dominant suppliers of facts and serious research. But if something is not publicly available, key gatekeepers like journalists and Wikipedians, move to the best available source, even if they know that there probably is a better source behind JSTOR’s paywall. So, instead, JSTOR’s vast troves of valuable information remain within academia and the broader Internet’s immune system is that much weaker.

Madrigal is makes an important point. Search engines like Google now regularly return links to academic articles as part of search results. For most people1 following a journal link leads to a page that informs you that you don’t have access to that article. Want to experience it for yourself, just follow this link to as article on Open access and academic journal quality … #irony.

Other than examining attempts, there are other ways to wrap our head around the problem. I decided that I’d look into how many articles are firewalled at JSTOR. To do this, I ran a number of queries on the JSTOR search engine2 using variations on the string (cty:(journal) AND ty:(fla)) AND (year:[1 TO 2012]) The results are rather sobering. Here’s the top line:

Publicly available articles on JSTOR as of 1÷13÷2011 total articles publicly accessible articles publicly available as a % of total
All articles on JSTOR 3,816,066 272,475 7.14%
Texts out of copyright (beginning-​1922) 533,282 264,384 49.58%
Texts published since 1922 3,172,269 8,085 0.25%

Currently, only 7% of all of JSTOR’s content is freely available.

Worse yet, only half of articles that have entered the public domain are publicly available via JSTOR!

0 of the 2,465,468 articles published between 1923 and 1996 are publicly available.

In 1997 the first open journals began to publish. However, only 8,085 — less than 1% — of the 829,330 JSTOR articles published after 1997 are publicly available.

These are big numbers.

In theory, the point of publishing is to disseminate research for the development of knowledge. Further, many of those 3 million articles were built on data collected through publicly funded research. I have a hard time seeing how we can say the public is getting a solid return on its research investment when it still doesn’t have open access to research it helped funded over fifty-​years ago.

As an academic of sorts, I appreciate the need to protect the work of research. But I cannot buy into the idea that copyright is the right way to protect that work (especially when the one who benefits in the long term is the archive as opposed to the scholar). Imagine an alternative scenario. For example, that academic publication were handled more like patents — which enter into the public domain after 20 years for the good of society. JSTOR currently holds approximately 2,567,820 articles that would, under patent laws, have entered the public domain, versus the 533,282 that currently have passed out of copyright.3

All of this speaks to Madrigal ‘s point. This massive amount of information that is only available to those of us who are lucky enough to be in institutions that are willing to pay for it.

Admittedly, as I understand things, JSTOR has no legal obligation to provide free access to any of this content. And the price of access for back articles is often set by the journals, or rather their publishers. However, moral obligations are entirely different.

To their credit, in September of 2011 JSTOR began the process of opening up access to all their content that has entered into the public domain. Approximately 50% of it is currently available, but that still leaves half of it behind firewalls.

Hopefully, JSTOR’s new program will greatly improve public access. However, given the fact that there are over three million articles that currently remain beyond the reach of the public (and many scholars), it’s going to take a lot to make a real dent.

BTW, I’ve made all of the data I collected available via google docs. Please feel free to use it as you’d like. If you do something cool with it, let me know.

  1. this includes academics as there are far more journals available than even the most affluent research institutions can afford to subscribe to []
  2. I had to brute force this. I’d love it if someone could point me to a example of python code to do the same sort of thing. []
  3. Currently 2,303,436 of those articles are firewalled. []