The digital knowledge gap

I recently ran across an article from 2012 titled The Missing 20th Century: How Copyright Protection Makes Books Vanish.  It had a graph that immediately caught my eye.

Screen Shot 2016-04-01 at 4.52.06 PMThe data was compiled from a sample of 2,500 books and presented in a talk called “Do Bad Things Happen When Works Fall Into the Public Domain?” by Professor Paul Heald.  You can access the talk on youtube.  Heald is a law professor, but my interest is not legal.

I was at a meeting at the University of Maryland recently, and I took a Lyft driven by an undergrad.  We were talking about libraries, and he said how great they were for meeting up with people to study.  So I asked him, “When was the last time you used a library to get a physical book?”  And he said, “Maybe elementary school?”

This is a college student, someone who is actively learning and working on assignments.  He doesn’t use physical books from a library — EVER.  He may not be representative of all students, but it certainly gave me pause.

I did a little more poking around, wondering about how often physical books are used in university settings, and read “National Survey Documents Effects of Internet Use on Libraries.”  It isn’t specifically about book use per se but it had a revealing passage:

“Of the time that graduate and undergraduate students devote to looking for information used in research and course work, one-third is spent in campus libraries. When searching for and using information for research and teaching, faculty members, by contrast, spend only about 10 percent of their time in the library.”

In other words, many of the people an academic library serves don’t actually use the physical structure for learning, regardless of whether they use a physical book.  Another study found that only 9% of surveyed students reported using a library book during their periods of heaviest academic work.

This is what the 6th floor of the university library looked like during the final weeks of the semester  —  a lot of books, and no people.

Increasingly students (and others) are turning to digital resources, and they don’t necessarily want to go to a physical library to get them.

The Internet Archive is trying to create a digital library that represents all human knowledge.  I analyzed 4 million of our books to find out whether we have the same gap in our digital collections that Heald observed in the books Amazon sells.


Internet Archive books are generally free to download if they were published before 1923, and have some level of restrictions if they are more recent (they may be borrowable, available only to sight disabled people, etc.).

Regardless of access levels, this library of 4 million books is missing a lot of content from the 20th century.  Most of these books are not in print any longer, and the publishers (if they still exist) have little incentive to make digital copies available.  And according to Heald’s small study, they may not be easy to buy – at least from Amazon.

These physical books do still exist in libraries around the country, but will those libraries make digital copies and make them available to the general public — not just their own students and patrons?

If we don’t address this digital knowledge gap, we may lose access to a century of information.


  1. Alexis, thank you for the wonderful article. you are raising an important issue not only to America but also beyond. There is trend even within schools in developing countries, where Internet and access to online academic resources is not yet the best, of decreasing library usage. May be Google is making searching for academic resources too easy to trouble with library shelves!!!

