This story appeared on Network World at


The future library: A 50-petabyte iPod?


'Net Insider 


By Scott Bradner, Network World, 05/29/06


I started playing with digitized literature almost 25 years ago. A lot has changed in the digital books biz since then.


Some of the history, current status, future possibilities and clashing business models in this area were recently explored in a cover "manifesto" in The New York Times Magazine by Wired writer Kevin Kelly. Spoiler: It will all come out fine in the end, but the length of time you will have to wait depends on when Congress stops moving the copyright goal posts.


In the summer of 1982, a classics graduate student working in the computer lab I ran in the Harvard psychology department got a copy of the Thesaurus Linguae Graecae, a large batch of classical Greek literature that had been typed into computers someplace outside the United States, with HP co-founder David Packard paying the bill. I, along with people in the Harvard Classics and English departments, convinced the university administration to pay for a huge - at the time - 300MB disk drive to store this text as well as a collection of Middle English literature.


Over the next few years the graduate student, Greg Crane, now a professor at Tufts University, put together the first version of what became the Perseus Project. This is a Weblike mixture of text and clickable links to other material, done many years before the Web and search engines showed up.


This well-indexed online text changed what sort of things would be reasonable Ph.D. dissertation topics. Before Crane's work, a student could arrive at a topic after years of index-card- based investigations of how specific words were used in classical Greek; after Crane's effort, that became a weekend task.


Kelly's Times Magazine story explores what happens in a future where you might have petabytes of digital material being attacked by cutting-edge search engines. Kelly estimates that a 50-petabyte disk farm could hold all the 32 million books, 750 million stories and essays, 25 million songs, 500 million images, 500,000 movies, TV shows and short films and 100 billion public Web pages.


Quite a bit of the material is already digitized, including as new books, DVD movies and CD music. The story describes multiple projects under way to try to catch up with digitizing older books and discusses the legal and access issues caused by Congress' continual extension of the copyright period.


A few years ago in a column I quoted a student who told me "if it is not on the Web, then it does not exist." The same point was reinforced last week when I suggested that a graduate student see whether he could find some information on a particular topic in the library that was one floor down from my office, and he admitted to being in the library only once or twice - and had not looked anything up.


Kelly paints a picture in which physical libraries might not be needed, other than for books published by companies whose lawyers are not ready to embrace a searchable digital world. In Kelly's future, world books are no longer individual items but are parts of a vast relational database on steroids where your biggest problem will be figuring out how to ask the question you want answered. And to figure out what is left that could be a good dissertation topic. All in all, a very good read.


Disclaimer: If physical libraries fade away, Harvard is going to wind up with a lot of prime real estate that will be bitterly fought over, but I did not ask the view of the university library folk about The New York Times story, so the above is my own review.


All contents copyright 1995-2006 Network World, Inc.