The following text is copyright 1999 by Network World, permission is hearby given for reproduction, as long as attribution is given and this notice is included.

How big is the world?

by Scott Bradner

An undergraduate student told me last year that "if it was not on the web then it did not exist." The "it" she was talking about was research materials. Aside from the implication a statement like that has at a research University such as Harvard with the breathtaking variety of available resources in its libraries and museums, she had a very important point. Most people, and most students are people, are beginning to act as if the web is the only real data source in the world. This is more than a bit troubling on many fronts.

The web is now big enough to pass at first glance for a world surrogate. OCLC (The Online Computer Library Center - www.oclc.org) recently published the results of its latest annual set of statistics on the web. (http://www.oclc.org/oclc/research/projects/webstats/) They project that there are some 3.6 million web sites (+/- 3% fudge factor) with 288 million web pages ( +/- 35% ) - only 42 thousand sites ( +/- 30% ) they classify as adult sites. Those 1.2% adult sites sure do raise a political ruckus far in excess of their numbers. OCLC has a quite good methodology, which is well explained in a document reachable from their site, so their numbers should be able to be trusted as a first approximation.

There is clearly a lot of stuff out there. But what are the characteristics of what is there and what is not there?

One of the biggest problems with the 'Net is knowing the qualifications of some one creating and posting information. A particular document could have come from a future Nobel prize winner writing in his field or it could come from a demented teenager spewing out her fantasies. Unquestioning reliance on what you read on the net is just as productive as unquestioning reliance on what you read in a supermarket checkout line.

Another significant problem with using the 'Net as a primary or only source of information is that it is woefully incomplete. Very little current information is actually on-line. Some areas are far better represented than others with the national news papers and some areas of scientific research leading the way. But there is a real dearth of material from most areas. Largely this is a result of the fact that most people like to get paid for their labors. The web is currently mostly no-cost access to information. People with valuable content such as most printed books tend to not put it up lest they reduce sales of the books. Out of print books might seem a good target for net-based access but copyright laws get in the way.

There is a lot that you are missing if your world is just the web.

disclaimer: With 200K or so alumni, Harvard's world is the world but the above warning is my own.