The following text is copyright 1997 by Network World, permission is hearby given for reproduction, as long as attribution is given and this notice is included.

If we could only get rid of the people

The right and left columns on the front page of The New York Times on July 18, 1997 are representative of an all too common type of problem in technology these days. This problem is getting a lot of press but most of the stories seem to miss an important detail.

July 17th was not a good day in the human fallibility department. One of the astronauts aboard the Mir space station pulled the wrong cable disabling the guidance system which was keeping the solar cells pointed at the sun. A technician at Network Solutions ignored warning messages from the automated systems and distributed corrupted master data files for the Internet domain name system. Back hoe operators in a least two parts of the country cut through major fiber cables carrying large amounts of Internet traffic. But if no one had told you about them, how much would you have known about these problems? Would you have even known that they happened?

The scale of the directly impacted population was quite small in the case of the error onboard the Mir so few people would have known of the problem if the information was not propagated through the news media. On the other hand the Times article on the Internet disruptions described a very large impact of that error in judgment --"countless thousands or even millions of E-mail messages had been returned as undeliverable, while untold numbers of users had been unable to make contact with various World Wide Web sites." This seems like a biggie, but in my case I only had one message returned out of the dozens I sent that morning. I am sure that the operators of large mailing lists and Internet spamers were heavily effected but I'm not so sure that most Internet users even noticed that anything was wrong. In many cases this was because their local computers or their local sites had already saved a temporary copy of the correct information. I do suppose that in other cases the behavior fit the already assumed Internet reliability model and would not have been noticed. In any case I would agree that quite a few people were actually effected by this error, no where near Bob Metcafe's "gigalapse" predictions, but a large enough number to be a real issue.

But how about the fiber cuts? It seems that fibertropic back hoes are quite the rage these days. Hardly a day goes by without some note floating by of another cable being cut. Since the extent of the telephone network infrastructure is currently so much larger than the data infrastructure most of these cable cuts effect mostly the phone network, but many do disrupt data connectivity. If this is the case why don't we see an almost constant inability to reach large parts of the Internet? Because all major and most other ISPs architect their networks to be redundant. A failure in a single link in an ISP backbone may cause congestion on other links but generally does not cause unreachability.

Architecture does not always win over a high klutz factor but it is a testament to the Internet architecture that most people would not have known of these front page problems if they had not been on the front page.

disclaimer: Mercy be, no klutzes here at Harvard so the above must be my meanderings.