The following text is copyright 2005 by Network World, permission is hearby given for reproduction, as long as attribution is given and this notice is included.
Refusal, ignorance, arrogance or PR?
By Scott Bradner
In mid March French news service Agence France Presse (AFP) sued Google in a U.S. District Court for copyright violations. They demanded that Google News stop including AFP's material in Google News site and asked for $17.5 million to compensate AFP for the damage that Google News had caused. You will pardon me if I express some doubts about the actual motivation for this lawsuit.
I've written in the past about Google News. (http://www.nwfusion.com/columnists/2004/0308bradner.html) I consider it one of the most useful sites on the Internet. I use it to fill out the news snippits that I get from most other news sources. That said, I do get frustrated at Google News links to subscription-only sites since I cannot access some of the stories that look interesting. I have always assumed that such sites welcome Google's pointers because the sites get free advertising for themselves and thus may get some additional customers. In that context the AFP suit makes me wonder what's up with them. Google News does not show full articles so I find it hard to understand what damage could mount up to over $17 million -- maybe AFP has a very high opinion of its ability to come up with inventive headlines and feels that other news organizations will rip them off if the headlines, which Google News does show, are visible. Or maybe the reason that AFP does not want Google News to point to AFP's material is that AFP fears that getting more subscribers will mean that AFP would have to hire more people to deal with them.
Even if I do not understand why a company in the business of selling its services does not want more people to know about those services it does not look like it would be all that hard for AFP to ensure that the AFP sites are skipped over by Goggle. Google has an easy to find web page that says quite clearly how to keep a site from being scanned. (http://www.google.com/remove.html) Basically all you do if you want Google to skip all or part of your site is to put one or more files named "robots.txt" in your web site. For example, your whole site will be skipped if you have such a file at the root of your web server containing these two lines:
Robots.txt files can get quite fancy see http://www.searchengineworld.com/robots/robots_tutorial.htm for more information.
I suppose it is possible that the Google News web crawlers do not pay attention to the robots.txt files that Google says that it respects for its other web crawling but that does not seem all that likely. It is more likely that AFP somehow did not know how easy it would be to do 2 minutes worth of work themselves on their own web site to ensure that their material would not be included. A tactic that would have taken far less effort than, as they claim to have done, pestering Google trying to get Google to stop scanning. It would also have taken far less effort than filing a lawsuit. Well maybe it is not all that likely that no one at AFP knew about robots.txt files -- maybe there is some other reason that AFP did not take the easy path. The two that spring to mind are arrogance ('stop' said King Canute to the tide, 'splash' said the tide to King Canute) or a desire for publicity.
disclaimer: Of course you never see either arrogance or a desire for publicity in relationship to Harvard so the above observation is mine alone.