The following text is copyright 2007 by Network World, permission is hearby given for reproduction, as long as attribution is given and this notice is included.
Google: looking good by doing less evil
By: Scott Bradner
Google made news in mid march by saying that they were going to reduce the length of time that they keep of personally identifiable information they are going to keep about their users from infinite to merely obscene. There are some positives in this announcement but it mostly emphasises how bad things are now, and will continue to be.
Google announced their new plans in a blog entry on March 14th. ("Taking steps to further improve our privacy practices" http://googleblog.blogspot.com/2007/03/taking-steps-to-further-improve-our.html.) Google had been under pressure for a log time over their assumption that it was fine to keep a lifelong record of every search query each of their users ever executed along with the IP address the query was executed from and a cookie ID to link together queries from your computer even if the IP address changes.
Google is not alone in this belief, to one degree or another all of the search engine companies have said they save the same basic information -- although AOL says they do not keep IP addresses. Google does not exactly say why they think they need to keep a record of all of your queries -- their log retention FAQ (http://220.127.116.11/blog_resources/google_log_retention_policy_faq.pdf) says vaguely "We use this information to improve the quality of our services and for other business purposes. For example, we use this information for fraud detection and prevention purposes, to identify system problems and to combat denial of service attacks." But it is reasonable to assume that the main reason they keep the logs is that they are trying to get in our heads to see how we think so they can feed us ads that we will respond to. Google has done quite well in convincing advertisers that they know how to do this and the logs are the way they do it.
But even given this actual reason for the logs its hard to see that they need years worth of logs in which individual searchers can be easily identified -- under their new policy they will maintain logs forever but will do some simple tweaks to the data after 18-24 months to make it a little harder to identify the individual searcher. These tweaks are not likely to be all that effective in actually hiding people's identities as AOL found out when they released a pile of similar data. (See Thanks for nothing AOL http://www.networkworld.com/columnists/2006/082806bradner.html) I would think that the most reliable information Google needs to know about me in order to target ads comes from the last few months - its not all that often that I'll still be interested in a topic I was looking at 4 years ago.
Google says that the 18-24 month duration was chosen to be compatible with possible future data retention laws in various parts of the world. But the FAQ admits that the laws generally not exist yet and when they do the retention period could be as short as 6 months. Why not make the Google retention period be based on the laws where the hardware is located?
Maybe Google just wants an excuse for long retention because it is afraid that it has not yet thought of all the ways it can exploit the information it has about us.
Google has come very late to the realization that some people are worried about the information Google stores about them. This is a good first step but it would be far better for Google to actually anonymize their information in a few days or weeks rather than years.
disclaimer: Harvard does not forget easily, at least its former students since they are a revenue source, but has not expressed an opinion on others remembering activities so the above is my own opinion.