Interesting People mailing list archives
more on Google and Data Retention - Policies and Possibilities
From: David Farber <dave () farber net>
Date: Tue, 31 Jan 2006 14:17:09 -0500
Begin forwarded message: From: Bradley Malin <malin () cs cmu edu> Date: January 31, 2006 2:03:59 PM EST To: dave () farber net Cc: lauren () vortex com Subject: Re: [IP] Google and Data Retention - Policies and Possibilities [Possibly for IP] For those that are interested, at Computers, Freedom, & Privacy in 2003 (on April Fools Day no less) there was a day long session (I was a participant) held by the Usage Log Data Management Working Group on retention and access. A long email thread regarding the session and subsequent discussions can be found here: http://cryptome.org/no-logs.htm Also, the register ran an article on the session: http://www.theregister.co.uk/2003/04/06/the_trails_left_in_web/ -brad ================================================ Bradley Malin, PhD Candidate Carnegie Mellon University School of Computer Science Institute for Software Research, International Dave Farber wrote:
-------- Original Message -------- Subject: Google and Data Retention - Policies and Possibilities Date: Tue, 31 Jan 2006 09:08:22 -0800 From: Lauren Weinstein <lauren () vortex com> To: dave () farber net CC: lauren () vortex com Dave, That Google can track user searches is hardly an "alert the media" revelation. This status was effectively obvious since we know that Google responds affirmatively to various law enforcement-related data-retrieval orders (and quite possibly to others that we don't know about, such as national security letters), that would be largely useless without such data -- and Google has never claimed to operate anonymously in this respect. A more interesting question in terms of data retention is *how long* various aspects of the data are retained. That is, does this fine grain of data "expire" over time, or is retrospective data mining of the detailed data possible back into the indefinite past? This issue is rapidly moving into the spotlight, as Congress appears poised to discuss laws that would *mandate* data retention rules for search engines and perhaps other Internet services -- and we allknow that when Congress gets involved in technical matters, the resultsare often -- shall we say -- less "optimal" than if industry had addressed these concerns themselves voluntarily. Obviously there are certain enhanced Google services (mostly related to logged-in users in the search and Gmail spaces, including but not limited to users availing themselves of Google's search history features) that require long-term detailed data to function. But viewed from the outside, there are steps that Google could take to minimize privacy-related risks while not significantly interfering with the value of that data for ongoing R&D and innovation. This is only a thumbnail conceptual description of course, based on external observations alone. 1) Minimize the length of time that full log records are maintained for users not using enhanced services. For instance, full records might be maintained for 30 days (an arbitrary figure for this example). These would be available to law enforcement queries and the like for ongoing investigations. However, after the expiration period, records would be anonymized (stripped of IP, cookie, and other connection-related data identifying the user). Logged search query strings (though they also can contain personal information, as we know) would not be affected at this stage and would continue to be available for R&D and other purposes, but now with a significantly lower outside abuse potential. 2) After some longer period of time (a year? -- again, an arbitrary period for the sake of this example) the remaining portion of the records for non-enhanced service users would be purged (deleted). I of course cannot address the non-trivial issues of system and related data backups in this regard, since I have no idea how Google has structured backup activities across their enterprise, but this aspect in particular might make for an interesting technical challenge. 3) Users of Google's enhanced search-history-based services, etc. represent another interesting problem, since detailed data must be maintained for these users in some form for the services to function. However, it seems likely that the outside abuse potential of this detailed data could be greatly reduced through various cryptographic techniques, while still permitting the required functionalities. It should be noted that cryptographic methods may also be applicable in various ways to alternative solutions for the issues described in sections (1) and (2) above.Since I am not privy to Google's internal topology, the above ideas canquite reasonably be categorized as speculative. However, the point is that there do exist a range of technological approaches to dealing with this data that could be harnessed to strike a reasonable balance between data usefulness and privacy-related concerns -- permitting R&D and innovation to proceed while minimizing the inherent abuse potential in sensitive data of this sort. --Lauren-- Lauren Weinstein lauren () vortex com or lauren () pfir org Tel: +1 (818) 225-2800 http://www.pfir.org/lauren Co-Founder, PFIR - People For Internet Responsibility - http://www.pfir.org Co-Founder, IOIC - International Open Internet Coalition - http://www.ioic.net Moderator, PRIVACY Forum - http://www.vortex.com Member, ACM Committee on Computers and Public Policy Lauren's Blog: http://lauren.vortex.com DayThink: http://daythink.vortex.com - - - - - - -Begin forwarded message: From: Adam Fields <ip20398470293845 () aquick org> Date: January 30, 2006 10:05:48 PM EST To: dave () farber net Subject: More detailed queries of what Google stores I asked two very specific questions in a conversation with John Battelle, and he's received unequivocal answers from Google: 1) "Given a list of search terms, can Google produce a list of peoplewho searched for that term, identified by IP address and/or Googlecookie value?" 2) "Given an IP address or Google cookie value, can Google produce alist of the terms searched by the user of that IP address or cookievalue?" The answer to both of them is "yes". http://battellemedia.com/archives/002283.php -- - Adam------------------------------------- You are subscribed as malin () cs cmu edu To manage your subscription, go to http://v2.listbox.com/member/?listname=ipArchives at: http://www.interesting-people.org/archives/interesting- people/
------------------------------------- You are subscribed as lists-ip () insecure org To manage your subscription, go to http://v2.listbox.com/member/?listname=ip Archives at: http://www.interesting-people.org/archives/interesting-people/
Current thread:
- more on Google and Data Retention - Policies and Possibilities David Farber (Jan 31)