Citadel of the Blogs The Inbox of the Internet (really)

Live Search Engine Terms (and Privacy)  0

Posted on July 9th, 2007. About privacy, web 2.0.

In case you don’t like reading long posts like the previous one , here’s one important link from it I didn’t want you to miss:

To get an idea of the relationship between privacy and search terms, Dogpile actually lets you review live search terms of the type that the Justice Department wanted to see (and gives you a “clean” and “um, maybe not so clean” version).

Be sure to check out both!

Google & Privacy  0

Posted on July 9th, 2007. About privacy.

Here are some observations on Google’s privacy policies and the larger privacy trends they are situated within.

First, I can say that Google is a member of Safe Harbor. This means it adheres to the US Safe Harbor privacy principles. These are seven principles that are substantially similar to the ten privacy protection principles that form the basis of PIPEDA in Canada. And as you may know, FIPPA itself is substantially similar to PIPEDA. In short, the privacy principles of Google are very much akin to FIPPA institutions in Ontario.

The most important of these principles is the “consistent purpose rule”. This means that Google, like any institution subject to FIPPA, must ensure that individuals could reasonably have expected any use or disclosure of their information at the time the information was collected. In general, Google’s other privacy polices and notices detail more fully how Google adheres to these seven principles. Think of this rule as the privacy litmus test.

Bottomline: While Google has recently come under fire for its allegedly weak privacy practices, it undoubtedly wants to protect its good name. Naturally, to behave badly with personal data would harm it enormously. As such, I do not see such misuse occurring in the near future. And I might add, neither do financial investors evidently.

I mention this as context for the Federal Trade Commission’s (FTC) recent investigation of Google’s purchase of DoubleClick. In this case, consumer privacy advocates are worried about how Google will combine its data with DoubleClick data to map what someone searches for, as well as other online activities. The FTC launched a very similar investigation of Doubleclick in 2000 at which time Doubleclick backed off from its move to use individuals’ buying behaviour and the FTC dropped the investigation. It should be mentioned that the Center for Democracy and Technology (CDT) was responsible for filing the privacy actions with the FTC in both cases.

Yet Google has at times been seen as a proponent of privacy protection. In 2006, Google fought the US Department of Justice on privacy grounds. The DOJ had ordered the release of a week’s worth of search terms and one million Web pages from its index to aid in the Bush administration’s defense of an Internet pornography law. In the end, a federal judge sided mostly with Google and ordered the company to provide half of the amount of the URL’s sought but said users’ search queries were off-limits. More recently, Google threatened to shut down their German Gmail service out of privacy concerns because, surprisingly, German law requires identity verification.

By comparison, Microsoft, Yahoo and AOL received subpoenas identical to the original DOJ request. Those companies chose to comply rather than fight the request in court. They have all emphasized that they turned over search terms and logs but not information that could be linked to individuals.

I thought it was important provide this background. Much of the media attention tends to blur the lines between what Google is doing right with what it could be doing better. In my opinion, Google is now caught in the midst of market forces that are compelling it to more clearly shape and express its stance on privacy. So I believe the net result of the media attention will be positive. But for the moment some of Google’s privacy polices are vague on some important points.

RETENTION

In regards to Gmail’s Privacy Policy, if you delete an e-mail message from your Gmail account, it “…may take up to 60 days to be deleted from our active servers and may remain in our offline backup systems.” This does not mean that the message is necessarily gone. Their offline backup servers may contain copies of your messages in perpetuity.
Bottomline: More end-user control is preferable.

In regards to search strings, Google has, apparently, been maintaining search data logs indefinitely. However, in March 2007, Google announced it would anonymize the final eight bits of the IP address and the cookie data after somewhere between 18 months and 24 months, unless legally required to retain the data for longer. The information on specific searches will remain indefinitely, but it will be much harder to tie the searches to specific individuals or computers.

One puzzle is that Google claims it chose this retention length because data retention laws in Europe require communications service providers to hold on to the information for that long. But in June 2007, Europe’s Data Protection Committee declared that Google’s search engine logs are not covered by the Data Retention Directive. So why does Google claim they are? In a final ironic twist, this month (July 2007) an EU panel of national data protection officers
declared it will investigate whether Google is storing its search information too long (as well as other search engines)!

By comparision, PIPEDA (ss. 8 and s. 37) regulates that organizations can only retain personal information for as long as it is operationally needed. It does not specify beyond this but it illustrates why privacy groups are alarmed by Google’s arbitrary choice of 18 to 24 months: is this really “operationally necessary”?

By comparison, Yahoo and Microsoft have declined to disclose their exact data retention policies with respect to Web searches. AOL saves personally-identifiable search data for up to 30 days in a way that’s visible to the user and uses an encryption hashing technique to obscure it thereafter.

According to AOL spokesman Andrew Weinstein, “We do not keep any IP addresses in our search database, and we de-identify any associated account information through an encryption algorithm,” he said. “We have also made a business decision not to keep any unique identifiers (i.e. the hashed user ID) for longer than 13 months. …”That said, it still might contain information of a personal nature, as the data released last year clearly did.”

But today I read that both Yahoo and Microsoft are soon going to announce changes in their privacy polies regarding the retention of search users’ information. “Changes” may be stretching it a bit, since neither have publicly come out to say what their retention policies are.

USE AND DISCLOSURE

The Google Privacy Policy states that Google may combine information users submit “with information from other Google services or third parties”. They say this is to provide for a better experience and “…may give you the opportunity to opt out of combining such information”.
Bottomline: This “may” is vague and the user is left to infer the “consistent purpose rule” applies. More clarity is preferable.

BROADER TRENDS
Between Canada and the US, I can only suggest that third-party access to email maintained in the US will, evidently, require an American warrant and such access is permissible under PIPEDA .

But to add a fine-layer to this, an American federal appeals court upheld the right of government agents to gather information without a search warrant on the e-mail and Internet addresses used by a criminal suspect.

Whereas email content evidently requires a warrant–and this is permissible under PIPEDA– this “surface information” is not similarly protected.

Think of this distinction between the non-content / content information as that between an envelope and the contents of a letter. The envelope contains addressing information that is exposed to others; the contents of the letter are concealed. Envelope information falls outside Fourth Amendment protection, but content information is fully protected by the Fourth Amendment. The envelope/content distinction works fairly well with email — the headers (which contain the to/from line) are the digital equivalent of envelopes; the text of the email itself is the content.

When applied to IP addresses and URLs, the envelope/content distinction becomes fuzzier. An IP address is a unique number that is assigned to each computer connected to the Internet. Each website, therefore, has an IP address. On the surface, a list of IP addresses is simply a list of numbers; but it is actually much more. With a complete listing of IP addresses, the government can learn quite a lot about a person because it can trace how that person surfs the Internet. The government can learn the names of stores at which a person shops, the political organizations a person finds interesting, a person’s sexual fetishes and fantasies, her health concerns, and so on.

ARE SEARCH TERMS PRIVATE?

If search terms are unlinked from your identity, and just part of a list of anonymous searches scrolling across a screen, the privacy concerns are minimized. However, if you search for your own name and something additional (e.g. social insurance number) then it is clear that such information becomes readily identifiable.

Google even displays a list of live search terms on a screen that visitors can view in its Silicon Valley headquarters. To get an idea, Dogpile actually lets you review live search terms of the type that the Justice Department wanted to see (and gives you a “clean” and “um, maybe not so clean” version).

Choose from Full RSS or comments RSS feeds. Administrator login and new user registration.