Google Gmail and Data Mining?

Joined
11 March 2000
Messages
1,251
Still in beta the much-in-demand e-mail service from Google is receiving a lot of attention over accusations of data-mining and vague privacy policies.

Google's Gmail: spook heaven?
By Mark Rasch, SecurityFocus
Published Tuesday 15th June 2004 09:46 GMT

Google's plans to run targeted advertising with the mail that you see through its new Gmail service represents a potential break for government agencies that want to use autobots to monitor the contents of electronic communications travelling across networks. Even though the configuration of the Gmail service minimises the intrusion into privacy, it represents a disturbing conceptual paradigm - the idea that computer analysis of communications is not a search. This is a dangerous legal precedent which both law enforcement and intelligence agencies will undoubtedly seize upon and extend, to the detriment of our privacy.

The Gmail advertising concept is simple. When you log into the Gmail to retrieve and view your email, the service automatically scans the contents of the email and displays a relevant ad on the screen for you to see. Although it has been said that neither Google nor the advertiser "knows" the text or essence of the email message, this is not strictly true: if you click on the link to the ad, it can be reasonably inferred that the text of the email in some way related to the advertiser's service.

Moreover, like any email provider, the text of your Gmail is stored and subject to subpoena. I can envision a situation where an advertiser, paying Google hundreds of thousands of dollars, claims that Google failed to "insert" its ads in relevant emails, or inserted a competitor's ads instead (or in addition to, or more prominently). In the course of the ensuing litigation, wouldn't both the ads themselves and the text of the messages into which they were inserted be relevant, and therefore discoverable? I can't imagine why not.

If a computer programmed by people learns the contents of a communication, and takes action based on what it learns, it invades privacy.

But perhaps the most ominous thing about the proposed Gmail service is the often-heard argument that it poses no privacy risk because only computers are scanning the email. I would argue that it makes no difference to our privacy whether the contents of communications are read by people or by computers programmed by people.

My ISP offers spam filtering, spyware blocking and other filtering of email (with my consent) based at least partially on the content of these messages. Similarly, I can consent to automated searches of my mail to translate it into another language or do text-to-speech, or to strip HTML or executables. All these technologies examine the contents of mail sent to me. This certainly seems to suggest that an automated search of the contents of email, with the recipient's consent, is readily tolerated. But is it legal?

The answer is not so simple. California Penal Code, Section 631 makes it a crime to "by means of any machine, instrument, or contrivance, or in any other manner, ... willfully and without the consent of all parties to the communication, ... learn the contents or meaning of any message, report, or communication while the same is in transit or passing over any wire, line, or cable, or is being sent from, or received at any place within this state; or [to] use, or attempt to use, in any manner, or for any purpose, or to communicate in any way, any information so obtained."

So, if I send a mail to a Gmail user (let's assume I don't know anything about Gmail, and therefore can't be said to have "consented" to the examination) and Google's computers learn the meaning of the message without my consent, this action theoretically violates the California wiretap law. Google is based in California, but it's worth noting that other states, like Maryland, Illinois, Florida, New Hampshire and Washington State, also have so-called "all party consent" provisions that may also preclude this conduct.

To avoid these draconian provisions, Google will likely argue that its computers are not "people" and therefore the company does not "learn the meaning" of the communication. That's where we need to be careful. We should nip this nonsensical argument in the bud before it's taken too far, and the federal government follows.
Don't Be Echelon

The government has already ventured a few steps down that road. In August 1995 the Naval Command and Control Ocean Surveillance Center detected computer attacks coming through Harvard University. Because Harvard's privacy policy did not give them the right to monitor the traffic, federal prosecutors obtained a court ordered wiretap for all traffic going through Harvard's computer systems to look for packets that met certain criteria. Literally millions of electronic communications from innocent users of Harvard's system were analysed by a en read pursuant to the court order. In a press release, the U.S. Attorney for Massachusetts explained, "We intercepted only those communications which fit the pattern. Even when communications contained the identifying pattern of the intruder, we limited our initial examination ... to further protect the privacy of innocent communications."

Thus, the government believed that the "interception" did not occur when the computer analysed the packets, read their contents, and flagged them for human viewing. Rather, the government believed that only human reading impacted a legitimate privacy interest. The U.S. Attorney went on to state, "This is a case of cyber-sleuthing, a glimpse of what computer crime fighting will look like in the coming years. We have made enormous strides in developing the investigative tools to track down individuals who misuse these vital computer networks." Then-Attorney General Reno added that the process of having computers analyse the intercepted messages was an appropriate balance because, "We are using a traditional court order and new technology to defeat a criminal, while protecting individual rights and Constitutional principles that are important to all Americans."

But imagine if the government were to put an Echelon-style content filter on routers and ISPs, where it examines billions of communications and "flags" only a small fraction (based upon, say, indicia of terrorist activity). Even if the filters are perfect and point the finger only completely guilty people, this activity still invades the privacy rights of the billions of innocent individuals whose communications pass the filter.

Simply put, if a computer programmed by people learns the contents of a communication, and takes action based on what it learns, it invades privacy.

Google may also argue that its computers do not learn the contents of the message while in transmission but only contemporaneously with the recipient, making wiretap law inapplicable. That argument, while technically accurate, is somewhat fallacious. If taken to its logical extreme, electronic communications are never intercepted in transmission. The packets must be stopped to be read.

Fundamentally, we should treat automated searches of contents as what they are: tools used by humans to find out more about what humans are doing, and provide that information to other humans. At least until the computers take over.

Copyright © 2004, 0

SecurityFocus columnist Mark D. Rasch, J.D., is a former head of the Justice Department's computer crime unit, and now serves as Senior Vice President and Chief Security Counsel at Solutionary Inc.

for more on this subject check out G-Mail is too Creepy
 
Back
Top