Back in February, in a San Jose courtroom, a bombshell was dropped that could have been erased from the public record.
It turns out that Google, which bases its business on collecting and analyzing huge reams of data for advertising purposes, has been scanning users’ emails even before users have a chance to open or read them, including email messages that are deleted without being opened. Google knows what’s in your email before you do.
The revelation came in a now-settled legal dispute over Google’s Gmail service. Dozens of the nation’s largest newspapers and media companies fought to make sure that the case — and its wide-ranging implications for Internet users — received a full public airing. It has been an unfolding drama ever since, affecting what analysts estimate are 500 million Gmail users worldwide.
Google tried, and failed, to redact information about its email scanning process from a transcript of a public court hearing. Last month, the judge in the case ruled that portions of the transcript from that February hearing could not be redacted retroactively, since that would be tantamount to closing a public courtroom.
The company’s attempt — akin to putting toothpaste back into a tube — was a reversal of its previous position in the lawsuit, a pledge that there would be a “fully public airing of the issues raised by plaintiffs’ motion for class certification.” The NSA recently used the very same tactic when it tried to secretly delete portions of a public court transcript in a lawsuit filed against its surveillance practices.
As a result, we now know much more about how Google collects personal information from users of Gmail and Google Apps and, in this case, how it plugged a critical gap in its data mining operation to sweep up even more of your information.
Here’s what happened:
Back in 2010, Google was facing a vexing problem. It was losing out on a treasure trove of personal information from millions of Gmail users who were slipping through its chief analytical tool, known as “Content OneBox.” Anytime they accessed their email through Outlook or on their iPhone, Google’s data machine wasn’t there to capture it all. So it needed a way to sidestep the problem.
Within a matter of months, the company shrewdly moved the Content OneBox from Gmail’s storage area to the “delivery pipeline” — meaning that it could now scan messages before they were received. As the plaintiffs explained:
“Google made a choice. They said, you know what, when people are accessing emails by an iPhone, we are not able to get their information. When people aren’t opening their emails or they are deleting them, we are not able to get their information. When people are using Google Apps accounts where ads are disabled, we are not able to get that information. When people are accessing Gmail through some other email provider, we are not able to get that information. So what they did is they took a device that was in existence already and operating just fine back in the storage area, and they moved it to the delivery pipeline.”
This move has sweeping consequences, as Chris Hoofnagle, director of privacy programs at Berkeley’s Center for Law & Technology, has described:
“Hiding ads while analyzing data takes advantage of a key deficit users have around internet services: users only perceive profiling if they receive ads. The content one box infrastructure would allow Google to understand the meaning of all of our communications: the identities of the people with whom we collaborate, the compounds of drugs we are testing, the next big thing we are inventing, etc. Imagine the creative product of all of Berkeley combined, scanned by a single company’s ‘free’ email system.”
Google’s stated goal is to “organize the world’s information,” but they fought to avoid disclosing how and why they’ve done it. Now we know.