Thursday, September 16, 2010

Privacy: a number's game?

How do you measure privacy protections? There are many important questions that I ask, including these:

What data is collected?
Who has access to this data?
How is this data used?
Is this data transferred to third-parties?
Can the data subject see and control this data?
Is this data protected by adequate security safeguards?
How long is this data retained before it is either destroyed or anonymized?

In reviewing this list, I think the last one is the least important in terms of measuring meaningful privacy protections for data. But curiously, it's precisely this one that I hear the most as I move around Continental Europe listening to privacy media and regulatory concerns in the online debates in recent years. Why is that?

European privacy law has clear provisions that personal data should not be retained "longer than necessary". Naturally, this time period is left vague in the laws, since it would be impossible to prescribe precise time periods for myriads of different contexts, especially since retention is always justified by "legitimate purposes". I think there's a temptation to try to boil privacy down into something simple and numerical, and what could be simpler and more measurable than a time period? In practice, there's a vast spectrum of legitimate retention periods, even for similar services, if the retention periods were designed to respect the very different legitimate purposes for which they were retaining data. To take some Google services as examples: Search logs (9 months), Instant Search logs (2 weeks), Suggest logs (24 hours), etc. To me, it's absurd to think that the most important privacy issue in Search is whether Search logs are retained for 6 or 9 months.

To take a different example: data retention rules in Europe (for government and law enforcement access) range from 6 months to 24 months, with each country in Europe picking and debating different time periods. Germany for example picked 6 months (but the German Constitutional Court struck down its version of data retention on other grounds), while France picked 12.

Curiously, the time dimension of data retention is almost entirely a Continental European privacy concern. It rarely registers as a meaningful vector in other countries, even in countries with very intense privacy debates. Of course, the euro-time-period debate is also intimately tied up with the debate about the so-called "right to be forgotten", the "droit a l'oubli", a well-intentioned idea that people should somehow be able to have parts of their own past (presumably the disagreeable parts) edited out of their personal histories. And, not coincidentally, this debate is most intense in countries with historical chapters that many people consciously or unconsciously want to forget: like Spanish society's conflict between remembering or forgetting the crimes of the Franco era.

I've spent a fair amount of time engaging in the time period debate "how many months is ok." It's pretty repetitive after a while. Lots of people who can't be bothered to think about the issues will just say: "oh, that's too long". I strongly believe that personal data should not be retained for "longer than necessary", as required by European privacy law, and I generally believe that it's an important debate for data controllers to justify their retention according to "legitimate purposes". Beyond that, reducing the online privacy debate to a numbers' game risks focusing all the attention on only one aspect of the broader privacy debate (and in my opinion, on the least important aspect of the debate to boot). And I am very much not in the superficial privacy school of thinking that "shorter is always better".

To clear my head, I spent some time playing tennis this summer. Now that's a number's game. By the way, I lost.


Manuel Pardi said...

Dear Mr. Fleischer

I´d like to express my disagree with your opinion, about time period of data retention must be the least aspect of the privacy debate, or that these concerns belongs almost entirely to Continental Europe.

In Video Privacy Protection Act, you can find an interesting antecedent about how U.S. Federal Law provides an specific period of time for data retention (no longer than one year from the date is no longer necessary for the purpose for wich it was collected).

Also in my opinion, the right to oblivion was, is and will be a big issue for privacy because it is closed related to the human need to be forgiven (judeo-christian roots). It is therefore necessary to edit the disagreeable parts of our past, to achieve a fresh start.


Manuel Pardi

Larry M. said...

This is why everyone should use TrulyMail, PGP, or GPG to encrypt their emails if they are going to keep messages on Google's (or anyone's) servers.

To leave unencrypted emails on someone's server is just asking for this kind of violation. The temptation is there. Remove the temptation...remove your data yourself.

Vincent T said...

Thank you for raising all these interesting questions.

I do agree that the log retention period might not be the most important criteria to measure privacy protections. It’s also true that most search engines communicate about the log retention period and does not answer to the other questions.

Google is also communicating a lot about this criteria (there are three posts about that topic on Google official blog), but it’s still quite hard to know exactly what is collected when a user does a search on Google. Because most of the data that Google collects do not contain personal information, Google’s privacy policy does not reply to most of the questions you raised (who has access to search logs, how are they used, how aggregated are the pieces of information that are shared with third parties, does the internal access policy apply to these data…). I’ve spent some time reading Google policies and failed to find a clear answer to these questions.

In my opinion, another important is “how are these logs anonymized?”. In your post you mention that the retention period for Google search is 9 months, but the cookies are retained in search logs for (at least) 18 months. Finally, is it possible to have additional information about “Instant Search” logs? I did not know they were retained for 2 weeks.


Christopher Parsons said...

Reading through various academic literatures and research, the concerns about the time that data is retained often seems to focus on the conflict over what constitutes 'legitimate reasons' for retention. Perhaps governments require information to be retained for a certain period of time - though given Art. 29's recent report on the data retention directive, it's questionable how effective European governments are in maintaining checks and balances - and its certainly true that various new services provided by corporations are dependent on the capture, analysis, and use of various data types.

The 'legitimacy' of these collections, however, is often tied to the conditions authorizing data retention in the first place. In the case of the EU, a concern is that an incredibly vast amount of traffic data is retained, with citizens lacking a democratic 'connection' with those laws (the oft-cited democratic deficit). They don't see themselves as authors and addressees. The same might be said of Americans, who are reportedly subject to federal surveillance through carrier hotels across the US.

The legitimacy of corporate protections is often challenged on the basis of information asymmetry; information is collected without the individuals having a full understanding of what is collected, or why, and as a result duration of data retention is the only 'number' that is clearly understood by non-technical users (and, as is often the case, policy makers). In this sense, 'retention periods' might be a kind of heuristic to evaluate privacy risks. It is, as you point out, not the best of heuristics. However, given that individuals are often unable to decode privacy policies and statements, do not opt-in to (many) site analytics services, to say nothing of behavioural advertising schemes, it's not surprising that a temporal heuristic - as a way to limit long-term harm - is commonly witnessed.

Stephan Alex said...

Dear Peter,

I think it would be a great help if Google would make the retention policy more transparent. So far, the retention time for Google search logs is known, but a clear statement on other services, especially user IP addresses in Google Analytics, is missing.
If the 9 months rule applies for all server logs, including Google Analytics, this should be cleary stated.

Thanks a lot.

Stephan Alex

Francesco P said...

I'd like to express my sympathy to Fleischer, becuase of the brand new inquiry in Italy (criminal court in Rome), regarding the "Google cars affair".
It must be very stressful for Google and its employees to keep on facing Italian courts (that's why foreign investors stay away from Italy).

Anonymous said...

Dear Peter -

I notice that you and your tennis opponent chose to remain pseudonymous on the score-board...