Thursday, September 16, 2010

Privacy: a number's game?



How do you measure privacy protections? There are many important questions that I ask, including these:

What data is collected?
Who has access to this data?
How is this data used?
Is this data transferred to third-parties?
Can the data subject see and control this data?
Is this data protected by adequate security safeguards?
How long is this data retained before it is either destroyed or anonymized?

In reviewing this list, I think the last one is the least important in terms of measuring meaningful privacy protections for data. But curiously, it's precisely this one that I hear the most as I move around Continental Europe listening to privacy media and regulatory concerns in the online debates in recent years. Why is that?

European privacy law has clear provisions that personal data should not be retained "longer than necessary". Naturally, this time period is left vague in the laws, since it would be impossible to prescribe precise time periods for myriads of different contexts, especially since retention is always justified by "legitimate purposes". I think there's a temptation to try to boil privacy down into something simple and numerical, and what could be simpler and more measurable than a time period? In practice, there's a vast spectrum of legitimate retention periods, even for similar services, if the retention periods were designed to respect the very different legitimate purposes for which they were retaining data. To take some Google services as examples: Search logs (9 months), Instant Search logs (2 weeks), Suggest logs (24 hours), etc. To me, it's absurd to think that the most important privacy issue in Search is whether Search logs are retained for 6 or 9 months.

To take a different example: data retention rules in Europe (for government and law enforcement access) range from 6 months to 24 months, with each country in Europe picking and debating different time periods. Germany for example picked 6 months (but the German Constitutional Court struck down its version of data retention on other grounds), while France picked 12.

Curiously, the time dimension of data retention is almost entirely a Continental European privacy concern. It rarely registers as a meaningful vector in other countries, even in countries with very intense privacy debates. Of course, the euro-time-period debate is also intimately tied up with the debate about the so-called "right to be forgotten", the "droit a l'oubli", a well-intentioned idea that people should somehow be able to have parts of their own past (presumably the disagreeable parts) edited out of their personal histories. And, not coincidentally, this debate is most intense in countries with historical chapters that many people consciously or unconsciously want to forget: like Spanish society's conflict between remembering or forgetting the crimes of the Franco era.

I've spent a fair amount of time engaging in the time period debate "how many months is ok." It's pretty repetitive after a while. Lots of people who can't be bothered to think about the issues will just say: "oh, that's too long". I strongly believe that personal data should not be retained for "longer than necessary", as required by European privacy law, and I generally believe that it's an important debate for data controllers to justify their retention according to "legitimate purposes". Beyond that, reducing the online privacy debate to a numbers' game risks focusing all the attention on only one aspect of the broader privacy debate (and in my opinion, on the least important aspect of the debate to boot). And I am very much not in the superficial privacy school of thinking that "shorter is always better".

To clear my head, I spent some time playing tennis this summer. Now that's a number's game. By the way, I lost.


Tuesday, September 7, 2010

Face recognition software

How should we handle face recognition software?

Every so often a new technology comes along that has the ability to alter fundamentally the private/public balance, with profound implications for privacy. Face recognition is one of them, in my opinion.

We're already seeing highly accurate face recognition software provided by companies like face.com in the Facebook community. Some online photo albums also offer it, as a tool for users to tag one of their photos and allow the software to come back with face matches and propose auto-tagging them too.

But what will we do about face recognition software in the wild? Any Internet-connected smart phone with a camera could in theory do a real-time face recognition search on a person walking down the street, without their knowledge, and get web-based search results. Google declined to include face recognition in the version of Goggles that it launched a few months ago, precisely because of the unresolved privacy implications.

Over the last few months, I've spoken about face recognition with a number of privacy experts. Everyone quickly understands how it could be a useful tool, and how it could be a freaky tool, depending on how it's used. But essentially no one has a clue what to do about it. One could imagine a "solution" where users would upload their photos to a company offering this service, with either an opt-in or an opt-out, in other words, telling the company, "yes" you can can run searches against my photo, or "no", please do not run searches against my photo. In either case, the company has to maintain a central database of these people and their faces. Moreover, the database is essentially a biometric database, since the software runs against algorithmic "face prints". Neither of these "solutions", opt-in or opt-out, seems very palatable. In addition, it's hard to imagine how different countries might regulate such global services according to different standards, if, as one might realistically expect, one country wants to regulate an opt-in model, while another wants to take an opt-out model, while yet a third wants to prohibit such services entirely. How would that work?

Well, as we reflect, the technology is developing rapidly, and is already on the marketplace, offered by many different companies. Once again, the technology will evolve faster than our legal, political and sociological response to it. Hang on, this one will be interesting. If you have an idea about how to handle it, I'd welcome your comments, which you're free to submit, anonymously, of course.
--

Monday, September 6, 2010

Exhibitionism, or Self-Expression?



In privacy circles, we all try to make sure that people are sensitive about what they post online. I remember a chat I had with a journalist at SFGate.com back in 2007 :

"Before posting anything online, Peter Fleischer asks himself: Is this something I want to make public forever? ...

he thinks a lot about the implications of sharing information with the world. As a result, in his private life, he takes a cautious approach...

But he's uncomfortable sharing photos online..."


I generally advise people not to post things publicly without thinking about whether they're likely to regret having posted it. I also advise people not to post anything about other people (like pictures or videos), unless those people agreed to have it posted. But that doesn't mean that I think people should stop posting stuff about themselves and their friends online. In fact, I'm wildly enthusiastic about these social platforms that empower people to publish things about themselves and their friends to the world. The interesting risk-debate is about stuff in a gray zone, where one person's self-expression is another person's exhibitionism. This sort of gets summed up as a question that helps kids understand the consequences of posting things online: "even if you think this photo/video etc is cool, what will a future employer think about it when you start looking for a job?"

Digital natives are creating a part of their identity online. What they publish, or don't publish, is a self-created, highly edited version of their "identity" that they'd like to project. Digital natives are used to seeing lots of stuff about themselves and their friends online. The older generation isn't. So, rather than a technology clash, this strikes me more as a classic generational clash. The older generation warns the younger generation about putting too much of themselves out there, because, well, they never did, didn't have the opportunity, and no one in their generation did either. Perhaps that's why some people are calling the younger crowd Generation Xhibitionists.

Curiously, every time I've done an image search on my own name (and hey, regular "vanity" searches on your own name are an essential part of privacy hygiene, to know what's out there about yourself), I see a highly-ranked image search result of a guy in a bathing suit...who isn't me. Since I'm a believer in the principle that the best answer to bad speech (or bad content) is to confront it with better content, I figure I might as well post a picture of myself in a bathing suit too. The other guy is younger and better-looking, but hey, at least this is me. And to all those people who say I'm never willing to share anything personal online, well, call me Gen X.

Sunday, September 5, 2010

10 paths and they're all hard



We spent a couple days on mountain bikes in Switzerland recently. We got lost a lot. We didn't use GPS or geo-location-apps. We didn't really know where we were going, but we sort of had faith in our legs and our bicycles that we'd somehow get up and back down.

It was good to get out on a mountain. It clears my head. I was trying to think of the big privacy challenges this year.

And like choosing a mountain path that you don't know, these privacy challenges may turn out to be easy, or they may turn out to be the hardest ride of your life.

Here's my list of this year's cliff-hangers. And like any good cliff-hanger, I'll be back to comment on all of them in the months ahead.

1. Location: who should know where you are and where you've been and how can you control it?

2. Face recognition: how to enable useful apps without creating a mass surveillance device?

3. Data minimization: can we (or should we) restrict some data collection in the age of data ubiquity?

4. Notice and consent in machine to machine processing: e.g., how can a user meaningful exercise control and consent when apps instantly share data?

5. Communicating with end users: everyone agrees privacy policies aren't human-friendly, but does anyone have a better idea?

6. Social graph: what can algorithms know or deduce from your public social graph and what can you do about it?

7. Online mapping: what's private in a public place?

8. Droit a l'Oubli: can a line be drawn between "forgetfulness" and censorship?

9. Conflicts of laws: how can sites on the global web comply with conflicting rules from country to country, and is the global web balkanizing?

10. Anonymization: in the age of data mining, what is "anonymous", or is everything somewhere on a spectrum to identifiability, and what does that mean for privacy practices?