Thursday, April 22, 2010

Transparency: now for government requests too

This is my personal blog, and I try hard to keep my Google work-life out of it. I try to resist the temptation to turn this into a running daily diary of privacy at Google, since that would be a different blog. But sometimes, Google launches something that is so important in privacy terms that I can't resist some personal comments.

The most recent launch answers a basic question: how many requests does Google get from governments for user data? Take a look at the map and the country-by-country data.

This is an important step on the road to transparency. Users should be able to see their own data. And they should be able to get maximum information too about who else can see their data, including, perhaps more important than anyone else, governments. I haven't seen any other company provide this level of transparency. Hopefully some others will be inspired to do this too.

Wednesday, April 21, 2010

The data deluge

One of the most provocative things a privacy geek can say is "data minimization is dying". Data minimization has been one of the foundations of traditional privacy-think. The idea is basic and appealing: privacy is better protected when less data is collected, when less data is shared, when data is kept for shorter periods of time. This explains the endless debates in privacy circles about how many months computer or phone logs or passenger-name records should be retained, as though a numbers game about retention was the key issue in privacy. It isn't, but a debate over numbers is simple and appealing, and can be relayed by the press in a simple manner.

But whether you like it or not, we're entering an age of data ubiquity. Clearly, technology trends are making this possible, computing power, storage capacity, Internet transmissions have all allowed this to happen. And like all trends in technology, it will have good and bad applications: the same ease of transmission of data that enables billions of people to access information from around the globe makes it easy to transmit malicious viruses as well.

Statistics about the scale of the data deluge are indeed sobering, even if they reflect scales that human brains can't really understand. There are over a trillion web pages now, growing by billions per day. I read that there are now over 40 billion photos on Facebook alone. YouTube users upload over 24 hours of video every minute. The Economist reported that the total amount of data in the world is growing by 60% per year. No matter where you turn on the web, the scale of data growth is stunning. Even if you find concrete steps to advance data minimization, you're just taking a few drops out of the ocean of the data deluge.

There's no doubt that the Information Age is doing a lot of great stuff with this data deluge. It's also true that this data deluge is posing unprecedented challenges to privacy. I've struggled with this conundrum for many years. I don't think there's a better solution than trying to create maximum transparency and putting control over data back into people's hands, as best as possible. Trying to stop the data deluge is either Sisyphean or chimerical. But trying to decide on behalf of people also undermines the fundamental dignity and choice that each individual should be able to exercize over his/her own data. Of course, not all people can or will exercize responsible control over their own data. But putting transparency and control into users' hands is much like democracy. It fundamentally empowers the individual to make choices and trade-offs about data: making choices between data benefits and privacy. It's not perfect, of course, but it's still better than putting someone else (like governments or companies) in charge of those decisions. I think companies, governments and privacy professionals should define success foremost by whether we contribute to putting people in charge of their own data. As Churchill said: It has been said that democracy is the worst form of government except all the others that have been tried.

Thursday, April 15, 2010

To tweet or to delete?

How would you resolve the conflict between the cultural imperative to archive human knowledge and the privacy imperative to delete some of it? To put this in perspective, compare the approaches of the US Library of Congress and the French Senate.

As reported by The New York Times, the "the Library of Congress, the 210-year-old guardian of knowledge and cultural history, ...will archive the collected works of Twitter, the blogging service, whose users currently send a daily flood of 55 million messages, all that contain 140 or fewer characters."

Meanwhile, the French Senate is moving in the opposite direction, as it explores a law to legislate "the right to be forgotten". The French Senate has been considering a proposed law which would amend the current data protection legislation to include, among other things, a broader right for individuals to insist on deletion of their personal information. The proposed law in France would require organisations to delete personal information after a specified length of time or when requested by the individual concerned.

To take another example, this time from Germany. A court there was recently asked to consider a legal action by two convicted murderers (now released from prison) seeking to force Wikipedia to remove their names from an article documenting their criminal past. While the case is ongoing (as far as I know), the German language version of Wikipedia has agreed to remove the names from the article in question. The two men are now seeking to force the Wikipedia Foundation to delete their names from the English language version as well.

Well, I think we'll be blogging and tweeting about this dilemma for some time, knowing that our tweets will be archived. I testified to French Senators recently that I could never support a privacy "right to be forgotten" that amounted to censorship. I wonder if they tweet in the French Senate, and if they know their tweets are being archived in the US Library of Congress?

Which photos reveal "sensitive" personal data?

There are hundreds of billions of photos and videos online now. As a matter of common sense and common courtesy, users should not upload pictures or videos of other people to hosting platforms without their consent. Moreover, users should take particular care when uploading photos which might reveal "sensitive" personal data?

Privacy laws provide lots of extra legal protections to "sensitive" personal data. Trying to define what is "sensitive" is no easy task. The EU Data Protection Directive uses this definition: "personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and the processing of data concerning health or sex life."

But what is "sensitive personal data" in the context of photos or videos? In one extreme logical sense, any photo of a person reveals "racial or ethnic origin". A picture of my face reveals that I am a middle-aged Caucasian male of European descent, revealing my racial or ethnic origin, as well as the fact that I usually wear glasses, indicative of the health issue of myopia. Does that mean that every photo or video of a person should be treated according to the legal standards of "sensitive personal data"? Most people would assume that is neither possible nor desirable, since it could require the explicit consent of data subjects (in writing, in some countries, and subject to prior approval by the DPA, in other countries) before their photos could be uploaded to the web. Clearly, this is not the way that the web works today, and indeed it would be completely unworkable.

I've discussed this issue with many people, in particular in the context of photos taken on public streets. Some privacy regulators have shared their (rather extreme) opinion with me that a photo or video of someone sitting in a wheel chair, or even someone walking in the vicinity of a hospital, should be treated as "sensitive", since it might reveal "health" status. Similarly, a photograph of a person appearing on a street near a mosque should be treated as "sensitive" since it might reveal "religious beliefs". But it's hard for me to imagine a crude solution like drawing a no-photograph zone around mosques and hospitals. It also seems wrong to me to apply the legal standard of "sensitive" personal data to situations which merely increase the likelihood of associations. So, many people take a more nuanced approach. A photo or video often lacks the context to make it meaningful: a photograph of myself in front of a cathedral doesn't automatically mean that I'm Catholic, and isn't necessarily revealing "sensitive" personal data. A photograph of people praying there maybe does. But does the fact that such photos are taken in a public place, and are widely considered banal, change the analysis of whether they should fit into the more restrictive categories of "sensitive" personal data?

All in all, it's very hard to know where to draw the lines. Hopefully, people who take photos and videos will be respectful of the very serious issues that the legal concept of "sensitive" personal data" is meant to protect. But the lines separating "sensitive" from "normal" personal data will usually be fuzzy and contextual. Think of the simple example of a photo of two people holding hands. Is this indicative of their sexual orientation, and hence, "sensitive" personal data, or really, just two people holding hands? I suppose it depends on the context. This is not something that photo or video hosting platforms or software filters are able to know. Ultimately, this is all about protecting people's human dignity, and that fundamentally, is a human judgment.