Peter Fleischer: Privacy...?: 2007

Wednesday, December 5, 2007

Transparency, Google and Privacy

A group called One World Trust sent a survey to Google. A lot of people ask us to fill out surveys. I’m not sure who at Google they sent it to. In fairness, until yesterday, I had never heard of One World Trust, and it’s possible that whoever received it hadn’t either. Since we didn’t respond to their request for a survey, though, Google was ranked bottom in terms of transparency, in particular with regards to privacy. And Robert Lloyd, the report’s lead author, went so far as to say Google “did not co-operate (with the report) and on some policy issues, such as transparency towards customers, they have no publicly available information at all.” All this according to the FT http://us.ft.com/ftgateway/superpage.ft?news_id=fto120420070313216545.

But filling out surveys is not how a company proves transparency to its customers. It does so by making information public. We’ve gone to extraordinary lengths to publish information for our users about our privacy practices. In many respects, I feel we lead the industry. Here are just a few examples:

We were the first search company to start anonymising our search records (a move the rest of the industry soon followed), and we published our exchange of letters with the EU privacy regulators, explaining these issues in great depth. http://googleblog.blogspot.com/2007/06/how-long-should-google-remember.html

We engineer transparency into our technologies like Web History, which allow users to see and control their own personal search history.
https://www.google.com/accounts/ServiceLogin?hl=en&continue=http://www.google.com/psearch&nui=1&service=hist

We’ve also gone to extraordinary lengths to explain our privacy practices to users in the clearest ways we can devise.
Our privacy policies: http://www.google.com/privacy.html
Our privacy channel on YouTube, with consumer-focused videos explaining basic privacy concepts:
http://googleblog.blogspot.com/2007/08/google-search-privacy-plain-and-simple.html http://googleblog.blogspot.com/2007/09/search-privacy-and-personalized-search.htmlOur Google blogs on privacy: http://googleblog.blogspot.com/2007/05/why-does-google-remember-information.html
Our Google public policy blogs on privacy: http://googlepublicpolicy.blogspot.com/

With each of these efforts, we were the first, and often the only, search engine to embrace this level of transparency and user control. And lots of people in Google are working on even more tools, videos and content to help our users understand our privacy practices and to make informed decisions about how to use them. Check back to these sites regularly to see more.

So, really, is it fair for this organization to claim that “on some policy issues, such as transparency towards customers, they have no publicly available information at all”? Perhaps next time, they can follow up their email with a comment on our public policy blog, or a video response on our Google Privacy YouTube channel. Or even send a question to our Privacy Help Site:
http://www.google.com/support/bin/request.py?contact_type=privacy . Well, so much for the report’s claim that Google doesn’t have a feedback link.

Tuesday, October 23, 2007

Online Advertising: privacy issues are important, but they don’t belong in merger reviews

As the European Commission and the US Federal Trade Commission review Google’s proposed acquisition of DoubleClick, a number of academics, privacy advocates and Google competitors have argued that these competition/anti-trust authorities should consider “privacy” as part of their merger review. That’s just plain wrong, as a matter of competition law. It’s also the wrong forum to address privacy issues. If online advertising presents a “harm to consumers”, let’s try to figure out what exactly the harm is, figure out which online advertising practices to change, and then apply those principles to all the participants in the industry. But we shouldn’t bootstrap privacy concerns onto a merger review. That’s like evaluating a merger of automakers by looking at the gas mileage of their cars. We don’t invoke antitrust law to prevent a merger of car companies, because we think the industry should build cars that use less gas.

Some advocates state that online advertising “harms” consumers. So they reason that the merger of Google and DoubleClick would “harm” consumers more, to the extent that it enables more targeted advertising. But these same critics rarely cite specific examples of consumer “harms”, and indeed, I’m having trouble identifying what they might be. The typical use of ad impression tracking now is to limit the number of times a user is exposed to a particular ad. That is, after you have seen an image of a blue car for 6 or 7 times, the ad server will switch to an image of a red car or to some other ad. This means that a user will see different ads, rather than re-seeing the same ad over and over again. As someone who is sick of seeing the same ads over and over again on television, I think that’s good for both viewers and advertisers. There are also new forms of advertising that are enabled by the Internet that may allow for more effective matching between buyers and sellers. Again, I prefer to see relevant ads, if possible. I go to travel sites a lot, and I’m happy to see travel ads, even when I’m not on a travel site. I don’t want to see ads for children’s toys, and I dislike the primitive nature of television, when it shows me such blatantly irrelevant ads.

We all dislike unsolicited direct marketing by phone. So, we created a regulatory “do not call” solution. But without knowing which precise practices of online advertising create a “harm”, it’s impossible to discuss a potential solution. Moreover, a website that offers its services or content for free to consumers (e.g., a news site), tries to generate revenue from advertising to pay its journalists’ salaries and other costs. Shouldn’t such websites also have a say in whether they should be forced to offer their free content to consumers without the ability to match ads to viewers according to some basic criteria? It’s very clear (but worth reiterating) that free sites are almost always more respectful of privacy than paying sites, because of the simple fact that paying sites must collect their users’ real identities and real credit card numbers, while free sites can often be used anonymously.

Now, some legal observations relating to European laws on merger reviews. The overriding principle protected by those laws is consumer welfare: referring to those aspects of a transaction that affect the supply and demand for goods/services (i.e., that affect quantity, quality, innovation choice, etc.). The reference in Article 2(1)(b) ECMR to "the interests of the intermediate and ultimate consumers, and the development of technical and economic progress provided that it is to consumers' advantage and does not form an obstacle to competition" must therefore be read in this context – consumer interests are relevant to the merger assessment only for the purpose of assessing whether the degree of competition that will remain post-transaction will be sufficient to guarantee consumer welfare.

The fact that non-competition issues, such as privacy, fall outside the scope of ECMR is consistent with the general consensus that merger control should focus on the objective of ensuring that consumer welfare is not harmed as a result of a significant impediment to effective competition. Introducing non-competition related considerations into a merger analysis (e.g., environmental protection or privacy) would lead to a potentially arbitrary act of balancing competition against potentially diverging interests. Accordingly, policy issues, such as privacy, are not suitably addressed in a merger control procedure, but should be dealt with separately.

Indeed, privacy interests are addressed in Directive 95/468 and Directive2002/589 (both of which are based on Article 14 EC and Article 95 EC), Article 6 TEU and Article 8 ECHR, and Google must abide by its legal obligations under these instruments. Such instruments are also far more efficient in addressing privacy issues than the ECMR, as they are industry-wide in scope. Internet privacy issues are relevant to the entire industry as they are inextricably linked to the very nature of the technology used by every participant on the Internet. Information is generated in relation to virtually every event that occurs on the Internet, although the nature of the data, the circumstances in which it is collected, the entities from whom and by whom it is collected, and the uses to which it is put, vary considerably. This situation pre-dates Google’s proposed acquisition of DoubleClick and is not in any way specific to it. More importantly, any modification of the status quo in terms of the current levels of privacy protection must involve the industry as a whole, taking account of the diversity of participants and their specific circumstances.

Google has always been, and will continue to be, willing to engage in a wider policy debate regarding Internet privacy. Issues of privacy and data security are of course of great importance to Google, as maintaining user trust is essential for its success. As a large and highly visible company, Google has strong incentives to practice strong privacy and security policies in order to safeguard user data and maintain user trust. These concerns are one of the reasons why Google has thus far chosen not to accept display ad tags from third parties. The proposed transaction will not change Google's commitment to privacy, and Google is in fact currently developing a new privacy policy to address the additional data gathered through third-party ad serving. Similarly, a number of Google's competitors have announced new and supposedly improved policies to protect consumer privacy, highlighting the robustness of recent competition on privacy issues. There is no reason to suggest that such competition will diminish if Google acquires DoubleClick; to the contrary, such competition appears to be intensifying.

Privacy is an important issue in the world of online ads. But it is not an issue for a competition law review.

Can you “identify” the person walking down the street?

I recently posted a blog on Google’s Lat Long Blog about Street View and privacy.

http://google-latlong.blogspot.com/2007/09/street-view-and-privacy.html

I’d like to add a few personal observations to that post.

Some people might have wondered why Google posted a blog about what a future launch of Street View would look like in some non-US countries, especially since, so far, it only includes images from 15 US cities. We felt the need to respond to concerns that we had heard recently, in particular concerns from Canada’s privacy regulators, that a launch of the US-style of Street View in Canada might not comply with Canadian privacy regulations. And we wanted to be very clear that we understood privacy regimes are different in some countries, such as Canada, and for that matter, much of Europe, compared to the US tradition of “public spaces.” And of course, that we would respect those differences, when/if we launched Street View in those countries.

Basically, Street View is going to try not to capture “identifiable faces or identifiable license plates” in its versions in places where the privacy laws probably wouldn’t allow them (absent consent from the data subjects, which is logistically impossible), in other words, in places like Canada and much of Europe. And for most people, that pretty much solves the issue. If you can’t identify a person’s face, then that person is not an “identifiable” human being in privacy law terms. If you can’t identify a license plate number, then that car is not something that can be linked to an identifiable human being in privacy law terms.

How would Street View try not to capture identifiable faces or license plates? It might be a combination of blurring technology and resolution. The quality of face-blurring technology has certainly improved recently, but there are still some unsolved limitations with it. As one of my engineering colleagues at Google explained it to me: “Face detection and obscuring technology has existed for some time, but it turns out not to work so well. Firstly, face recognition misses a lot of faces in practice, and secondly, a surprising number of natural features (bits of buildings, branches, signs, chance coincidence of all of the above) look like faces. It’s somewhat surprising when you run a face recognition program over a random scene and then look closely at what it recognises. These problems are also exacerbated by the fact that you have no idea of scale, because of the huge variations in distance that can occur.”

Lowering the quality of resolution of images is another approach to try not to capture identifiable faces or license plates. If the resolution is not great, it’s hard (or even impossible) to identify them. Unfortunately, any such reduction in resolution would of course also reduce the resolution of the things we do want to show, such as buildings. So, it’s a difficult trade-off.

Some privacy advocates raise the question of how to circumscribe the limits of “identifiability”. Can a person be considered to be identifiable, even if you cannot see their face? In pragmatic terms, and in privacy law terms, I think not. The fact is that a person may be identifiable to someone who already knows them, on the basis of their clothes (e.g., wearing a red coat), plus context (in front of a particular building), but they wouldn’t be “identifiable” to anyone in general. Others raise the issue of whether properties (houses, farms, ranches) should be considered to be “personal data” (so that their owners or residents could request them to be deleted from these geo sites, like Google Earth)? Last month, various German privacy officials made these arguments in a Bundestag committee hearing. They reasoned that a simple Internet search can often combine a property’s address with the names of the property’s residents. Others see this reasoning as a distortion of privacy concepts, which were not meant to be extended to properties. And the consequences of that reasoning would mean that satellite and Street View imagery of the world might be full of holes, as some people (disproportionately, celebrities and the rich, of course) would try to block their properties from being discoverable.

Google will have to be pragmatic, trying to solve privacy issues in a way that doesn’t undermine the utility of the service or the ability of people to find and view legitimate global geographic images. I personally would like to see the same standard of privacy care applied to Street View across the globe: namely, trying not to capture identifiable faces or license plates, even in the US, regardless of whether that’s required by law or not. But I recognize that there are important conflicting principles at play (i.e., concepts of “public spaces”), and “privacy” decisions are never made in a bubble.

We’re engaged in a hard debate, inside Google and outside: what does privacy mean in connection with images taken in “public spaces”, and when does a picture of someone become “identifiable”? Can we have a consistent standard around the world, or will we have to have different standards in different countries based on local laws and culture? This isn’t the first time (and I hope, not the last time) that Google has launched a new service, letting people access and search for new types of information. Those of us in the privacy world are still debating how to address it.

I think the decisions taken by the Street View team have been the right ones, even for the US launch, at least at this point in time, and given the current state of technology. But a more privacy-protective version in other countries (and someday, maybe in the US too?) would be a good thing, at least for privacy.

Tuesday, October 16, 2007

I like the anonymity of the big city

For much of history, people lived in small communities, where everyone knew them, and they knew everyone. Identity was largely inherited and imposed, and the ability of people to re-invent themselves was quite limited. You were father, farmer, drunkard, and everyone knew it.

The big city changed all that, by offering anonymity and choice. Against the background of anonymity, people can choose their identity, or choose multiple identities, often by choosing the community of other people with whom they live, work or play. In the city, you can choose to cultivate multiple identities: to mingle with bankers or toddlers by day, to play rugby or poker by night, to socialize with rabbis or lesbians, and to do all this while choosing how anonymous to remain. Maybe you’re happy to use your real name with your bank colleagues, but delight in the anonymity of a large nightclub. And you can share different parts of your identity with different communities, and none of them need to know about the other parts, if you don’t want them too: work and home, family and friends, familiarity and exploration, the city allows you to create your identity against a background of anonymity.

Like the city, but on a much, much bigger scale, the Web allows people to create multiple digital identities, and to decide whether to use their “real” identity, or pseudonyms, or even complete anonymity. With billions of people online, and with the power of the Internet, people can find information and create virtual communities to match any interest, any identity. You may join a social networking site with your real names or your pseudonyms, finding common interests with other people on any conceivable topic, or exploring new ones. You may participate in a breast cancer forum, by sharing as much or as little information about yourself as you wish. You may explore what it means to be gay or diabetic, without wanting anyone else to know. Or you may revel in your passion to create new hybrids of roses with other aficionados. The Web is like the city, only more so: more people, more communities, more knowledge, more possibility. And the Web has put us all in the same “city”, in cyberspace.

Life is about possibilities: figuring out who you are, who you want to be. Cities opened more possibilities for us to create the identities we choose. The Web is opening even more.

Wednesday, September 19, 2007

Eric Schmidt on Global Privacy Standards

Eric Schmidt, Google's CEO, added his voice to the debate on global privacy standards with this OpEd, published in a number of outlets around the world this week.

As the information age becomes a reality for increasing numbers of people globally, the technologies that underpin it are getting more sophisticated and useful. The opportunities are immense. For individuals, a quantum leap forward in their ability to communicate and create, speak and be heard; for national economies, accelerated growth and innovation.

However, these technological advances do sometimes make it feel as if we are all living life in a digital goldfish bowl. CCTV cameras record where we shop and how we travel. Mobile phones track our movements. Emails leave a trail of who we “talk” to, and what we say. The latest internet trends - blogs, social networks and video sharing sites - take this a step further. At the click of a mouse it’s possible to share almost anything – photographs, videos, one’s innermost thoughts - with almost anyone.

That's why I believe it's important we develop new privacy rules to govern the increasingly transparent world which is emerging online today – and by new rules I don’t automatically mean new laws. In my experience self regulation often works better than legislation – especially in highly competitive markets where people can switch providers simply by typing a few letters into a computer.

Search is a good example. Search engines like Google have traditionally stored their users’ queries indefinitely – the data helps us to improve services and prevent fraud. These logs record the query, the time and date it was entered, and the computer’s Internet Protocol (IP) address and cookie. For the uninitiated, an IP address is a number (sometimes permanent, sometimes one-off) assigned to a computer – it ensures the right search results appear on the right screen. And a cookie is a file which records people’s preferences - so that users don’t continually have to re-set their computers.

While none of this information actually identifies individuals, it doesn’t tell us who people are or where they live, it is to some extent personal because it records their search queries. That’s why Google decided to delete the last few digits of the IP address and cookie after 18 months – breaking the link between what was typed, and the computer from which the query originated. Our move created a virtuous dynamic, with others in the search industry following suit soon afterwards. In an industry where trust is paramount, we are now effectively competing on the best privacy practices as well as services.

Of course, that’s not to say privacy legislation doesn’t have its place in setting minimum standards. It does. At the moment, the majority of countries have no data protection rules at all. And where legislation does exist, it’s typically a hotchpotch of different regimes. In America, for example, privacy is largely the responsibility of the different states – so there are effectively 50 different approaches to the problem. The European Union by contrast has developed common standards, but as the UK’s own regulator has acknowledged these are often complex and inflexible.

In any event, privacy rules in one country, no matter how well designed, are of limited use now that personal data can zip several times around the world in a matter of seconds. Think about a routine credit card transaction – this can involve six or more separate countries once the location of customer service and data centres are taken into account.

The lack of agreed global privacy standards has two potentially damaging consequences. First, it results in the loss of effective privacy protections for individuals. How can consumers be certain their data is safe, wherever it might be located? Second, it creates uncertainty for business, which can restrict economic activity. How does a company, especially one with global operations, know what standards of data protection to apply in all the different markets where it operates?

That’s why Google is today calling for a new, more co-ordinated approach to data protection by the international community. Developing global privacy standards will not be easy – but it’s not entirely new ground. The Organization for Economic Co-operation and Development produced its Guidelines on the Protection of Privacy and Trans-border Flows of Personal Data as far back as 1980.

More encouragingly recent initiatives in this area by the United Nations, the Asian-Pacific Economic Co-operation Forum and the International Privacy Commissioners’ Conference have all focussed on the need for common data protection principles. For individuals such principles would increase transparency and consumer choice, helping people to make informed decisions about the services they use as well as reducing the need for additional regulation. For business, agreed standards would mean being able to work within one clear framework, rather than the dozens that exist today. This would help stimulate innovation. And for governments, a common approach would help dramatically improve the flow of data between countries, promoting trade and commerce.

The speed and scale of the digital revolution has been so great that few of us can remember how life was before we had the ability to communicate, trade or search for information 24-hours a day, seven days a week. And the benefits have been so great that most people who do recall our analogue past would never want to return to the old days. The task we now face is twofold: to build trust by preventing abuse and to enable future innovation. Global privacy standards are central to achieving these goals. For the sake of economic prosperity, good governance and individual liberty, we must step up our efforts to implement them.

Friday, September 14, 2007

The Need for Global Privacy Standards

Introduction

How should we update privacy concepts for the Information Age? The total amount of data in the world is exploding, and data flows around the globe with the click of mouse. Every time you use a credit card, or every time you use an online service, your data is zipping around the planet. Let’s say you live in France and you use a US company’s online service. The US company may serve you from any one of its numerous data centers, from the “cloud” as we say in technology circles, in other words, from infrastructure which could be in Belgium or Ireland – and which could change based on momentary traffic flows. The company may store offline disaster recovery tapes in yet another location (without disclosing the location, for security purposes). And the company may engage customer service reps in yet another country, say India. So, your data may move across 6 or 7 countries, even for very routine transactions.
As a consumer, how do you know that your data is protected, wherever it is located? As a business, how do you know which standards of data protection to apply? As governments, how do you ensure that your consumers and your businesses can participate fully in the global digital economy, while ensuring their privacy is protected?

The story illustrates the argument I want to make today. It is that businesses, governments but most of all citizens and consumers would all benefit if we could devise and implement global privacy standards. In an age when billions of people are used to connecting with data around the world at the speed of light, we need to ensure that there are minimum privacy protections around the world. We can do better, when the majority of the world’s countries offer virtually no privacy standards to their citizens or to their businesses. And the minority of the world’s countries that have privacy regimes follow divergent models. Today, citizens lose out because they are unsure about what rights they have given the patchwork of competing regimes, and the cost of compliance for businesses risks chilling economic activity. Governments often struggle to find any clear internationally recognised standards on which to build their privacy legislation.

Of course there are good reasons for some country-specific privacy legislation. The benefits of homogeneity must be balanced by the rights of legitimate authorities to determine laws within their jurisdictions. We don’t expect the same tax rules in every country, say some critics, so why should we expect the same privacy rules? But in many areas affecting international trade, from copyright to aviation regulations to world health issues, huge benefits have been achieved by the setting of globally respected standards. In today’s inter-connected world, no one country and no one national law by itself can address the global issues of copyright or airplane safety or influenza pandemics. It is time that the most globalised and transportable commodity in the world today, data, was given similar treatment.

So today I would like to set out why I think international privacy rules are necessary, and to discuss ideas about how we create universally respected rules. I don’t claim to have all the answers to these big questions, but I hope we can contribute to the debate and the awareness of the need to make progress.

Drivers behind the original privacy standards

But first a bit of history. Modern privacy law is a response to historical and technological developments of the second-half of the 20th century. The ability to collect, store and disseminate vast amounts of information about individuals through the use of computers was clearly chilling against the collective memories of the dreadful mass-misuse of information about people that Europe had experienced during WWII. Not surprisingly, therefore, the first data privacy initiatives arose in Europe, and they were primarily aimed at imposing obligations that would protect individuals from unjustified intrusions by the state or large corporations, as reflected in the 1950 European Convention for the Protection of Rights and Fundamental Freedoms.

Early international instruments

After a decade of uncoordinated legislative activity across Europe, the Organisation for Economic Co-operation and Development identified a danger: that disparities in national legislations could hamper the free flow of personal data across frontiers. In order to avoid unjustified obstacles to transborder data flows, in 1980 the OECD adopted its Guidelines on the Protection of Privacy and Transborder Flows of Personal Data. It’s worth underscoring that concerns about international data flows were already being addressed in a multinational context as early as 1980, with the awareness that a purely national approach to privacy regulation simply wasn’t keeping abreast of technological and business realities.

These OECD Guidelines became particularly influential for the development of data privacy laws in non-European jurisdictions. The Guidelines represent the first codification of the so-called ‘fair information principles’. These eight principles were meant to be taken into account by OECD member countries when passing domestic legislation and include: 1) collection limitation, 2) data quality, 3) purpose specification, 4) use limitation, 5) security safeguards, 6) openness, 7) individual participation, and 8) accountability.

A parallel development in the same area but with a slightly different primary aim was the Council of Europe Convention on the Automated Processing of Personal Data adopted in 1981. The Convention’s purpose was to secure individuals’ right to privacy with regard to the automatic processing of personal data and was directly inspired by the original European Convention on human rights. The Council of Europe instrument sets out a number of basic principles for data protection, which are similar to the ‘fair information principles’ of the OECD Guidelines. In addition, the Convention establishes special categories of data, provides additional safeguards for individuals and requires countries to establish sanctions and remedies.
The different origins and aims of both instruments result in rather different approaches to data privacy regulation. For example, whilst the Convention relies heavily on the establishment of a supervisory authority with responsibility for enforcement, the OECD Guidelines rely on court-driven enforcement mechanisms. These disparities have been reflected in the laws of the countries within the sphere of influence of each model. So, for example, in Europe, privacy abuses are regulated by independent, single-purpose bureaucracies, while in the US, privacy abuses can be regulated by many different government and private bodies (e.g., the Federal Trade Commission at the Federal level, Attorneys General at the State levels, and private litigants everywhere). It’s impossible to say which model is more effective, since each reflects the unique regulatory and legal cultures of their respective traditions. Globally, we need to focus on advocating privacy standards to countries around the world. But we should defer to each country to decide on its own regulatory models, given its own traditions.

Current situation

Today, a quarter century later, some countries are inspired by the OECD Guidelines, others follow the European approach, and some newer ones incorporate hybrid approaches by cherry-picking elements from existing frameworks, while the significant majority still has no privacy regimes at all.

After half a decade of negotiations, in 1995, the EU adopted the Data Protection Directive 95/46/EC. The EU Directive has a two-fold aim: to protect the right to privacy of individuals, and to facilitate the free flow of personal data between EU Member States. Despite its harmonisation purpose, according to a recent EU Commission Communication, the Directive has not been properly implemented in some countries yet. This shows the inherent difficulty in trying to roll out a detailed and strict set of principles, obligations and rights across jurisdictions. However, the Commission has also made it clear that at this stage, it does not envisage submitting any legislative proposals to amend the Directive.

In terms of core European standards, the best description of what the EU privacy authorities would regard as “adequate data protection” can be found in the Article 29 Working Party’s document WP 12. This document is a useful and detailed point of reference to the essence of European data privacy rules, comprising both content principles and procedural requirements. In comparison with other international approaches, EU data privacy laws appear restrictive and cumbersome, particularly as a result of the stringent prohibition on transfers of data to most countries outside the European Union. The EU’s formalistic criteria for determining “adequacy” have been widely criticized: why should Argentina be “adequate”, but not Japan? As a European citizen, why can companies transfer your data (even without your consent) to Argentina and Bulgaria and other “adequate” countries, but not to the vast majority of the countries of the world, like the US and Japan? In short, if we want to achieve global privacy standards, the European Commission will have to learn to demonstrate more respect for other countries’ approach to privacy regimes.

But at least in Europe there is some degree of harmonisation. In contrast, the USA has so far avoided the adoption of an all-encompassing Federal privacy regime. Unlike in Europe, the USA has traditionally made a distinction between the need for privacy-related legislation in respect of the public and the private sectors. Specific laws have been passed to ensure that government and administrative bodies undertake certain obligations in this field. With regard to the use of personal information by private undertakings, the preferred practice has been to work on the basis of sector-specific laws at a Federal level whilst allowing individual states to develop their own legislative approaches. This has led to a flurry of state laws dealing with a whole range of privacy issues, from spam to pretexting. There are now something like 37 different USA State laws requiring security breach notifications to consumers, a patchwork that is hardly ideal for either American consumer confidence or American business compliance.

The complex patchwork of privacy laws in the US has led many people to call for a simplified, uniform and flexible legal framework, and in particular for comprehensive harmonised Federal privacy legislation. To kick start a serious debate on this front, a number of leading US corporations set up in 2006 the Consumer Privacy Legislative Forum, of which Google forms part. It aims to make the case for harmonised legislation. We believe that the same arguments for global privacy standards should also apply to US Federal privacy standards: improve consumer protections and confidence by applying a consistent minimum standard, and ease the burdens on businesses trying to comply with multiple (sometimes conflicting) standards.
A third and increasingly influential approach to privacy legislation has been developing in Canada, particularly since the federal Personal Information Protection and Electronic Documents Act (“PIPEDA”) was adopted in 2000. The Canadian PIPEDA aims to have the flexibility of the OECD Guidelines – on which it is based – whilst providing the rigour of the European approach. In Canada, as in the USA, the law establishes different regimes for the public and private sectors, which allows for a greater focus on each. As has also been happening in the USA in recent years with state laws, provincial laws have recently taken a leading role in developing the Canadian model. Despite the fact that PIPEDA creates a privacy framework that requires the provincial laws to be "substantially similar" to the federal statute, a Parliamentary Committee carrying out a formal review of the existing framework earlier this year, recommended reforms for PIPEDA to be modelled on provincial laws. Overall, Canada should be praised for encouraging the development of progressive legislation which serves the interests of both citizens and businesses well.

Perhaps the best example of a modern approach to the OECD privacy principles is to be found in the APEC Privacy Framework, which has emerged from the work of the 21 countries of the Asia-Pacific Economic Cooperation forum. The Framework focuses its attention on ensuring practical and consistent privacy protection across a very wide range of economic and political perspectives that include global powerhouses such as the US and China, plus some key players in the privacy world (some old, some new), such as Australia, New Zealand, Korea, Hong Kong and Japan. In addition to being a sort of modern version of the old OECD Guidelines, the Framework suggests that privacy legislation should be primarily aimed at preventing harm to individuals from the wrongful collection and misuse of their information. The proposed framework points out that under the new “preventing harm” principle, any remedial measures should be proportionate to the likelihood and severity of the harm.

Unfortunately, the co-existence of such diverse international approaches to privacy protection has three very damaging consequences: uncertainty for international organisations, unrealistic limits on data flows in conflict with global electronic communications, and ultimately loss of effective privacy protection.

New (interconnected) drivers for global privacy standards

Against this background, we are witnessing a series of new phenomena that evidence the need for global privacy standards much more compellingly than in the 70s, 80s or 90s. The development of communications and technology in the past decade has had a marked economic impact and accelerated what is commonly known as ‘globalisation’. Doing business internationally, exchanging information across borders and providing global services has become the norm in an unprecedented way. This means that many organisations and those within them operate across multiple jurisdictions. The Internet has made this phenomenon real for everyone.

A welcome concomitant of the unprecedented technological power to collect and share all this personal information on a global basis is the increasing recognition of privacy rights. The concept of privacy and data protection regimes has moved from one discussed by experts at learned conferences to an issue that is discussed and debated by ordinary people who are increasingly used to the trade offs between privacy and utility in their daily lives. As citizens’ interest in the issue has grown, so, of course has politicians’ interest. The adoption of new and more sophisticated data privacy laws across the world and the radical legal changes affecting more traditional areas of law show that both law makers and the courts perceive the need to strengthen the right to privacy. Events which have highlighted the risks attached to the loss or misuse of personal information have led to a continuous demand for greater data security which often translates into more local laws, such as those requiring the reporting of security breaches, and greater scrutiny.

Routes to the development of global privacy standards

The net result is that we have a fragmentation of competing local regimes, at the same time as we the massively increased ability for data to travel globally. Data on the Internet flows around the globe at nearly the speed of light. To be effective, privacy laws need to go global. But for those laws to be observed and effective, a realistic set of standards must emerge. It is absolutely imperative that these standards are aligned to today’s commercial realities and political needs, but they must also reflect technological realities. Such standards must be strong and credible but above all, they must be clear and they must workable.

At the moment, there are a number of initiatives that could become the guiding force. As the most recent manifestation of the original OECD privacy principles, one possible route would be to follow the lead of the APEC Privacy Framework and extend its ambit of influence beyond the Asia-Pacific region. One good reason for adopting this route is that it already balances very carefully information privacy with business needs and commercial interests. At the same time, it also accords due recognition to cultural and other diversities that exist within its member economies.

One distinctive example of an attempt to rally the UN and the world’s leaders behind the adoption of legal instruments of data protection and privacy according to basic principles is the Montreux Declaration of 2005. This Declaration probably represents the first official written attempt to encourage every government in the world to do something like this and this is an ambition that must be praised. Little further was heard about the progress of the Montreux Declaration until the International Privacy Commissioners’ Conference took place in November 2006 and the London initiative was presented. The London Initiative acknowledged that the global challenges that threaten individuals’ privacy rights require a global solution. It focuses on the role of the Commissioners’ Conference to spearhead the necessary actions at an international level. The international privacy commissioners behind the London Initiative argue that concrete suggestions must emerge in order to accomplish international initiatives, harmonise global practices and adopt common positions.

One privacy commissioner who has expressed great interest in taking an international role aimed developing global standards is the UK Information Commissioner. The Data Protection Strategy of the Information Commissioner’s Office published at the end of June 2007 stresses the importance of improving the image, relevance and effectiveness of data protection worldwide and, crucially, recognises the need for simplification.

Way forward

The key priority now should be to build awareness of the need for global privacy standards. Highlighting and understanding the drivers behind this need – globalisation, technological development, and emerging threats to privacy rights – will help policymakers better understand the crucial challenge we face and how best to find solutions to address them.
The ultimate goal should be to create minimum standards of privacy protection that meet the expectations and demands of consumers, businesses and governments. Such standards should be relevant today yet flexible enough to meet the needs of an ever changing world. Such standards must also respect the value of privacy as an innate dimension of the individual. To my mind, the APEC Framework is the most promising foundation on which to build, especially since competing models are flawed (the USA model is too complex and too much of a patchwork, the EU model is too bureaucratic and inflexible).

As with all goals, we must devise a plan to achieve it. Determining the appropriate international forum for such standards would be an important first step, and this is a choice that belongs in the hands of many different stakeholders. It may be the OECD or the Council of Europe. It may be the International Chamber of Commerce or the World Economic Forum. It may be the International Commissioners’ Conference or it may be UNESCO. Whatever the right forum is, we should work together to devise a set of standards that reflects the needs of a truly globalised world. That gives each citizen certainty about the rules affecting their data, and the ability to manage their privacy according to their needs. That gives businesses the ability to work within one framework rather than dozens. And that gives governments clear direction about internationally recognised standards, and how they should be applied.

Data is flowing across the Internet and across the globe. That’s the reality. The early initiatives to create global privacy standards have become more urgent and more necessary than ever. We must face the challenge together.

Thursday, August 30, 2007

Slowing down: 17 minutes for privacy

In the era of the soundbite and the tabloid headline, it's almost startling to be invited to talk on radio about privacy at Google for 17 minutes. I don't normally believe in cross-posting media stuff into this blog, but it's not everyday that you get a chance to talk about things slowly, in depth. The audio link is here:

http://oe1.orf.at/highlights/107732.html

All this is in connection with the Ars Electronica Privacy Symposium in Linz, Austria.

IP Geolocation: knowing where users are – very roughly

A lot of Internet services take IP-based geolocation into account. In other words, they look at a user's IP address to try to guess the user's location, in order to provide a more relevant service. In privacy terms, it's important to understand the extent to which a person's location is captured by these services. Below are some insights into how precise these are (or rather, are not), how it's done, and how they're used in some Google services.

The IP geolocation system Google uses (similar to the approach used by most web sites) is based primarily on third-party data, from an IP-to-geo index. These systems are reasonably accurate for classifying countries, particularly large ones and in areas far from borders, but weaker at city-level and regional-level classification. As measured by one truth set, these systems are off by about 21 miles for the typical U.S. user (median), and 20% of the time don't know where the user is located within less than 250 miles. The imprecision of geolocation is one of the reasons that it is a flawed model to use for legal compliance purposes. Take, for example, a YouTube video with political discourse that is deemed to be “illegal” content in one country, but completely legal in others. Any IP-based filtering for the country that considers this content illegal will always be over- or under-inclusive, given the imprecision of geolocation.

IP address-based geolocation is used at Google in a variety of applications to guess the approximate location of the user. Here are examples of the use of IP geolocation at Google:

Ads quality: Restrict local-targeted campaigns to relevant users
Google Analytics: Website owners slice usage reports by geography
Google Trends: Identifying top and rising queries within specific regions
Adspam team: Distribution of clicks by city is an offline click spam signal
Adwords Frontend: Geo reports feature in Report Center

So, an IP-to-geo index is a function from an IP address to a guessed location. The guessed location for a given IP address can be as precise as a city or as vague as just a country, or there can be no guess at all if no IP range in the index contains the address. There are many efforts underway to improve the accuracy of these systems. But for now, IP-based geolocation is significantly less precise than zip codes, to take an analogy from the physical world.

Tuesday, August 28, 2007

Do you read privacy policies, c'mon, really?

What’s the best way to communicate information about privacy to consumers? Virtually all companies do this in writing, via privacy policies. But many are not easy to read, because they are trying to do two (sometimes contradictory) things, namely, provide consumers with information in a comprehensible format, while meeting legal obligations for full privacy disclosure. So, should privacy policies be short (universally preferred by consumers) or long (universally preferred by lawyers worried about regulatory obligations)? Perhaps a combination of the two is the best compromise: a short summary on top of a long complete privacy policy, the so-called “layered” approach. This is the approach recommended in a thoughtful study by the Center for Information Policy Leadership:
http://www.hunton.com/files/tbl_s47Details/FileUpload265/1405/Ten_Steps_whitepaper.pdf

But then I’m reminded of what Woody Allen said: “I took a speed reading course and read ‘War and Peace’ in twenty minutes. It involves Russia.” Yes, privacy summaries can be too short to be meaningful.

Indeed, maybe written policies aren’t the best format for communicating with consumers, regardless of whether they’re long or short. Maybe consumers prefer watching videos. Intellectually, privacy professionals might want consumers to read privacy policies, but in practice, most consumers don’t. We should face that reality. So, I think we have an obligation to be creative, to explore other media for communicating with consumers about privacy. That’s why Google is exploring video formats. We’ve just gotten started, and so far, we’ve only launched one. We’re working on more. Take a look and let me know what you think. Remember, we’re trying to communicate with “average” consumers, so don’t expect a detailed tech tutorial.

http://googleblog.blogspot.com/2007/08/google-search-privacy-plain-and-simple.html

Personally, I’ve also been trying to talk about privacy through other video formats, with the media. Below is just one example. I don’t know if all these videos are the right approach, but I do think it’s right to be experimenting.

http://www.reuters.com/news/video/videoStory?videoId=57250

Did you read the book, or watch the movie?

Monday, August 27, 2007

Data Protection Officers according to German law

Some of you might be interested in German law on data protection officers. I’m going to give this to you in factual terms. [This isn’t legal advice, and it’s not commentary: so, I’m not commenting on how much or little sense I think this makes in practice.]

Since August 2006, according to the German Data Protection Act, the appointment of an Data Protection Officer (“DPO”) is compulsory for any company or organization employing more than nine employees in its automated personal data processing operations.

Anyone appointed as DPO must have the required technical and technical-legal knowledge and reliability (Fachkunde und Zuverlässigkeit). He or she need not be an employee, but can also be an outside expert (i.e., the work of the official can be outsourced). Either way, the official reports directly to the CEO (Leiter) of the company; must be allowed to carry out his or her function free of interference (weisungsfrei); may not be penalized for his or her actions; and can only be fired in exceptional circumstances, subject to special safeguards (but note that this includes being removed as DPO at the suggestion of the relevant DPA). The company is furthermore required by law to provide the official with adequate facilities in terms of office space, personnel, etc.

The main task of the DPO is to ensure compliance with the law and any other data protection-relevant legal provisions in all the personal data processing operations of his employer or principal. To this end, the company must provide the DPO with an overview of its processing operations that must include the information which (if it were not for the fact that the company has appointed a DPO) would have had to be notified to the authorities as well as a list of persons who are granted access to the various processing facilities. In practice, it is often the first task of the DPO to compile a register of this information, and suggest appropriate amendments (e.g., clearer definitions of the purpose(s) of specific operations, or stricter rules on who has access to which data). Once a DPO has been appointed, new planned automated processing operations must be reported to him or her before they are put into effect.

The DPO’s tasks also include verifying the computer programs used; and training the staff working with personal data. More generally, he has to advise the company on relevant operations, and to suggest changes where necessary. This is a delicate matter, especially if the legal requirements are open to different interpretations. The Act therefore adds that the official may, “in cases of doubt” contact the relevant DPA. However, except in the special context of a “prior check” issues, the Act does not make this obligatory.

It is important to note that the DPO in Germany is not just a cosmetic function, and it is important for the company and DPO to take his role seriously. Thus, the DPO must be given sufficient training and resources to do his job properly. Failure to take the DPO function seriously can have serious legal consequences, both for the company and the DPO.

When appointing a DPO, it is important to identify potential incompatibility and conflict of interests between this position and other positions of the person within the company. Non-compliance with the law is subject to an administrative offense which can be punished by a fine of up to € 25,000. Moreover, the DPA can order the dismissal of the DPO if he or she also holds a position which is incompatible with the role as DPO. Finally, non-compliance may give rise to liability under the Act.

Unfortunately, with regard to conflicts of interest there is no clear picture, and much depends on local requirements and views by local DPAs. In general, the following positions are considered to be incompatible with the position of a DPO:

CEO, Director, Corporate Administrators, or other managerial positions that are legally or statutory compulsory
Head of IT/ IT Administrator
Head of HR
Head of Marketing
Head of Sales
Head of Legal
Executives of corporate units processing massive or sensitive personal data

Employees in the administrative department and employees in the legal department are more likely considered to have no conflicts of interest. Finally, views differ considerably with regard to the position of an internal auditor and the head of corporate security. An IT security manager can be appointed if he is independent in the organization of the department.

Finally, German law does not provide for having a “Group DPO” that oversees a group of companies or a holding (Konzerndatenschutzbeauftragter). Such a DPO needs to be appointed by every single entity and also has to implement local data protection coordinators.

Tuesday, July 17, 2007

Safe Harbor: the verification problem

A company that signs up to comply with the provisions of the Safe Harbor Agreement for the transfer of personal data from Europe to the US must have a process to verify its compliance. There’s very little in the way of “official” guidance on this question. I’ve spent some time trying to figure out how companies can verify compliance. Here are three options, and companies should choose the model that fits best with their corporate culture and structure.

Traditional Audits

A company can conduct a traditional audit of privacy practices company-wide. The problem with company-wide audits based on traditional checklists, however, is that no two people read the checklist the same way; and all the incentives are to be brief and forgetful when filling out a form. If the checklist is used by an interviewer, the return on investment of time goes up in terms of quality of information, but only so much as the interviewer has the knowledge of the product and the law to ask the right questions. The bigger and more diverse the company, the more daunting the task and the less consistent the information collected.

The traditional auditor approach to verification usually includes massive checklists, compiled and completed by a large team of consultants, usually driven by outputs that require formal corrective action reporting and documented procedures, and cost a fortune. To an auditor, verification means proof, not process; it means formal procedures that can be tested to show no deviation from the standard, and corrective action steps for procedures that fail to consistently deliver compliance.

Alternative Model – Data Flow Analysis

An alternative model involves a more simple procedure focusing on risk. It shows that a company is least at risk when it collects information, and that the risk increases as it uses, stores and discloses personal information to third parties. The collection risk is mitigated through notice of the company’s privacy practices; IT security policies that include authorizations for access and use of information mitigate the risks associated with storage and use; and strong contractual safeguards mitigate the risk on disclosure of personal information.

A sound privacy policy is built around understanding how data flows through an organization. Simply put, you ask the following four questions:

What personal information do you collect
What do you use it for
Where is it stored and how is access granted to it
To whom is it disclosed

The results must then be compared to the existing privacy policy for accuracy and completeness. The best way to do that is on the front-end of the interview, not after the fact. In other words, preparation for each interview should include a review and analysis of the product and the accompanying policy.

A disadvantage with the above approach is that it is somewhat labor intensive and time consuming. Note however that this procedure is not a traditional audit, which can take far longer, cost much more and generally is backward looking (i.e., what did you do with data yesterday?). Instead, the data flow analysis identifies what the company does with data on an ongoing basis and armed with that knowledge, permits the company to continuously improve its privacy policies – it is a forward-looking approach that permits new internal tools or products to be developed around the output. For example, one side benefit of this approach is that every service would yield up the data elements captured and where they are stored.

Sub-Certification Method

There is yet one more alternative – the use of SoX-like sub-certifications to verify the accuracy and completeness of product or service privacy statements. Sarbanes-Oxley requires the company CFO and CEO certify that the information provided to the public regarding the company’s financial matters is true. In order to make the certification, most companies have established a system of sub-certifications where those officers and employees with direct, personal knowledge of the underlying facts certify up that the information is correct.

The same could be done in regard to privacy. There is a two-fold advantage from this approach. First, it emphasizes the importance of the information collection by attaching to it the formality of a certification. Second, it can inform a training program as it forces periodic review of the policy and therefore attention to its existence and relevance.

How granular should the inquiry be at the product level? In a distributed model of verification, the manner and means of confirming the accuracy of the content can be left to the entrepreneurial talents of the managers. The key is to ensure that the information provided is complete and accurate, and that the product lead and/or counsel are willing to certify the results.

There is very little guidance publicly available that informs the process of an in-house review, but it is hard to criticize the very same process accepted for validation of a company’s financial statements upon which individual consumers and investors rely for financial decision-making.

Monday, July 16, 2007

Safe Harbor Privacy Principles

Some privacy advocacy groups have made the claim (and others have repeated it) that Google doesn’t comply with any "well-established government and industry standards such as the OECD Privacy Guidelines." That’s just plain incorrect. Google complies with the robust privacy requirements of the US-EU Safe Harbor Agreement, as disclosed in its Privacy Policy. http://www.google.com/intl/en/privacy.html

The Safe Harbor privacy principles are generally considered to exceed the requirements of the OECD Privacy Guidelines, since they were designed to provide an equivalent level of privacy protection to the laws of the European Union. http://www.export.gov/safeharbor/
As a reminder, here are the privacy principles of the Safe Harbor Agreement:

WHAT DO THE SAFE HARBOR PRINCIPLES REQUIRE?

Organizations must comply with the seven safe harbor principles. The principles require the following:

Notice
Organizations must notify individuals about the purposes for which they collect and use information about them. They must provide information about how individuals can contact the organization with any inquiries or complaints, the types of third parties to which it discloses the information and the choices and means the organization offers for limiting its use and disclosure.
Choice
Organizations must give individuals the opportunity to choose (opt out) whether their personal information will be disclosed to a third party or used for a purpose incompatible with the purpose for which it was originally collected or subsequently authorized by the individual. For sensitive information, affirmative or explicit (opt in) choice must be given if the information is to be disclosed to a third party or used for a purpose other than its original purpose or the purpose authorized subsequently by the individual.
Onward Transfer (Transfers to Third Parties)
To disclose information to a third party, organizations must apply the notice and choice principles. Where an organization wishes to transfer information to a third party that is acting as an agent(1), it may do so if it makes sure that the third party subscribes to the safe harbor principles or is subject to the Directive or another adequacy finding. As an alternative, the organization can enter into a written agreement with such third party requiring that the third party provide at least the same level of privacy protection as is required by the relevant principles.
Access
Individuals must have access to personal information about them that an organization holds and be able to correct, amend, or delete that information where it is inaccurate, except where the burden or expense of providing access would be disproportionate to the risks to the individual's privacy in the case in question, or where the rights of persons other than the individual would be violated.
Security
Organizations must take reasonable precautions to protect personal information from loss, misuse and unauthorized access, disclosure, alteration and destruction.
Data integrity
Personal information must be relevant for the purposes for which it is to be used. An organization should take reasonable steps to ensure that data is reliable for its intended use, accurate, complete, and current.
Enforcement
In order to ensure compliance with the safe harbor principles, there must be (a) readily available and affordable independent recourse mechanisms so that each individual's complaints and disputes can be investigated and resolved and damages awarded where the applicable law or private sector initiatives so provide; (b) procedures for verifying that the commitments companies make to adhere to the safe harbor principles have been implemented; and (c) obligations to remedy problems arising out of a failure to comply with the principles. Sanctions must be sufficiently rigorous to ensure compliance by the organization. Organizations that fail to provide annual self certification letters will no longer appear in the list of participants and safe harbor benefits will no longer be assured.

While the Safe Harbor Agreement principles were designed as a framework for companies to comply with European-inspired privacy laws, the OECD Guidelines from the year 1980 were designed as a framework for governments to create privacy legislation. http://www.oecd.org/document/18/0,2340,en_2649_34255_1815186_1_1_1_1,00.html
The US has chosen to not (yet) implement those principles into its Federal legislation. As a public policy matter, in the US, Google is working with other leading companies to encourage the development of robust Federal consumer privacy legislation. http://googleblog.blogspot.com/2006/06/calling-for-federal-consumer-privacy.html
I’ll come back to the issue of US Federal and global privacy standards again soon. The global nature of data flows on the Internet requires renewed focus on the need for global privacy standards. I hope privacy advocates will work with us on that.

Monday, July 9, 2007

I know people who spent their entire childhood hiding from the German government

Governments around the world are asking whether they should restrict anonymity on the Internet in the name of security. Take Germany as an example. Should Internet service providers be required to verify the identity of their users? Germany recently proposed – and then retreated – on requiring that providers of email services must verify the identity of their account holders. However, Germany is on the path to require that providers of VoIP services must verify the identity of their users. The debates about the proper limits of anonymity on the Internet are profound. In case you’re interested in the details, here is a history of the proposals in Germany, from the drafts of the telecommunicationsurveillance act. German outside counsel summarized these for me.

* 8. Nov. 2006 - First draft submitted to the GovernmentThe German Ministry of Justice put together the first draft of law designed to reform telecommunications monitoring and to implement the directiveadopted by the European Union on the retention of traffic and location data.This draft contained the proposal that email service providers should be obliged to COLLECT and to STORE account data, name, address, date of birth,start date of the contractual relationship (proposed changes to §111 TKG).

* 18 April 2007 - First Draft of the German Government - "Regierungsentwurf" The draft of the German Government did not include an obligation for emailservice providers to COLLECT personal information. It contained, however,the obligation to STORE a personal identifier as well as name and address of the account holder IF the provider collects such data (proposed changes to§111 TKG).
Text: http://www.bmj.bund.de/files/-/2047/RegE%20TK%DC.pdf

* 29. May 2007 - Recommendation ("Empfehlung") of different working groupsto the German Federal Assembly (Bundesrat)The text did not proposed additional requirements for email serviceproviders to collect or to store personal data. However, it recommended that telecommunication service providers should be obliged to verify via theofficial ID card if the telecommunication user is the person who signed upfor the service (proposed changes to § 95 sec. 4 sent. 1 TKG). German legal experts expressed the opinion that this might also be applicable for email services.

* 8. June 2007 - Statement of the German Federal Assembly (Bundesrat) -"Stellungnahme des Bundesrates" The Bundesrat did not follow the recommended wording and did not suggest anychanges to the First Draft of the German Government as of 18 April 2007 with regard to email services.

So, in conclusion, anonymous use of Internet services is very much up in the air, in Germany, as regards certain services, such as VoIP services like Google Talk, even if the proposal to limit anonymity for email users appears to be off the table. Fundamental rights are in play. The age-old trade-offs between government security and privacy is being re-debated. I know people who spent their entire childhood hiding from the German government.

Saturday, June 23, 2007

The Working Party

The Working Party is a group of representatives from every European country’s data protection authority plus the European Commission, dedicated to working on the harmonized application of data protection across Europe. I think I have the (perhaps dubious) distinction of being the private sector privacy professional who has worked the most with this group in the last decade. Most of my peers flee the Working Party like the plague, but I agree with Mae West, who said, “Too much of a good thing is wonderful.”

In my many years of privacy practice, I’ve always thought the best strategy is to work constructively with the Working Party. They are thoughtful privacy regulators, trying to improve privacy practices and to enforce often-unclear data protection laws. The companies I worked for are committed to improving their privacy practices and to complying with European laws. And the Working Party itself is committed to becoming more effective at working with the private sector, and in particular with the technology sector. So, based on my many years of experience, how could this all work better? And by the way, if you think I’ll be biased and self-serving in making these observations, feel free to stop reading here.

Here’s my golden rule: when regulators want to change practices across an entire industry, then they shouldn’t just work with one company. To make the point, here’s a little timeline summary of the recent Working Party exchanges with Google.

November 2006: the international data protection authorities issued a resolution calling on all search companies to limit the time periods during which they retain personally-identifiable data. No leading search company publicly disclosed a finite retention period at this time.

http://ec.europa.eu/justice_home/fsj/privacy/news/docs/pr_google_annex_16_05_07_en.pdf

March 2007: Google chose to lead the industry by announcing it would anonymize its search server logs after 18-24 months.

http://googleblog.blogspot.com/2007/03/taking-steps-to-further-improve-our.html

This generated considerable positive press, in my opinion quite justified, as the first such move by a leading search company.

May 2007: the Working Party sent Google a letter asking it to explain its retention decisions, and to justify whether this period was “too long” under European data protection principles. This set off a worldwide press storm, as hundreds of newspapers ran headlines like: “Google violates EU data protection laws.” And many of the EU privacy regulators added fuel to the media flames, as they issued comments expressing their concerns about “Google”, or even declaring Google’s practices to be “illegal”, without even waiting for Google to respond to their letter.

June 2007: Various privacy advocates jumped on the publicity bandwagon. One even went so far as to declare Google to be the “worst” in terms of privacy, due to the vagueness of its data collection and data retention practices. But since Google was the only one of the entire list of companies to have publicly stated a finite retention period, I would have thought Google should have been declared the “best.” Of course, that report was thoroughly de-bunked by more thoughtful industry observers, such as Danny Sullivan: “Google Bad on Privacy? Maybe it’s Privacy International’s Report that Sucks.” http://searchengineland.com/070610-100246.php

Nonetheless, the press damage was done. Even my dad called me after reading his small-town Florida newspaper to ask me why I was so bad at my job. Argh.

Then, I published a long open letter explaining the factors Google took into account while announcing a new retention period of 18 months: privacy, security, innovation, retention obligations. http://googleblog.blogspot.com/2007/06/how-long-should-google-remember.html
I wanted us to be transparent about our analysis and the factors that guided it. Of course, I couldn’t really describe all the security reasons for log retention: you can’t describe all your security practices publicly without undermining your security. And you can’t describe all your uses of data for search algorithm improvements without revealing trade secrets to your competitors. But nonetheless, I think we have been remarkably transparent throughout this process. Meanwhile, our competitors have been completely, studiously silent.

Finally, the Working Party realized how unfair all this had become for Google, and told the press that its sub-group, called the Internet Task Force, would consider these issues further in July, and include other search companies in the review.

I’m quite eager to hear from other search companies. I undertook a thorough and thoughtful analysis of Google’s need for logs for these various (sometimes conflicting) purposes. I am intellectually curious to understand whether our peer companies balance these factors in the same way as we did, or differently. Will they announce retention periods too? And will they announce periods that are longer or shorter than ours?

Privacy on the Internet concerns everyone, and all companies. The Working Party has got to learn how to engage with the industry. I continue to remain committed to working with the Working Party, but I fear that other companies in the industry will draw the opposite lesson: keep a low profile and try as hard as possible not to make it onto their radar screen. That would be bad for privacy. Well, the Working Party is a work in progress. And I hope someone tells my dad I’m not doing such a bad job… Or maybe my studiously-silent peers were right, and I was wrong…?

Thursday, June 14, 2007

Server Logs and Security

I recently posted a blog to explain why Google retains search server logs for 18 months before anonymizing them.
http://googleblog.blogspot.com/2007/06/how-long-should-google-remember.html
Security is one of the important factors that went into that decision. Google uses logs to help defend its systems from malicious access and exploitation attempts. You cannot have privacy without adequate security. I've heard from many people, all agreeing that server logs are useful tools for security, but some asking why 18 months of logs are necessary. One of my colleagues at Google, Daniel Dulitz, explained it this way:

"1. Some variations are due to cyclical patterns. Some patterns operate on hourly cycles, some daily, some monthly, and others...yearly. In order to detect a pattern, you need more data than the length of the pattern.

2. It is always difficult to detect illicit behavior when bad actors go to great lengths to avoid detection. One method of detecting _new_ illicit behaviors is to compare old data with new data. If at time t all their known characteristics are similar, then you know that there are no _new_ illicit behaviors visible in the characteristics known at time t. So you need "old" data that is old enough to not include the new illicit behaviors. The older the better, because in the distant past illicit behaviors weren't at all sophisticated.

3. Another way of detecting illicit behaviors is to look at old data along new axes of comparison, new characteristics, that you didn't know before. But the "old" data needs to run for a long interval because of (1). So its oldest sample needs to be Quite Old. The older the data, the more previously undetected illicit behaviors you can detect.

4. Some facts can be learned from new data, because they weren't true before. Other facts have been true all along, but you didn't know they were facts because you couldn't distinguish them from noise. Noise comes in various forms. Random noise can be averaged out if you have more data in the same time interval. That's nice, because our traffic grows over time; we don't need old data for that. But some noise is periodic. If there is an annual pattern, but there's a lot of noise that also has an annual period, then the only way you'll see the pattern over the noise is if you have a lot of instances of the period: i.e. a lot of years.

This probably isn't very surprising. If you're trying to learn about whether it's a good idea to buy or rent your house, you don't look only at the last 24 months of data. If you're trying to figure out what to pay for a house you're buying, you don't just look at the price it sold for in the last 24 months. If you have a dataset of house prices associated with cities over time, and someone comes along and scrubs the cities out of the data, it hasn't lost all its value, but it's less useful than it was."

Monday, June 4, 2007

Did you mean Paris France or Paris Hilton?

Here's an OpEd I contributed to the Financial Times.
http://www.ft.com/cms/s/560c6a06-0a63-11dc-93ae-000b5df10621.html

Published: May 25 2007

There was a survey conducted in America in the 1980s that asked people a deceptively simple question: "Who was shot in Dallas?" For many who had lived through the national trauma of 1963, the deliberations of the Warren Commission, the theories about the grassy knoll and the magic bullet, there was only one answer: JFK. For others, who followed every twist of the Ewing family, the oil barons' ball and Cliff Barnes's drink problem, there was also only one answer: JR.

The point of the survey was to show how the same words can have very different meanings to different people depending on their background and their interests. It is the same idea that is driving Google's personal search service.

Our search algorithm is pretty sophisticated and most people end up with what they want. But there is inevitably an element of guesswork involved. When someone searches for "Paris" are they looking for a guide to the French capital or for celebrity gossip? When someone types in "golf" are they looking to play a round on the nearest course or to buy a Volkswagen car? An algorithm cannot provide all the answers.

But if an algorithm is built to take into account an individual's preferences it has much more chance of guessing what that person is looking for. Personalised search uses previous queries to give more weight to what each user finds relevant to them in its rankings. If you have searched for information about handicaps or clubs before, a search for "golf" is more likely to return results about the game than the car. If you have been checking out the Louvre, you are less likely to have to wade through all the details of a particular heiress's personal life.

This makes search more relevant, more useful and much quicker. But it is not for everybody. As the Financial Times has pointed out this week, personalised search does raise privacy issues. In order for it to work, search engines must have access to your web search history. And there are some people who may not want to share that information because they believe it is too personal. For them, the improved results that personalised search brings are not matched by the "cost" of revealing their web history.

The question is how do we deal with this challenge? Stop all progress on personalised search or give people a choice? We believe that the responsible way to handle this privacy issue is to ask users if they want to opt in to the service. That is why Google requires people to open an account and turn on their personalised search functionality. They do not have to give a real name to open a Google account, but even if they cannot be identified, we think they should have to give explicit consent before their web history is used. Unless they do, they will simply have the standard Google search service.

Our policy puts the user in charge. It is not something Google seeks to control. At any time they can turn off personal search, pause it, remove specific web history items or remove the whole lot. If they want, they can take the whole lot to another search engine. In other words personalised search is only available with the consent ofthe user.

If you think of search as a 300chapter book, we are probably still only on chapter three. There are enormous advances to be made. In the future users will have a much greater choice of service with better, more targeted results. For example, a search engine should be able to recommend books or news articles that are particularly relevant - or jobs that an individual user would be especially well suited to.

Developing more personalised search results is crucial given how much new data is coming online every day.The University of California Berkeley estimates that humankind createdfive exabytes of information in 2002 - double the amount generated in 1999. An exabyte is a one followed by 18 noughts. In a world of unlimited information and limited time, more targeted and personal results can really add to people's quality of life.

If you type "Who was shot in Dallas?" into Google today, the results are as divided as the survey's respondents a quarter of a century ago. But with personalised search you are more likely to get the "right" result for you. Giving users the chance to choose a search that is better for them as individualsis something we are proud of andwill continue to build on. After all, the web is all about giving people - you and me - more choice and more information.

Thursday, May 31, 2007

Sweden and government surveillance

All democratic governments need to maintain a delicate balance between 1) respect for the private lives of their citizens, and 2) police and government surveillance to combat crime. The Swedish government has proposed legislation to shift the balance radically towards government surveillance. These measures have a huge impact on the daily life of each citizen, living inside or outside Sweden. By introducing these new measures, the Swedish government is following the examples set by governments ranging from China and Saudi Arabia to the US government’s widely criticised eavesdropping programme. Do Swedish citizens really want their country to have the most aggressive government surveillance laws in Europe?

Recently, a new bill was introduced allowing the National Defence Radio Establishment (Försvarets radioanstalt, FRA) to intercept internet traffic and telephone conversations that cross Sweden's borders at some point. The FRA claims this additional surveillance power to be essential because terrorists and fraudsters now mainly rely on the internet to communicate. Operators will be obliged to co-operate with the legal authorities by channelling the data about their users to the FRA through so-called collection nodes (samverkanspunkter). While the FRA claims it is not interested in intercepting each citizen's emails and telephone conversations, it will nevertheless have the capability to do so once the bill is adopted. Citizens will not need to be suspected of fraud or any other illegal activity for their communications to be intercepted.

Apart from this stringent surveillance measures, the Minister of Justice also want to introduce a monitoring duty for internet access providers. Minister Beatrice Ask indicated that she wants access providers to be responsible for blocking illegal internet content. Strict legislation would be adopted if the internet service providers do not take their responsibility. The Minister's position is remarkable, as European eCommerce legislation explicitly forbids imposing this type of general monitoring on access providers. It also raises the question on which types of content should be considered illegal enough to warrant blocking, and runs the risk of crippling freedom of speech.

Technical experts are not convinced that massively storing and monitoring communication data will indeed aid in the fight against terrorism and fraud. For one thing, terrorists and fraudsters can easily use special tools (such as encryption) to circumvent any wiretapping. When telephone companies and internet access providers are required to monitor, filter and store communication data, costly investments are required. In Sweden, as in most European countries, the law provides no proper compensation for these investments by the government. Obviously, end-users will – literally – pay the price for having their conversations monitored.

Technical feasibility and high costs aside, I think the most important objection against wiretapping and storing data is that they interfere with every citizen's private life, communications and freedom of speech. By storing and being capable of monitoring data about every single phone call, fax, email message and website visited, safeguards provided by the European Convention on Human rights and the European Data Protection Directive are effectively undermined.

Sometimes, a government has to make difficult choices. It would be a sad day for Sweden, if it passes the most privacy-invasive legislation in Europe, and thereby puts itself outside of the mainstream of the global Internet economy. And don't get me wrong, I love Sweden. That's why I care.

Monday, May 7, 2007

Some rules of thumb for online privacy

Here's a short 0pinion piece that I contributed to this month's edition of .net magazine:
http://www.netmag.co.uk/zine/latest-issue/issue163

Privacy is one of the key legal and social issues of our time. Mobile phones pinpoint where we are to within a few hundred meters. Credit cards record what we like to eat, where we shop and the hotels we stay in. Search engines track what we are looking for, and when. This places a huge duty on business to act responsibly and treat personal data with the sensitivity it deserves.

The Internet is where privacy issues are the most challenging. Any website that collects personal data about its visitors is confronted with an array of legal compliance obligations, as well as ethical responsibilities. I deal with these every day, and here are some of my rules of thumb.

First, be very clear about whether your site needs to collect “personal data” or not. “Personal data” is information about an identifiable human being. You may wish to construct your site to avoid collecting personal data, and instead only collect anonymous and statistical information, thereby avoiding all the compliance obligations of privacy law. For example, we designed Google Analytics to provide anonymous and statistical reports to the websites that use it, giving them information about their visitors in ways that do not implicate privacy laws (e.g., the geographic distribution of their visitors). Even the UK Information Commissioner’s website uses Google Analytics, and I think the disclosure that they put on their site is a best practice in terms of transparency to end users: http://www.ico.gov.uk/Global/privacy_statement.aspx

Second, if your site collects “personal data”, then you must post a privacy policy. Most sites choose to display it as a link on the bottom of each page. A privacy policy is a legal document, in which you provide “notice” to your visitors about how your site will collect and use their personal data, as well as obtain their “consent”. Because it’s a legal document, it needs to be drafted carefully. But that doesn’t mean that it needs to sound like it was written by lawyers. I think the best privacy policies are short, simple, and easy to read. If you have a complicated site, like Google’s, then it’s a good idea to present the privacy policy in a layered architecture, with a short, one-page summary on top, with links to the fuller policy, and/or with links to privacy policies for specific products or services within your site. Take a look and see if you like our model: http://www.google.com/privacy.html

Third, if your site collects “sensitive” personal data, such as information about a person’s health, sex life, or political beliefs, then you will have to obtain their explicit opt-in consent. In fact, it’s usually a good idea to obtain a user’s opt-in consent anytime your site collects personal data in an unusual, or particularly broad way that the average Internet user might not be aware of. Remember, the privacy legal standard for using a person’s personal data is “consent”, so deciding on the right level of consent will always depend on the facts and circumstances of what your site does.

Fourth, EU data protection law places restrictions on the transfer of personal data from Europe to much of the rest of the world, to places that are deemed not to have “adequate” data protection, such as the US. So, if your site operates across borders, then you should find a legal mechanism for this transfer. Google has signed up to the terms of the US-EU Safe Harbor Agreement, which legitimizes the transfers of personal data from Europe to the US, as long as the company certifies that it will continue to apply the Safe Harbor’s standard of privacy protections to the data. You can read more about that here: http://www.export.gov/safeharbor/
But the Safe Harbor is only one of various alternative methods, including: 1) the explicit consent of the data subject, or 2) “binding corporate rules”, which obligate the company to apply consistent, EU-style privacy practices worldwide, to name just two.

Finally, privacy is about more than legal compliance, it’s fundamentally about user trust. Be transparent with your users about your privacy practices. If your users don’t trust you, you’re out of business.

Peter Fleischer: Privacy...?