Friday, February 15, 2008

Can a website identify a user based on IP address?

There is a public debate about whether IP addresses should be considered to be “personally-identifiable data” (to use the US phrase) or “personal data” (to use the European phrase. The question is: when can a person be identified by an IP address? This is a question of significant import, since it’s relevant to every single web site on the planet, and indeed to every single packet of data being transferred on the Internet architecture. I’ve blogged about this before, but the debate has evolved:

Last year, the Article 29 Working Party of EU data protection authorities published an official Opinion on the concept of personal information which included a thorough analysis of what is meant by “identified or identifiable” person. The Opinion pointed out that someone is identifiable if it is possible to distinguish that person from others. The recitals that precede the EU data protection directive explain that to decide which pieces of information qualify as personal information, it is necessary to consider all the means likely reasonably to be used to identify the individual. As the Working Party put it, this means that a mere hypothetical possibility to single out an individual is not enough to consider that person as identifiable. Therefore, if taking into account all the means likely reasonably to be used, that possibility does not exist or is negligible, a person should not be considered as identifiable and the information would not be considered as personal data.

Two recent decisions from the Paris Appeals Court followed this logic. The Court concluded that 'the IP address doesn't allow the identification of the persons who used this computer since only the legitimate authority for investigation (the law enforcement authority) may obtain the user identity from the ISP' (27 April ruling). The Court recognized in the same decision that 'it should also be reminded that each computer connected to the Internet is identified by a unique number called "Internet address" or IP address (internet protocol) that allows to find it among connected computers or to find back the sender of a message'. In its 15 May ruling, the Court considered that 'this series of numbers indeed constitutes by no means an indirectly nominative data of the person in that it only relates to a machine, and not to the individual who is using the computer in order to commit counterfeit.' The Court conclusion was then that this collection of IP addresses does not constitute a processing of personal data, and consequently was not subject to CNIL prior authorization, as required by the French Data Protection Act. The CNIL has protested loudly that these court decisions are incorrect, but the CNIL’s own position of declaring “all” IP addresses to be personal data, regardless of context, seems to be incorrect to me.

Paris Appeal Court decision - Anthony G. vs. SCPP (27.04.2007)
Paris Appeal Court decision - Henri S. vs. SCPP (15.05.2007)
IP address is a personal data for all the European DPAs (2.08.2007)

Let’s take Google as an example. Like all websites, Google servers capture the IP addresses of its visitors. If a user is using non-authenticated Google Search (i.e., not using a Google Account to log in), then Google collects the user’s IP Address along with the search query and the date and time of the query. Can Google determine the identity of the person using that IP Address only on the basis of that information? No. The IP Address may locate a single computer or it may locate a computer network using Network Address Translation. Where the IP Address locates a single computer, can Google identify the person using that computer? The answer is still “no”. The IP Address enables to send data to one specific computer, but it does not disclose which actual computer that is, let alone who owns it. In order to get to that granular of a level, it would be necessary for Google to ask the ISP that issued the IP Address for the identity of the person that was using that IP Address. Even then, the ISP can only identify the account holder, not the person who was actually using the computer at any given time.

Also, the ISP is prohibited under US law from giving Google that information, and there are similar legal prohibitions under European laws. Surely, illegal means are not “reasonable” means in the terms of the Directive.

So the reality is that like any other web site on the Internet that logs the IP Address of the computer used to access that site, the chances of Google being able to combine an IP Address with other information held by the ISP that issued that IP Address in order to identify anyone are indeed negligible.

However, let’s hypothesize for now that Google could ask the ISP for that information. Could the ISP give Google the identity of the person? Again, the answer is “hardly.” Why is it so difficult? First, an ISP can only link an IP Address to an account. That means that if there are multiple people, like a family, logging into the same account, only the account holder’s name is associated with the IP Address.

Second, ISP’s are given a finite number of IP Addresses to assign to their subscribers. At this point there are not enough IP Addresses to cover the number of users that wish to access the Internet. So, many ISPs have resorted to the use of dynamic IP Addresses. This means that a user could be assigned a different IP Address as often as every time they access the Internet. In order for the ISP to track the account that is connected to an IP Address, the ISP may require the actual date and time of use.

Finally, almost all big organizations have their own private network that sits behind a firewall. They may use static or dynamic IP addresses, but in either case these are not visible outside the organization. They are using Network Address Translation (NAT). NAT enable multiple hosts on a private network to access the Internet using a single IP Address. NAT is also a standard feature in routers for home and small office Internet connections.

So again, on the balance of probabilities and taking into account any factors identified by the Working Party as relevant, the most obvious conclusion is that the IP Addresses obtained by Google and other websites are not sufficiently significant or revealing to qualify as personal data from the point of view of the EU data protection directive.

Some people have raised the question whether the government/law enforcement can identify an individual user from an IP address from Google’s logs. Google on its own cannot tie any IP to any specific ISP account or any specific computer. We simply know that the IP address locates a computer that is accessing our system. We don’t know who is using that computer. So, in order for someone to tie the IP to an account holder, there have to be at least two subpoenas issued: one to Google and a separate one to the ISP.

Others have suggested that IP addresses should be considered “personal data”, on the mistaken understanding that looking up an IP address in a “whois” directory allows IP addresses to be tied to identifiable human beings. But in reality, if you look up an IP address in a whois directory, you usually get the name of the organization that manages the IP address. So, normally, Google could determine that a user’s queries come from a particular IP address owned by, say, Comcast, but Google has no way of knowing the name or organization of the human being behind the IP address.

A different question altogether is whether identifiability should equate to individualization. As discussed above, identifiability is about the likelihood of an individual being distinguished from others. But for this distinction to merit the protection afforded by privacy laws, it must be necessary to establish a link between the person and their right to privacy. For example, during the course of an online transaction between a retail web site and a customer, that customer’s identity will be protected by data privacy laws that impose obligations on the website operator (like seeking the customer’s consent for ancillary uses of customer information) and give rights to the individual (like allowing the customer to opt out of direct marketing). However, if someone who visits the web site for the first time (therefore prior to any transaction taking place) is presented with a local language version of the web site as a result of the geographical identifier associated to the IP Address used to access the site, there will be an element of individualization that does not involve identifying the person. In other words, unless and until that user becomes a registered customer, the web site operator will not be able to identify that individual. But the language appearing on the pages accessed by anyone using that IP Address may be different from the language presented to those using an IP Address associated with a different geographic location.

Should privacy laws apply in this situation? There is an obvious danger in trying to apply privacy laws as we understand them today in terms of notice, choice, access rights or data transfer limitations, to these types of cases. For example, there is no way that websites can provide consumers with a so-called right of access to IP-address-based logs, since such databases provide no way of authenticating a user. Individualization of Internet users is a logical and beneficial result of the way in which Internet technology works and sometimes it is also indispensable in order to comply with legal obligations such as presenting or blocking certain information in certain territories. Attempting to impose privacy requirements to situations that do not affect someone’s right to privacy will not only hamper technological development, but will entirely contradict the common sense principles on which privacy laws were founded. Privacy laws should be about protecting identifiable individuals and their information, not about undermining individualization. No doubt some people think that the cause of privacy is advanced, if data protection is extended to ever-broader categories of numerical locators like IP addresses. But let’s think hard about when these numbers can identify someone, and when they can’t. Black and white slogans are usually wrong. The real world is more complicated than that.


The Dean said...

I don't mean to be contrary but I suspect that if I had enough data from enough websites from maybe having even an image on a page and the co-operation of only one website which held information about an individual - that I could build a very complete picture of their online experience.

Yes single i.p. addresses may 'hide' a large number of individuals but it is by no means infinite in practice. It's not rocket science to match and predict and compute and google already has a lot of information because it provides so many leads or clicks with search. If a different fee was involved in a sale to a click rather than a dud then google could extrapolate from that what purchases an individual makes. Add all this to users conveniently having home pages and searching in the address bar and it is pretty well all there in black and white - lets be realistic. Before long you'll have dynamic pages appearing in your favorite colors with the product displayed precisely in the context relevant to you, but it isn't quite that easy. Mistakes could be dangerous.

The biggest problem on the web is lack of authentication - it's the root of all evil, but the only way it can be achieved is through giving anonymity - at least from each other and the businesses we buy from, and even google. Then the web will be a fun experience because you won't know my name be able to ever put the real face to me - but you will be able to personalise my anonymous experience to the nth degree - happily.

The merchants will be happy without the fraud, kids will be chatting with kids not 42 yr old perverts, free speech will be free of fear, and my ID will remain my ID>:{] It's coming...

Greg said...

Hi Peter, I've was just reading some of your comments from an article in Monday's Sci-Tech Today

Here is Canadian case law to help support your argument:

PIPEDA case #319

" In her [asst. Privacy Commissioner of Canada's] view, an IP address can be considered personal information if it can be associated with an identifiable individual. "


Phil said...

Greg - you are referencing a case involving an ISP. ISP are the only excepting to the rule; as they store the customers billing address, and thus can link IP to the user in the real world. Search engines & webmasters are not able to make the same association.

However, you are on the right track with regards to SMPT servers. Here are some more Breadcrumbs...

IP`s are not JUST used to surf the internet. They are also an intrinsic part of email communications as the IP and email sending server is stored within the header of each and every email you receive. This sending server (called an SMTP server), which in most cases will be the one your ISP specified when you setup your broadband (e.g

However, not every user, uses public SMPT servers. It is possible to setup a private SMTP server on any Windows XP pro machine (assuming your ISP does not block port 25). This means that whenever you send an email your IP will be visible in the email headers.

As with dynamic & static IP, it is not possible to know if a visitors IP is dynamic & static. It is also, not easily possible to know if a user is using a public or private SMTP server without receiving the email first. This would mean that all companies using email would need to register with the Data Protection act if Peter Schaars suggestions that an IP is deemed to be a Personal Identifier. The disruption to business makes this unrealistic and it could result in an unenforceable data protection policy.
Secondly, there is more chance of a user being identified by a guessed IP rather than a stored one. For example if you connect a brand new PC running Windows XP Service pack1 with no Microsoft updates, using an ADSL modem (rather than a router) to the internet; it will be found in about 15 minuets, and once found malware or trogans can be installed to display adverts or take control of the PC.

It is important to note that that this can happen without the user visiting any websites, purely the fact that they are connected to the internet without a firewall means their IP is susceptible. This has analogies with email spam where randomly generated email addresses are guessed in sequence, IP`s can be guessed in a similar way.

A far greater threat to individual’s privacy lies with malicious programs & fake emails as these can result in financial loss, inconvenience, and reduced trust in internet usage. Thus more resources should be allocated to dealing with phishing emails, malware and trogens, as this would have more of an impact on the masses than storing IPs and cookies. Publicity of programs such as Windows defender, Adaware, Spybot Search & Destroy, AVG, Zone alarms, Comodo Firewall and user training would aid in this cause.


Phil said...

Peter - Have Google considered a compromise with Working Party 29 by offering support in Google Webmaster tools for P3P machine readable privacy policies - in return Working Party 29 reduces their demands that IP are personal?

A Google (& possibly Firefox) supported P3P policy would most likely benefit the user community more than Working Party 29 preventative measures for storing IP`s.

This is because a user would be able to choose what types of websites store personal information about them rather the current system where information is stored until the point a tracking cookie is blocked or iGoogle account deleted (i.e opt-out of cookies, rather than P3P opt-in).

Also, the P3P scheme is failing due to lack of support from browsers and major organisations (see P3P work suspended ). Integrating a basic form of P3P into webmaster tools and providing a free p3p creator would allow webmasters to easily create a p3p.xml page, similar to the sitemaps.xml

It would then be down to the Working Party 29 to encourage legal enforcement of p3p policy. It may be necessary for a 3rd party to confirm that a website`s p3p is correct. This is because some websites that use the existing p3p scheme lie about what they do with cookies; to improve user tracking and prevent the cookie being blocked in IE due to user’s privacy preferences.

Additional, if brought out a free Web Privacy Seal with a link to the validator similar to the XHTML quality seal, then it may encourage adoption by webmasters.

Note that has a web privacy scheme in existence already, it does not support machine readable privacy policies and is $649 - $13K per year! (see Trust prices or seal

This could provide Google with an opportunity to improve its privacy perception with users (which helps with user retention) and provide Working Party 29 with a means to revive the dying P3P policy which helps meet its aims of protecting individuals.


Anonymous said...

First comment:
" ISP can only link an IP Address to an account. That means that if there are multiple people, like a family, logging into the same account, only the account holder’s name is associated with the IP Address."
That is still identifying an individual by distinguishing them from others.
Second comment:
re Phil's post on a compromise
This is not a bazaar where we barter to reach agreement. If the data protection Commissioners consider IP addresses in most or certain circumstances to be personal data, that doesn't change just because you use different technology! You might find a more privacy friendly way of doing things, but the fact remains that IP addresses are personal data.

Phil said...

Mr Anon – Technical, IP`s can Not be linked to a Windows user by ISP`s.

This can only be done by Search Engines or WebTracking software on Websites that store a cookieID on the windows user account.

So, Mum will have cookieID:123, Dad will have cookieID:124 etc. The same applies on corporate networks; which either share the same internet connection with thousands of employees or connect through a proxy server (a connection through another computer).

In both examples; IP`s do Not allow the user to be distinguished from other users of the same computer, or from users of the same corporate network.

So... IP`s do not distinguish users, CookieID`s do.

However, unlike IP`s a CookieID is just a random number. Unless, it is linked with other information, it provides no way of identification.

IP`s can be linked across two different systems. For instance if Google wanted to combine DoubleClicks data with visitors search data IP`s would allow linkage.

Note: IP linking is a rough guess, due to the variance caused by dynamic IP`s.

2nd Comment - I was not bartering, I was suggesting!

The problem is that the technology is fast exceeding the confines of the laws that were written to govern it.

The Internet landscape was very different 10 & 5 years ago when DPA 1998 & EU Privacy and Electronic Communications Directive 2003 were created.

Secondly, WP29 & ICO are experts on data protection & the law; they are not experts on Internet Technologies and are not the ones who have to apply the legal changes to new IT systems. Also, they do not have the ability to mass-communicate with Webmasters; who would be the ones implementing p3p or similar privacy policies for websites.

Thus, it is even more imperative to seek industry consultation and be open to constructive SUGGESTIONS, before drawing conclusions on the legal implications of judging IP`s as personal or imposing new laws effecting Internet Technologies.

Afteral, there might be a better way to do something or a technical situation you might have overlooked.

For example the WP29 supported P3P scheme initiated in 2002 has largely failed due to a lack of corporate support from Browsers & Search Engines.


Also is using IP`s within two server log analysing programs designed to monitor user activity these are Domino Web server: and Cordis

On examining the website I notice that there is an absence of a p3p policy see: or

Anonymous said...

great post buddy.

Anonymous said...

ICO website has just got hacked and has been down for over 3 hours (all sites on their server are not responding )

Looks like they must have really peeved off an ISP, or company working with an ISP who has an understanding of hosting servers and knowledge of how to exploit security vulnerabilities.

Although, I don't endorse unlawful vigilante tactics like this, it does highlight the need for Lawyers to work with Tech Companies to prevent these sorts of things happening.

Personally, I believe this is an attempt to undermines the ICO`s credibility as a legal enforcer - if the ICO is unable to protect the privacy & security of their own website - how can they enforce laws on other companies doing similar things. To summarise, their new powers to criminalize data leaks just got a setback:

Poor old - looks like those two press releases about phorm on the ICO homepage have disappeared into to the a "Runtime Error" message. Jeffrey Brooks will be pleased ;-)

At least the ICO do not hold any personal data on their website; otherwise they would have to issue a notice to all effected.

FYI: I have notified the server admin, whois contact, and a company specialising in government security - not much more I can do to help.

DN said...

Great comments, Phil!!!

To add to the cookie tracking, it is up to individual discretion to manage their cookies. You can block cookies, delete them periodically, e.t.c. And a lot of poeple do that! But there is no way that you can link an individual to an IP address when they come to your site.

For the Anonymous, believe me, there are strict compliance and privacy departments in every large firm that is able to afford the online tracking. They do have laywers and they do tend to be even stricter then the self-regulated indistry guidelines suggest. Also, their actions are transparent, you can read the disclosures on the sites you visit. As, Phil said, bringing legislature into the online world without a tech understanding could be one-sided and harmful, or they have a high likelihood of creating rules that are impossible to implement. What you need is compromise and guidelines with technical understanding. And even though there is a lot of hype around the behavioral targeting industry, it is still in its infancy stage!

DN said...

Great comments, Phil!!!

To add to the cookie discussion, the internet user has the discretion on which website's cookies he/she would like to keep. They have the option to block or delete. And a lot of people do that!

Moreover, as mentioned earlier, that is no way that the online marketer can find out what computer you are using based on your IP address. All they can get is statistics of clicks that they can segment into other look-alike clickstreams. The whole behavioral targeting online is a bit hyped up - the field is still in it's infancy.

For Anonymous, as Phil said bringing legislature into the space, will most likely only lead to regulations that cannot be implemented. The technology knowledge and understanding needs to be there! Also, regarding your comment of having lawyers in the space - I would be surprised if you can find a company large enough to afford online tracking without a solid privacy and compliance department and vigilant lawyers! Typically, every legitimate site you visit, will have a disclosure of their practices and their privacy and compliance departments try to be tighter than the guidelines of the self-governed online world.

The Dean said...

I did a little experiment by 'following' an IP address through the logs of many web servers eventually tying the surfer to an identity when they logged on to a site.
You too can do it.
It's as simple as searching an i.p. address and checking the log entries which match times. If any of the sites they visit use the identity then the identity can be linked to the sites they visited in that internet session at least.
Sure it may be mum and dad and you may bot be able to tell which, but that would be the exception rather than the rule.
I am not a multi-million dollar equipped behavioural marketer with access to the same same resources they are, yet I can do it.
Get real.

Anonymous said...

Hi The Dean,

You say you "followed" an IP address through the logs of many web servers -- in
what sense would you have access to the logs of many web servers? But it's true that someone who has access to a web server logs could see a
log that includes (userid, IP, timestamp) if he has access to other web servers that have
logs that include (IP, timestamp), then someone who has access to the
logs across all of the servers can draw conclusions about userid (and
thus possibly identity) by matching IP and timestamp ranges. But in the real world, you don't have access to all these different web server logs. That's like saying you could figure out how much money someone has by adding up their accounts at 5 different banks, when in reality, there's no way you can get into their account details, because the banks that hold them protect them from unauthorized access. So do the companies that hold web logs.

The Dean said...

In terms of access to web logs, without resorting to any black hat tricks there are a surprisingly large number of web servers with visible logs, although we'd hope that the more reputable sites would have them hidden, at least from the average joe.
What we are really talking about here is when you visit a site which has for instance google analytics on it, like a large number of sites, you are traceable by your cookie(s). I tend to be a non-average cookie hoarder but I have seen that there are a lot of cookies being placed in pages which are accessible by more than one site, and even sometimes a whole TLD.
Is there really some doubt that google would have access to enough information which could be mined to track a user?
I notice they already use some of it to anticipate what I might want to see advertised (usually incorrectly).
Anyone who puts an MS or google toolbar is of course making the job easy.
I have clients who would happily pay to see what certain users or IP addresses were googling, if it were legal(and it is somewhere).
For instance might the Democrats like to know what the Republicans were searching for on the net? This might give them some insight into what issues were likely to emerge and might enable them to defuse issues before they were surprised by them.
Perhaps I'm too Machiavellian in my thinking, but having seen exactly that capability in the real political world (although not with the abovementioned parties) perhaps I know better.
It's probably a moot point anyway because I suspect the poor consumer might wake up to what is happening eventually, if not then I suppose I'll just have to become one of those using it to exploit them too, or perhaps provide something to protect them from it.
The issue search providers may face is that the consumer en masse can very quickly change their mind, their search engine and even browser and todays favourite could easily become tomorrows pariah.
Too much behavioural marketing might just tip it.
The point I'll finish with is that an awful lot of people are spending a lot of money trying to do it, an in my experience - nothing is impossible (except time travel) and there are no secrets.
Perhaps the best model for the web would be anonymity once you enter, like a mask, only it was noted by a trusted party which mask you had on in case you get up to funny business. That way we could access all of our own information without identifiable tags so that it didn't matter that anyone can see what is flying around the wires and airwaves but they would not know whose information it was.
Some of us are aware that nothing is secure or secret on any network anywhere, no matter how much the snake oil salesman would tell you otherwise.

I have personally used search bots to find everything from the names and addresses of the researchers, the location of their facilities, their budgets and research, production and deployment details etc etc,of some of the worlds most 'secret' defense projects, although I suspect it isn't as fruitful as it used to be.
I would be foolish to assume I was the only one, that's the scary part.
Don't bother to ask, I'll never tell, but it made me feel very scared and surprisingly secure at the same time. There's a lot of stuff going on out there, and it's not 1950 anymore if you know what I mean.

Go to a whois server and type in the name of an ISP pick an i.p address in your isp's range and type it into google. Most users have fixed ip addresses, certainly at home on broadband anyway.
If it's too hard try say -

maya said...

why is there no link on this blog to email the author of this blog????

why can't EU privacy standars also apply to U.S. inhabitants? ( why should standards be lower for U.S. google-users than for EU-users???

AND: I have always wanted to know, and have asked at lots of places, but I have never received a reply: WHY does google save IP-address of users using google search anyway???? (I mean users who are NOT signed up, and thus have elected NOT to send any personal info (incl. IP address) to google...)

Anonymous said...

re Phil's reply to my comment that it is not a bazaar.
I was not suggesting that legislation and internet experts do not work together - in fact that is precisely what is needed. Art 29 are experts in dp, but cannot be experts in everything else, so need to work more closely with industry to get the facts straight and not pronounce on things without fully understanding the issue. What I objected to was your comment that Art 29 'reduce their demands that IP are personal data'. It is not a question of reducing demands. Art 29 consider IP addresses to be personal data in most circumstances. That view is unlikely to change, although obviously more privacy friendly technologies are more likely to meet with approval. Privacy laws are generally technology neutral, and have to be. So inevitably as technology advances and changes, the considerations have to be on whether the relevant law applies, and this can be answered for some technologies by looking at (among other things) the question: is it personal data. If the experts decide it is, then legal implications follow. The other thing to note is that although Art 29 don't know everything, there are techy people in most data protection authorities around the world, they do talk to each other, and most have good relations with industry, so they are not acting in a vacuum without consulting anyone. Also, their view has to represent the views of all 27 EU authorities, so some compromise is inevitable.

DN said...

Yes, you can google just about anything like you said, Dean, but it is up to you to post your personal info online...but believe me, site marketers typically don't have the time in the day or the desire to cross strict compliance guidelines and regulations to see what you do on the web as an individual. All the info used is non-PII, same with google - you're just an anonymous profile (if you don't delete your cookies from google). Also, with the way the internet is growing, i personally don't mind someone sifting through the rubbish to get me more relevant results I'm searching for.

By the way, it is interesting how people see the negative side of collecting IPs, but it is helpful in fraud detection also!

tarocchi said...

i think it ll take still long time before than all the stuff will be propertly regoulated...

Bonnie Yu 余巧 妍 said...

Nice post! I recently got interested in thinking about the question how to identify an individual "user" using data available on the web, and found your post as I am learning about this subject matter.

The individual "user" problem is so so so difficult. You noted the one machine to many user, but then there's also the many machine to one user issue. I think it's such a fascinating topic. It is not feasible to do it with just IP address alone. there are many potential signals. even if people share accounts - you can imagine their behaviors are different.

It's amazing that 3 years ago people were already talking about behavioral targeting and search personalization.

Today with the advent of social media profiles identifying an individual user has gotten "easier."