Thursday, August 30, 2007

IP Geolocation: knowing where users are – very roughly

A lot of Internet services take IP-based geolocation into account. In other words, they look at a user's IP address to try to guess the user's location, in order to provide a more relevant service. In privacy terms, it's important to understand the extent to which a person's location is captured by these services. Below are some insights into how precise these are (or rather, are not), how it's done, and how they're used in some Google services.

The IP geolocation system Google uses (similar to the approach used by most web sites) is based primarily on third-party data, from an IP-to-geo index. These systems are reasonably accurate for classifying countries, particularly large ones and in areas far from borders, but weaker at city-level and regional-level classification. As measured by one truth set, these systems are off by about 21 miles for the typical U.S. user (median), and 20% of the time don't know where the user is located within less than 250 miles. The imprecision of geolocation is one of the reasons that it is a flawed model to use for legal compliance purposes. Take, for example, a YouTube video with political discourse that is deemed to be “illegal” content in one country, but completely legal in others. Any IP-based filtering for the country that considers this content illegal will always be over- or under-inclusive, given the imprecision of geolocation.

IP address-based geolocation is used at Google in a variety of applications to guess the approximate location of the user. Here are examples of the use of IP geolocation at Google:

Ads quality: Restrict local-targeted campaigns to relevant users
Google Analytics: Website owners slice usage reports by geography
Google Trends: Identifying top and rising queries within specific regions
Adspam team: Distribution of clicks by city is an offline click spam signal
Adwords Frontend: Geo reports feature in Report Center

So, an IP-to-geo index is a function from an IP address to a guessed location. The guessed location for a given IP address can be as precise as a city or as vague as just a country, or there can be no guess at all if no IP range in the index contains the address. There are many efforts underway to improve the accuracy of these systems. But for now, IP-based geolocation is significantly less precise than zip codes, to take an analogy from the physical world.


mccaffrey said...

Free service - recently launched is - uses Amazon's ec2 infrastructure and gives real time information about a visitor's location

bayesianlogic said...

real time information that's wrong is hardly useful.