Monday, February 19, 2007

Your Data is in the “Cloud”

Henry David Thoreau was prescient again, when he wrote: “You must not blame me if I do talk to the clouds.” We’re all doing that now, even if Thoreau had more to say than most of us.

So, if your data is in the cloud, where exactly is that? The cloud is the data that exists within the physical infrastructure of the Internet. Web 2.0 services are built on the concept that data held in the cloud enables users to access and share data from anywhere, anytime and from any Internet-enabled device. The cloud exists on the servers of the companies offering these services, as well as on the browsers of users’ own devices. To know the “location” of your data, you’d need to understand the architecture of data centers.

Some companies like Google have very large data centers in multiple locations. A data center is simply a warehouse building with stacks of server computers. Companies try to pick places that are near cheap, reliable sources of electricity. They tend to prefer not to specify publicly the exact locations of these data centers, for a couple reasons. First, competitors are watching each others’ choice of data center locations. Second, strong security practices dictate that they be kept as low-profile as possible. Nonetheless, newspapers have written extensively about Google data center construction projects in Oregon and North Carolina, to name just two.

As a user of a Web 2.0 service, you expect your service provider not to lose your data and to respond to your queries quickly. Data centers therefore usually replicate users’ data in more than one place. Google users would not be happy if they lost all their data just because the power goes out in Oregon. And the geographical location of data centers can be optimized to enhance the speed of a service, e.g., serving European users from a European data center can be faster than having the data cross the Atlantic. Finally, having data centers in different locations allows companies to optimize computing power, automatically shifting work from one location to another, depending on how busy the machines are.

For all those reasons, it’s actually very hard to answer the apparently simple question: “where’s my data?” Yes, data protection law was largely written in an era when data did indeed have an easily-identifiable location. But, now, if you want to know how your data is being protected, the important question is not “where is my data?”, but rather “who holds my data?” and “what is the privacy policy being applied to my data?”

You can’t pin-point the location of the clouds, but you can still talk to them.

1 comment:

Unknown said...

the cloud is a metaphor. the data is on locatable servers. if you can censor you can manage data.