Wednesday, October 1, 2025

The world’s largest surveillance system…hiding in plain sight

 

The world’s largest surveillance system is watching you. 


It’s capturing (almost) everything you do on the web on (almost) every website.  And it’s hiding in plain sight. 


And it’s “legal” because it’s claiming that you know about it,

and that you consented to it. 

But do you know what it is?  

Do you know what “analytics” is?  Websites use analytics services to give them insights into how their users interact with their sites.  Every website wants to know that.  And analytics providers can give them that information.  For example, an analytics provider can give detailed statistical reports to a website about their users and how they interact with its site:  how many people visited the site, where did they come from, what did they view or click on, how did they navigate the site, when did they leave/return, and many, many other characteristics.  This data can be collected and collated over years, over thousands or millions of users.  

There are many providers of analytics service, but according to analysts, there is only one 800-pound gorilla, Google Analytics.  

“Google Analytics has market share of 89.31% in analytics market. Google Analytics competes with 315 competitor tools in analytics category.

The top alternatives for Google Analytics analytics tool are Tableau Software with 1.17%, Vidyard with 0.78%, Mixpanel with 0.59% market share.”

And according to other third party analystsAs of 2025, Google Analytics is used by 55.49% of all websites globally. This translates to approximately 37.9 million websites using Google Analytics.”

You get the point:  one company, one service is capturing the bulk of the web traffic on the planet.  Websites get statistical reports on the user interactions on their sites.  Google gets individual-level information on the actions of most everyone on the web, on most websites, click-by-click, globally.  Wow. 

Legally, a website that uses Google Analytics is contractually obligated to obtain "consent" from its visitors to apply Google Analytics.  But often the disclosure on those websites is cursory, or even incomprehensible:  “we use analytics”, or “we use analytics software for statistical purposes”...which sounds harmless, but hardly would explain what’s happening to the average user.  Technically, what happens is simple, but invisible to the average user:  when they click on a website, that website auto-transfers to Google, in real time, detailed information about every step a user takes on its site. What’s happening is very simple.  A site using Google Analytics incorporates a small piece of code on its site which auto-transfers to Google, in real time, information about every interaction its users have:  every visit, every click, and information about each of those visitors, on an identifiable basis.  

In fairness, Google Analytics has some privacy protections.  Its reports to its client websites are statistical, rather than reports at individual users.  But even if the websites don’t get information about users at an individually-identifiable level, Google does…. And Google does not do cross-site correlation, i.e., it does not profile users across sites, for Analytics purposes.  (Note, Google does exactly this cross-site correlation in the context of its Ads businesses, but that’s a different topic than this blog.)  

All this is “legal” if it’s based on consent.  A phrase disclosed in a privacy policy, or a cookie notice, no doubt you’ve seen, or maybe clicked on, is deemed to constitute “consent”.  But really, did you or the average user have a clue?  

I’m in the school of believing that analytics tools represent a relatively low level of privacy risk to individual users.  But what do you think if one company is getting real-time information about how most of humanity is engaging with websites on a planetary level?  A user goes to any random site, but their data also auto-transfers to Google, did they know?  Since the scale of this service vastly exceeds any other service on the web, the scale of this data collection is the largest on the web.  Please respond with a comment if you can think of anything of similar surveillance scale.  I know you can’t, but let’s engage in the thought experiment.  I’m not picking on Google (I love my former employer), but in this field, which is essential to privacy, it’s the 800-pound gorilla, surrounded by a few mice.  

And the photo, if you’re interested, is Chartres Cathedral, built in the era when we believed only God was all-knowing.  

Wednesday, September 24, 2025

The Irish Backdoor

It’s not a gay bar in Dublin, the Irish Backdoor, sorry if that’s why you clicked on this blog.  It’s how non-EU companies, like tech companies from the US and China, use the “one stop shop” mechanism to evade the privacy regulations of 26 countries to be regulated instead by the Irish regulator, the gentle golden retriever of privacy enforcement.  

I am expanding on my blogpost below.  But now I’m revealing something new.  How most of the non-EU companies, like tech companies from the US and China, have no legal right to assert a claim to be regulated by the one stop shop.  Fiction or fraud?  Let me explain.


Legally, a non-EU company can only claim the benefits of the one stop shop if the decisions regarding data processing in Europe are made there.  


Let me suggest a reality test.  Most companies from outside the EU claim to benefit from the one stop shop in Ireland if they do the following:  1) create a corporate entity in Ireland, 2)  write a privacy policy (or ask ChatGPT to write one) that tells users that the Irish corporate entity is the “controller” of their data in Europe, and 3) has some minimal presence in Ireland, like appointing some employee as a “data protection officer” for the entity.  All this can be done in a day, and with a tiny local Irish staff. But…does this meet the legal test?… that the data processing operations in Europe are being decided by this Irish entity?


Most tech companies build products in their homes, Silicon Valley, China, etc.  They then roll out these products globally.  Usually these products are identical worldwide, except for language interface translations.  In those cases, does anyone really believe that their Irish subsidiaries are really the decision-makers for how the data will be processed for their services for their (millions) of European users?  Perhaps that is the case for a few large non-EU companies with large operations in Ireland.  For all the others, it’s hard to believe.  


Maybe it’s an innocent fiction for a company from China or the US to claim it is “established” in Ireland to evade the privacy laws of 26 EU countries with millions of users.  Or maybe it’s a fraud…?


(Final note, as a former employee of Google, I must point out that nothing in this blogpost is meant to suggest anything regarding that particular company.  Google has a huge workforce in Ireland).


Meanwhile, non-EU companies are getting an easy ride in Europe, while their EU company competitors aren’t.  I just don’t think that’s fair to EU companies or to EU users. 


Monday, September 22, 2025

Why does every US and Chinese company want to go to Ireland?

Ireland is one of the biggest winners of the EU 27 construct.  It has established itself as a tax and regulatory haven for foreign (non-EU) companies.  Virtually all Chinese and American companies, in particular in tech, rush to “establish” themselves in Ireland.  In exchange, they get to pay a low corporate tax rate (even if their users and their money is made in the other 26 EU countries) and they get to benefit from the light-touch privacy regulation of Ireland.  

You’ll recall that Europe’s tough (on paper) General Data Protection Regulation of 2018 created the concept of a one-stop shop for foreign companies.  So, any Chinese or American company could pick one of the EU countries as its “establishment”.  Of course, they all picked Ireland, given its universal reputation for light-touch tax and regulation.  It is entirely a different debate about why/how Europe made this blunder:  in effect, it gave a massive advantage to foreign companies over domestic European companies.  A French/Italian/Spanish company would be regulated by their domestic French/Italian/Spanish regulators, who take privacy seriously, and would sanction non-compliance.  But a Chinese or American tech company would do business in all those countries, while benefiting from the Irish regulatory culture, as gentle as an Irish mist.  


Occasionally, a European regulator would try to take on an American or Chinese company in the field of privacy.  https://www.cnil.fr/en/cookies-placed-without-consent-shein-fined-150-million-euros-cnil

But this action wasn’t based on the core European privacy law, the GDPR, but on a rather obscure law about other things.  


The Trump administration has defended American companies in Europe against what it claims are discriminatory regulatory actions.  https://www.lemonde.fr/en/international/article/2025/09/06/eu-commission-reluctantly-fines-google-nearly-3-billion-despite-trump-threat_6745092_4.html#  It was therefore not a surprise to see the French regulator announce fines at the same time against one American and one Chinese company.  But it is surprising to see the Trump administration rushing to defend one of the most Democratic-leaning companies in the US.


Indeed, Europe does discriminate, in the field of privacy, in favor of non-EU Chinese and American companies, due to the one-stop-shop Irish backdoor.  One can only assume European dysfunctional politics led to this absurd result, from a European perspective.  Hundreds of millions Europeans depend on a small Irish privacy regulator to ensure that the gigantic American and Chinese tech companies respect European privacy laws.  Hilarious.  


All of this might seem like trivial corporate politics, but the consensus is growing that humanity is allowing the tech industry to put us (I mean, our entire home sapiens species) on a path to doom.  https://www.theguardian.com/books/2025/sep/22/if-anyone-builds-it-everyone-dies-review-how-ai-could-kill-us-all  Even if we’re doomed, can we at least put up a fight?  


Thursday, August 28, 2025

Hi Privacy Pro's: where's your mojo?


I’ve been committed to the field of privacy for 3 decades, and I’ve had the pleasure to mentor multiple generations of smart and committed people to the field.  But I can’t remember a time when the profession felt more disempowered and disrespected than now.  

Where have all the senior privacy leaders at Big Tech gone?  Virtually all the Big Tech companies have lost (or fired) their most senior privacy leaders this year.  The most senior privacy leaders at Microsoft, Google, Facebook, and Apple have all exited this year, or recently.  These are the companies that process vast amounts of personal data, so it’s not a minor question to ask why they’ve lost their most senior privacy leaders.  Undoubtedly, each person who exited their employer will have their own story, and I won’t tell it, even if I know it.  But if these companies have lost their most senior privacy leaders, who is left there to ensure that these companies respect their users’ privacy?  


The privacy leaders of my generation (and I knew them all) all shared one characteristic:  they advocated for good privacy in their organizations internally, and they worked collaboratively with regulators to find solutions when required.  But perhaps the collaborative model is no longer the fashion in Silicon Valley:  perhaps the truculent, cage-fighting ethos has the ascendancy, reflecting the personalities of some of its leaders:  media-hungry, kick-boxing, “I am Caesar” and anyway, I have a survivalist bunker in case it doesn’t work out.  In that world, you don’t want privacy leaders, you want privacy litigators. Privacy litigators can make an easy meal of the average privacy regulator, who have tiny technical and litigation resources.  


Privacy only makes sense as a human value, since its only purpose is to protect the autonomy and dignity of an individual human being.  In an age when Big Tech fires many thousands of workers (in the name of “efficiency”), often without warning, by email at 2 am, with immediate effect (I don’t need to name names, do I?), it’s fair to ask what respect they have for individual human beings.  If you don’t respect your own employees as human beings, why would you respect your users, or their or anyone’s privacy?  


Try to read a privacy policy, when you randomly click on some website.  It will inevitably begin with the phrase:  “We care about your privacy”.  Then it will go on to list the innumerable ways that they plan to violate your privacy, to track and profile your data, and to share it with hundreds of their “partners”.  You cannot possibly understand these privacy statements, and neither can I.  They’re not designed to explain privacy practices:  they’re designed to create a veneer (or fiction) that their companies’ data collection practices have been disclosed, and that users have somehow “consented” to them.  Of course, you can’t consent to something that you can’t understand, but a click looks like consent, so that’s all these companies are seeking.  The latest atrocity is the attempt by sites to ask you to consent to tracking your “precise location”.  Usually this phrase is buried innocuously deep inside the privacy statement.  If you are dumb, or bored enough, to click “I accept”, these companies will track your precise location (within meters) every time they encounter you on the web, and share that with their hundreds of partners, and store your precise locations forever, and heaven knows what they’ll do with that.  Nothing creepy there? 


Thursday, June 12, 2025

It’s all about (sharing) the data, stupid: privacy meets antitrust

I spent time with a group of privacy experts recently.  We were discussing the intersection of privacy and antitrust law.  Traditionally, these two fields were very separate, with separate laws, separate regulators, and separate practitioners.  But the rise of the data-processing monopolies like Google and Facebook is forcing these two fields to converge.  When a monopoly like Google Search or Facebook is based on processing vast amounts of personal data, and when no competitor could possibly compete with these data-gorged monopolies, well, it’s obvious that antitrust law should consider forcing these monopolies to share data with potential competitors.  Otherwise, these monopolies will carry on with their “data barrier to entry”.  Data is an essential input into any of these existing or future services.  


Existing monopolies, like Google Search, do not want to share their data with potential competitors.  Duh.  So, they are making public arguments that such sharing would create a serious risk of violating the privacy of their users.  But is that true?  

Google has resorted to public blogging to warn its (3 billion) users of the risks of court-mandated data sharing.  “DOJ’s proposal would force Google to share your most sensitive and private search queries with companies you may never have heard of, jeopardizing your privacy and security. Your private information would be exposed, without your permission, to companies that lack Google’s world-class security protections, where it could be exploited by bad actors.” https://blog.google/outreach-initiatives/public-policy/doj-search-remedies-apr-2025/


Now, let’s unpack that statement.  Google is clearly stating that it collects “your most sensitive and private search queries”.  Its privacy policy makes it clear that it collects, retains and analyzes that data to run and improve its own services (not just Google Search).  So, Google clearly analyzes your “most sensitive and private” data, despite the privacy issues to you:  the privacy issues, according to Google, only arise if that data is shared with other parties.  


Now think about Google’s money machine, its ads network.  Doesn’t that network do exactly what Google is here claiming is a terrible thing for users’ privacy?  Google ads network collects vast amounts of its users “sensitive and private” surfing history, and shares it with “companies you may never have heard of”.  Indeed, that’s exactly what the ads network does today.  Not coincidentally, an unrelated antitrust monopoly case is underway regarding the Google ads monopoly.  So, let’s be clear, in the context of Google Search, Google claims sharing data with third parties would be terrible for users’ privacy, but in the context of Google ads network, all that sharing is just fine…


Privacy professionals should take a clearer look at the privacy implications of any court ordering Google to share Search data with competitors.  Would that really raise any privacy issues?  Some experts in the field are starting to discuss the issue:  https://www.hklaw.com/en/insights/publications/2025/04/google-search-data-sharing-as-a-risk-or-remedy


Search is based on data mountains.  They are different mountains.  Each category of the data mountains has different privacy implications.  We need to unpack data-sharing into its different categories to assess whether it has any impact on privacy.


The Index:  the biggest data mountain is the Search index.  That’s the index that Google Search creates by crawling the entire public web.  It’s one of the largest, if not the largest, database on the planet.  But it’s not a privacy issue:  it’s just crawling the public web.  Of course, there is public data on the public web, but it’s not a privacy issue to force Google to share such data with other parties, who could also access it on the public web. 


User interaction data:  with its 3 billion users, and over 20 years of operation, Google Search has the largest database of user interaction data on the planet.  I’m guessing it’s 1 million times larger than its nearest competitor Bing. (Google can correct my guess if it wishes to.)  This user interaction data is essential to teach a Search engine’s algorithm how to guess what someone intends to find when they type a query.  If you have billions of examples of what people are searching for, you can train your search algorithms accordingly.  If you don’t have that data, you don’t have a chance.  So, would it be a privacy issue, as Google menacingly suggests in its blog post, if it were forced to share such data?  It depends:  yes, if it were forced to share search histories (i.e., search logs) with all of the personally-identifiable data that Google collects and shares.  No, if it were forced to share anonymized data sets, such as anonymized search logs.  


Fortunately, many years ago, Google introduced its policy to anonymize search query logs, after a number of months, in the interests of users’ privacy, and to respond to regulators’ pressure.  I know something about that, since I worked on that privacy initiative, with my great former colleagues.

https://publicpolicy.googleblog.com/2008/09/another-step-to-protect-user-privacy.html

There is no privacy issue, none at all, with forcing a company to share anonymized user interaction data.  


I get that Google is blogging as part of its anti-antitrust litigation strategy.  It really, really, doesn’t want to share its data with potential competitors.  Litigators will advance their clients’ interest, as best they can.  The rest of us 3 billion users of Google Search can assess the intellectual honesty of their arguments.   As far as I am concerned, there are profound privacy issues on the web:  forcing the Google Search monopoly to share its non-personally identifiable data with potential competitors is not a privacy issue.  





Tuesday, April 29, 2025

Debating Privacy in Venice

 





I’m looking forward to seeing lots of old friends at the upcoming Venice https://privacysymposium.org/  


For many years, I attended and spoke at privacy conferences around the world.

 

I believe in sustaining a dialogue amongst privacy professionals, regulators, academics and advocates. 


I always learned a lot from these events, and I did my best to contribute to the debates as well. 


I also believe in building human connections to the people in this field, and I’m happy to count many of them as personal friends.  


This year, I’ll join a distinguished group of regulators and practitioners on a panel entitled:  Privacy and Antitrust.


This should be interesting!  I have, after all, spent 30 years guiding Microsoft and Google… 


Like privacy itself, Venice is precious and fragile. 


We’re lucky to be there together in May, before another tech monopolist rents the city for himself in June.  














Wednesday, April 23, 2025

A Gaggle of Monopolies

 












One of the peculiarities of monopolies in the age of Big Tech is how they tend to leverage quickly into a group of monopolies. 


Historically, building a monopoly was a rare business event, and it usually just happened in a single industry, like oil or finance. 


But the tech industry is different:  businesses build one monopoly (legally, let’s assume).


Then they quickly manage to leverage it into multiple monopolies across an array of businesses. 


You can read recent press reports about antitrust enforcement actions against Google and Meta, to take those prominent examples.  

Traditional antitrust/competition law developed to restrain individual monopolies from leveraging their existing monopolies unfairly into new markets.  But historical law seems to struggle with how to address this new phenomenon of companies that develop a portfolio of monopolies.  Of course, these tech companies leverage their monopolies to develop and support each other, in particular by sharing user data, given that all these monopolies are based on processing vast amounts of user data.  The more you have, the better you can leverage into a new market.  That’s why this antitrust/competition conundrum is also a privacy challenge.  Monopolies that process personal data and share them across their portfolio of services are processing personal data at a scale unprecedented in human history.  Europe took a first step to try to address this problem with its Digital Markets Act.  

We don’t have a legal word for a portfolio of monopolies.  Calling a company a “monopolist” doesn’t capture the nature of a portfolio of interlocking monopolies.  So, I looked to the wildly colorful words of English vocabularies to describe a group of animals, for inspiration.  

A bloat of hippopotamuses

A parliament of owls

A gaggle of geese

A flamboyance of flamingos


A murder of crows


A company of parrots


A charm of finches


A shiver of sharks


An aggregation of snakes


A gamble of alligators


A skulk of foxes

Antitrust/competition law will have to come up with new tools to deal with this new phenomenon of portfolios of monopolies, as will the field of privacy.  Any remedies that the authorities impose will need to take into account the nature of these interlocking monopolies.  And yes, forcing a company with a portfolio of monopolies to divest one of its monopolies might be the right way forward, or to stop it from acquiring new ones.  I doubt though that a “murder of crows” will suffer terribly if it loses one crow.  

But first, let’s find a name:  a “gaggle of monopolies”?, a “bloat of monopolies”?  Any of the above might do, with all due respect to the animals.  I’m happy to let Llama or Gemini choose.  

Monday, April 7, 2025

The end of US clouds in Europe?

 

How should Europe riposte to the Trump tariff wars? I visited with some friends in Madrid recently.

I was surprised how quickly the consensus was reached:  “it’s time to liberate ourselves from the US cloud providers.”

As background, if you don’t already know, the global market is dominated by three US cloud service providers:  Amazon, Microsoft, and Google, in that order of dominance.  It’s no secret that all three of these entities run giant tech monopolies in other markets, which they leveraged into the cloud market.  The cloud market isn’t particularly high-tech.  It’s more about scale:  the bigger you are, the more you can build a global infrastructure network and lower costs for all.  There are lots of smaller local competitors, but none can match the scale of the US giants.  

But Trump has changed the global understanding of the risks of entrusting your critical national infrastructure to three US companies.  Could Trump order these US companies to terminate their services, immediately, in Europe or any other country?  What was once unthinkable, is now a possible (hopefully, improbable) scenario.  European business and political leaders are now asking what they would do if the US government ordered these US cloud services to terminate them.  Bedlam hardly describes what would happen.  Relying heavily on US cloud providers is reckless, leaders are realizing.

There have long been criticisms of US cloud providers.  In Europe, for example, they have been criticized for utilizing tax haven structures in Ireland and Luxembourg to earn vast revenues, while paying tiny taxes.  Want to compare the income tax rates of a Madrid bus driver to the tax rates paid by Amazon, Microsoft, Google in Europe? Guess which is higher. 

Europe already has lots of laws on the books that could be used to drive forward to a US cloud-free future.  Europe’s main privacy law, the GDPR, has prohibitions on transferring personal data from Europe to the US, as long as the US does not have “essentially equivalent” data protection laws.  (I’ll blog on this separately, but the chances for the European Court of Justice to decide that any new data transfer scheme between Europe and the US meets that test are about as high as my chances to win the Paris marathon next weekend.)

In purely trade terms, the US runs a huge trade surplus with Europe in digital services, with cloud services high on the list.  As Europeans look at ways to riposte to Trump’s trade war, consider the risks to US cloud providers.  If I was a US West Coast cloud executive, I’d be quaking in my birkenstocks.  Meanwhile, the sky over Madrid is cloud free. 

Thursday, April 3, 2025

Mirror, Mirror, on the wall, who’s the wokest of them all?

 

I spent most of my life in woke organizations, namely Harvard and Google.  Both of those institutions have lots in common:  both have very talented and intelligent communities, and both are amongst the wokest organizations on the planet.  They were both meritocracies, or at least, they used to be, and to a lesser extent, still are.  But both became radical converts to wokism, and both developed cancel-cultures to stifle any dissenting voices.  

Harvard appointed a black female President, with an underwhelming academic record, who later resigned, as you’ll recall, in part because she couldn’t figure out if calling for the genocide of Jews was against Harvard policy.  I was in the Google legal department, and so I celebrated when Google also appointed a black female general counsel, as my department’s boss.  I’m sure she’s brilliant, and like me, a former Harvard student.

The one thing that the woke brigades all agree on is simple:  the insufferable inherited privilege of the white male.  I wasn’t surprised to see a former Googler sue the company for discrimination:  https://nypost.com/2025/04/02/business/google-executive-discriminated-against-male-employees-bombshell-lawsuit-alleges/

To be clear, I know none of the people mentioned in the article above, and I have no opinion on the merits.  Even if I did know them, I wouldn’t publicly comment on it, out of loyalty to my former employer.  

The truth, however, is that it’s hard to be a white male at Google.  And even harder to be an older white male at Google.  When I left Google, I was a 60-year-old white male.  I never met another one at Google, not a single one.  You might be surprised that there was even one. 

Friday, March 28, 2025

23andMe, Privacy zombie

 

Will it finally go away?  23andMe is filing for bankruptcy. https://www.npr.org/2025/03/24/nx-s1-5338622/23andme-bankruptcy-genetic-data-privacy  

23andMe's entire business model peddled pseudoscience for years.  https://www.theguardian.com/commentisfree/2025/mar/27/geneticist-mourn-23andme-useless-health-information

However absurd the slick test results were presented to 23andMe customers, however absurd its “insights” into genetic health risks or ancestry, it did work wonders at identifying one genetic trait:  stupidity.  If you spit into a test tube, and sent your saliva to this shadowy company with a long history of privacy breaches, then I can conclusively determine that you are genetically…stupid.  Your genome is your most personal, sensitive, unchanging identifier.  You handed over this data to a company built on data mining for cheap fun, and even paid them money for the privilege.  They will retain this data forever, unless you take steps to delete it, assuming that they actually delete it when you make that request, which is a fair question given their shadowy history of privacy practices in the past.  At least, please go try to delete it and hope it actually is deleted.  Whether they actually delete it or not, you’ll never know.  

Now, your genetic data is considered this bankrupt company’s “asset” that they plan to sell to the highest bidder.  Your genome has become their bankruptcy asset.  You might have trusted 23andMe with your genetic data, but will you trust whoever buys it in a bankruptcy sale?  

The problem with genetic data isn’t just what people can do with it today, or what they can deduce about you today, it’s what they can do with it in the years ahead, as science evolves.  France, to take one example, has sensibly outlawed such home DNA testing kits.  

The company, 23andMe may finally die now.  The CEO, the former wife of a Google founder, wants to buy it out of bankruptcy.  But your most sensitive personal data, if you were stupid enough to spit into a test tube for them, will live on, no matter what happens to the company. 

Monday, March 17, 2025

The AI training bots are reading my 100% human-generated blog…Great?!

 













I have been posting to this blog to share my thoughts with a small community of privacy professionals. 


So, I was a bit surprised to see Blogger give me statistics:  my posts get around 10,000 views.  I was surprised, because the privacy expert community is smaller than that.  

But how many of those views were bots, in particular AI training bots?  Blogger doesn’t give me those statistics.  

We all know that AI models are trained on data.  Big models, like large language models, are trained on vast amounts of data.  In fact, they’re being trained on essentially all available data in the world.  So, given their hunger for data, in particular for human-generated content, I’m not surprised they’ll visit my little blog too.  

There’s a raging debate about whether AI training bots should be allowed to use other people’s data to train their models.  There are many voices who claim that AI bots shouldn’t be allowed to train on other people’s data, if that data is either “personal data” or under copyright.  I think they’re wrong.  

I think the key distinction is public v private data.  If I make my data public, as I do with this blog, then I should expect (and probably want) it to be read by anyone who wants to:  humans or bots.  After all, search engine crawlers have been crawling public data for decades, and almost no one seems to object.  If AI training bots are reading my blog, say, to learn about human language, or about privacy, I’m delighted.  

On the other hand, private data is private.  If I use an email service, I expect that data to be private, as it’s filled with my highly personal and sensitive information.  If I use a social networking service, and I set the content I upload to “private”, I expect the platform to respect that choice, including from their own or third-party bots.  Failure to respect these privacy choices is a serious privacy breach, maybe even crime, unless the owner of the data has consented to allowing their data to be used for AI training.  (It’s a different discussion if “consent” can be deduced from some updated clause in some terms of use.)

Thousands of training bots are looking for more data, especially human-generated data.  If you make your data public, then realize the bots will come read it.  You can’t really stop it.  And I think that’s fine.  

The real issue is what the AI models intend to do after training on your data.  If they’re learning human language (large language models), it’s not going to have any impact on your real-world privacy.  But if they’re reading your data to impersonate you, to copy your voice or image or your copyrighted content, then you have every reason to object and use the legal resources available.  I think it’s fine when bots read public data for training.  The real question, and vastly harder to evaluate, is what their trained models should be allowed to do with it afterwards.