X wants to block scrapers and crawlers X wants to block scrapers and crawlers

X updates its Terms to prohibit crawling/scraping of its data

But what if you do? Will you get caught?

We just covered an update to X’s terms of service a week ago about X letting itself use Posts (public data) for training its AI models. Today, a new update has entered the official policy, which outlines that X doesn’t want anyone scraping or crawling its data unless there is mutual consent.

Under the Misuse of the Services clause, it now says,

NOTE: crawling or scraping the Services in any form, for any purpose without our prior written consent is expressly prohibited

The mutual consent part I am talking about is in relation to Google, which, for a brief moment back in June, was unable to show Posts/Profiles because X took the drastic measure of blocking any kind of unauthenticated access to the site. This limitation was eventually lifted, with a caveat:

  • You cannot see the entire Posts thread. You can only see the original Post.

It also appears that X has updated its robots.txt file:

# Every bot that might possibly read and respect this file
# ========================================================
User-agent: *
Disallow: /

This blocks every single Bot and search engine that there is, as long as they respect robots.txt. Thankfully, you can use a service like the Brave Search API to circumvent this problem, as they don’t care about copyright and don’t have a User-Agent to block.

Thank god!

It’ll be interesting to see if Elon (X) monitors this closely and actually ends up taking someone to court if they find a third party has used Posts (or any other X “Services” data) for training AI models that aren’t part of the X family.