X can now use all public data to train its AI X can now use all public data to train its AI

X/Twitter’s new Privacy Policy on AI training goes into effect today

X can now use all public data to train its AI models as per the new terms.

As I reported in early September, X (formerly Twitter) made an adjustment to their Privacy Policy to include a phrase (under section 2.1) to say the following,

We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy.

That policy goes into effect today, September 29, 2023.

It is not immediately made clear just how far “publicly available information” extends.

For example, does it include only “text” or does it also include images, videos, and audio – all of which are formats frequently shared on X?

If it is the latter, then I expect a lot more people will need to think twice about what they publicly share in their Posts.

I have given X several days’ notice by contacting their press team to ask this very question, but I have not yet received an answer.

At the time of the initial story, Elon Musk said the following:

And, of course, it goes without saying that you’re now locked into the perpetual jail of not being able to opt out unless you leave the platform entirely.

One week after the story broke out, X made an update to its Terms of Service to say that no one else can scrape or crawl their platform, least of all use any of X’s data for training any outside AI models, unless there is mutual consent.

Elon has been strongarming X’s users with weird ideas time and time again ever since he acquired the platform. The ability to block users? He wants that gone. Show headlines when posting a link on X? Nope, don’t want that either. For what it’s worth, none of these features have been implemented yet.

The Privacy Policy that takes effect today also includes things like X’s ability to collect biometric data and information on users’ employment and education histories for purposes such as security, verification, and job recommendations.

The company contends that biometric data will be harnessed for “safety, security, and identification,” although it leaves the term “biometric” intentionally vague. Could this include facial recognition, fingerprints, or perhaps even voice signatures? That’s yet to be clarified.

X is straddling a thin line between added functionality and privacy invasion. The actual impact of this policy shift will largely depend on execution and public perception, two variables that are notoriously difficult to predict.

How will X use your public data in the context of AI?

X’s updated policy indicates they’ll use collected data to train machine learning or AI models, aligning with multiple objectives outlined in the policy. There are several use-cases where this data can be pivotal.

Firstly, content personalization. X could refine its algorithms to tailor your feed, showing you tweets, ads, or topics more closely aligned with your interests. The better the model understands your behavior, the more relevant content you’ll see, keeping you engaged on the platform longer.

Secondly, automated moderation. Advanced machine learning models can help X identify and remove spam, harassment, or other forms of harmful content more efficiently than manual moderation. This not only improves the user experience but also frees up resources for other tasks.

There was a story recently about X laying off its election integrity team, but I’m not sure this is related, as they would have needed an enormous corpus of data to verify that their AI/ML model can accurately combat misinformation.

Then there’s advertising. This one’s a biggie. By understanding user behavior at a granular level, X can offer advertisers highly targeted options, increasing the platform’s revenue. The more effective the ad targeting, the higher the value proposition for advertisers. And X could definitely use more revenue.

Finally, let’s talk about predictive analytics. X could foresee trends or user behaviors based on historical data, allowing them to preemptively address issues or roll out features that would be well-received.

The bigger question is about ethics and data privacy. How much should a platform know about you? And what safeguards are there to ensure that this powerful technology isn’t misused? Just because they can, should they? These are debates that the tech community is grappling with, and no clear answers are emerging yet.