Bluesky Faces Backlash Over User Data Scraping for AI Training
Bluesky Faces Backlash Over User Data Scraping for AI Training
Share:

Bluesky, the social media platform often seen as a rival to Twitter, is at the center of a controversy after one million of its public posts were scraped and used to train artificial intelligence (AI) models. The dataset, which contained user data along with posts from Bluesky, was reportedly uploaded to the AI platform Hugging Face by AI researcher Daniel van Strien. This act has sparked concerns about privacy and user consent.

Van Strien accessed the data through Bluesky's Firehose API, a tool that provides a real-time stream of public data from the platform. He then published the dataset on Hugging Face for use in developing AI models and analyzing trends on social media, including content moderation and posting behaviors. The dataset also included decentralized identifiers (DIDs), making it possible to track specific users.

Despite Bluesky’s assurances that it would not use user data for training generative AI, this incident has raised alarms. The platform's Firehose API provides aggregated public data, which includes posts, likes, follows, and other user interactions. Bluesky’s open, decentralized design meant that third-party developers could access this data, even though users did not explicitly agree to its use in AI training.

In response to the incident, Bluesky said, "Bluesky is an open and public social network, much like websites on the Internet itself. Just as robots.txt files don’t always prevent outside companies from crawling those sites, the same applies here." The company added that it is working on ways to give users more control over their data and ensure that outside organizations respect their consent.

The dataset was swiftly removed from Hugging Face following public backlash. Van Strien later apologized for the oversight, acknowledging that the approach violated principles of transparency and consent in data collection.

This incident highlights growing concerns around the use of social media data for AI training, with many Bluesky users expressing frustration over how their content was used without their permission.

Share:
Join NewsTrack Whatsapp group
Related News