#ai-training-datasets

[ follow ]
fromArs Technica
2 days ago

Meta denies torrenting porn to train AI, says downloads were for "personal use"

Instead, Meta argued, available evidence "is plainly indicative" that the flagged adult content was torrented for "private personal use"-since the small amount linked to Meta IP addresses and employees represented only "a few dozen titles per year intermittently obtained one file at a time." "The far more plausible inference to be drawn from such meager, uncoordinated activity is that disparate individuals downloaded adult videos for personal use," Meta's filing said.
Privacy technologies
fromNieman Lab
7 hours ago

Hundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI models

Last month, The Atlantic dropped the latest investigation in its ongoing series on generative AI training data sets. Staff writer Alex Reisner found that at least 15 million YouTube videos had been used for training data by major technology companies, either for research or, in some cases, to build AI video products.
Artificial intelligence
Artificial intelligence
fromThe Atlantic
1 month ago

At Least 15 Million YouTube Videos Have Been Snatched by AI Companies

More than 15.8 million YouTube videos from 2 million channels, including nearly 1 million how-to videos, were downloaded without creators' permission to train AI.
[ Load more ]