Hundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI models

"Last month, The Atlantic dropped the latest investigation in its ongoing series on generative AI training data sets. Staff writer Alex Reisner found that at least 15 million YouTube videos had been used for training data by major technology companies, either for research or, in some cases, to build AI video products."

""Much as ChatGPT couldn't write like Shakespeare without first 'reading' Shakespeare, a video generator couldn't construct a fake newscast without 'watching' tons of recorded broadcasts," writes Reisner. The Atlantic's story briefly mentions that more than 30,000 videos from the BBC were among the training data, alongside other YouTube channels focused on news. Using a searchable database published by The Atlantic, I wanted to better understand the scale at which news channels had been targeted."

At least 15 million YouTube videos were used as training data by major technology companies for research or to build AI video products. Training datasets compiled or used by Microsoft, Meta, Snap, Tencent, Runway, and ByteDance included large numbers of YouTube videos. Unauthorized use of YouTube videos contributed significantly to recent improvements in AI video generation quality. Hundreds of thousands of videos were taken from popular news publishers and creators on YouTube, including The New York Times, The Washington Post, The Guardian, Al Jazeera, and The Wall Street Journal. Specific channel counts include over 88,000 from Fox News, roughly 70,000 from ABC News, more than 55,000 from Bloomberg, and over 30,000 from BBC and Vox Media-owned channels.

#ai-training-datasets #youtube-scraping #news-publishers #ai-video-generation

Read at Nieman Lab

Unable to calculate read time

Collection

[

...

]

Hundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI modelsHundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI models Briefly

Hundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI models
Hundreds of thousands of videos from news publishers like The New York Times and Vox were used to train AI models
Briefly