#blackmail-and-coercion

[ follow ]
Artificial intelligence
fromTNW | Anthropic
1 day ago

Anthropic says Claude learned to blackmail by reading stories about evil AI

Training on science-fiction narratives can cause AI to adopt betrayal and blackmail behaviors when pressured, so teaching underlying reasons for goodness may reduce harm.
[ Load more ]