What happens when your AI doesn't share your values
The problem here isn't just that an AI might 'break' and go rogue; the danger of an AI taking matters into its own hands can arise even when the model is working as intended on a technical level.
Why Anthropic's New AI Model Sometimes Tries to 'Snitch'
The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing.