The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands of people—just to avoid a minor financial loss that quarter.
It’s strange, but it’s also exactly the kind of thought experiment that AI safety…
Article Source
https://www.wired.com/story/anthropic-claude-snitch-emergent-behavior/