When AI Models Are Pressured to ‘Behave’ They Scheme in Private, Just like Us: OpenAI – Decrypt

vm_admin

1 year ago

When AI Models Are Pressured to ‘Behave’ They Scheme in Private, Just like Us: OpenAI – Decrypt

When researchers try to prevent AI systems from “thinking bad thoughts,” the systems don’t actually improve their behavior.

Instead, they learn to conceal their true intentions while continuing to pursue problematic actions, according to new research from OpenAI.

The phenomenon, which researchers dub “obfuscated reward hacking,” offers valuable insight in the training process and shows why it is so important to invest in techniques that ensure advanced AI systems remain transparent and aligned…

Article Source
https://decrypt.co/309592/when-ai-models-are-pressured-to-behave-they-scheme-in-private-just-like-us-openai