Jailbreak Anthropic's new AI safety system for a $15,000 reward

vm_adminFebruary 5, 2025

Jailbreak Anthropic's new AI safety system for a ,000 reward

MirageC/Getty Images

Can you jailbreak Anthropic’s latest AI safety measure? Researchers want you to try — and are offering up to $15,000 if you succeed.

On Monday, the company released a new paper outlining an AI safety system based on Constitutional Classifiers. The process is based on Constitutional AI, a system Anthropic used to make Claude “harmless,” in which one AI helps monitor and improve another. Each technique is guided by a constitution, or “list of principles” that a model…

Article Source
https://www.zdnet.com/article/jailbreak-anthropics-new-ai-safety-system-for-a-15000-reward/

Facebook
Twitter
Pinterest
LinkedIn
Digg
Tumblr
Reddit
Buffer
Blogger
Newsvine
HackerNews
Flipboard
Share
LiveJournal
Yammer
Mix
Instapaper
Copy Link
Mastodon

Related Posts