Anthropic claims new AI security method blocks 95% of jailbreaks, invites red teamers to try

Anthropic claims new AI security method blocks 95% of jailbreaks, invites red teamers to try

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Two years after ChatGPT hit the scene, there are numerous large language models (LLMs), and nearly all remain ripe for jailbreaks — specific prompts and other workarounds that trick them into producing harmful content. 

Model developers have yet to come up with an effective defense — and, truthfully, they may never be able to deflect such attacks…


Article Source https://venturebeat.com/security/anthropic-claims-new-ai-security-method-blocks-95-of-jailbreaks-invites-red-teamers-to-try/

More From Author

Google Fights Uphill To Scrap Antitrust Verdict At 9th Circ. – Law360

Google Fights Uphill To Scrap Antitrust Verdict At 9th Circ. – Law360

Intel delays AI chip designed to rival Nvidia – Tech in Asia

Listen to the Podcast Overview

Watch the Keynote