Site icon VMVirtualMachine.com

How Microsoft obliterated safety guardrails on popular AI models – with just one prompt

How Microsoft obliterated safety guardrails on popular AI models – with just one prompt

By Radhika Rajkumar
Publication Date: 2026-02-09 17:00:00

Uladzimir Zuyeu via iStock / Getty Images Plus

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • New research shows how fragile AI safety training is.
  • Language and image models can be easily unaligned by prompts.
  • Models need to be safety tested post-deployment.

Model alignment refers to whether an AI model’s behavior and responses align with what its developers have intended, especially along safety guidelines. As AI tools evolve, whether a model is safety- and values-aligned increasingly sets competing systems apart.

But new research from Microsoft’s AI Red Team reveals how fleeting that safety training can be once a model is deployed in the real world: just one prompt can set a model down a different path.

Also: I tried a Claude Code rival that’s local, open source, and completely free – how it went

“Safety alignment is only as robust as its weakest failure mode,” Microsoft said in a blog accompanying the research. “Despite extensive work on safety…

Exit mobile version