By Radhika Rajkumar
Publication Date: 2026-02-09 17:00:00
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- New research shows how fragile AI safety training is.
- Language and image models can be easily unaligned by prompts.
- Models need to be safety tested post-deployment.
Model alignment refers to whether an AI model’s behavior and responses align with what its developers have intended, especially along safety guidelines. As AI tools evolve, whether a model is safety- and values-aligned increasingly sets competing systems apart.
But new research from Microsoft’s AI Red Team reveals how fleeting that safety training can be once a model is deployed in the real world: just one prompt can set a model down a different path.
Also: I tried a Claude Code rival that’s local, open source, and completely free – how it went
“Safety alignment is only as robust as its weakest failure mode,” Microsoft said in a blog accompanying the research. “Despite extensive work on safety…