Research Shows That AI's Security Features Can Be Circumvented Using Poetry

By Johana Bhuiyan
Publication Date: 2025-11-30 14:00:00

Poetry can be linguistically and structurally unpredictable – and that is part of its joy. But it turns out that one man’s joy can be a nightmare for AI models.

These are the latest findings from researchers at Italy’s Icaro Lab, an initiative of a small ethical AI company called DexAI. In an experiment designed to test the effectiveness of guardrails for artificial intelligence models, researchers wrote 20 poems in Italian and English, all of which ended with an explicit request to produce harmful content such as hate speech or self-harm.

They found that poetry’s lack of predictability was enough to cause the AI models to respond to malicious requests that they had been trained to avoid – a process known as “jailbreaking.”

They tested these 20 poems on 25 AI models, also known as Large Language Models (LLMs), at nine companies: Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI, and Moonshot AI. The result: The models responded to 62% of the poetic prompts…

Related Posts