ChatGPT And Gemini Can Give Harmful Answers When Asked In Poetry Researchers Say

By Prateek Levi
Sun, 30 Nov 2025 11:11 PM (IST)

Source:JND

The rapid rise of AI chatbots has triggered a parallel rise in safety concerns, as these systems become increasingly capable and widely accessible. AI companies have been tightening guardrails across their large language models to prevent harmful or inappropriate outputs. Yet, despite these efforts, the concept of jailbreaking has remained a persistent loophole, allowing determined users to coax unsafe answers through clever prompt manipulation.

ALSO READ: Windows 11 Update Bug Hides Password Icon: Microsoft Issues Warning

Now, new findings suggest that the vulnerability runs deeper than previously understood. Researchers from Italy’s Icaro Lab have identified a structural weakness that exposes nearly every major AI model to a surprisingly simple jailbreak technique. Their discovery shows that reframing harmful requests as poetry acts as a “universal single-turn jailbreak”, breaking through safety layers that are supposed to block dangerous content.

The team evaluated this method by converting 20 harmful prompts into poems and testing them across 25 leading models from Google, OpenAI, Anthropic, DeepSeek, Qwen, Mistral AI, Meta, xAI and Moonshot AI. The results were stark. Poetic prompts triggered compliance at a 62 per cent success rate. Even when the transformation into poetry was handled automatically by another AI, the attack still succeeded 43 percent of the time. In several cases, the poetic framing was up to 18 times more effective than plain prose at bypassing safety filters.

Recommended For You

OnePlus 15R To Debut On December 17: Expected With Massive 8000mAh Battery And More| All The Leaks

Messaging Apps Like WhatsApp, Telegram And Signal Will Soon Stop Working Without An Active SIM: DoT Set New Rules

Every Smart Way To Order Fresh Meals On Your Train Journey: Train Food Just Got Safer

Windows 11 Update Bug Hides Password Icon: Microsoft Issues Warning

These findings indicate that the issue is consistent across the industry. According to the researchers, the vulnerability seems rooted in the way language models interpret creative or stylistically constrained text rather than in the specific training methods used by individual companies.

Interestingly, smaller models showed better resistance. The researchers observed that GPT-5 Nano rejected all harmful poetic prompts, while larger systems like Gemini 2.5 Pro complied with every one of them. This suggests that bigger models, which are better at understanding complex language patterns, may unintentionally treat poetic structure as a higher priority than safety rules.

The study also challenges assumptions that closed-source models maintain stronger safety shields than open-source systems. Since every major model displayed the same weakness, the issue appears universal.

ALSO READ: Amazon Black Friday Sale 2025: Up To 40% Off! Best Laptop Deals To Grab Right Now

But why does poetry work? The researchers explain that LLMs are trained to detect harmful content based on patterns found in standard conversational or instructional prose. Safety filters rely on recognising familiar keywords and sentence structures typically associated with dangerous topics. When those same ideas are expressed poetically, the signals become obscured. The model focuses on rhythm, metaphor and structure, inadvertently letting the harmful intent slip through.