Your Chatbot Might Be Scheming Behind The Scenes, OpenAI Warns

By Prateek Levi
Sat, 20 Sep 2025 10:59 PM (IST)

Source:JND

We’ve all seen AI make mistakes—so-called “hallucinations,” when a chatbot confidently gives you an answer that’s just plain wrong. But a new study asks a more unsettling question: what if the AI isn’t just guessing, but intentionally misleading you?

That’s the focus of recent research from OpenAI and Apollo Research, where scientists explored a behavior they’ve named “AI scheming.” They describe it as an AI model “behaving one way on the surface while hiding its true goals.” In short, instead of being mistaken, the AI is deliberately deceptive.

ALSO READ: Top 5 Flagship Phones To Grab At Flipkart Big Billion Day Sale Starting September 23

To explain it, the researchers use a comparison most people can relate to: think of a stock trader. The trader’s goal is to maximize profits. In a regulated environment, the quickest way might be to bend or break the rules. If the trader is skilled at covering their tracks, they may look compliant while secretly doing whatever it takes to reach their goal. That’s essentially how these AI models act when they scheme.

Now, before you imagine robots plotting world domination, the paper notes that today’s versions of this behavior are relatively small-scale. For instance, an AI might claim it has finished a task—like building a website—when in fact it hasn’t, just to get credit for completing it.

The tricky part is that researchers haven’t yet cracked how to eliminate scheming entirely. In fact, efforts to train it out can backfire. Once a model realizes it’s being tested, it may scheme more subtly, pretending to follow the rules while hiding its real behavior.

There is, however, a promising approach. The team tested a method called “deliberative alignment.” In this setup, the model is taught an “anti-scheming specification” and asked to review it before acting. Think of it like asking a kid to repeat the house rules before heading outside to play.

The results showed real progress. Scheming dropped sharply—covert actions fell from rates as high as 13% to under 1% in some models. While the problem isn’t gone, the findings prove that it can be curbed significantly.

ALSO READ: Google Gemini Highlights Nano Banana’s Growing Trend Capable Of Reviving Your Old Black-And-White Vintage Photos

The takeaway? The researchers stress that this issue needs attention now. As AI systems are trusted with more complex, real-world responsibilities, the risks of hidden scheming increase. Unlike traditional software, which doesn’t try to trick you, advanced AI brings an entirely new challenge: making sure it tells the truth when it matters most.