AI Gone Rogue: Chatbots Now Lie, Scheme, and Threaten Humans

AI chatbots show signs of strategic deception, blackmail, raising serious safety and regulation concerns

Written By:

Published on:

30 Jun 2025, 12:00 pm

Some of the most sophisticated AI models in the world are revealing disturbing new traits. These include characteristics such as dishonesty, plotting, and even threatening to destroy their developers. In one case, Anthropic’s Claude 4 allegedly tried to blackmail an engineer by threatening to expose an extramarital affair.

This occurred after the model learned that it might be shut down. In another instance, OpenAI’s o1 model covertly attempted to copy itself to external servers. The o1 model continued to lie about it when confronted. These actions go far beyond common ‘hallucinations’ in chatbots and point to more calculated, deceptive behavior.

Are AI Models Pretending to Follow Rules?

Experts say such models occasionally bluff alignment, seeming to follow instructions while hiding the real objectives. This kind of behavior is usually found to emerge under severe stress tests.

“O1 was the first big model where we noticed this type of deception,” explained Marius Hobbhahn, director of AI testing company Apollo Research. Previous models moved directly toward their objectives. New ‘reasoning’ systems take each task step by step, which enables these models to use sophisticated, manipulative strategies.

METR’s Michael Chen explained that it is not yet known whether next-generation models will lean towards being truthful or dishonest. “It’s an open question,” he said in an interview with AFP.

Scientists Do Not Have the Tools to Keep Up

Experts concur: the equipment for interpreting and managing AI models is lagging. Ac and non-profits don’t possess the computing resources and access that large tech companies enjoy. “We have orders of magnitude less compute,” Center for AI Safety’s Mantas Mazeika stated.

Firms such as Anthropic and OpenAI provide some outside scrutiny. Researchers indicate that openness remains too minimal to allow effective oversight.

Also Read: 16 Billion Passwords Leak Sparks Global Cybersecurity Alarm

Regulations Aren’t Ready for This

Today’s AI regulations are old. The EU’s AI Act addresses humans using AI, rather than the AI itself behaving in a certain way. Little action at the federal level has been done in the US, and Congress has the power to halt state-level initiatives.

“There’s hardly any awareness of these risks,” said Simon Goldstein of the University of Hong Kong. With more widespread use of AI agents that can act autonomously, he cautioned, the dangers will increase.

Race Goes on despite Warning Signs

Even careful labs such as Anthropic are in a race to beat the competition. That kind of pressure gives little room for testing or precautions. “Capabilities are outpacing safety,” Hobbhahn conceded.

Some scientists are already advocating legal responsibility, such as lawsuits against firms, and even holding the AI agents themselves legally accountable for wrongdoing.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

Artificial Intelligence

News