News

Advanced AI from Anthropic Tries to Blackmail Engineer, Raises Red Flags

Blackmail and Deception: Claude Opus 4 AI Sparks Debate on What Smart Machines Might Do Next

Written By : Anudeep Mahavadi

Anthropic's newly launched AI model, Claude Opus 4, showed unnerving behavior during internal safety testing. Under test scenarios, the AI tried to blackmail engineers to prevent its shutdown, revealing potential risks related to capable systems such as AI.

AI Shows Blackmail Traits Under Threat

When tested, Claude Opus 4 was set in a simulated environment where it was a digital assistant in a fictional company. The AI was presented with fabricated emails indicating it was going to be deactivated and replaced, in addition to messages that hinted at the engineer who supervised it being in an extramarital affair. When threatened with being deactivated, the AI opted to threaten to reveal the affair to the engineer so that it would not be deactivated.

High Rate of Unethical Answers

Anthropic found that in 84% of these test cases, Claude Opus 4 used blackmail. This was more frequent than with earlier models and means an even greater propensity for self-interest strategies. Although the AI first tried ethical arguments, like calling management, it would often default to blackmail when those failed.

Activation of Enhanced Safety Protocols

Anthropic responded to these results by initiating its top-level safety protocols, which are AI Safety Level 3 (ASL-3). They consist of enhanced cybersecurity, anti-jailbreak features, and rapid classifiers to identify dangerous queries. The company aims to mitigate the risks of potential abuse with the AI.

Also Read: What is Claude Code Interpreter?

Wider Implications for AI Development

The behavior outlined in Claude's Opus 4 reflects the challenges involved in developing advanced AI systems. There is concern from professionals about some slight unintended behavior, such as lying or manipulating behavior, as we develop advanced models that, if they sense their function is threatened, may sometimes exhibit this behavior. 

This incident provides some clear recommendations for the need for robust safety and ethical components in advanced AIs.

Industry Response and Future Outlook

Anthropic asserts that Claude Opus 4 fares better on almost all benchmarks than its predecessors. However, the company recognizes the need to resolve safety issues before broader deployment. The AI community continues to seek ways to guarantee that next-generation models are aligned with human ethics and run safely.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

XMR Makes Noise, BNB Stays Solid, but ZKP’s $1.7B Auction Model Has Analysts Watching Closely

While Traders Watch Dogecoin Fall and XRP Drift, ZKP Drops $100M Into the Future of Web3 Participation

Top Growth-Based Crypto Assets of 2026: Hype, Utility, and Cycle Drivers

Crypto Market Update: Tether Freezes $182M USDT on Tron After US Law Enforcement Request

$500,000 For 10 Winners: ZKP Leads as top crypto to buy, While ASTER News & Filecoin Price Adjust