OpenAI has conceded that prompt injection attacks are one of the most significant security risks against AI browsers. Even as the tech giant increases its defenses in the new ‘Atlas AI browser’ to prevent such dangers, it hinted at a long-term problem for AI agents deployed on the public web.
In prompt injection attacks, malicious instructions can be embedded inside web pages, emails, or even documents. When this content is read by an AI-powered tool, it may be tricked into performing malicious tasks.
OpenAI admitted that the ‘agent mode’ in Atlas, which enables the robot to surf through the web, read emails, and take over user activities, makes it more vulnerable to these security issues.
Since Atlas’ launch back in October, several researchers have proven that simple texts appearing in Google Docs can also affect browsers, allowing interactions with robot agents.
Similar weaknesses have reportedly been discovered in other AI-assisted browsing softwares available in the market, including those by Perplexity.
This is not a new concern; UK’s National Cyber Security Centre has already indicated the head start attacks on the generative AI system might never get fully eliminated.
The cybersecurity body recommended focusing on minimizing risks to reduce impacts under such attacks.
OpenAI appears to have adopted a similar approach; instead of addressing security issues through a patch job, the firm is claiming to work towards aggressive response cycles for security testing.
Also Read: OpenAI’s ChatGPT Atlas Browser Hit by Major Jailbreak Flaw
An important element of OpenAI’s approach has been the ‘automated attacker’ development within its own group. This system, designed and implemented using reinforcement learning, can simulate the role of a hacker. The computer repeatedly attacks Atlas until it learns how to modify its attack approaches.
According to OpenAI, this method has revealed attack vectors that have been overlooked in traditional, human-conducted penetration tests. However, it appears that the company has not revealed whether these steps have resulted in a notable decrease in successful prompt injection attacks.
OpenAI emphasized its teams are collaborating with third-party companies to face this problem. While these setups were already in action before Atlas was released, the organization affirmed to keep fighting against prompt injection attacks.