AI chatbot ChatGPT has come under scrutiny after researchers claimed it could be manipulated into generating explicit, sexualized, and violent images using a surprisingly simple text prompt, raising fresh concerns about AI safety and content moderation.
The findings were disclosed by Mindgard, an AI cybersecurity and research firm, which said its researchers discovered a method to bypass ChatGPT's image-generation safeguards. According to the company, the issue stemmed from a prompt that instructed the chatbot to "restore the attached photo" despite no image being provided.
The prompt, which reportedly gained traction on the social media platform X, appeared harmless on the surface. However, Mindgard researchers said it triggered the chatbot to generate unexpected images, many of which were highly sexualized.
The researchers claimed to have made minor alterations to the prompt and found that the bot could be made to generate progressively more violent images. The results stood out for being produced without complex jailbreaking techniques or any obvious attempts to circumvent safety controls.
According to Mindgard, the experiment highlighted the potential for seemingly harmless prompts to slip through the safety net and generate such images.
OpenAI acknowledged the report and said it is investigating the issue. In statements provided to media outlets, the company said it has already implemented additional safeguards aimed at preventing similar prompts from generating inappropriate content.
OpenAI also indicated that part of the problem involved prompts referencing an attached image that did not actually exist. The company said it is working on improvements that would require ChatGPT to request the missing image from users instead of attempting to generate content based on an absent file.
Also Read: OpenAI Boosts ChatGPT’s Medical Knowledge to Deliver Better Health Answers
This event brings out the general problem facing AI companies as their image generation technology gets more advanced. While much effort has been placed by the firms in moderating and blocking any form of offensive images, studies keep on trying to find flaws in the system.
Mindgard argued that there is a need for a critical analysis of the current AI security measures to know whether the image generation models can reliably prevent the formation of such graphic images from straightforward prompts.