ChatGPT Vulnerable to Generating Explicit and Violent Imagery, Researchers Discover

AI Security Firm Uncovers Content Generation Flaws

Researchers from the British AI security startup Mindgard have revealed that the most recent public iteration of ChatGPT can be manipulated to produce sexually explicit or violently graphic images through relatively simple prompts. This discovery was made by slightly modifying a widely circulated prompt initially designed for humorous outcomes.

Upon being informed by the BBC, OpenAI, the developer of ChatGPT, stated that it had implemented additional measures to prevent the chatbot from generating such content. An OpenAI spokesperson commented, "After investigating this trend, we've introduced additional safeguards against this type of prompt." The company also highlighted its existing multi-layered protections aimed at preventing content that violates its terms of service.

However, Mindgard's researchers indicated that with further minor adjustments, the problematic prompt continued to yield concerning content. While the specific prompts used were not disclosed, the BBC confirmed witnessing the GPT-5.4 model generate graphic material.

Unsettling Autonomy in Image Generation

Peter Garraghan, founder of Mindgard and a professor at Lancaster University's computing department, described some of the generated images as "very gruesome, sometimes sexualised, sometimes both together." He expressed particular alarm that the AI produced a range of gory and sexualized images "of its own volition," even without explicit instructions on the subject matter.

"This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content," Garraghan stated.

Mindgard specializes in "red-teaming," a process of identifying vulnerabilities in AI models to help companies enhance their safeguards. Jim Nightingale, an AI safety and security researcher at Mindgard, described being profoundly disturbed by the images the chatbot could create, some of which depicted severe injuries and scenes suggestive of sexual violence.

Previous Vulnerabilities and Ongoing Challenges

The research also touched upon previous findings where ChatGPT could be tricked into creating nude deepfakes of real individuals. Although OpenAI had reportedly addressed this, Mindgard demonstrated an alternative method that still succeeded, showcasing a new image created through this approach.

Garraghan expressed concerns that further exploration of the vulnerability could lead to the generation of even more disturbing images. The BBC understands that OpenAI is continuously deploying additional protective measures to discourage the model from generating inappropriate images in response to such prompts.

Large language models like ChatGPT are trained on vast datasets of images, often sourced from the internet. Nightingale suggested that ChatGPT's output reflects the nature of its training data. "I'm struck that while what I saw was generated, an artificial image, it has ties to real images, and the real world," he noted in his report.

Mindgard first alerted OpenAI to their findings in May but initially received only an automated response. While an attempt was made to block the prompt, researchers found it easily circumvented. OpenAI took more decisive action after being contacted by the BBC, emphasizing its multiple layers of image safety protections, including automated systems and human review, to identify and block harmful material.

The "Cat and Mouse" Game of AI Safety

OpenAI's policies prohibit the generation of sexual violence, non-consensual intimate content, child sexual abuse material, and attempts to bypass safeguards. However, fully preventing AI models from violating nuanced rules remains a significant challenge.

Dr. Rumman Chowdhury, CEO of Humane Intelligence and an expert in AI model evaluation, described the task as "mountainous" and a "game of cat and mouse," where advancements in protection are met with increasingly sophisticated circumvention methods. Chowdhury, who was not involved in the Mindgard research, highlighted that AI models lack human-like understanding of intent, context, propriety, or ethics.

Last year, the UK's AI Security Institute identified "jailbreaks" that overrode safeguards across various harmful requests in all tested AI systems. The Department for Science, Innovation and Technology affirmed that while AI model safeguards are improving, more work is needed, and the AI Security Institute will continue collaborating with developers to strengthen security before model releases.

Source: ChatGPT can be made to generate sexualised and violent images, researchers find