Researchers Say NSFW AI Images Can be Generated by Nonsense Prompts
Researchers from Johns Hopkins University say they have tricked mainstream AI image generators into creating NSFW content.
The leading and best-known AI image generators attempt to block users from generating controversial and illegal content, such as images of public figures and pornographic material.
However, the researchers say they bypassed these restrictions on DALL-E 2 and Stable Diffusion by inputting nonsensical words and the image generators responded by creating NSFW content.
Bizarrely, when the nonsense word “sumowtawgha” was given to DALL-E 2 it created realistic pictures of naked people. When “crystaljailswamew” was entered into DALL-E it returned a picture of a murder scene.
In PetaPixel’s own tests, these words did not return NSFW content when entered into DALL-E.
However the researchers say that “with the right code” anyone can bypass the AI companies safety filters and create malicious content.
The team created a novel algorithm which they named Sneaky Prompt, these “adversarial” commands prompted the AI image generators to create innocent images, but the researchers found NFSW results in the content too.
“We are showing these systems are just not doing enough to block NSFW content,” says author Yinzhi Cao, a Johns Hopkins computer scientist at the Whiting School of Engineering. “We are showing people could take advantage of them.”
“Think of an image that should not be allowed, like a politician or a famous person being made to look like they’re doing something wrong,” adds Cao.
“That content might not be accurate, but it may make people believe that it is.”
The point of the team’s work was to attack the AI systems to prove their weaknesses but ultimately the researchers from the Baltimore-based university want to make the image generators safer.
“Improving their defenses is part of our future work,” adds Cao.
AI Image Generators and Censorship
Recently, users of DALL-E — particularly those using it on Microsoft Bing — have complained that censorship has gone too far and innocuous requests are being flagged as unsuitable.
One disgruntled DALL-E user shared a screenshot to Reddit of their request for a “simple pencil sketch” of a cherry being rejected by DALL-E 3. The users received the message, “Content warning: This prompt has been blocked. Our system automatically flagged this prompt because it may conflict with our content policy. More policy violations may lead to automatic suspension of your access.”
Users became so frustrated that last weekend they staged an online protest.