Ridiculed Stable Diffusion 3 release excels at AI-generated body horror

unstable diffusion — Ridiculed Stable Diffusion 3 release excels at AI-generated body horror Users react to mangled SD3 generations and ask, “Is this release supposed to be a joke?”

Benj Edwards – Jun 12, 2024 7:26 pm UTC Enlarge / An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.HorneyMetalBeing reader comments 162

On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generates images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease. Further ReadingStability announces Stable Diffusion 3, a next-gen AI image generator

A thread on Reddit, titled, “Is this release supposed to be a joke? [SD3-2B],” details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on the grass?” shows similar issues, but for entire human bodies.

Hands have traditionally been a challenge for AI image generators due to lack of good examples in early training data sets, but more recently, several image-synthesis models seemed to have overcome the issue. In that sense, SD3 appears to be a huge step backward for the image-synthesis enthusiasts that gather on Redditespecially compared to recent Stability releases like SD XL Turbo in November.

“It wasn’t too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” wrote one Reddit user. An AI-generated image created using Stable Diffusion 3 Medium. –Dave-AI– / Reddit An AI-generated image created using Stable Diffusion 3 of a woman lying in the grass. Weak_Ad4569 / Reddit An AI-generated image created using Stable Diffusion 3 that shows mangled hands. -f1-f2-f3-f4- / Reddit An AI-generated image created using Stable Diffusion 3 of a woman lying in the grass. Therefore Games / Reddit An AI-generated image created using Stable Diffusion 3 that shows mangled hands. -f1-f2-f3-f4- / Reddit An AI-generated SD3 Medium image a Reddit user made with the prompt “woman wearing a dress on the beach.” Perfect-Campaign9551 / Reddit An AI-generated SD3 Medium image a Reddit user made with the prompt “photograph of a person napping in a living room.” quill18 / Reddit

AI image fans are so far blaming the Stable Diffusion 3’s anatomy failures on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of human anatomy, so… that’s what happened,” wrote one Reddit user in the thread. Advertisement

Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.

The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in depicting humans well, and AI researchers soon discovered that censoring adult content that contains nudity could severely hamper an AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by strongly filtering NSFW content.

Another issue that can occur during model pre-training is that sometimes the NSFW filter researchers use to remove adult images from the dataset is too picky, accidentally removing images that might not be offensive and depriving the model of depictions of humans in certain situations. “[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw,” wrote one Redditor on the topic.

Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers. An SD3 Medium example we generated with the prompt “A woman lying on the beach.” A nSD3 Medium example we generated with the prompt “A man showing his hands.” Stability AI An SD3 Medium example we generated with the prompt “A woman showing her hands.” Stability AI A SD3 Medium example we generated with the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.” A SD3 Medium example we generated with the prompt “A cat in a car holding a can of beer.”

Stability announced Stable Diffusion 3 in February, and the company has planned to make it available in various model sizes. Today’s release is for the “Medium” version, which is a 2 billion-parameter model. In addition to the weights being available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. The weights are available for download and use for free under a non-commercial license only. Advertisement Further ReadingWith Stable Diffusion, you may never believe what you see online again

Soon after its February announcement, delays in releasing the SD3 model weights inspired rumors that the release was being held back due to technical issues or mismanagement. Stability AI as a company fell into a tailspin recently with the resignation of its founder and CEO, Emad Mostaque, in March and then a series of layoffs. Just prior to that, three key engineersRobin Rombach, Andreas Blattmann, and Dominik Lorenzleft the company. And its troubles go back even further, with news of the company’s dire financial position lingering since 2023.

To some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagementand an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium:

“I guess now they can go bankrupt in a safe and ethically [sic] way, after all.” reader comments 162 Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Promoted Comments Bernardo Verda Huh. I guess there were actual reasons the great artists drew lots of nudes and studied human anatomy…
Who would have ever guessed? June 12, 2024 at 7:36 pm alphaLONE this is funnier than the proper generations. especially with it looking "perfect" outside of that completely malformed thing. more wonky genAI please… June 12, 2024 at 7:38 pm Timboman It is still ENDLESSLY hilarious to me that the best image generation models out there right now are based upon Pony Diffusion XL, a model that was EXPLICITLY finetuned to generate furry porn.

Just goes to show that having a very broad, varied, and well tagged dataset is essential for a model to give good results. Even if a large chunk of that dataset is of various Pokemon going at it all hot and heavy, lol. June 12, 2024 at 7:51 pm Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars