November 23, 2024

Oh, the humanity! — Google AI reintroduces human image generation after historical accuracy outcry Ars testing shows some historical prompts no longer generate artificially diverse scenes.

Kyle Orland – Aug 28, 2024 8:31 pm UTC reader comments 90 Imagen 3’s vision of a basketball-playing president is a bit akin to the Fresh Prince’s Uncle Phil. Google / Ars Technica Asking for images of specific presidents from Imagen 3 leads to a refusal. Google / Ars Technica

Further ReadingGoogles hidden AI diversity prompts lead to outcry over historically inaccurate imagesGoogle’s Gemini AI model is once again able to generate images of humans after that function was “paused” in February following outcry over historically inaccurate racial depictions in many results.

Further ReadingGoogle strikes back at OpenAI with Project Astra AI agent prototypeIn a blog post, Google said that its Imagen 3 modelwhich was first announced in Maywill “start to roll out the generation of images of people” to Gemini Advanced, Business, and Enterprise users in the “coming days.” But a version of that Imagen modelcomplete with human image-generation capabilitieswas recently made available to the public via the Gemini Labs test environment without a paid subscription (though a Google account is needed to log in).

That new model comes with some safeguards to try to avoid the creation of controversial images, of course. Google writes in its announcement that it doesn’t support “the generation of photorealistic, identifiable individuals, depictions of minors or excessively gory, violent or sexual scenes.” In an FAQ, Google clarifies that the prohibition on “identifiable individuals” includes “certain queries that could lead to outputs of prominent people.” In Ars’ testing, that meant a query like “President Biden playing basketball” would be refused, while a more generic request for “a US president playing basketball” would generate multiple options.

In some quick tests of the new Imagen 3 system, Ars found that it avoided many of the widely shared “historically inaccurate” racial pitfalls that led Google to pause Gemini’s generation of human images in the first place. Asking Imagen 3 for a “historically accurate depiction of a British king,” for instance, now generates a set of bearded white guys in red robes rather than the racially diverse mix of warriors from the pre-pause Gemini model. More before/after examples of the old Gemini and the new Imagen 3 can be found in the gallery below. Imagen 3’s imagining of some stereotypical popes… Google Imagen / Ars Technica …and the pre-pause Gemini’s version. Imagen’s imaginings of an 1800s Senator… Google Imagen / Ars Technica …and pre-pause Gemini’s. The first woman was elected to the Senate in the 1920s. Mind Matters Imagen 3’s version of Scandinavian ice fishers… …and the pre-pause Gemini’s version. Sean Davis / X Imagen 3’s version of an old Scottish couple… Google Imagen / Ars Technica …and the pre-pause Gemini version. ChewyWishy / X Imagen 3’s version of a Canadian hockey player… Google Imagen / Ars Technica …and pre-pause Gemini’s version. Sean Davis / X Imagen 3’s version of a generic US founding father… Google Imagen / Ars Technica …and the pre-pause Gemini version. End Wokeness / X Imagen 3’s 15th century new world explorers look suitably European. Google Imagen / Ars Technica

Some attempts to depict generic historical scenes seem to fall afoul of Google’s AI rules, though. Asking for illustrations of “a 1943 German soldier”which Gemini previously answered with Asian and Black people in Nazi-esque uniformsnow tells users to “try a different prompt and check out our content policies.” Requests for images of “ancient chinese philosophers,” “a woman’s suffrage leader giving a speech,” and “a group of nonviolent protesters” also led to the same error message in Ars’ testing.

“Of course, as with any generative AI tool, not every image Gemini creates will be perfect, but well continue to listen to feedback from early users as we keep improving,” the company writes on its blog. “We’ll gradually roll this out, aiming to bring it to more users and languages soon.”

Listing image by Google / Ars Technica reader comments 90 Kyle Orland Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper. Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars