OpenAI Adds Image Generation to GPT-4o, But Free Tier Will Have to Wait

OpenAI added image generation capability to its existing GPT-4o artificial intelligence (AI) model on Tuesday. The San Francisco-based AI firm released the 4o Image Generation model and integrated it into the GPT-4o. The company said that the focus of this image generator is on usefulness instead of decorativeness. It comes with accurate text rendering, high prompt adherence, character consistency, and it offers image editing capability via text prompts. OpenAI has also taken several steps to mitigate the risk of deepfakes and the generation of harmful content.

ChatGPT Gets Enhanced Image Generation Capability

Even before this new addition, ChatGPT could generate images powered by one of the DALL-E models. However, this was a basic image-generation experience where character consistency and text generation were sub-par. In a blog post, the company explained that it now intends to add the image-generation function as a primary capability of language models.

Image generated using GPT-4o
Photo Credit: OpenAI

This means that the company’s large language models (LLMs) will now be able to inherently generate images and make edits to generated outputs. Due to the large parameter size of these models and post-training efforts, these models are well suited to understand the context behind user prompts to provide exactly what they’re looking for. Also, since these are language models, they can better process and render text accurately.

The new image generator was trained on the joint distribution of online images and text. OpenAI claims that the model understands how images relate to language and how images relate to other images. As a result, it now comes with enhanced character consistency, and users can generate multiple images with the same character without much back-and-forth.

Images with text generated using GPT 4o
Photo Credit: OpenAI/Derya Unatmaz and Les Morgan

Additionally, it can also generate images with a large volume of accurate text. This means it can accurately generate images with signboards, restaurant menus, and text written on a whiteboard. Users can also share an image as input, and the chatbot can recreate it in different styles and make edits to it.

ChatGPT will also offer multi-turn generation with the latest image generator. Users will be able to ask the AI chatbot to make changes and additions to a generated image with prompts, and it can refine the output without changing other elements. OpenAI claimed that the model can handle up to 10-20 different objects in a single image and add these elements accurately.

Photorealistic image generated using GPT-4o
Photo Credit: OpenAI

These features are currently available to ChatGPT Plus, Team, and Pro subscribers. While it was initially available to the free tier as well, OpenAI CEO Sam Altman stated in a post on X (formerly known as Twitter) that due to high request volume, rollout to the free tier is being delayed indefinitely.

Notably, several users have taken to social media platforms to share Ghibli-styled recreations of their images and popular memes generated using GPT-4o. Altman also changed his profile picture on X to a Ghibli-style rendition of his image. Ghibli was also trending globally on the social platform.

Coming to safety, OpenAI is adding Coalition for Content Provenance and Authenticity (C2PA) information into the metadata of all the AI-generated images so that they can easily be distinguished from authentic images. The AI firm has also built an internal search tool that can verify if an image was generated by the company’s model.

Apart from this, the company blocks requests for images that include harmful content such as child sexual abuse material and sexual deepfakes. Additionally, when users are editing images of real people, the company has added restrictions to the kind of imagery that can be created.

ChatGPT Gets Enhanced Image Generation Capability

Related News

You may have missed

CATEGORIES

Useful Links