April 23, 2025
Character AI Says Its AvatarFX Model Can Turn Still Images Into Videos
Character AI, the California-based artificial intelligence (AI) platform, unveiled its first video generation model on Monday. Dubbed AvatarFX, it is an image-to-video model that can generate 2D and 3D animated characters. The AI firm claims the generated videos will hold temporal consistency with face, hand and body movement. Additionally, the videos will also featur...

Character AI, the California-based artificial intelligence (AI) platform, unveiled its first video generation model on Monday. Dubbed AvatarFX, it is an image-to-video model that can generate 2D and 3D animated characters. The AI firm claims the generated videos will hold temporal consistency with face, hand and body movement. Additionally, the videos will feature speech, which is powered by the company’s native text-to-speech (TTS) models. Character AI said AvatarFX will be released in the coming months, and the paid subscribers will get access to it first.

Character AI’s AvatarFX Won’t Generate Realistic Human Characters

In a blog post, the company detailed the new video generation model. So far, Character AI has focused on text and image-based models. With AvatarFX, the AI firm is taking a stab at AI-generated videos. So far, the company has stated that Character AI+ subscribers will get access to the model, however, it is not known whether it will later be expanded to the free tier.

AvatarFX can generate videos of 2D animated characters, 3D cartoon characters, and non-human faces. The company claims that one of the major highlights of the model is temporal consistency with face, hand, and body movements. This means the subject will remain consistent between frames, while glitches like extra arms and convoluted facial expressions will be less frequent. It’s important to remember that these are claims, and the tool’s capabilities cannot be verified until it is released.

Unlike most video generation models, AvatarFX does not support text inputs for video generation. Instead, the model only accepts images as input. Character AI claims that this allows users to gain better control over the generated output. The videos will also feature speech, which will be generated using native TTS models.

Coming to the architecture, the AI firm said that AvatarFX is built on flow-based diffusion models. The base model was built from the ground up using the Document Image Transformer (DiT) architecture, which is a transformer encoder model. Character AI claims that it also used a new inference strategy, which preserves visual quality, motion consistency, and expressive diversity even in longer duration videos.

AI-powered video generation models always raise concerns about deepfakes and potentially harmful content. Character AI stated that it has taken several measures to minimise such risks. The AvatarFX has in-built safety filters that check the dialogues users write for the videos.

The AI model also does not approve video generation using images of minors, high-profile politicians, and notable figures. Other human photos are made unrecognisable in the generation process, so the person cannot be identified from the video, the company claimed. Additionally, each video is watermarked to let viewers know it is AI-generated. However, the company did not specify if the watermark is added just to the front-end of the video or in its metadata.

Character AI said it has also added a set of new terms for the feature, which prohibit AvatarFX’s usage for impersonation, bullying, deepfakes, and the use of protected IP without permission. Violations would result in a “strict” one-strike ban.