Google unveiled its latest artificial intelligence (AI) model, Lumiere, last week. The new AI model is a multimodal video generation tool that can generate 5-second-long videos. It supports both text-to-video and image-to-video generation and joins existing AI models such as Runway Gen-2 and Pika 1.0. As per Google, Lumiere uses a Space-Time U-Net (STUNet) architecture that innovates how motion occurs in an AI video, making it appear realistic. The platform is not open to the public as of yet.
In an accompanying preprint paper, the research team behind Lumiere explained that the major innovation in motion comes from creating the video in a single process instead of putting together still frames. Due to this, both the spatial (the objects in the video) and temporal (how things move around in the video) aspects of the video generation are created simultaneously. For the layperson, this results in perceiving motions as they occur in nature. To achieve this, Lumiere generates a larger number of 80 frames instead of Stable Diffusion’s 25 frames.
“By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales,” the paper added.
While Google Lumiere cannot be tested at the moment, the website is live and enthusiasts can check various videos created using the AI model as well as the text prompt and input images used to create the output. It can also generate videos in various styles, cinemagraphs that let users animate a certain part of the video, and inpainting where a masked-out video or image is used and the AI completes it based on the prompt.
Google’s latest AI video generation tool competes with existing AI models such as Runway Gen-2, which was launched in March 2023, and Pika Lab’s Pika 1.0, both of which are accessible to the public. While Pika can create 3-second-long videos (which can be increased for 4 more seconds), Runway can generate videos as long as 4 seconds. Both models are multimodal and allow video editing as well.