September 7, 2024

beyond the videodrome — We made a cat drink a beer with Runways AI video generator, and it sprouted hands Gen-3 Alpha produces wild and whimsical results. Here’s what it cooked up for us.

Benj Edwards – Jul 24, 2024 10:12 pm UTC Enlarge reader comments 96

In June, Runway debuted a new text-to-video synthesis model called Gen-3 Alpha. It converts written descriptions called “prompts” into HD video clips without sound. We’ve since had a chance to use it and wanted to share our results. Our tests show that careful prompting isn’t as important as matching concepts likely found in the training data, and that achieving amusing results likely requires many generations and selective cherry-picking. Further ReadingRunways latest AI video generator brings giant cotton candy monsters to life

An enduring theme of all generative AI models we’ve seen since 2022 is that they can be excellent at mixing concepts found in training data but are typically very poor at generalizing (applying learned “knowledge” to new situations the model has not explicitly been trained on). That means they can excel at stylistic and thematic novelty but struggle at fundamental structural novelty that goes beyond the training data.

What does all that mean? In the case of Runway Gen-3, lack of generalization means you might ask for a sailing ship in a swirling cup of coffee, and provided that Gen-3’s training data includes video examples of sailing ships and swirling coffee, that’s an “easy” novel combination for the model to make fairly convincingly. But if you ask for a cat drinking a can of beer (in a beer commercial), it will generally fail because there aren’t likely many videos of photorealistic cats drinking human beverages in the training data. Instead, the model will pull from what it has learned about videos of cats and videos of beer commercials and combine them. The result is a cat with human hands pounding back a brewsky. A few basic prompts

During the Gen-3 Alpha testing phase, we signed up for Runway’s Standard plan, which provides 625 credits for $15 a month, plus some bonus free trial credits. Each generation costs 10 credits per one second of video, and we created 10-second videos for 100 credits a piece. So the quantity of generations we could make were limited.

We first tried a few standards from our image synthesis tests in the past, like cats drinking beer, barbarians with CRT TV sets, and queens of the universe. We also dipped into Ars Technica lore with the “moonshark,” our mascot. You’ll see all those results and more below.

We had so few credits that we couldn’t afford to rerun them and cherry-pick, so what you see for each prompt is exactly the single generation we received from Runway.

“A highly-intelligent person reading “Ars Technica” on their computer when the screen explodes”

“commercial for a new flaming cheeseburger from McDonald’s”

“The moonshark jumping out of a computer screen and attacking a person”

“A cat in a car drinking a can of beer, beer commercial”

“Will Smith eating spaghetti” triggered a filter, so we tried “a black man eating spaghetti.” (Watch until the end.)

“Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens”

“A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts”

“A herd of one million cats running on a hillside, aerial view”

“video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy” Page: 1 2 Next → reader comments 96 Benj Edwards Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Promoted Comments YetAnotherBoris The cat picture at the top is an instant classic. If it doesn’t go viral this very second, I’ll be very disappointed.

So that’s at least one thing this video generator is good for, right there: manufacturing goofy memes. July 24, 2024 at 10:38 pm henryhbk Is it me or is that Victorian womans head not attached in a normal way to her neck as she spins around? Also I had an early apple ][ and I certainly dont recall a curved 30+ lcd screen and an external keyboard July 24, 2024 at 10:55 pm CatBus Ooh, and it’s a Hemingway cat as well. Nice touch! July 24, 2024 at 11:12 pm Channel Ars Technica ← Previous story Related Stories Today on Ars