September 20, 2024

If it talks like a human… — Disarmingly lifelike: ChatGPT-4o will laugh at your jokes and your dumb hat It’s amazing what a few well-placed chuckles and vocal tone shifts can do.

Kyle Orland – May 13, 2024 10:25 pm UTC Enlarge / Oh you silly, silly human. Why are you so silly, you silly human?Aurich Lawson | Getty Images reader comments 55

Further ReadingMajor ChatGPT-4o update allows audio-video talks with an emotional AI chatbotAt this point, anyone with even a passing interest in AI is very familiar with the process of typing out messages to a chatbot and getting back long streams of text in response. Today’s announcement of ChatGPT-4owhich lets users converse with a chatbot using real-time audio and videomight seem like a mere lateral evolution of that basic interaction model.

After looking through over a dozen video demos OpenAI posted alongside today’s announcement, though, I think we’re on the verge of something more like a sea change in how we think of and work with large language models. While we don’t yet have access to ChatGPT-4o’s audio-visual features ourselves, the important non-verbal cues on display hereboth from GPT-4o and from the usersmake the chatbot instantly feel much more human. And I’m not sure the average user is fully ready for how they might feel about that. It thinks it’s people

Take this video, where a newly expectant father looks to ChatGPT-4o for an opinion on a dad joke (“What do you call a giant pile of kittens? A meow-ntain!”). The old ChatGPT4 could easily type out the same responses of “Congrats on the upcoming addition to your family!” and “That’s perfectly hilarious. Definitely a top-tier dad joke.” But there’s definitely much more impact to hearing GPT-4o give that same information in the video, complete with the gentle laughter and rising and falling vocal intonations of a lifelong friend.

Or look at this video, where GPT-4o finds itself reacting to images of an adorable white dog. The AI assistant immediately dips into that high-pitched, baby-talk-ish vocal register that will be instantly familiar to anyone who has encountered a cute pet for the first time. It’s a convincing demonstration of what xkcd’s Randall Munroe famously identified as the “You’re a kitty!” effect, and it goes a long way to convincing you that GPT-4o, too, is just like people. Advertisement Enlarge / Not quite the world’s saddest birthday party, but probably close…OpenAI

Then there’s a demo of a staged birthday party, where GPT-4o sings the “Happy Birthday” song with some deadpan dramatic pauses, self-conscious laughter, and even lightly altered lyrics before descending into some sort of silly raspberry-mouth-noise gibberish. Even if the prospect of asking an AI assistant to sing “Happy Birthday” to you is a little depressing, the specific presentation of that song here is imbued with an endearing gentleness that doesn’t feel very mechanical.

As I watched through OpenAI’s GPT-4o demos this afternoon, I found myself unconsciously breaking into a grin over and over as I encountered new, surprising examples of its vocal capabilities. Whether it’s a stereotypical sportscaster voice or a sarcastic Aubrey Plaza impression, it’s all incredibly disarming, especially for those of us used to LLM interactions being akin to text conversations.

Further ReadingMicrosoft lobotomized AI-powered Bing Chat, and its fans arent happyIf these demos are at all indicative of ChatGPT-4o’s vocal capabilities, we’re going to see a whole new level of parasocial relationshipdeveloping between this AI assistant and its users. For years now, text-based chatbots have been exploiting human “cognitive glitches” to get people to believe they’re sentient. Add in the emotional component of GPT-4o’s accurate vocal tone shifts and wide swathes of the user base are liable to convince themselves that there’s actually a ghost in the machine. See me, feel me, touch me, heal me

Beyond GPT-4o’s new non-verbal emotional register, the model’s speed of response also seems set to change the way we interact with chatbots. Reducing that response time gap from ChatGPT4’s two to three seconds down to GPT-4o’s claimed 320 milliseconds might not seem like much, but it’s a difference that adds up over time. You can see that difference in the real-time translation example, where the two conversants are able to carry on much more naturally because they don’t have to wait awkwardly between a sentence finishing and its translation beginning. Page: 1 2 Next → reader comments 55 Kyle Orland Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper. Advertisement Promoted Comments brewejon If the released product is as good as the demos then this is quite amazing. My main concerns are (in no particular order):
this speaking voice will cause people to trust the answers way more than mere text answers, meaning more incorrect information being spread around. Also if youre interacting via speech youre probably less likely to stop and quickly fact check a statement by chatgpt via a normal internet search. the amount of people claiming this AI is sentient is going to rise dramatically, and thats going to be very annoying. if this thing becomes as popular as I think it might, its going to have quite the environmental footprint. This at a time when we need to be reducing impacts. May 13, 2024 at 10:43 pm DaVuVuZeLa I think this is quite the accomplishment. Not even Data understood comedy.

May 13, 2024 at 10:57 pm Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars