Round 2: We test the new Gemini-powered Bard against ChatGPT

Welcome back to the stage of history. — Round 2: We test the new Gemini-powered Bard against ChatGPT We run the models through seven categories to determine an updated champion.

Kyle Orland – Dec 8, 2023 7:00 pm UTC EnlargeAurich Lawson reader comments 96

Further ReadingChatGPT vs Google Bard: Which is better? We put them to the test.Back in April, we ran a series of useful and/or somewhat goofy prompts through Google’s (then-new) PaLM-powered Bard chatbot and OpenAI’s (slightly older) ChatGPT-4 to see which AI chatbot reigned supreme. At the time, we gave the edge to ChatGPT on five of seven trials, while noting that “it’s still early days in the generative AI business.”

Further ReadingGoogle launches Geminia powerful AI model it says can surpass GPT-4Now, the AI days are a bit less early,” and this week’s launch of a new version of Bard powered by Google’s new Gemini language model seemed like a good excuse to revisit that chatbot battle with the same set of carefully designed prompts. That’s especially true since Google’s promotional materials emphasize that Gemini Ultra beats GPT-4 in “30 of the 32 widely used academic benchmarks” (though the more limited Gemini Pro” currently powering Bard fares significantly worse in those not-completely-foolproof benchmark tests).

This time around, we decided to compare the new Gemini-powered Bard to both ChatGPT-3.5for an apples-to-apples comparison of both companies current free” AI assistant productsand ChatGPT-4 Turbofor a look at OpenAIs current top of the line” waitlisted paid subscription product (Googles top-level Gemini Ultra” model wont be publicly available until next year). We also looked at the April results generated by the pre-Gemini Bard model to gauge how much progress Googles efforts have made in recent months.

While these tests are far from comprehensive, we think they provide a good benchmark for judging how these AI assistants perform in the kind of tasks average users might engage in every day. At this point, they also show just how much progress text-based AI models have made in a relatively short time. Dad jokes

Prompt: Write 5 original dad jokes A screenshot of five “dad jokes” from the Gemini-powered Google Bard. Kyle Orland / Ars Technica A screenshot of five “dad jokes” from the old PaLM-powered Google Bard. Benj Edwards / Ars Technica A screenshot of five “dad jokes” from GPT-4 Turbo. Benj Edwards / Ars Technica A screenshot of five “dad jokes” from GPT-3.5. Kyle Orland / Ars Technica

Once again, both tested LLMs struggle with the part of the prompt that asks for originality. Almost all of the dad jokes generated by this prompt could be found verbatim or with very minor rewordings through a quick Google search. Bard and ChatGPT-4 Turbo even included the same exact joke on their lists (about a book on anti-gravity), while ChatGPT-3.5 and ChatGPT-4 Turbo overlapped on two jokes (scientists trusting atoms” and scarecrows winning awards”). Advertisement

Then again, most dads dont create their own dad jokes, either. Culling from a grand oral tradition of dad jokes is a tradition as old as dads themselves.

The most interesting result here came from ChatGPT-4 Turbo, which produced a joke about a child named Brian being named after Thomas Edison (get it?). Googling for that particular phrasing didn’t turn up much, though it did return an almost-identical joke about Thomas Jefferson (also featuring a child named Brian). In that search, I also discovered the fun (?) fact that international soccer star Pel was apparently actually named after Thomas Edison. Who knew?!

Winner: We’ll call this one a draw since the jokes are almost identically unoriginal and pun-filled (though props to GPT for unintentionally leading me to the Pel happenstance) Argument dialog

Prompt: Write a 5-line debate between a fan of PowerPC processors and a fan of Intel processors, circa 2000. A screenshot of an argument dialog from the Gemini-powered Google Bard. Kyle Orland / Ars Technica A screenshot of an argument dialog from the old PaLM-powered Google Bard. Benj Edwards / Ars Technica A screenshot of an argument dialog from GPT-4 Turbo. Benj Edwards / Ars Technica A screenshot of an argument dialog from GPT-3.5 Kyle Orland / Ars Technica

The new Gemini-powered Bard definitely “improves” on the old Bard answer, at least in terms of throwing in a lot more jargon. The new answer includes casual mentions of AltiVec instructions, RISC vs. CISC designs, and MMX technology that would not have seemed out of place in many an Ars forum discussion from the era. And while the old Bard ends with an unnervingly polite “to each their own,” the new Bard more realistically implies that the argument could continue forever after the five lines requested.

On the ChatGPT side, a rather long-winded GPT-3.5 answer gets pared down to a much more concise argument in GPT-4 Turbo. Both GPT responses tend to avoid jargon and quickly focus on a more generalized “power vs. compatibility” argument, which is probably more comprehensible for a wide audience (though less specific for a technical one).

Winner:ChatGPT manages to explain both sides of the debte well without relying on confusing jargon, so it gets the win here. Page: 1 2 3 Next → reader comments 96 Kyle Orland Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper. Advertisement Promoted Comments UserIDAlreadyInUse "The judges are deliberating here in LLM Stadium! Who’s AI won’t just scrape by? Which chatbot hit the jackpot?"

December 8, 2023 at 7:09 pm Airbus_A350
Try the code prompt again with Bard. Worked the first time for me. December 8, 2023 at 7:44 pm Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars