Passing the test — If AI is making the Turing test obsolete, what might be better? The Turing test focuses on the ability to chatcan we test the ability to think?
Rupendra Brahambhatt – Dec 15, 2023 12:16 am UTC Enlargemevans reader comments 94
If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoningour ability to apply logic and think rationally before making decisions? How could we even identify whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.
“This test treats an ‘intelligent’ program as though it were a participant in a psychological study and has three steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers note.
They suggest the standard methods of evaluating a machines intelligence, such as the Turing Test, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Googles LaMDA and OpenAIs ChatGPT, for example, have come close to passing the Turing Test, yet the test results dont imply these programs can think and reason like humans.
This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing Test. We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason? the study authors argue. Whats wrong with the Turing Test?
During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind test, so evaluators dont know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-like responsesto the extent that evaluators struggle to distinguish between the human and the AI programthe AI is considered to have passed. However, since the Turing Test is based on subjective interpretation, these results are also subjective. Advertisement
The researchers suggest that there are several limitations associated with the Turing Test. For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the test clearly doesnt evaluate a machine’s reasoning and logical ability.
The results of the Turing Test also cant tell you if a machine can introspect. We often think about our past actions and reflect on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.
AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools, Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.
In addition to this, the Turing Test fails to analyze an AI programs ability to think. In a recent Turing Test experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?
Alan Turing, the famous British scientist who created the Turing Test, once said, A computer would deserve to be called intelligent if it could deceive a human into believing that it was human. His test only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including those other aspects.
Its unclear whether passing the Turing Test is a meaningful milestone or not. It doesnt tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence, Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg. Page: 1 2 Next → reader comments 94 Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars