What Drives OpenAI's Top Executive to Demand More Oversight on Artifice Intelligence Technology?

Is It Time to Retire the Turing Test? Five Modern Evaluation Methods for Artifice Intelligence

Over 70 years ago, when artificial intelligence was conceptualized, Alan Turing published a paper that described how to identify it. It was later known as the Turing test, and it has been used for decades to distinguish between a human and an AI.

However, with the introduction of advanced AI chatbots like ChatGPT and Google Bard, it’s becoming more difficult to tell if you’re talking to an AI. It begs the question; is the Turing test outdated? And if it is, what are the alternatives?

Is the Turing Test Outdated?

Image Credit: Jesus Sanz/Shutterstock

To determine if the Turing test is outdated, you must firstunderstand how it works . For an AI to pass the Turing test, it must convince a human interrogator that it’s a human. But there is a catch—the AI is evaluated alongside a human, and it must respond using text.

Think of it like this; if you’re the interrogator, and you’re asking questions to two participants online using text, but one of them is an AI model—would you tell them apart after five minutes? Keep in mind that the objective of the Turing test is not to identify the AI model based on the correct answers but to evaluate if the AI can think or behave like a human.

The problem with the Turing test approach of only identifying human-like responses is that it doesn’t consider other factors. For instance, the intelligence of the AI model or the knowledge of the interrogator. Besides that, the Turing test is limited to text only, and it’s becoming more difficult to identify an AI that generates a human voice ordeepfake videos that imitate human behavior .

However, the current AI models like ChatGPT-4 and Google Bard haven’t yet advanced to a point they can consistently pass the Turing test. In fact, if you’re familiar with AI, you canspot AI-generated text .

The 5 Best Turing Test Alternatives

It’s possible thatfuture AI models like ChatGPT-5 could pass the Turing test. If that happens, we would need different tests combined with the Turing test to identify if we’re talking to an AI or a human. Here are the best Turing test alternatives:

1. The Marcus Test

Gary Marcus, a renowned cognitive scientist and AI researcher, proposed an alternative to the Turing test that was published in theNew Yorker to identify the cognitive ability of an AI. The test is simple—you judge an AI model based on its ability to watch and understand YouTube videos and TV shows without subtitles or text. For the AI to pass the Marcus test, it should understand sarcasm, humor, irony, and the storyline when watching the videos and explain it like a human.

At the moment,GPT-4 can describe images , but so far, there is currently no AI model that can comprehend videos like a human.Self-driving vehicles come close, but they’re not completely autonomous and require sensors since they can’t make sense of everything in their surrounding environment.

2. The Visual Turing Test

According to a research paper published onPNAS , the visual Turing test can be used to identify if you’re talking to a human or an AI using image questionaries. It works like the Turing test, but instead of answering questions using texts, participants are shown images and expected to answer simple questions while thinking like a human. However, thevisual Turing test is different from CAPTCHAs since all the answers are correct—but to pass the test, the AI must process the images similarly to a human.

Beyond that, if an AI and a human are shown multiple images side by side and asked to identify realistic images, the human would have the cognitive ability to pass the test. This is because AI models find it difficult to distinguish images that don’t look like they were taken in the real world. In fact, that’s the reason why you canidentify AI-generated images using anomalies that don’t make sense.

3. The Lovelace 2.0 Test

The theory that a computer can’t create original ideas beyond what it was programmed to do was first conceptualized by Ada Lovelace before the Turing test. However, Alan Turing objected to that theory arguing that AI can still surprise humans. It wasn’t until 2001 that the guidelines for the Lovelace test were developed to tell apart an AI from a human—and, as per thethekurzweillibrary the rules were later revised in 2014.

For an AI to pass the Lovelace test, it must demonstrate that it can generate original ideas that exceed its training. Current AI models like GPT-4 don’t have the capability to come up with new inventions beyond our existing knowledge. However,artificial general intelligence can achieve that capability and pass the Lovelace test.

4. Reverse Turing Test

How about the Turing test, but done in reverse? Instead of trying to find out if you’re talking to a human, the objective of thereverse Turing test is to trick the AI into believing you’re an AI. However, you also need another AI model to answer the same questions using text.

For instance, if ChatGPT-4 is the interrogator, you could enroll Google Bard and another human as participants. If the AI model can correctly identify the human participant based on the answers, it has passed the test.

The downside of the reverse Turing test is that it’s unreliable, especially considering that sometimesAI cannot differentiate AI-generated and human-written content.

5. AI Classification Framework

According to the AI classification framework developed byChris Saad , the Turing test is just one evaluation method to know if you’re talking to an AI. More succinctly, the AI classification framework is based on the theory of multiple intelligence, which requires human intelligence to satisfy at least eight different criteria, which include: musical-rhythm, logical-mathematical intelligence, visual identification, emotional intelligence, self-reflective intelligence, existential thinking ability, and body movement.

Since the AI is evaluated on eight different parameters, it’s unlikely to pass for a human even if it performs better than average in certain benchmarks. For instance,ChatGPT can solve math problems , describe images, and converse in a natural language like a human, but it would fail other categories defined in the AI classification framework.

The Turing Test Is Not Conclusive

The Turing test was meant to be more of a thought experiment than a conclusive test to differentiate between humans and AI. When it was initially proposed, it was the pivotal benchmark for measuring machine intelligence.

However, with the recent development of AI models with speech, visual, and hearing interactive capabilities, the Turing test falls short since it’s limited to text conversation. The most effective solution would be to introduce Turing test alternatives that further differentiate AI models from humans.

Tech Haven