Shall we play another game?
- 5 minutes ago
- 3 min read

It’s been over seventy-five years since Alan Turing came up with the Imitation Game in his 1950 paper, Computing Machinery and Intelligence, and I have previously written written about the Turing Test.
After so many years, and so much work on Artificial Intelligence, there is a lot of criticism of the test, and discussion of how to beat it, and systems which can now “beat” it, but not much on what comes next.
Does beating the Turing test mean that we have achieved AI? Is the test now obsolete? Irrelevant?
Maybe there’s a better test, like the ARC Prize I have discussed before? I’d say, probably not - while the ARC Prize is very interesting, it does not appear to focus on general intelligence.
Or maybe Dr Turing was on to something, and we should be continuing his work?
I think he was, and that we should.
In my opinion, the major weakness in the original Imitation Game was that Dr Turing underestimated the degree to which mimicry of intelligence can be interpreted as intelligence, via the human brain’s agency detection. Had he survived, and continued his work, I think it very likely that he would have addressed this, most likely by extending the game and treating it as a conversation leading to a level of confidence, rather than a competition with a binary win/lose. I should also note that this was at the dawn of both computing and cognitive science, and Dr Turing appears to have been ahead of the curve in both.
With the vast amount we have learned since Dr Turing’s untimely death, we now recognize that there are likely multiple types of intelligence, and varying degrees to which each can be measured. So, in contrast to the original test, which attempted to distinguish between a human and a computer through a single test, consider a suite of tests designed to assess different types of intelligence.
Such tests might include “memory” tests, but measuring simple memorization of facts seems pointless. It seems more likely that “remembering” the significance of a prior fact, and how it relates to the current context might be more relevant.
Other tests might attempt to identify behaviours which are unique to “natural” vs “synthetic” intelligences. One way to identify a “synthetic” intelligence might be to test for some version of the “strawberry problem”, where the subject is asked to count the number of occurrences of the letter “r” in the word “strawberry”. As a side-effect of the way LLMs (Large Language Models) handle data tokens, most traditionally calculated the number as two, rather than three. While this specific example has been addressed in most major LLMs, the underlying issue is inherent to the training process, and unlikely to change fundamentally.
On the other hand, it might be possible to identify a “natural” intelligence by attempting to “trigger” a cognitive bias which exists in humans, but which does not appear – or presents differently - in AI systems.
I would suggest starting with identifying “types” of intelligence which are relevant and reasonably testable, then building suites of tests by which each can be assessed. These tests can then be applied to a variety of entities, refined over time, and used to learn more about the practical nature of intelligence.
By assessing the scores of these tests, we can define guidelines and thresholds for scoring intelligence levels. I should note that I am not referring to “IQ” tests, but rather to tests intended to estimate the likelihood of a subject exhibiting a specific type of intelligence.
Interestingly, such an endeavour would not only support ongoing research into AI, but would also help to clarify our understanding of intelligence more generally. As an example, if empathy is one of the “types” of intelligence measured, it might be that some who experiences antisocial personality disorder might receive low scores in that type of intelligence, and higher scores in other types of intelligence.
Similarly, some subjects might exhibit some types of intelligence, but not others. Understanding such cases would also contribute to our understanding of how these might relate to “general” intelligence, and might allow us define an entity as exhibiting evidence of one type of intelligence, but not another.
Dr Turing’s original work in this area seems to have been well ahead of its time, as we are only now starting to understand the scope and applicability of the test he suggested so long ago. But, rather than simply noting that he was a genius, and picking faults with the original test, I think we should be continuing his work, and recognizing his ongoing legacy as a pioneer, not only in computers and cryptography, but also in our understanding of intelligence – both natural and synthetic.
Cheers!