ML Pseudoscience

Apr 15
5 min read

PSEUDOSCIENCE: Human and animal physiognomies, via Wikipedia.org

NOTE: An edited version of this article appeared on The Skeptic (UK).

Computer science is science, right? It’s in the name.

This is actually a topic of some debate. Is it a scientific discipline? An engineering discipline? A branch of mathematics?

Yes. All of that, and more. Probably best to describe it as a “multi-disciplinary field” and note the intersection points with mathematics, cognitive science, linguistics, physics, and others.

As to why it matters, your mindset and training can have a significant impact on the way you approach questions, and how you answer them.

In my experience (nearly thirty years in IT), most “computer science” people have an engineering mindset. Their approach is to understand and define a problem, then develop a plan and approach for solving it with available tools, focusing on the practical realities of the problem at hand. Whenever possible, the “good” ones will try to make improvements to existing tools, or create new tools, but will rarely go back to first principles.

Machine Learning (ML) is a branch of Artificial Intelligence (AI), where a neural network is “trained” to recognize patterns, in order to identify those patterns in future. 3Blue1Brown has an excellent overview of what neural networks are and how they work, but the basic idea is around repeatedly adjusting probability predictions according to a data set. Approaches to ML can, roughly, be broken into “supervised”, “unsupervised”, or “reinforcement” learning models.

Someone with a scientific mindset might break ML into theory and application.

On the theory side, what does it mean when we say that a model learns, and how does it work? How can we evaluate the process and test what is learned? What factors affect the quality of the learning process, and how can these be investigated?

And, once we understand how it works, how can we apply it effectively? For what sort of problems is it well-suited? Are there cases where it is not an effective tool?

It appears, however, that many researchers approach ML from an engineering perspective, so they ask different questions. How can I use this? How can I make it better/faster/cheaper? What problems can I solve with this new tool?

This is where pseudoscience can rear its ugly head.

Several examples are described in a 2024 paper called: “The reanimation of pseudoscience in machine learning and its ethical repercussions”, where the authors describe the process by which pseudoscience and junk science is being “laundered” by ML.

Over the years, ML has been demonstrated to be a very useful tool for facial recognition, and the accuracy of such identification has improved steadily over the years. In fact, the primary concerns about the technology are not accuracy, but rather ethics and security.

But facial recognition is a relatively “simple” problem, in that the goal is to maximize the accuracy of the process. For example, if face recognition is used for access control, such as on your phone, the goal is to optimize the rates of a false-negative, where the tool rejects a valid face, and false-positive, where the tool accepts an invalid face.

But what if you apply ML to a badly-defined or invalid question?

One of the studies referenced (Alam, et al), gives the game away in the abstract. After describing autism in a very stigmatized way as “a neurological illness characterized by deficits in cognition, physical activities, and social skills”, they admit that “there is no medical test to identify ASD”, but then state that “the human face can be used as a biomarker as it is one of the potential reflections of the brain and can thus be used as a simple and handy tool for early diagnosis”.

Wow. Where to begin?

Autism Spectrum Disorder (ie ASD or Autism) is a blanket term for a range of conditions, and covers a group ranging from non-verbal people requiring constant care, to highly articulate and successful people who have a different way of interacting with the world and processing sensory input. To describe it as an “illness” is, at best, obsolete and inappropriate.

To say that there is no “medical test” to identify autism is technically correct, if you assume they refer to a blood test, genetic test, or similar, but there are well-defined diagnostic criteria. But then they leap into the long-debunked pseudoscience of physiognomy (which boils down to “well, you look autistic”), and try to find the “best fit” of hyperparameters (ie, parameters associated with the machine learning process, rather than what is being learned) in order to maximize the “accuracy” of their ML model.

Have they demonstrated a link between facial features and autism? No.

Have they proposed a mechanism by which an autistic person might present certain facial features? No.

Have they provided details and an assessment of the dataset used for training, and how decisions were made regarding which were those of autistic people? No.

Have they provided enough detail to replicate their “study”? No.

Other studies described claim to train ML models to use photos, voice recordings, or other biometric data to to identify characteristics such as race, sexuality, mental illness, criminal propensity, and neuroticism. But, without first demonstrating a link between some biometric trait and some individual characteristic, what do you get?

Nothing.

Worse than nothing.

What you get is a “study” which assumes the validity of the link, then searches for data points which can be claimed as evidence for future “studies”.

There’s a term for that. It’s called junk science, and it’s being used to try and establish the pseudoscience of physiognomy as “valid”.

Among many other problems, laundering pseudoscience and junk science in this way can lead to companies marketing their technology in new and worrisome ways.

Rather than selling solid technology for facial recognition, one company brags about using “advanced machine learning techniques” to provide “an array of classifiers”, which “represent a certain persona, with a unique personality type, a collection of personality traits or behaviors”, including “High IQ”, “Academic Researcher”, “Professional Poker Player”, “Bingo Player”, “Brand Promoter”, “White-Collar Offender”, “Terrorist”, and “Pedophile”.

That got very dark, very quickly, no?

How much time do we spend on-camera? Online at work? At airports? Malls? Sports events? Government offices? How about body-cameras used by law enforcement?

The ethical and privacy implications of our current level of surveillance are already of great concern to many people, but what if a person is identified as a terrorist simply because of an ML model “trained” using pseudoscience? What if a person has a job offer made or withheld due to a facial scan identifying them as “High IQ” or a “White-Collar Offender”?

What could possibly go wrong?

Cheers!