How to Fool an AI Using Adversarial Machine Learning and Other Tricks

From chatbots to that creepy Sophia robot which seems to hint vaguely at world domination, AI is already here. Sophisticated algorithms use machine learning to achieve certain tasks, including self-driving cars and beating grandmasters at complex games.

Artificial neurons are trained with input data. These inputs are processed and the parameters of the AI’s model continually refined until it can achieve target outputs with greater accuracy.

Machine learning is all very impressive but AI, in its current form at least, also has plenty of limitations. In fact, it can be fooled by coders who know what they are doing, or by humans who are willing to wear weird t-shirts.

The trick to fooling an AI is all in the interference with input data – the further away from its training you can take it, the more useless it becomes.

Here’s a Tesla self-driving into a $3.5 million private jet

World class AI loses to simple Go strategy

Games have long been a testing ground for AI and machine learning. Can AI beat the greatest humans at chess or poker? Can it come up with a flawless strategy for Go? The more complex the game, the more difficult the task.

Of course, it’s easy for both AI and humans to learn the Texas Hold ‘em starting hands, but to apply a correct winning strategy against pros is something else entirely. Most bots are not sophisticated enough to beat even average players, and would be caught and banned by online sites anyway.

But there have been a couple of examples of success. Liberatus was the first AI system to beat the pros, earning $1.2 million worth of chips in a couple of hours. Pluribus has also achieved a decent cash game rate over 10,000 hands against talented players.

The ancient Chinese game Go is arguably even more complex, yet AI programs have recently been developed that have beaten the pros.

Here’s the catch, though. Although AI systems like KataGo can beat top tier human opponents, they have recently been defeated by a very basic strategy. Researchers at UC Berkeley created an “adversarial policy” that exploited KataGo’s blindspots.

In simple terms, the program tricks KataGo into thinking it’s winning the game so that it skips a turn. The adversary can then also skip a turn to end the match at a time when it is winning on points.

Self-driving cars and stop signs

The issue is that, the further outside of training data that you take AI, the less accurate you can expect its outputs to be. By playing completely outside of the realm of its training, Kata’s adversary defeated it consistently and easily.

Not so bad in a game of Go, not great when the same concept is applied to real world scenarios.

Back in 2017, researchers demonstrated that you could fool self-driving cars by slightly defacing road signs with either a graffiti effect or with sticky tape. The self-driving cars would quite consistently read the ‘Stop’ signs as ‘Speed Limit 45’.

The proposed solution was to coat signs with graffiti and sticky tape-proof material.

This again demonstrates a kind of interference with the AI’s input data. The same effect can be achieved by applying certain filters to pictures.

Granted the AI technology is improving, and it certainly needs to in order to be released to the public at any scale, but its not quite there yet.

That t-shirt we mentioned

Of course, if we’re ever to stop pesky vandals crashing all the self-driving cars, we’ll need facial recognition, won’t we? Except that facial recognition currently isn’t very reliable, and it can be tricked by wearing a stupid t-shirt.

Researchers at IBM and MIT have come up with a kaleidoscope patterned t-shirt, full of color, that completely baffles digital surveillance systems by adding pixel noise to certain areas of the body so that the AI cannot define the body as a person.

These t-shirts are known as adversarial examples, the physical version of the adversarial programs used to beat the KataGo AI.