Fast and simple.
How Adversarial Attacks Work
What is an Adversarial Attack?
Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.
How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.
In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.
Types of Adversarial Attacks
Here are the main types of hacks we will focus on:
- Non-targeted adversarial attack: the most general type of attack when all you want to do is to make the classifier give an incorrect result.
- Targeted adversarial attack: a slightly more difficult attack which aims to receive a particular class for your input.
É um tema bem recente o ataque em sensores e por consequência em modelos de Machine Learning (isso será abordado em algum momento do futuro por aqui, mas esse artigo mostra bem o potencial danoso disso).
O Cleverhans é uma lib em Python que insere de maneira artificial um pouco de ruído/distúrbio na rede como forma de treino para esse tipo de ataque.
This repository contains the source code for cleverhans, a Python library to benchmark machine learning systems’ vulnerability to adversarial examples. You can learn more about such vulnerabilities on the accompanying blog.
The cleverhans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open.
About the name
The name cleverhans is a reference to a presentation by Bob Sturm titled “Clever Hans, Clever Algorithms: Are Your Machine Learnings Learning What You Think?” and the corresponding publication, “A Simple Method to Determine if a Music Information Retrieval System is a ‘Horse’.” Clever Hans was a horse that appeared to have learned to answer arithmetic questions, but had in fact only learned to read social cues that enabled him to give the correct answer. In controlled settings where he could not see people’s faces or receive other feedback, he was unable to answer the same questions. The story of Clever Hans is a metaphor for machine learning systems that may achieve very high accuracy on a test set drawn from the same distribution as the training data, but that do not actually understand the underlying task and perform poorly on other inputs.