Adversarial Attacks and Defenses in Deep Learning.pdf

Adversarial Attacks and
Defenses in Deep Learning
Kui Ren, Tiahnhang Zheng, Zhan Qin, Xue Liu (2020) Engineering

On April 23, 2013, Syrian hackers compromised the Associated Press
Twitter feed and tweeted, “Breaking: Two Explosions in the White House
and Barack Obama is injured”.
In response to the tweet, the Dow Jones Industrial Average dropped by
$136 billion dollars (although this drop was reversed 3 minutes later).

What are adversarial attacks and why should you
care?
● Any attempt to fool a deep learning model with deceptive input
● Especially researched in image recognition, but can also be applied to audio, text or tabular data
● When building models, we mostly focus on classiﬁcation effectiveness/ minimizing error. Relatively
little work on model security and robustness.
● Imperceptible amounts of non-random noise can fool neural networks!
● Some of these attacks are 100% effective in fooling normal neural networks!

Gong & Poellabauer, 2018. Comp Sci & Engineering

What I’ll talk about
● Threat-models
● Some background terminology
● Notable adversarial models
● Notable adversarial defences
● Trends and remaining challenges
● Code

Level of threat
● White-box: full knowledge of model architecture and parameters
● Gray-box: knowledge limited to features, model type
● Black-box: no/minimal knowledge of the model, can only use output
All non-adversarially trained models are susceptible, even to black box models
Adversarially trained models are still susceptible to white box models

Background
Adversarial loss: J(θ, x, y), θ = model weights
An adversarial sample x’ : D(x,x’) < η (predeﬁned distance constraint, perturbation)
● Idea: ﬁnd the minimum difference or perturbation f(x’) ≠ y’

Adversarial samples should be indistinguishable from
benign samples
Distance metrics:
● L₂ distance: What is the squared difference between adversarial and benign image?
● L∞ distance: Maximum element-wise difference between adversarial and benign image
(for each pixel, take the absolute value difference between X and Z, and return the
largest such distance found over all pixels)

Notable adversarial models
Limited-memory BFGS
Grid search / line search to ﬁnd optimal hyperparameter
Carlini and Wagner (C&W) attack
Set of optimization-based attacks that generate L₀, L₂ and L∞ norm measured adversarial samples, with some
restrictions (kappa) to make sure a valid image is produced
100% attack success on ‘normal’ neural networks trained on MNIST, CIFAR-10, ImageNet
Compromised defensive models

DeepFool
“Iterative linearization of the
classifier to generate minimal
perturbations that are sufficient
to change classification labels”
Computes perturbations more
reliably
Moosavi-Dezfooli et al., https://arxiv.org/pdf/1511.04599.pdf

Universal adversarial attack
● Is there a universal perturbation that will work on most
samples?
● L-BFGS- based
● Effective in attacking NN like CaffeNet, GoogleNet, VGG and
Resnet
● Fooling rate > 53%

Adversarial patch
Only certain areas are perturbed

Text & Audio Models
1% audio perturbation can change 50 words in
text transcription!
Attacks are robust to MP3 compression, but get
lost when played over speakers
https://nicholas.carlini.com/code/audio_adversarial_examples/
Strategies for text attacks generally include
deleting, inserting, and modifying
characters/words

Adversarial defenses fall into 5 categories
1. Training on adversarial samples
2. Randomization
3. Adding noise
4. Removing noise
5. Mathematically provable defenses

Defang: Randomize input or features
● Randomly padding and resizing input; image transformations with randomness

● Add random noise layer before each convolutional layer in training and test sets (RSE)
● Random feature pruning at each layer

Detect: Denoise the input or features
● Conventional input rectiﬁcation
○ ‘Squeeze’ image → if output is very different from input, then likely adversarial
● GAN-based
○ Use GAN to learn benign data distribution
○ Generate a benign projection for the adversarial sample
● Autoencoder-based
○ Detector & reformer
○ Use an autocoder to compress input and learn manifold of benign samples
○ Detector compares each sample to learnt manifold
○ Reformer rectiﬁes adversarial samples

Detect: Denoise the input or features
● High-level representation guided denoiser (HGD)
○ Trains a denoising u-net using a feature-level loss function to minimize feature
differences between benign and adversarial samples
○ Won ﬁrst place in black-box defenses, 2017
○ Even so, certain (white-box) attacks can reduce effectiveness to 0%

Provable (certiﬁcated) defenses
● Defenses that have theoretical backing to have a certain accuracy against attacks
● Range of defenses include KNN and Bayesian-based defenses
● Consistency-based defenses:
○ Perturbations also affect the area around them
○ > 90 detection rate
● Very computationally intensive

Trends in adversarial research
● Design stronger attacks to probe for weaknesses
● Real-world attack capabilities
● Certiﬁcated defenses - but currently not scalable
“A problem is that an attack can only target one category of defenses, but defenses are required to … be effective
against all possible attack methods”
● Analyzing model robustness - mostly done on KKN and linear classiﬁers

Unresolved challenges
● Causality
● Does a general robust decision boundary exist that could be learnt by (certain) neural
networks?
● Effectiveness vs efﬁciency
○ Adversarial training is effective, but requires a lot of data and compute
○ Randomization and denoising strategies very efﬁcient, but not as effective as claimed

Discussion
In what other ways are models not robust?
Is model robustness/ security applicable to what you do / to our students?
Model fairness has been a hot topic lately, but robustness/ security seems to lag behind - what do you
think needs to change for adversarial training to be widely implemented?
What are your thoughts on the paper in general?

Try it yourself
Benchmark machine learning systems' vulnerability to adversarial examples:
https://github.com/cleverhans-lab/cleverhans
Blog: cleverhans.io

Adversarial Attacks and Defenses in Deep Learning.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Adversarial Attacks and Defenses in Deep Learning.pdf

Semelhante a Adversarial Attacks and Defenses in Deep Learning.pdf (20)

Último

Último (20)

Adversarial Attacks and Defenses in Deep Learning.pdf