Review and discussion of the paper 'Adversarial Attacks and Defenses in Deep Learning' by Kui Ren, Tiahnhang Zheng, Zhan Qin, Xue Liu (2020) for Machine Learning Book Club.
2. On April 23, 2013, Syrian hackers compromised the Associated Press
Twitter feed and tweeted, âBreaking: Two Explosions in the White House
and Barack Obama is injuredâ.
In response to the tweet, the Dow Jones Industrial Average dropped by
$136 billion dollars (although this drop was reversed 3 minutes later).
3. What are adversarial attacks and why should you
care?
â Any attempt to fool a deep learning model with deceptive input
â Especially researched in image recognition, but can also be applied to audio, text or tabular data
â When building models, we mostly focus on classiïŹcation effectiveness/ minimizing error. Relatively
little work on model security and robustness.
â Imperceptible amounts of non-random noise can fool neural networks!
â Some of these attacks are 100% effective in fooling normal neural networks!
7. What Iâll talk about
â Threat-models
â Some background terminology
â Notable adversarial models
â Notable adversarial defences
â Trends and remaining challenges
â Code
8. Level of threat
â White-box: full knowledge of model architecture and parameters
â Gray-box: knowledge limited to features, model type
â Black-box: no/minimal knowledge of the model, can only use output
All non-adversarially trained models are susceptible, even to black box models
Adversarially trained models are still susceptible to white box models
9. Background
Adversarial loss: J(Ξ, x, y), Ξ = model weights
An adversarial sample xâ : D(x,xâ) < η (predeïŹned distance constraint, perturbation)
â Idea: ïŹnd the minimum difference or perturbation f(xâ) â yâ
10. Adversarial samples should be indistinguishable from
benign samples
Distance metrics:
â Lâ distance: What is the squared difference between adversarial and benign image?
â Lâ distance: Maximum element-wise difference between adversarial and benign image
(for each pixel, take the absolute value difference between X and Z, and return the
largest such distance found over all pixels)
11. Notable adversarial models
Limited-memory BFGS
Grid search / line search to ïŹnd optimal hyperparameter
Carlini and Wagner (C&W) attack
Set of optimization-based attacks that generate Lâ, Lâ and Lâ norm measured adversarial samples, with some
restrictions (kappa) to make sure a valid image is produced
100% attack success on ânormalâ neural networks trained on MNIST, CIFAR-10, ImageNet
Compromised defensive models
12. Notable adversarial models
DeepFool
âIterative linearization of the
classiïŹer to generate minimal
perturbations that are sufïŹcient
to change classiïŹcation labelsâ
Computes perturbations more
reliably
Moosavi-Dezfooli et al., https://arxiv.org/pdf/1511.04599.pdf
13.
14. Notable adversarial models
Universal adversarial attack
â Is there a universal perturbation that will work on most
samples?
â L-BFGS- based
â Effective in attacking NN like CaffeNet, GoogleNet, VGG and
Resnet
â Fooling rate > 53%
16. Text & Audio Models
1% audio perturbation can change 50 words in
text transcription!
Attacks are robust to MP3 compression, but get
lost when played over speakers
https://nicholas.carlini.com/code/audio_adversarial_examples/
Strategies for text attacks generally include
deleting, inserting, and modifying
characters/words
17. Adversarial defenses fall into 5 categories
1. Training on adversarial samples
2. Randomization
3. Adding noise
4. Removing noise
5. Mathematically provable defenses
18. Defang: Randomize input or features
â Randomly padding and resizing input; image transformations with randomness
19. â Add random noise layer before each convolutional layer in training and test sets (RSE)
â Random feature pruning at each layer
20. Detect: Denoise the input or features
â Conventional input rectiïŹcation
â âSqueezeâ image â if output is very different from input, then likely adversarial
â GAN-based
â Use GAN to learn benign data distribution
â Generate a benign projection for the adversarial sample
â Autoencoder-based
â Detector & reformer
â Use an autocoder to compress input and learn manifold of benign samples
â Detector compares each sample to learnt manifold
â Reformer rectiïŹes adversarial samples
21. Detect: Denoise the input or features
â High-level representation guided denoiser (HGD)
â Trains a denoising u-net using a feature-level loss function to minimize feature
differences between benign and adversarial samples
â Won ïŹrst place in black-box defenses, 2017
â Even so, certain (white-box) attacks can reduce effectiveness to 0%
22. Provable (certiïŹcated) defenses
â Defenses that have theoretical backing to have a certain accuracy against attacks
â Range of defenses include KNN and Bayesian-based defenses
â Consistency-based defenses:
â Perturbations also affect the area around them
â > 90 detection rate
â Very computationally intensive
23. Trends in adversarial research
â Design stronger attacks to probe for weaknesses
â Real-world attack capabilities
â CertiïŹcated defenses - but currently not scalable
âA problem is that an attack can only target one category of defenses, but defenses are required to ⊠be effective
against all possible attack methodsâ
â Analyzing model robustness - mostly done on KKN and linear classiïŹers
24. Unresolved challenges
â Causality
â Does a general robust decision boundary exist that could be learnt by (certain) neural
networks?
â Effectiveness vs efïŹciency
â Adversarial training is effective, but requires a lot of data and compute
â Randomization and denoising strategies very efïŹcient, but not as effective as claimed
25. Discussion
In what other ways are models not robust?
Is model robustness/ security applicable to what you do / to our students?
Model fairness has been a hot topic lately, but robustness/ security seems to lag behind - what do you
think needs to change for adversarial training to be widely implemented?
What are your thoughts on the paper in general?
26. Try it yourself
Benchmark machine learning systems' vulnerability to adversarial examples:
https://github.com/cleverhans-lab/cleverhans
Blog: cleverhans.io