adversarial robustness through local linearization

딥러닝 논문 읽기모임
이미지처리팀
Adversarial Robustness through
Local Linearization
Authors : Chongli Qin*, James Martens, Sven Gowal, Dilip Krishnan,
Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli,
Deepmind
Krishnamurthy (Dj) Dvijotham
Google
Dec/20/2020
NeurIPS 2019
김병현 안종식 홍은기 허다은

딥러닝
논문
읽기모임
What is Adversarial Attack?
2
 Intriguing properties of neural networks (Szegedy et al., 2014)
Neural networks are velnurable to visually imperceptible adversarial perturbations
Classified as
Ostrich,
Speaker
Mantis
Dog
Correctly
Predicted
Difference
Btwn Left/Right

딥러닝
논문
읽기모임
What is Adversarial Attack?
3
 Intriguing properties of neural networks (Szegedy et al., 2014)
Neural networks are velnurable to visually imperceptible adversarial perturbations
Classified as
Ostrich,
Speaker
Mantis
Dog
Correctly
Predicted
Difference
Btwn Left/Right

딥러닝
논문
읽기모임
Difficulties in Adversarial Training
 Previous Works on Adversarial Training
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Thermometer encoding: One hot way to resist adversarial examples
Stochastic Activation Pruning for Robust Adversarial Defense
Leveraging generative models to understand and defend against adversarial examples.
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial
Examples
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
Robustness via curvature regularization, and vice versa
 Growing model complexity and high input dimensionality
 make adversarial training prohibitive
4
VGG ResNet
More complex Higher dimension
MNIST
(28×28)
ImageNet
(256×256 )

딥러닝
논문
읽기모임
Difficulties in Adversarial Training
 Multiple iterations for computing adversarial perturbations
5
Define Model
Training Data Load
for i in range(training step):
for j in range(perturbation step):
train adversarial perturbation
(requires too much training cost)
training model against perturbations
Psuedo Code for Adversairal Trainining i.e. Non-linear loss surface
 Requires large step
to find an adversarial perturbation
Gradient Obfuscation
(Broken Gradient)
If perturbation steps are reduced,
A trained model becomes less robust

딥러닝
논문
읽기모임
How to Prevent Gradient Obfuscation
 Local Linearity Regularizer (LLR)
Make Loss Surface Linear around Training Example
• Much more likely to avoid Gradient Obfuscation
 Requires small number of steps
6
w/o LLR with LLR

딥러닝
논문
읽기모임
Contribution
 We show that training with LLR is significantly faster than adversarial
training, allowing us to train a robust ImageNet model with a 5× speed up when
training on 128 TPUv3 cores.
 We show that LLR trained models exhibit higher robustness relative to
adversarially trained models when evaluated under strong attacks. Adversarially trained
models can exhibit a decrease in accuracy of 6% when increasing the attack strength
at test time for CIFAR-10, whereas LLR shows only a decrease of 2%.
 We achieve new state of the art results for adversarial accuracy against
untargeted white-box attack for ImageNet (with = 4/2552 ): 47%. Furthermore, we
match state of the art results for CIFAR 10 (with = 8/255): 52.81%.
 We perform a large scale evaluation of existing methods for adversarially robust
training under consistent, strong, white-box attacks. For this we recreate
several baseline models from the literature, training them both for CIFAR-10 and
ImageNet (where possible).
7

딥러닝
논문
읽기모임
Adversarial Traininig
 Classification Function (Model)
Notiation
𝒙 : input, 𝜽 : weigts, 𝑪 : logits for classes
i.e. Classification model with Softmax Output
 Adversarial Training
more robust to adversarial attack
•  Return Same Result
• Perturbations Set (where : 𝜖 : magnitude)
8
w/o attack under attack

딥러닝
논문
읽기모임
 General Model Training
Empirical Risk Minimization (ERM)
Standard Cross-Entropy Loss
 Adversarial Trainig
Inner Maximization using Projected Gradient Descent
9
Constraint :

딥러닝
논문
읽기모임
 General Model Training
Empirical Risk Minimization (ERM)
Standard Cross-Entropy Loss
 Adversarial Trainig
Inner Maximization using Projected Gradient Descent
10
Constraint :
Define Model
Training Data Load
Important to achieve
fast/good(…) training

딥러닝
논문
읽기모임
Motivating the Local Linearity Regularizer
 Taylor Expansion
 Relatively Linear Loss Surface will be..
well-predicted by 1st order Talyor Expansion
11
1st order Taylor Expansion
a a
Error is
trivial
Error is
significant
Linear Loss Surface Non-Linear Loss Surfcae

딥러닝
논문
읽기모임
Motivating the Local Linearity Regularizer
 Local Linearity Measure
To measure local linearity of loss surface
12
True loss when
perturbation applied
1st Taylor Expansion
indicator of how linear the loss surface
Maximum among perturbation set
Local Linearity Measure

딥러닝
논문
읽기모임
Local Linearity Regularizer
 Empirical Observations on Adversarial Training
13

딥러닝
논문
읽기모임
Local Linearity Upper Bounds Adversarial Loss
 Local Linearity Upper Bounds Adversarial Loss
14
Difference between
Loss under attack and w/o attack
1st Taylor expansion step
+ Local linearity measure

딥러닝
논문
읽기모임
Local Linearity Regularization (LLR)
 Local Linearity Regularization (LLR)
15
Upper bound equation
Loss under attack can be
replaced by gamma and
1st Taylor expansion term
Adversarial training

딥러닝
논문
읽기모임
Local Linearity Regularizer - Algorithm
16

딥러닝
논문
읽기모임
Experiments and Results
 Loss Functions for Adversarial Attack
1. Random-Targeted
• i.e. Cat-to-Dog (Randomly Selected, Not Change while training)
2. Untargeted
• i.e. Cat-to-Dog or Mantis … Speaker
(Highest Logit excluding true lable, Change while training)
3. Multi-targeted
• i.e. Cat-to-Dog and Mantis … Speaker
(Target All Class except true lable, Not Change while training)
Nominal : Trained with Perturbation but tested without attack
 Metric
Attack Success Rate : Does the model return targeted class?
Adversarial Accuracy : Accuracy under Adversarial Attack
17
1
2
3

딥러닝
논문
읽기모임
 CIFAR-10
18
Sign of Gradient :
One of perturbation optimizer
2 3

딥러닝
논문
읽기모임
 ImageNet
LLR showed notably higher score when radius is 4/255
LLR showed lower score when radius is 16/255
19
2 1

딥러닝
논문
읽기모임
 Accuracy Degradation
20

딥러닝
논문
읽기모임
Images created with perturbation radius 16/255
21
 The perturbations are VISIBLE when radius is 16/255
Normal Under Attack
Perturbations are visible

딥러닝
논문
읽기모임
 Resistance to Gradient Obfuscation
22

딥러닝
논문
읽기모임
Ablation Studies
23

딥러닝
논문
읽기모임
Discussions
 Gradient Obfuscation : Broken Gradient
Shattered Gradients
• nonexistent or incorrect gradient (non-differentiable)
Stochastic Gradients
• randomized defenses
Exploding & Vanishing Gradients
• feeding the output of one computation as the input of the next
Even though the authors mentioned the original paper found
gradient obfuscation, they did not deal with any defensive
experiments on attack methods proposed in the paper.
24
Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated gradients give a false sense of security:
Circumventing defenses to adversarial examples." arXiv preprint arXiv:1802.00420 (2018).

딥러닝
논문
읽기모임
Discussions
 The paper showed how LLR contributes to a robust
adversarial training by experimental results but the
theoretical explanation is weak.
 It seems that the notations in the paper do not include
full-information for re-implementation of the proposed
method.
i.e.
25
Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated gradients give a false sense of security:
Circumventing defenses to adversarial examples." arXiv preprint arXiv:1802.00420 (2018).

adversarial robustness through local linearization

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a adversarial robustness through local linearization

Semelhante a adversarial robustness through local linearization (20)

Mais de taeseon ryu

Mais de taeseon ryu (20)

Último

Último (20)

adversarial robustness through local linearization

Notas do Editor