Despite achieving state-of-the-art performance across many domains, machine learning systems are highly vulnerable to subtle adversarial perturbations. Although defense approaches have been proposed in recent years, many have been bypassed by even weak adversarial attacks. Previous studies showed that ensembles created by combining multiple weak defenses (i.e., input data transformations) are still weak. In this talk, I will show that it is indeed possible to construct effective ensembles using weak defenses to block adversarial attacks. However, to do so requires a diverse set of such weak defenses. Based on this motivation, I will present Athena, an extensible framework for building effective defenses to adversarial attacks against machine learning systems. I will talk about the effectiveness of ensemble strategies with a diverse set of many weak defenses that comprise transforming the inputs (e.g., rotation, shifting, noising, denoising, and many more) before feeding them to target deep neural network classifiers. I will also discuss the effectiveness of the ensembles with adversarial examples generated by various adversaries in different threat models. In the second half of the talk, I will explain why building defenses based on the idea of many diverse weak defenses works, when it is most effective, and what its inherent limitations and overhead are.
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks
1. Ensembles of Many Diverse
Weak Defenses can be Strong
Ying
Meng
Jianhai
Su
Jason
O’Kane
Pooyan
Jamshidi
@pooyanjamshidi
2. Artificial Intelligence and Systems Laboratory
(AISys Lab)
Machine
Learning
Computer
Systems
Software
Engineering
ML Systems
https://pooyanjamshidi.github.io/AISys/ 2
3. Hardware-aware optimization of
deep neural networks
Figure 2: Image classification models constructed using the cells optimized with architecture search.
Top-left: small model used during architecture search on CIFAR-10. Top-right: large CIFAR-10
model used for learned cell evaluation. Bottom: ImageNet model used for learned cell evaluation.
For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution with c0 channels,
followed by 3 groups of learned convolutional cells, each group containing N cells. After each cell
(with c input channels) we insert 3⇥3 separable convolution which has stride 2 and 2c channels if it
is the last cell of the group, and stride 1 and c channels otherwise. The purpose of these convolutions
is to control the number of channels as well as reduce the spatial resolution. The last cell is followed
by global average pooling and a linear softmax layer. 3
4. So what this talk is
about?
The Security of
Machine Learning
Deep
4
7. How to get probabilistic decisions?
• Activation:
• If z very positive -> want probability going to 1
• If z very negative -> want probability going to 0
z = w ⋅ f(x)
7
8. Multiclass Logistic Regression
• Multi-class linear classification
• A weight vector for each class:
• Score (activation) of a class y:
• Prediction w/highest score wins:
• How to make the scores into probabilities?
z1, z2, z3 !
ez1
ez1 + ez2 + ez3
,
ez2
ez1 + ez2 + ez3
,
ez3
ez1 + ez2 + ez3
original activations softmax activations
wy
wy ⋅ f(x)
y = argmaxywy ⋅ f(x)
8CS188 Intro to AI at UC Berkeley, ai.berkeley.edu
9. Best w?
• Maximum likelihood estimation:
• With:
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
9
P(y(i)
|x(i)
; w) =
e
wy(i) ·f(x(i)
)
P
y ewy·f(x(i))
10. How do we solve the optimization problem?
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
10
g(w)
18. Training the deep neural network is just like
logistic regression
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
! just run gradient ascent
+ stop when log likelihood of hold-out data starts to decrease
18
21. Adversarial Examples
[Engstrom, Tran, Tsipras, Schmidt, Madry 2018]:
Rotation + Translation can fool classifiers
[Athalye, Engstrom, Ilyas, Kwok 2017]:
3D-printed model classified as rifle from most viewpoints
[Goodfellow et al. 2014]: Imperceptible noise
can fool DNN classifiers
21
22. Adversarial Examples (Security)
[Sharif et al. 2016]: Glasses the fool face classifiers [Carlini et al. 2016]: Voice commands that
are imperceptible by humans
22
23. Adversarial Examples (Security)
[Huang et al. 2017]: Small input changes
can decrease RL performance
[Jia Liang 2017]: Irrelevant sentences confused
reading comprehension systems
23
24. Where Do Adversarial Examples Come From?
Distribution D
Input
Output
θ
Orange
Chimpanzee
Palm tree
fθ
fθ1
(x, y) = palm tree
fθ2
(x, y) = orange
, Orange(x, y) =
Find θ* such that
𝔼(x,y)∼D(θ*, x, y) Is small
Goal of ML:
24
25. Where Do Adversarial Examples Come From?
minθℒ(θ, x, y)
maxδℒ(θ, x+δ, y)
||δ||p ≤ ϵ
Can use gradient descent
method to find good parameters
Input
Output
θ
25
28. Original BIM_l?FGSM JSMABIM_l2 PGDDF_l2
inputperturbationcompress
(h&v)
denoise
(nl_means))
CW_l2 OnePixel MIM
But we need many
of them
The effectiveness of weak
defenses varies
28
29. Quality and quantity of weak defenses matter
Number of weak defenses
Testaccuracy
0.00
0.20
0.40
0.60
0.80
1.00
10 20 30 40 50 60 70
29
31. Athena: Ensemble of Many Diverse Weak
Defenses
Ensemble of n Weak Defenses
ft1
Predict x by WDs
Ensemble
strategy
x y
yt1
T1
Ti
Tn
xt1
xti
xtn
yti
ytn
fti
ftn
7
7
9
7
31
33. Threat model
33
Knows the parameters of
Blackbox
Greybox
Zero-knowledge
Target
Classifier
Weak
Defenses
Ensemble
Strategy
Existence of
Defense
Whitebox
35. White-box adversaries may be able to
successfully attack the defense
Detection + MV ensMV ensDetector
Max Normalized Dissimilarity
TestAccuracy
DetectedRate
0.2 0.4 0.6 0.8 1.0
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
35
36. And it comes with a high cost
36Dissimilarity
Second
37. The adversarial examples generated by a white-
box adversary are easily detectable
Gray-box White-box0.20.4
0.60.81.0
0.10.30.5
0.70.9
Gray-box White-box
37
Dissimilarity
38. So, we can detect them easily
Detection + MV ensMV ensDetector
Max Normalized Dissimilarity
TestAccuracy
DetectedRate
0.2 0.4 0.6 0.8 1.0
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
38
39. Interested in getting involved?
• Contribute to the project code: https://github.com/softsys4ai/athena
• Checkout the Athena paper: https://arxiv.org/abs/2001.00308
39