Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks

Ensembles of Many Diverse
Weak Defenses can be Strong
Ying

Meng
Jianhai

Su
Jason

O’Kane
Pooyan

Jamshidi
@pooyanjamshidi

Artiﬁcial Intelligence and Systems Laboratory
(AISys Lab)
Machine
Learning
Computer
Systems
Software
Engineering
ML Systems
https://pooyanjamshidi.github.io/AISys/ 2

Hardware-aware optimization of
deep neural networks
Figure 2: Image classiﬁcation models constructed using the cells optimized with architecture search.
Top-left: small model used during architecture search on CIFAR-10. Top-right: large CIFAR-10
model used for learned cell evaluation. Bottom: ImageNet model used for learned cell evaluation.
For CIFAR-10 experiments we use a model which consists of 3 ⇥ 3 convolution with c0 channels,
followed by 3 groups of learned convolutional cells, each group containing N cells. After each cell
(with c input channels) we insert 3⇥3 separable convolution which has stride 2 and 2c channels if it
is the last cell of the group, and stride 1 and c channels otherwise. The purpose of these convolutions
is to control the number of channels as well as reduce the spatial resolution. The last cell is followed
by global average pooling and a linear softmax layer. 3

So what this talk is
about?
The Security of

Machine Learning
Deep
4

A quick recap about
Supervised Machine Learning
5

Linear Classiﬁers
S
f1
f2
f3
w1
w2
w3
>0?activationw(x) =
∑
i
wi ⋅ fi(x) = w ⋅ f(x)
6CS188 Intro to AI at UC Berkeley, ai.berkeley.edu

How to get probabilistic decisions?
• Activation:

• If z very positive -> want probability going to 1

• If z very negative -> want probability going to 0
z = w ⋅ f(x)
7

Multiclass Logistic Regression
• Multi-class linear classiﬁcation

• A weight vector for each class:

• Score (activation) of a class y:

• Prediction w/highest score wins:

• How to make the scores into probabilities?
z1, z2, z3 !
ez1
ez1 + ez2 + ez3
,
ez2
ez1 + ez2 + ez3
,
ez3
ez1 + ez2 + ez3
original activations softmax activations
wy
wy ⋅ f(x)
y = argmaxywy ⋅ f(x)

Best w?
• Maximum likelihood estimation:

• With:
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
9
P(y(i)
|x(i)
; w) =
e
wy(i) ·f(x(i)
)
P
y ewy·f(x(i))

How do we solve the optimization problem?
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
10
g(w)

Gradient Ascent
w w + ↵ ⇤ rg(w)
rg =
2
6
6
6
4
@g
@w1
@g
@w2
· · ·
@g
@wn
3
7
7
7
5
11

Gradient Ascent
w w + ↵ ⇤ rg(w)
rg =
2
6
6
6
4
@g
@w1
@g
@w2
· · ·
@g
@wn
3
7
7
7
5
12

Multi-class Logistic Regression is a special case
of neural network

Deep Neural Network = Also learn the features!
15

16

17

Training the deep neural network is just like
logistic regression
max
w
ll(w) = max
w
X
i
log P(y(i)
|x(i)
; w)
! just run gradient ascent
+ stop when log likelihood of hold-out data starts to decrease
18

Machine Learning: The Success Story
19

Have we really
achieved human-
level performance?
20

Adversarial Examples
[Engstrom, Tran, Tsipras, Schmidt, Madry 2018]:
Rotation + Translation can fool classifiers
[Athalye, Engstrom, Ilyas, Kwok 2017]:
3D-printed model classified as rifle from most viewpoints
[Goodfellow et al. 2014]: Imperceptible noise
can fool DNN classifiers
21

Adversarial Examples (Security)
[Sharif et al. 2016]: Glasses the fool face classiﬁers [Carlini et al. 2016]: Voice commands that
are imperceptible by humans
22

Adversarial Examples (Security)
[Huang et al. 2017]: Small input changes
can decrease RL performance
[Jia Liang 2017]: Irrelevant sentences confused
reading comprehension systems
23

Where Do Adversarial Examples Come From?
Distribution D
Input
Output
θ
Orange
Chimpanzee
Palm tree
fθ
fθ1
(x, y) = palm tree
fθ2
(x, y) = orange
, Orange(x, y) =
Find θ* such that
𝔼(x,y)∼D(θ*, x, y) Is small
Goal of ML:
24

Where Do Adversarial Examples Come From?
minθℒ(θ, x, y)
maxδℒ(θ, x+δ, y)
||δ||p ≤ ϵ
Can use gradient descent
method to ﬁnd good parameters
Input
Output
θ
25

Athena:
A Framework for Defending
Machine Learning Systems
Against Adversarial Attacks
26

Key idea behind our approach:
Input transformation
(7, 0.9) (9, 0.56) (7, 0.4)
+δ Rotate 180
27

Original BIM_l?FGSM JSMABIM_l2 PGDDF_l2
inputperturbationcompress
(h&v)
denoise
(nl_means))
CW_l2 OnePixel MIM
But we need many
of them
The eﬀectiveness of weak
defenses varies
28

Quality and quantity of weak defenses matter
Number of weak defenses
Testaccuracy
0.00
0.20
0.40
0.60
0.80
1.00
10 20 30 40 50 60 70
29

Diversity of weak defenses matters
30

Athena: Ensemble of Many Diverse Weak
Defenses
Ensemble of n Weak Defenses
ft1
Predict x by WDs
Ensemble
strategy
x y
yt1
T1
Ti
Tn
xt1
xti
xtn
yti
ytn
fti
ftn
7
7
9
7
31

Trainasubstitueclassifier
fsub
fens
Collect trainingdataset
2
1
Craft adversarialexamples
for thesubstituteclassifier
3
Dbb
x
x'
{ x|x in D}
Attack theensemblemodel4
Evaluation
32

Threat model
33
Knows the parameters of
Blackbox
Greybox
Zero-knowledge
Target

Classiﬁer
Weak

Defenses
Ensemble

Strategy
Existence of

Defense
Whitebox

1.0
0.75
0.5
0.25
0.0
UM
RD
MV
AVEP
T2MV
AVEL
Rotate
Translation
Flip
Affine Transform
Morphology
std Norm
Cartoon
Quantization
Distortion
Noise
Filter
Compress
Denoise
Geometrics
Segmentation
Accuracy
BS FGSM BIM_l2 BIM_l? DeepFool CW_l2 JSMA One-Pixel MIM PGD
overshoot learning rate px count
0.1 0.2 0.3 0.75 1.0 1.2 0.075 0.09 0.12 3 8 20 0.01 0.012 0.015 0.15 0.18 0.21 5 15 30 0.05 0.075 0.1 0.075 0.09 0.1
34

White-box adversaries may be able to
successfully attack the defense
Detection + MV ensMV ensDetector
Max Normalized Dissimilarity
TestAccuracy
DetectedRate
0.2 0.4 0.6 0.8 1.0
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
35

And it comes with a high cost
36Dissimilarity
Second

The adversarial examples generated by a white-
box adversary are easily detectable
Gray-box White-box0.20.4
0.60.81.0
0.10.30.5
0.70.9
Gray-box White-box
37
Dissimilarity

So, we can detect them easily
Detection + MV ensMV ensDetector
Max Normalized Dissimilarity
TestAccuracy
DetectedRate
0.2 0.4 0.6 0.8 1.0
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
38

Interested in getting involved?
• Contribute to the project code: https://github.com/softsys4ai/athena

• Checkout the Athena paper: https://arxiv.org/abs/2001.00308
39

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks

Semelhante a Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks (20)

Mais de Pooyan Jamshidi

Mais de Pooyan Jamshidi (20)

Último

Último (11)

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks