1. Introduction to Adversarial Machine Learning
Fabio Roli
https://www.pluribus-one.it/research/sec-ml/wild-patterns/
Pattern Recognition
and Applications Lab
University of
Cagliari, Italy
ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
2. http://pralab.diee.unica.it
We Are Living in the Best of the Worlds…
2
AI is going to transform industry and business
as electricity did about a century ago
(Andrew Ng, Jan. 2017)
9. http://pralab.diee.unica.it
Deep Neural Networks for EXE Malware Detection
• MalConv: convolutional deep network trained on raw bytes to detect EXE malware
• Gradient-based attacks can evade it by adding few padding bytes
9[Kolosniaji, Biggio, Roli et al., Adversarial Malware Binaries, EUSIPCO2018]
10. http://pralab.diee.unica.it
Take-home Message
We are living exciting time for machine learning…
…Our work feeds a lot of consumer technologies for personal
applications...
This opens up new big possibilities, but also new security risks
10
11. http://pralab.diee.unica.it
The Classical Statistical Model
Note these two implicit assumptions of the model:
1. the source of data is given, and it does not dependent on the classifier
2. Noise affecting data is stochastic
11
Data
source
acquisition/
measurementRaw data
x1
x2
...
xd
feature
vector
learning
algorithm
classifier
stochastic
noise
ed by sets of coupled s
for formal neurons ation
of essentials feat
Example: OCR
12. http://pralab.diee.unica.it
Can This Model Be Used Under Attack?
12
Data
source
acquisition/
measurementRaw data
x1
x2
...
xd
feature
vector
learning
algorithm
classifier
stochastic
noise
ed by sets of coupled s
for formal neurons ation
of essentials feat
Example: OCR
13. http://pralab.diee.unica.it
An Example: Spam Filtering
13
Total score = 6.0
From: spam@example.it
Buy Viagra !
> 5.0 (threshold) Spam
Linear Classifier
ØThe famous SpamAssassin filter is really a linear classifier
§http://spamassassin.apache.org
Feature weights
buy = 1.0
viagra = 5.0
15. 15
But spam filtering is not a stationary classification
task, the data source is not neutral…
16. http://pralab.diee.unica.it
The Data Source Can Add “Good” Words
16
Total score = 1.0
From: spam@example.it
Buy Viagra !
conference meeting
< 5.0 (threshold)
Linear Classifier
Feature weights
buy = 1.0
viagra = 5.0
conference = -2.0
meeting = -3.0
Ham
üAdding good words is a typical spammers trick [Z. Jorgensen et al., JMLR 2008]
17. http://pralab.diee.unica.it
Adding Good Words: Feature Space View
17
X2
X1
+
+
+
+
+
-
-
-
-
-
yc(x)
Feature weights
buy = 1.0
viagra = 5.0
conference = -2.0
meeting = -3.0
From: spam@example.it
Buy Viagra!
conference meeting
-
üNote that spammers corrupt patterns with a noise that is not random..
18. http://pralab.diee.unica.it
Is This Model Good for Spam Filtering?
Ø the source of data is given, and it does not dependent on the classifier
Ø Noise affecting data is stochastic (“random”)
18
Data
source
acquisition/
measurementRaw data
x1
x2
...
xd
feature
vector
learning
algorithm
classifier
stochastic
noise
ed by sets of coupled s
for formal neurons ation
of essentials feat
Example: OCR
20. http://pralab.diee.unica.it
Adversarial Machine Learning
1. the source of data is not neutral, it really depends on the classifier
2. noise is not stochastic, it is adversarial, it is just crafted to maximize the classification error
20
measurementRaw data
x1
x2
...
xn
feature
vector
learning
algorithm
classifier
adversarial
noise
Spam message:
Buy Viagra
Camouflaged message:
Buy Vi@gra
Dublin University
Non-neutral
data source
21. http://pralab.diee.unica.it
Adversarial Noise vs. Stochastic Noise
21
Hamming’s adversarial
noise model: the channel
acts as an adversary that
arbitrarily corrupts the
code-word subject to a
bound on the total number
of errors
Shannon’s stochastic noise
model: probabilistic model
of the channel, the
probability of occurrence of
too many or too few errors
is usually low
• This distinction is not new...
22. http://pralab.diee.unica.it
The Classical Model Cannot Work
• Standard classification algorithms assume that data generating process is independent from
the classifier
– This is not the case for adversarial tasks
• Easy to see that classifier performance will degrade quickly if the adversarial noise is not
taken into account
• Adversarial tasks are a mission impossible for the classical model
22
24. http://pralab.diee.unica.it
Adversary-aware Machine Learning
24
Adversary Classifier Designer
1. Analyze
classifier
2. Devise and
execute attack
3. Analyze attack
4. Design defences
Machine learning systems should be aware of the arms race
with the adversary
[Biggio, Fumera, Roli. Security evaluation of pattern classifiers under attack, IEEE TKDE, 2014]
25. 25
How Can We Design Adversary-aware Machine
Learning Systems?
27. 27
Know your adversary
If you know the enemy and know yourself, you need not
fear the result of a hundred battles
(Sun Tzu, The art of war, 500 BC)
31. http://pralab.diee.unica.it
Adversary’s Knowledge
[B. Biggio, G. Fumera, F. Roli, IEEE Trans. on KDE 2014]
• Perfect-knowledge (white-box) attacks
– upper bound on the performance degradation under attack
31
TRAINING DATA
FEATURE
REPRESENTATION
LEARNING
ALGORITHM
e.g., SVM
x1
x2
...
xd
x xx
x x
x
x
x
x
x
x
x
x xxx
x
- Learning algorithm
- Parameters (e.g., feature weights)
- Feedback on decisions
32. http://pralab.diee.unica.it
Adversary’s Knowledge
[B. Biggio, G. Fumera, F. Roli, IEEE Trans. on KDE 2014]
• Limited-knowledge Attacks
– Ranging from gray-box to black-box attacks
32
TRAINING DATA
FEATURE
REPRESENTATION
LEARNING
ALGORITHM
e.g., SVM
x1
x2
...
xd
x xx
x x
x
x
x
x
x
x
x
x xxx
x
- Learning algorithm
- Parameters (e.g., feature weights)
- Feedback on decisions
33. http://pralab.diee.unica.it
Kerckhoffs’ Principle
• Kerckhoffs’ Principle (Kerckhoffs 1883) states that the security of a system should not rely on
unrealistic expectations of secrecy
– It’s the opposite of the principle of “security by obscurity”
• Secure systems should make minimal assumptions about what can realistically be kept secret
from a potential attacker
• For machine learning systems, one could assume that the adversary is aware of the learning
algorithm and can obtain some degree of information about the data used to train the learner
• But the best strategy is to assess system security under different levels of adversary’s
knowledge
33[A. D. Joseph et al., Adversarial Machine Learning, Cambridge Univ. Press, 2017]
34. http://pralab.diee.unica.it
Adversary’s Capability
34[B. Biggio, G. Fumera, F. Roli, IEEE TKDE 2014; M. Barreno et al., ML 2010]
Training
data
Learning-based
classifier
Attack at training time (a.k.a. poisoning)
http://www.vulnerablehotel.com/login/
malicious
36. http://pralab.diee.unica.it
A Deliberate Poisoning Attack?
36
[http://exploringpossibilityspace.blogspot.it/2016/03/po
or-software-qa-is-root-cause-of-tay.html]
Microsoft deployed Tay,
and AI chatbot designed
to talk to youngsters on
Twitter, but after 16 hours
the chatbot was shut
down since it started to
raise racist and offensive
comments.
38. http://pralab.diee.unica.it
Adversary’s Capability
• Luckily, the adversary is not omnipotent, she is constrained…
38[B. Biggio et al., IET Biometrics, 2012; R. Lippmann, Dagstuhl Workshop , Sept. 2012]
Email messages must be understandable by human readers
Data packets must execute on a computer, usually exploit a
known vulnerability, and violate a sometimes explicit
security policy
Spoofing attacks are not perfect replicas of the live
biometric traits
39. http://pralab.diee.unica.it
Adversary’s Capability
[B. Biggio, G. Fumera, F. Roli, IEEE Trans. KDE 2014]
• Constraints on data manipulation
– maximum number of samples that can be added to the training data
• the attacker usually controls only a small fraction of the training samples
– maximum amount of modifications
• application-specific constraints in feature space
• e.g., max. number of words that are modified in spam emails
39
d(x, !x ) ≤ dmax
x2
x1
f(x)
x
Feasible domain
x '
TRAINING
TEST
40. http://pralab.diee.unica.it
Conservative Design
• The design and analysis of a system should avoid unnecessary or unreasonable assumptions
about and limitations on the adversary
• This allows one to assess “worst-case” scenarios
• Conversely, however, analysing the capabilities of an omnipotent adversary reveals little
about a learning system’s behaviour against realistic constrained attackers
• Again, the best strategy is to assess system security under different levels of adversary’s
capability
40[A. D. Joseph et al., Adversarial Machine Learning, Cambridge Univ. Press, 2017]
42. http://pralab.diee.unica.it
The Discovery of Adversarial Examples
42
... we find that deep neural networks learn input-output mappings that
are fairly discontinuous to a significant extent. We can cause the
network to misclassify an image by applying a certain hardly
perceptible perturbation, which is found by maximizing the network’s
prediction error ...
[Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
43. http://pralab.diee.unica.it
Adversarial Examples and Deep Learning
• C. Szegedy et al. (ICLR 2014) independently developed a gradient-based attack against deep
neural networks
– minimally-perturbed adversarial examples
43[Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
44. http://pralab.diee.unica.it
!(#) ≠ &
School Bus (x) Ostrich
Struthio Camelus
Adversarial Noise (r)
The adversarial image x + r is visually
hard to distinguish from x
Informally speaking, the solution x + r is
the closest image to x classified as l by f
The solution is approximated using using a box-constrained limited-memory BFGS
Creation of Adversarial Examples
44[Szegedy, Goodfellow et al., Intriguing Properties of NNs, ICLR 2014, ArXiv 2013]
45. Many Adversarial Examples After 2014…
[Search https://arxiv.org with keywords “adversarial examples”]
45
Several defenses have been proposed against adversarial
examples, and more powerful attacks have been developed to
show that they are ineffective.
Most of these attacks are modifications to the optimization
problems reported for evasion attacks / adversarial examples,
using different gradient-based solution algorithms,
initializations and stopping conditions.
Most popular attack algorithms: FGSM (Goodfellow et al.),
JSMA (Papernot et al.), CW (Carlini & Wagner, and follow-up
versions)
47. http://pralab.diee.unica.it
Why Adversarial Perturbations against Deep Networks are
Imperceptible?
• Large sensitivity of g(x) to input changes
– i.e., the input gradient !"#(") has a large norm (scales with input dimensions!)
– Thus, even small modifications along that direction will cause large changes in the predictions
47[Simon-Gabriel et al., Adversarial vulnerability of NNs increases with input dimension, ArXiv 2018]
&
g(&)
&’
!"#(")
49. http://pralab.diee.unica.it
• Adversarial images fool deep networks even when they operate in the physical world, for
example, images are taken from a cell-phone camera?
– Alexey Kurakin et al. (2016, 2017) explored the possibility of creating adversarial images for
machine learning systems which operate in the physical world. They used images taken from a
cell-phone camera as an input to an Inception v3 image classification neural network.
– They showed that in such a set-up, a significant fraction of adversarial images crafted using the
original network are misclassified even when fed to the classifier through the camera.
[Alexey Kurakin et al., ICLR 2017]
Adversarial Images in the Physical World
49
53. http://pralab.diee.unica.it
No, we shouldn’t…
53
In this paper, we show experiments that suggest that a trained neural
network classifies most of the pictures taken from different distances and
angles of a perturbed image correctly. We believe this is because the
adversarial property of the perturbation is sensitive to the scale at which
the perturbed picture is viewed, so (for example) an autonomous car will
misclassify a stop sign only from a small range of distances.
[arXiv:1707.03501; CVPR 2017]
57. http://pralab.diee.unica.it
• Adversarial examples can exist in the physical world, we can fabricate concrete
adversarial objects (glasses, road signs, etc.)
• But the effectiveness of attacks carried out by adversarial objects is still to be
investigated with large scale experiments in realistic security scenarios
• Gilmer et al. (2018) have recently discussed the realism of security threat caused
by adversarial examples, pointing out that it should be investigated more
carefully
– Are indistinguishable adversarial examples a real security threat ?
– For which real security scenarios adversarial examples are the best attack vector?
Better than attacking components outside the machine learning component
– …
57
Is a Real Security Threat?
[Justin Gilmer et al., Motivating the Rules of the Game for
Adversarial Example Research, https://arxiv.org/abs/1807.06732]
58. http://pralab.diee.unica.it
Attacks with content preservation
58
There are well known security applications where minimal perturbations and
indistinguishability of adversarial inputs are not required at all…
59. http://pralab.diee.unica.it
To Conclude…
This is a recent research field…
59
Dagstuhl Perspectives Workshop on
“Machine Learning in Computer Security”
Schloss Dagstuhl, Germany, Sept. 9th-14th, 2012
60. http://pralab.diee.unica.it
Timeline of Learning Security
60
Security of DNNs
Adversarial M
L
2004-2005: pioneering work
Dalvi et al., KDD
200; Lowd &
M
eek, KDD
2005
2013: Srndic &
Laskov, NDSS
claim
nonlinear classifiers are secure
2006-2010: Barreno, Nelson,
Rubinstein, Joseph, Tygar
The Security of M
achine Learning
2013-2014: Biggio et al., ECM
L, IEEE TKDE
high-confidence &
black-box evasion attacks
to show
vulnerability of nonlinear classifiers
2014: Srndic &
Laskov, IEEE S&P
shows vulnerability of nonlinear classifiers
with our ECM
L ‘13 gradient-based attack
2014: Szegedy et al., ICLR
adversarial exam
ples vs DL
2016: Papernot et al., IEEE S&P
evasion attacks / adversarial exam
ples
2017: Papernot et al., ASIACCS
black-box evasion attacks
2017: Carlini &
W
agner, IEEE S&P
high-confidence evasion attacks
2017: Grosse et al., ESORICS
Application to Android m
alware
2017: Dem
ontis et al., IEEE TDSC
Secure learning for Android m
alware
2004
2014
2006 2013 2014 2017
2016 2017 2017 2017
2014
61. http://pralab.diee.unica.it
Timeline of Learning Security
61
Security of DNNs
Adversarial M
L
2004-2005: pioneering work
Dalvi et al., KDD
200; Lowd &
M
eek, KDD
2005
2013: Srndic &
Laskov, NDSS
claim
nonlinear classifiers are secure
2006-2010: Barreno, Nelson,
Rubinstein, Joseph, Tygar
The Security of M
achine Learning
2013-2014: Biggio et al., ECM
L, IEEE TKDE
high-confidence &
black-box evasion attacks
to show
vulnerability of nonlinear classifiers
2014: Srndic &
Laskov, IEEE S&P
shows vulnerability of nonlinear classifiers
with our ECM
L ‘13 gradient-based attack
2014: Szegedy et al., ICLR
adversarial exam
ples vs DL
2016: Papernot et al., IEEE S&P
evasion attacks / adversarial exam
ples
2017: Papernot et al., ASIACCS
black-box evasion attacks
2017: Carlini &
W
agner, IEEE S&P
high-confidence evasion attacks
2017: Grosse et al., ESORICS
Application to Android m
alware
2004
2014
2006 2013 2014 2017
2016 2017 2017 2017
2014
2017: Dem
ontis et al., IEEE TDSC
Secure learning for Android m
alware
62. http://pralab.diee.unica.it
Timeline of Learning Security
AdversarialM
L
2004-2005: pioneering work
Dalvi et al., KDD 2004
Lowd & Meek, KDD 2005
2013: Srndic & Laskov, NDSS
2013: Biggio et al., ECML-PKDD - demonstrated vulnerability of nonlinear algorithms
to gradient-based evasion attacks, also under limited knowledge
Main contributions:
1. gradient-based adversarial perturbations (against SVMs and neural nets)
2. projected gradient descent / iterative attack (also on discrete features from malware data)
transfer attack with surrogate/substitute model
3. maximum-confidence evasion (rather than minimum-distance evasion)
Main contributions:
- minimum-distance evasion of linear classifiers
- notion of adversary-aware classifiers
2006-2010: Barreno, Nelson,
Rubinstein, Joseph, Tygar
The Security of Machine Learning
(and references therein)
Main contributions:
- first consolidated view of the adversarial ML problem
- attack taxonomy
- exemplary attacks against some learning algorithms
2014: Szegedy et al., ICLR
Independent discovery of (gradient-
based) minimum-distance adversarial
examples against deep nets; earlier
implementation of adversarial training
SecurityofDNNs
2016: Papernot et al., IEEE S&P
Framework for security evalution of
deep nets
2017: Papernot et al., ASIACCS
Black-box evasion attacks with
substitute models (breaks distillation
with transfer attacks on a smoother
surrogate classifier)
2017: Carlini & Wagner, IEEE S&P
Breaks again distillation with
maximum-confidence evasion attacks
(rather than using minimum-distance
adversarial examples)
2016: Papernot et al., Euro S&P
Distillation defense (gradient masking)
Main contributions:
- evasion of linear PDF malware detectors
- claims nonlinear classifiers can be more secure
2014: Biggio et al., IEEE TKDE Main contributions:
- framework for security evaluation of learning algorithms
- attacker’s model in terms of goal, knowledge, capability
2017: Demontis et al., IEEE TDSC
Yes, Machine Learning Can Be
More Secure! A Case Study on
Android Malware Detection
Main contributions:
- Secure SVM against adversarial examples in malware
detection
2017: Grosse et al., ESORICS
Adversarial examples for
malware detection
2018: Madry et al., ICLR
Improves the basic iterative attack from
Kurakin et al. by adding noise before
running the attack; first successful use of
adversarial training to generalize across
many attack algorithms
2014: Srndic & Laskov, IEEE S&P
used Biggio et al.’s ECML-PKDD ‘13 gradient-based evasion attack to demonstrate
vulnerability of nonlinear PDF malware detectors
2006: Globerson & Roweis, ICML
2009: Kolcz et al., CEAS
2010: Biggio et al., IJMLC
Main contributions:
- evasion attacks against linear classifiers in spam filtering
Work on security evaluation of learning algorithms
Work on evasion attacks (a.k.a. adversarial examples)
Pioneering work on adversarial machine learning
... in malware detection (PDF / Android)
Legend
1
2
3
4
1
2
3
4
2015: Goodfellow et al., ICLR
Maximin formulation of adversarial
training, with adversarial examples
generated iteratively in the inner loop
2016: Kurakin et al.
Basic iterative attack with projected
gradient to generate adversarial examples
2 iterative attacks
Biggio and Roli, Wild Patterns: Ten Years
After The Rise of Adversarial Machine
Learning, Pattern Recognition, 2018
62
63. http://pralab.diee.unica.it
Learning Comes at a Price!
63
The introduction of novel
learning functionalities increases the
attack surface of computer systems
and produces new vulnerabilities
Safety of machine learning
will be more and more important in
future computer systems, as well as
accountability, transparency, and the
protection of fundamental human
values and rights
64. Vulnerability Assessment of Machine Learning
Battista Biggio
https://www.pluribus-one.it/research/sec-ml/wild-patterns/
Pattern Recognition
and Applications Lab
University of
Cagliari, Italy
ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
65. 65
Be Proactive
To know your enemy, you must become your enemy
(Sun Tzu, The art of war, 500 BC)
66. http://pralab.diee.unica.it
Be Proactive
• Given a model of the adversary characterized by her:
– Goal
– Knowledge
– Capability
Try to anticipate the adversary!
• What is the optimal attack she can do?
• What is the expected performance decrease of your classifier?
66
67. http://pralab.diee.unica.it
Evasion of Linear Classifiers
• Problem: how to evade a linear (trained) classifier?
Start 2007 with
a bang!
Make WBFS YOUR
PORTFOLIO’s
first winner of
the year
...
start
bang
portfolio
winner
year
...
university
campus
1
1
1
1
1
...
0
0
+6 > 0, SPAM
(correctly classified)
f (x) = sign(wT
x)
x
start
bang
portfolio
winner
year
...
university
campus
+2
+1
+1
+1
+1
...
-3
-4
w
x’
St4rt 2007 with
a b4ng!
Make WBFS YOUR
PORTFOLIO’s
first winner of
the year
... campus
start
bang
portfolio
winner
year
...
university
campus
0
0
1
1
1
...
0
1
+3 -4 < 0, HAM
(misclassified email)
f (x) = sign(wT
x)
67
68. http://pralab.diee.unica.it
Evasion of Nonlinear Classifiers
• What if the classifier is nonlinear?
• Decision functions can be arbitrarily complicated, with no clear relationship between
features (x) and classifier parameters (w) −2−1.5−1−0.500.511.5
68
69. http://pralab.diee.unica.it
Detection of Malicious PDF Files
Srndic & Laskov, Detection of malicious PDF files based on hierarchical document structure, NDSS 2013
“The most aggressive evasion strategy we could conceive was successful for only 0.025% of malicious
examples tested against a nonlinear SVM classifier with the RBF kernel [...].
Currently, we do not have a rigorous mathematical explanation for such a surprising robustness. Our
intuition suggests that [...] the space of true features is “hidden behind” a complex nonlinear
transformation which is mathematically hard to invert.
[...] the same attack staged against the linear classifier [...] had a 50% success rate; hence, the robustness
of the RBF classifier must be rooted in its nonlinear transformation”
69
70. http://pralab.diee.unica.it
Evasion Attacks against Machine Learning at Test Time
Biggio, Corona, Maiorca, Nelson, Srndic, Laskov, Giacinto, Roli, ECML-PKDD 2013
• Goal: maximum-confidence evasion
• Knowledge: perfect (white-box attack)
• Attack strategy:
• Non-linear, constrained optimization
– Projected gradient descent: approximate
solution for smooth functions
• Gradients of g(x) can be analytically computed in
many cases
– SVMs, Neural networks
−2−1.5−1−0.500.51
x
f (x) = sign g(x)( )=
+1, malicious
−1, legitimate
"
#
$
%$
x '
min
$%
&((%
)
s. t. ( − (%
. ≤ 0123
70
71. http://pralab.diee.unica.it
Computing Descent Directions
Support vector machines
Neural networks
x1
xd
d1
dk
dm
xf g(x)
w1
wk
wm
v11
vmd
vk1
……
……
g(x) = αi yik(x,
i
∑ xi )+ b, ∇g(x) = αi yi∇k(x, xi )
i
∑
g(x) = 1+exp − wkδk (x)
k=1
m
∑
#
$
%
&
'
(
)
*
+
,
-
.
−1
∂g(x)
∂xf
= g(x) 1− g(x)( ) wkδk (x) 1−δk (x)( )vkf
k=1
m
∑
RBF kernel gradient: ∇k(x,xi
) = −2γ exp −γ || x − xi
||2
{ }(x − xi
)
[Biggio, Roli et al., ECML PKDD 2013] 71
72. http://pralab.diee.unica.it
An Example on Handwritten Digits
• Nonlinear SVM (RBF kernel) to discriminate between ‘3’ and ‘7’
• Features: gray-level pixel values (28 x 28 image = 784 features)
Few modifications are
enough to evade detection!
1st adversarial examples
generated with gradient-based
attacks date back to 2013!
(one year before attacks to deep
neural networks)
Before attack (3 vs 7)
5 10 15 20 25
5
10
15
20
25
After attack, g(x)=0
5 10 15 20 25
5
10
15
20
25
After attack, last iter.
5 10 15 20 25
5
10
15
20
25
0 500
−2
−1
0
1
2
g(x)
number of iterations
After attack
(misclassified as 7)
72[Biggio, Roli et al., ECML PKDD 2013]
73. http://pralab.diee.unica.it
Bounding the Adversary’s Knowledge
Limited-knowledge (gray/black-box) attacks
• Only feature representation and (possibly) learning algorithm are known
• Surrogate data sampled from the same distribution as the classifier’s training data
• Classifier’s feedback to label surrogate data
PD(X,Y)data
Surrogate
training data
Send queries
Get labels
f(x)
Learn
surrogate
classifier
f’(x)
This is the same underlying idea
behind substitute models and black-
box attacks (transferability)
[N. Papernot et al., IEEE Euro S&P ’16;
N. Papernot et al., ASIACCS’17]
73[Biggio, Roli et al., ECML PKDD 2013]
74. http://pralab.diee.unica.it
Recent Results on Android Malware Detection
• Drebin: Arp et al., NDSS 2014
– Android malware detection directly on the mobile phone
– Linear SVM trained on features extracted from static
code analysis
[Demontis, Biggio et al., IEEE TDSC 2017]
x2
Classifier
0
1
...
1
0
Android app (apk)
malware
benign
x1
x
f (x)
74
75. http://pralab.diee.unica.it
Recent Results on Android Malware Detection
• Dataset (Drebin): 5,600 malware and 121,000 benign apps (TR: 30K, TS: 60K)
• Detection rate at FP=1% vs max. number of manipulated features (averaged on 10 runs)
– Perfect knowledge (PK) white-box attack; Limited knowledge (LK) black-box attack
75[Demontis, Biggio et al., IEEE TDSC 2017]
76. http://pralab.diee.unica.it
Take-home Messages
• Linear and non-linear supervised classifiers can be highly vulnerable to well-crafted evasion
attacks
• Performance evaluation should be always performed as a function of the adversary’s
knowledge and capability
– Security Evaluation Curves
76
min
x'
g(x')
s.t. d(x, x') ≤ dmax
x ≤ x'
77. http://pralab.diee.unica.it
Why Is Machine Learning So Vulnerable?
• Learning algorithms tend to overemphasize some
features to discriminate among classes
• Large sensitivity to changes of such input
features: !"#(")
• Different classifiers tend to find the same set of
relevant features
– that is why attacks can transfer across models!
77[Melis et al., Explaining Black-box Android Malware Detection, EUSIPCO 2018]
&
g(&)
&’
positivenegative
79. http://pralab.diee.unica.it
Is Deep Learning Safe for Robot Vision?
• Evasion attacks against the iCub humanoid robot
– Deep Neural Network used for visual object recognition
79[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
81. http://pralab.diee.unica.it
From Binary to Multiclass Evasion
• In multiclass problems, classification errors occur in different classes.
• Thus, the attacker may aim:
1. to have a sample misclassified as any class different from the true class (error-generic attacks)
2. to have a sample misclassified as a specific class (error-specific attacks)
81
cup
sponge
dish
detergent
Error-generic attacks
any class
cup
sponge
dish
detergent
Error-specific attacks
[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
82. http://pralab.diee.unica.it
Error-generic Evasion
• Error-generic evasion
– k is the true class (blue)
– l is the competing (closest) class in feature space (red)
• The attack minimizes the objective to have the sample misclassified as the closest class (could
be any!)
82
1 0 1
1
0
1
Indiscriminate evasion
[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
83. http://pralab.diee.unica.it
• Error-specific evasion
– k is the target class (green)
– l is the competing class (initially, the blue class)
• The attack maximizes the objective to have the sample misclassified as the target class
Error-specific Evasion
83
max
1 0 1
1
0
1
Targeted evasion
[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
84. http://pralab.diee.unica.it
Adversarial Examples against iCub – Gradient Computation
∇fi
(x) =
∂fi(z)
∂z
∂z
∂x
f1
f2
fi
fc
...
...
84
The given optimization
problems can be both solved
with gradient-based algorithms
The gradient of the objective
can be computed using the
chain rule
1. the gradient of the functions
fi(z) can be computed if the
chosen classifier is
differentiable
2. ... and then backpropagated
through the deep network with
automatic differentiation
85. http://pralab.diee.unica.it
Example of Adversarial Images against iCub
An adversarial example from class laundry-detergent, modified by
the proposed algorithm to be misclassified as cup
85[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
86. http://pralab.diee.unica.it
The “Sticker” Attack against iCub
Adversarial example generated
by manipulating only a
specific region, to simulate a
sticker that could be applied
to the real-world object.
This image is classified as cup.
86[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]
90. http://pralab.diee.unica.it
• Robust optimization (a.k.a. adversarial training)
• Robustness and regularization (Xu et al., JMLR 2009)
– under linearity of ℓ and "#, equivalent to robust optimization
Reducing Input Sensitivity via Robust Optimization
min
#
max
||*+||,-.
∑0 ℓ 10, "# 30 + *0
bounded perturbation!
90
min
#
∑5 ℓ 10, "# 30 + 6||73"||8
dual norm of the perturbation
||73"||8 = ||#||8
91. http://pralab.diee.unica.it
Experiments on Android Malware
• Infinity-norm regularization is the optimal regularizer against sparse evasion attacks
– Sparse evasion attacks penalize | " |# promoting the manipulation of only few features
Results on Adversarial Android Malware
[Demontis, Biggio et al., Yes, ML Can Be More Secure!..., IEEE TDSC 2017]
Absolute weight values |$| in descending order
Why? It bounds the maximum weight absolute values!
min
w,b
w ∞
+C max 0,1− yi f (xi )( )
i
∑ , w ∞
= max
i=1,...,d
wiSec-SVM
91
92. http://pralab.diee.unica.it
Adversarial Training and Regularization
• Adversarial training can also be seen as a form of regularization, which penalizes the (dual)
norm of the input gradients ! |#$ℓ |&
• Known as double backprop or gradient/Jacobian regularization
– see, e.g., Simon-Gabriel et al., Adversarial vulnerability of neural networks increases with input
dimension, ArXiv 2018; and Lyu et al., A unified gradient regularization family for adversarial
examples, ICDM 2015.
92
'
g(')
'’
with adversarial trainingTake-home message: the net
effect of these techniques is
to make the prediction function
of the classifier smoother
93. http://pralab.diee.unica.it
Ineffective Defenses: Obfuscated Gradients
• Work by Carlini & Wagner (SP’ 17) and Athalye et al. (ICML ‘18) has shown that
– some recently-proposed defenses rely on obfuscated / masked gradients, and
– they can be circumvented
93
g(")
"’"
Obfuscated
gradients do not
allow the
correct
execution of
gradient-based
attacks...
"
g(")
"’
... but substitute
models and/or
smoothing can
correctly reveal
meaningful
input gradients!
95. http://pralab.diee.unica.it
Detecting & Rejecting Adversarial Examples
• Adversarial examples tend to occur in blind spots
– Regions far from training data that are anyway assigned to ‘legitimate’ classes
95
blind-spot evasion
(not even required to
mimic the target class)
rejection of adversarial examples through
enclosing of legitimate classes
99. http://pralab.diee.unica.it
Poisoning Machine Learning
99
x xx
x x
x
x
x
x
x
x
x
x xxx
x
x1
x2
...
xd
pre-processing and
feature extraction
training data
(with labels)
classifier learning
start
bang
portfolio
winner
year
...
university
campus
Start 2007
with a bang!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
...
start
bang
portfolio
winner
year
...
university
campus
1
1
1
1
1
...
0
0
xSPAM start
bang
portfolio
winner
year
...
university
campus
+2
+1
+1
+1
+1
...
-3
-4
w
x
x
x
x
xx
x
x
x
x
x
x
x x
xx
x
classifier generalizes well
on test data
100. http://pralab.diee.unica.it
Poisoning Machine Learning
100
x xx
x x
x
x
x
x
x
x
x
x xxx
x
x1
x2
...
xd
pre-processing and
feature extraction
corrupted
training data
classifier learning
is compromised...
Start 2007
with a bang!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
...
university
campus...
start
bang
portfolio
winner
year
...
university
campus
1
1
1
1
1
...
1
1
xSPAM
start
bang
portfolio
winner
year
...
university
campus
+2
+1
+1
+1
+1
...
+1
+1
w
x
x
x
x
xx
x
x
x
x
x
x
x x
xx
x
... to maximize error
on test data
xx x
poisoning
data
101. http://pralab.diee.unica.it
• Goal: to maximize classification error
• Knowledge: perfect / white-box attack
• Capability: injecting poisoning samples into TR
• Strategy: find an optimal attack point xc in TR that maximizes classification error
xc
classification error = 0.039classification error = 0.022
Poisoning Attacks against Machine Learning
xc
classification error as a function of xc
[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] 101
102. http://pralab.diee.unica.it
Poisoning is a Bilevel Optimization Problem
• Attacker’s objective
– to maximize generalization error on untainted data, w.r.t. poisoning point xc
• Poisoning problem against (linear) SVMs:
Loss estimated on validation data
(no attack points!)
Algorithm is trained on surrogate data
(including the attack point)
[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
[Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015]
[Munoz-Gonzalez, Biggio, Roli et al., Towards poisoning of deep learning..., AISec 2017]
max
$%
& '()*, ,∗
s. t. ,∗
= argmin6 ℒ '89 ∪ ;<, =< , ,
max
$%
>
?@A
B
max(0,1 − =?,∗ ;? )
s. t. ,∗
= argminH,I
A
J
HK
H + C ∑O@A
P
max(0,1 − =O, ;O ) + C max(0,1 − =<, ;< )
102
103. http://pralab.diee.unica.it
xc
(0) xc
Gradient-based Poisoning Attacks
• Gradient is not easy to compute
– The training point affects the classification function
• Trick:
– Replace the inner learning problem with its equilibrium (KKT)
conditions
– This enables computing gradient in closed form
• Example for (kernelized) SVM
– similar derivation for Ridge, LASSO, Logistic Regression, etc.
103
xc
(0)
xc
[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
[Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015]
104. http://pralab.diee.unica.it
Experiments on MNIST digits
Single-point attack
• Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000
– ‘0’ is the malicious (attacking) class
– ‘4’ is the legitimate (attacked) one
xc
(0)
xc
104[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
105. http://pralab.diee.unica.it
Experiments on MNIST digits
Multiple-point attack
• Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000
– ‘0’ is the malicious (attacking) class
– ‘4’ is the legitimate (attacked) one
105[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]
106. http://pralab.diee.unica.it
How about Poisoning Deep Nets?
• ICML 2017 Best Paper by Koh et al., “Understanding black-box predictions via Influence
Functions” has derived adversarial training examples against a DNN
– they have been constructed attacking only the last layer (KKT-based attack against logistic
regression) and assuming the rest of the network to be ”frozen”
106
107. http://pralab.diee.unica.it
Towards Poisoning Deep Neural Networks
• Solving the poisoning problem without exploiting KKT conditions (back-gradient)
– Muñoz-González, Biggio, Roli et al., AISec 2017 https://arxiv.org/abs/1708.08689
107
109. http://pralab.diee.unica.it
Security Measures against Poisoning
• Rationale: poisoning injects outlying training samples
• Two main strategies for countering this threat
1. Data sanitization: remove poisoning samples from training data
• Bagging for fighting poisoning attacks
• Reject-On-Negative-Impact (RONI) defense
2. Robust Learning: learning algorithms that are robust in the presence of poisoning samples
xc
(0)
xc
xc
(0) xc
109
110. http://pralab.diee.unica.it
Robust Regression with TRIM
• TRIM learns the model by retaining only training points with the smallest residuals
argmin
',),*
+ ,, -, . =
1
|.|
2
3∈*
5 63 − 83
9 + ;Ω(>)
@ = 1 + A B, . ⊂ 1, … , @ , . = B
[Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018] 110
111. http://pralab.diee.unica.it
Experiments with TRIM (Loan Dataset)
• TRIM MSE is within 1% of original model MSE
Existing methods
Our defense
No defense
Better defense
111[Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018]
113. http://pralab.diee.unica.it
Attacks against Machine Learning
Integrity Availability Privacy / Confidentiality
Test data Evasion (a.k.a. adversarial
examples)
- Model extraction / stealing
Model inversion (hill-climbing)
Membership inference attacks
Training data Poisoning (to allow subsequent
intrusions) – e.g., backdoors or
neural network trojans
Poisoning (to maximize
classification error)
-
[Biggio & Roli, Wild Patterns, 2018 https://arxiv.org/abs/1712.03141]
Misclassifications that do
not compromise normal
system operation
Misclassifications that
compromise normal
system operation
Attacker’s Goal
Attacker’s Capability
Querying strategies that reveal
confidential information on the
learning model or its users
Attacker’s Knowledge:
• perfect-knowledge (PK) white-box attacks
• limited-knowledge (LK) black-box attacks (transferability with surrogate/substitute learning models)
113
114. http://pralab.diee.unica.it
Model Inversion Attacks
Privacy Attacks
• Goal: to extract users’ sensitive information
(e.g., face templates stored during user enrollment)
– Fredrikson, Jha, Ristenpart. Model inversion attacks that exploit
confidence information and basic countermeasures. ACM CCS, 2015
• Also known as hill-climbing attacks in the biometric community
– Adler. Vulnerabilities in biometric encryption systems.
5th Int’l Conf. AVBPA, 2005
– Galbally, McCool, Fierrez, Marcel, Ortega-Garcia. On the vulnerability of
face verification systems to hill-climbing attacks. Patt. Rec., 2010
• How: by repeatedly querying the target system and adjusting the
input sample to maximize its output score (e.g., a measure of the
similarity of the input sample with the user templates)
114
Reconstructed Image
Training Image
115. http://pralab.diee.unica.it
Membership Inference Attacks
Privacy Attacks (Shokri et al., IEEE Symp. SP 2017)
• Goal: to identify whether an input sample is part of the training set used to learn a deep
neural network based on the observed prediction scores for each class
115
116. http://pralab.diee.unica.it
Training data (poisoned)
Backdoored stop sign
(labeled as speedlimit)
Backdoor Attacks
Poisoning Integrity Attacks
116
Backdoor / poisoning integrity attacks place mislabeled training points in a region of
the feature space far from the rest of training data. The learning algorithm labels such
region as desired, allowing for subsequent intrusions / misclassifications at test time
Training data (no poisoning)
T. Gu, B. Dolan-Gavitt, and S. Garg. Badnets: Identifying vulnerabilities in
the machine learning model supply chain. In NIPS Workshop on Machine
Learning and Computer Security, 2017.
X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks on
deep learning systems using data poisoning. ArXiv e-prints, 2017.
M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine
learning be secure? In Proc. ACM Symp. Information, Computer and
Comm. Sec., ASIACCS ’06, pages 16–25, New York, NY, USA, 2006. ACM.
M. Barreno, B. Nelson, A. Joseph, and J. Tygar. The security of machine
learning. Machine Learning, 81:121–148, 2010.
B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support
vector machines. In J. Langford and J. Pineau, editors, 29th Int’l Conf. on
Machine Learning, pages 1807–1814. Omnipress, 2012.
B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers
under attack. IEEE Transactions on Knowledge and Data Engineering,
26(4):984–996, April 2014.
H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli. Is feature
selection secure against training data poisoning? In F. Bach and D. Blei,
editors, JMLR W&CP - Proc. 32nd Int’l Conf. Mach. Learning (ICML),
volume 37, pages 1689–1698, 2015.
L. Munoz-Gonzalez, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee,
E. C. Lupu, and F. Roli. Towards poisoning of deep learning algorithms
with back-gradient optimization. In 10th ACM Workshop on Artificial
Intelligence and Security, AISec ’17, pp. 27–38, 2017. ACM.
B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial
machine learning. ArXiv e-prints, 2018.
M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li.
Manipulating machine learning: Poisoning attacks and countermeasures
for regression learning. In 39th IEEE Symp. on Security and Privacy, 2018.
Attackreferred
toasbackdoor
Attackreferredtoas
‘poisoningintegrity’
117. Pentesting of Machine Learning
Battista Biggio
https://www.pluribus-one.it/research/sec-ml/wild-patterns/
Pattern Recognition
and Applications Lab
University of
Cagliari, Italy
ITASEC 2019, Italian Conference on Cybersecurity, February 14, Pisa, Italy
118. http://pralab.diee.unica.it
secml: An open source Python library for ML Security
118
adv
ml
expl
others
- ML algorithms via sklearn
- DL algorithms and optimizers via PyTorch and Tensorflow
- attacks (evasion, poisoning, ...) with custom/faster solvers
- defenses (advx rejection, adversarial training, ...)
- Explanation methods based on influential features
- Explanation methods based on influential prototypes
- Parallel computation
- Support for dense/sparse data
- Advanced plotting functions (via matplotlib)
- Modular and easy to extend
First release scheduled on April 2019!
Marco Melis
Ambra Demontis
119. http://pralab.diee.unica.it
ML Security Evaluation
Security Evaluation
(Attack Simulation)
attack model and parameters
Security Risk Level
(numeric value)
machine-learning model
secml
data
119
e.g., evasion attacks – modify samples of class +1 with l1 perturbation, for eps in {0, 5, 10, 50}
e.g., poisoning attacks – inject samples of classes +1/-1 in the training set, with % of poisoning points in {0, 2, 5, 10, 20}