MCS Systems Performance Under Attack

Multiple classifier systems under attack
Battista Biggio, Giorgio Fumera, Fabio Roli
Dept. of Electrical and Electronic Eng., Univ. of Cagliari

http://prag.diee.unica.it

9th International Workshop on Multiple Classifier Systems

Outline

● Adversarial classification

● MCSs in adversarial classification tasks

● Some experimental results

2

Adversarial classification
Two pattern classes: Examples:
● Biometric verification and recognition
legitimate, malicious ● Intrusion detection in computer networks

● Spam filtering

● Network traffic identification

Biometric verification ...
Spam filtering
I am John Smith
legitimate
Subject: MCS2010 Suggested tours

Dear MCS 2010 Participant,
Attached please find the offers
we negotiated with the travel agency
...
genuine Template
database spam
Subject: Need affordable Drugs??
J. Smith B. Brown
I am Bob Brown Order from Canadian Pharmacy
& Save You Money
We are having Specials
Hot Promotion this week!
...

3
impostor

Attack: fingerprint spoofing
spam
Subject: Need affordable Drugs??

Order from Canadian Pharmacy & Save You Money
We are having Specials Hot Promotion this week!
B. Brown ...

I am Bob Brown
Attack:
Bad word obfuscation
Good word insertion

impostor Subject: Need affordab1e D r u g s?? spam

Order from (anadian Ph@rmacy & S@ve You Money
We are having Specials H0t Promotion this week!
...
"Don't you guys ever read a paper? Moyer's a
gentleman now. He knows t
"Well I'm sure I can't help what you think,"
she said tartly. "After a
J. Smith B. Brown

Template database 4

Main issues:
● vulnerabilities of pattern recognition systems
● performance evaluation under attack
● design of pattern recognition systems robust to attacks

5

Multiple classifier systems
in adversarial environments
I am Bob Brown

Fusion rule
Accepted/
J. Smith B. Brown Rejected

impostor

Multimodal biometric systems: more accurate than unimodal ones

6

Multiple classifier systems
in adversarial environments
I am Bob Brown

Fusion rule
Accepted/
J. Smith B. Brown Rejected

impostor

Multimodal biometric systems: more accurate than unimodal ones
And also more robust to attacks (?)

Analogous claims in other applications
(spam filtering, network intrusion detection, etc.) 7

Aim of our work
Main issues in adversarial classification:
● vulnerabilities of pattern recognition systems
● performance evaluation under attack
● design of pattern recognition systems robust to attacks

Our goal: to investigate whether and how MCSs allow to
improve the robustness of PR systems under attack

8

Linear classifiers under attack
The adversary exploits some knowledge on
● the features
● the classifier's decision function
An example: spam filtering, linear classifiers
f(x) = sign { ω1x1 + ω2x2 + ... + ωNxN + ω0 }
xi  {0,1}; f(x) = +1: spam; f(x) = -1: legitimate

Buy viagra! Buy vi4gr4!

Did you ever play that game
when you were a kid where the
little plastic hippo tries to
gobble up all your marbles?

x = [ 1 0 1 0 0 0 0 0 …] x’ = [ 1 0 0 0 1 0 0 1 …]

9

The adversary exploits some knowledge on
● the features
● the classifier's decision function

ω buy
viagra f(x) = sign { ω1x1 + ω2x2 + ... ωNxN + ω0 }
2.0
0.5 Buy viagra! 0.5 + 2.0 - 0.9 = 0.6 > 0: spam
Buy vi4gr4! 0.5 - 0.9 = -0.4 < 0: legitimate
-0.5
-0.9 Buy viagra! 0.5 + 2.0 - 2.0 - 0.9 = -0.4 < 0: legitimate
-2.0 game
kid game ω
0

10

Possible strategy to improve the robustness of linear
classifiers: keep weights as much uniform as possible
(Kolcz and Teo, 6th Conf. on Email and Anti-Spam,
CEAS 2009)

ω buy
viagra f(x) = sign { ω1x1 + ω2x2 + ... ωNxN + ω0 }
1.0 1.5
Buy viagra! 1.0 + 1.5 - 0.9 = 1.6 > 0: spam
Buy vi4gr4! 1.0 - 0.9 = 0.1 > 0: spam
-1.0 -0.9 Buy viagra! 1.0 + 1.5 - 1.5 - 0.9 = 0.1 > 0: spam
-1.5
game
kid game ω
0
Buy viagra! 1.0 + 1.5 - 1.0 - 1.5 - 0.9 = -0.9 < 0
kid game legitimate

11

Ensembles of linear classifiers under attack
Do randomisation-based MCS techniques result in more
uniform weights of linear base classifiers?
● bagging
● random subspace method
● ...
(accuracy-robustness trade-off)

12

Experimental setting (1)
● Spam filtering task
● TREC 2007 data set (20,000 out of > 75,000 e-mails, 2/3 spam)
● Features: bag of words (word occurrence) > 360,000
● Base linear classifiers: SVM, Logistic Regression
● MCS
● ensemble size: 3, 5, 10
● bagging: 20%, 100% training samples
● RSM: 20%, 50%, 80% feature subset sizes
● 5 runs
● Evaluation of performance under attack: worst-case BWO/GWI
attack, for m obfuscated/added words (m = “attack strength”)

13

Performance measure
TP

1

Receiver Operating Characteristic (ROC) curve

TP = Prob [f(X) = Malicious | Y = Malicious]

FP = Prob [f(X) = Malicious | Y = Legitimate]

AUC10%

FP
0 0.1 1

14

Measure of weights uniformity
sum of top-K weights |ω|
absolute values

sum of weights
absolute values |ω1|  |ωΝ|

F(K)
least uniform weights |ω|
1

|ω1|  |ωΝ|

|ω|
most uniform weights

|ω1|  |ωΝ|
K
0 N
15
Kolcz and Teo, 6th Conf. on Email and Anti-Spam (CEAS 2009)

Results (1)

number of obfuscated/added words

16

Experimental setting (2)
● SpamAssassin
● About N = 900 Boolean“tests”, x1, x2, ...,xN , xi  {0,1}
● Decision function:
f(x) = sign { ω1x1 + ω2x2 + ... + ωNxN + ω0 },
f(x) = +1: spam; f(x) = -1: legitimate
● Default weights: machine learning + manual tuning
● Evaluation of performance under attack: evasion of the
worst m tests (m = “attack strength”)

17

Results (2)

number of evaded tests
18

Conclusions
● Adversarial classification: which roles can MCSs play?

● This work:
● linear classifiers
● attacks based on some knowledge about features
and decision function (case study: spam filtering)

● Future works: investigating MCSs on different
applications, base classifiers, kinds of attacks, ...

19

MCS Systems Performance Under Attack

Recommended

Recommended

More Related Content

More from Pluribus One

More from Pluribus One (20)

MCS Systems Performance Under Attack