Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

FFRI, Inc.

Improving accuracy of malware detection by
Fourteenforty Research Institute, Inc.
filtering evaluation dataset based on its similarity
Junichi Murakami
Director of Advanced Development
FFRI, Inc.
http://www.ffri.jp

FFRI, Inc.

Preface
• This slides was used for a presentation at CSS2013
– http://www.iwsec.org/css/2013/english/index.html
• Please refer the original paper for the detail data
– http://www.ffri.jp/assets/files/research/research_papers/M
WS2013_paper.pdf
(Written in Japanese but the figures are common)
• Contact information
– research-feedback@ffri.jp
– @FFRI_Research (twitter)

2

FFRI, Inc.

Agenda
•
•
•
•
•
•
•
•

Background
Problem
Scope and purpose
Experiment 1
Experiment 2
Experiment 3
Consideration
Conclusion

3

FFRI, Inc.

Background – malware and its detection
Malware generators

Obfuscators

Increasing
malware
Targeted Attack
(Unknown malware)

Limitation of
signature matching

other methods
Heuristic

Bigdata

Machine learning
Could reputation

4

FFRI, Inc.

Background – Related works
• Mainly focusing on a combination of the factors below
– Features selection and modification, parameter settings
• Some good results are reported (TRP:90%+, FRP:1%-)

Features

Algorithms

Evaluation

Static information

SVM

TPR/FRP, etc.

Dynamic information

Naive bayes

Accuracy, Precision

Hybrid

Perceptron, etc.

ROC-curve, etc.

5

FFRI, Inc.

Problem
• General theory of machine learning:
– Accuracy of classification declines
if trends of training and testing data are different
• How about malware and benign files

?

?

6

FFRI, Inc.

Scope and purpose
① Investigating differences between similarities of
malware and benign files(Experiment-1)
② Investigating an effect for accuracy of classification
by the difference(Experiment-2)
③ Based on the result above, confirming an effect of
removing data whose similarity with a training data
is low (Experiment-3)

7

FFRI, Inc.

Experiment-1(1/3)
• Used FFRI Dataset 2013 and benign files we collected as datasets
• Calculated the similarity of each malware and benign files
(Jubatus, MinHash)
• Feature vector: A number of 4-gram of sequential API calls
– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times
NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times
malware
benign

A
A

A
B
C
...

B

C

...

B

C

...

0.52 ...

A

ー

0.8

B

ー

ー

1.0

...

C

ー

ー

ー

...

...

ー

ー

ー

ー
8

FFRI, Inc.

Experiment-1(2/3)
Grouping malware and benign files based on their similarities

Threshold of similarity (0.0 - 1.0)
benign

malware

9

FFRI, Inc.

Experiment-1(3/3)
It is more difficult to find similar benign files compared to malware
100%
80%

60%
40%
unique
仲間無
not
仲間有

20%

0.8

0.85

0.9

0.95

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

0%

unique

1

Threshold of similarity
10

FFRI, Inc.

Experiment-2(1/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)

train

classify

jubatus

jubatus

TPR: ?
FPR: ?

malware

TPR: True Positive Rate
FPR: False Positive Rate

testing

train

benign

11

FFRI, Inc.

Experiment-2(2/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)

train

jubatus
malware
benign

testing

train

classify

jubatus

TPR: ?
FPR: ?

12

FFRI, Inc.

Experiment-2(3/3)
The accuracy declines if trends of training and testing data
are different

■TPR

■FPR
0.624(not unique)
+3.866

97.996(not unique)
81.297(unique)

4.49(unique)

-16.699
0

50

100
%

0

1

2

3

4

5

%

13

FFRI, Inc.

Experiment-3(1/6) – After a training
dividing line

malware

benign(train)

benign

malware(train)

benign(test)

malware(test）

14

FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

benign(train)

malware(train)

benign(test)

malware(test）

15

FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

FP

FN
benign(train)

malware(train)

benign(test)

malware(test）

16

FFRI, Inc.

Experiment-3(3/6) – Low similarity data
dividing line

TP(accidentally)

FN
benign(train)

FN

malware(train)

benign(test)

malware(test）

17

FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

The number of classified data

1400

1.00

1200

0.98

1000

0.96

800
0.94
600

0.92

400
200

0.90

0

TP
FN
TPR

0.88
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
18

FFRI, Inc.

Experiment-3(5/6) – Effect to FPR

2500

0.014
0.012

2000
0.010
1500

0.008

1000

0.006

TN
FP
FPR

0.004
500
0.002
0

0.000
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

19

FFRI, Inc.

/ The number of total testing data

Experiment-3(6/6)
Transition of the number of classified data
malware
マルウェア

benign
正常系ソフトウェア

120%
100%
80%
60%
40%
20%
0%
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

1

20

FFRI, Inc.

Consideration(1/3)
• In real scenario:
– trying to classify an unknown file/process whether it is
benign files or not
• If we apply Experiment-3:
– Files are classified only if similar data is already trained
– If not, files are not classified which results in
• FN if the files is malware
• TF if the files is benign (All right as a result)
• Therefore it is a problem about “TPR for unique malware”
(Unique malware is likely to be undetectable)

21

FFRI, Inc.

Consideration(2/3)
• If malware have many variants as the current
– ML-based detection works well
• Having many variants ∝ malware generators/obfuscators
• We have to investigate
– Trends of usage of the tools above
– Possibility of anti-machine learning detection

22

FFRI, Inc.

Consideration(3/3)
• How to deal with unclassified (filtered) data

1. Using other feature vectors
2. Enlarging a training dataset (Unique → Not unique)
3. Using other methods besides ML

23

FFRI, Inc.

Conclusion
• Distribution of similarity for malware and benign are difference
(Experiment-1)
•

Accuracy declines if trends of training and testing data are different
(Experiment-2)

• TPR of unique malware declines when we remove low similarity data
(Experiment-3)
• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

24

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Semelhante a Improving accuracy of malware detection by filtering evaluation dataset based on its similarity (20)

Mais de FFRI, Inc.

Mais de FFRI, Inc. (20)

Último

Último (20)

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity