In recent years, it has been getting more difficult to detect malware by a traditional method like pattern matching because of the improvement of malware. Therefore machine learning-based detection has been introduced and reported that it has achieved a high detection rate compared to a traditional method in various research. However, it is well-known that the accuracy of detection significantly degrades against data that differ from training dataset. This study provides a method to improve accuracy of detection by filtering evaluation dataset based on similarity between evaluation and training dataset.
Handwritten Text Recognition for manuscripts and early printed texts
Improving accuracy of malware detection by filtering evaluation dataset based on its similarity
1. FFRI, Inc.
Improving accuracy of malware detection by
Fourteenforty Research Institute, Inc.
filtering evaluation dataset based on its similarity
Junichi Murakami
Director of Advanced Development
FFRI, Inc.
http://www.ffri.jp
2. FFRI, Inc.
Preface
• This slides was used for a presentation at CSS2013
– http://www.iwsec.org/css/2013/english/index.html
• Please refer the original paper for the detail data
– http://www.ffri.jp/assets/files/research/research_papers/M
WS2013_paper.pdf
(Written in Japanese but the figures are common)
• Contact information
– research-feedback@ffri.jp
– @FFRI_Research (twitter)
2
4. FFRI, Inc.
Background – malware and its detection
Malware generators
Obfuscators
Increasing
malware
Targeted Attack
(Unknown malware)
Limitation of
signature matching
other methods
Heuristic
Bigdata
Machine learning
Could reputation
4
5. FFRI, Inc.
Background – Related works
• Mainly focusing on a combination of the factors below
– Features selection and modification, parameter settings
• Some good results are reported (TRP:90%+, FRP:1%-)
Features
Algorithms
Evaluation
Static information
SVM
TPR/FRP, etc.
Dynamic information
Naive bayes
Accuracy, Precision
Hybrid
Perceptron, etc.
ROC-curve, etc.
5
6. FFRI, Inc.
Problem
• General theory of machine learning:
– Accuracy of classification declines
if trends of training and testing data are different
• How about malware and benign files
?
?
6
7. FFRI, Inc.
Scope and purpose
① Investigating differences between similarities of
malware and benign files(Experiment-1)
② Investigating an effect for accuracy of classification
by the difference(Experiment-2)
③ Based on the result above, confirming an effect of
removing data whose similarity with a training data
is low (Experiment-3)
7
8. FFRI, Inc.
Experiment-1(1/3)
• Used FFRI Dataset 2013 and benign files we collected as datasets
• Calculated the similarity of each malware and benign files
(Jubatus, MinHash)
• Feature vector: A number of 4-gram of sequential API calls
– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times
NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times
malware
benign
A
A
A
B
C
...
B
C
...
B
C
...
0.52 ...
A
ー
0.8
B
ー
ー
1.0
...
C
ー
ー
ー
...
...
ー
ー
ー
ー
8
10. FFRI, Inc.
Experiment-1(3/3)
It is more difficult to find similar benign files compared to malware
100%
80%
60%
40%
unique
仲間無
not
仲間有
20%
0.8
0.85
0.9
0.95
malware
マルウェア
benign
正常系
malware
マルウェア
benign
正常系
malware
マルウェア
benign
正常系
malware
マルウェア
benign
正常系
malware
マルウェア
benign
正常系
0%
unique
1
Threshold of similarity
10
11. FFRI, Inc.
Experiment-2(1/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)
train
classify
jubatus
jubatus
TPR: ?
FPR: ?
malware
TPR: True Positive Rate
FPR: False Positive Rate
testing
train
benign
11
12. FFRI, Inc.
Experiment-2(2/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)
train
jubatus
malware
benign
testing
train
classify
jubatus
TPR: ?
FPR: ?
12
13. FFRI, Inc.
Experiment-2(3/3)
The accuracy declines if trends of training and testing data
are different
■TPR
■FPR
0.624(not unique)
+3.866
97.996(not unique)
81.297(unique)
4.49(unique)
-16.699
0
50
100
%
0
1
2
3
4
5
%
13
14. FFRI, Inc.
Experiment-3(1/6) – After a training
dividing line
malware
benign(train)
benign
malware(train)
benign(test)
malware(test)
14
15. FFRI, Inc.
Experiment-3(2/6) – After a classification
dividing line
benign(train)
malware(train)
benign(test)
malware(test)
15
16. FFRI, Inc.
Experiment-3(2/6) – After a classification
dividing line
FP
FN
benign(train)
malware(train)
benign(test)
malware(test)
16
17. FFRI, Inc.
Experiment-3(3/6) – Low similarity data
dividing line
TP(accidentally)
FN
benign(train)
FN
malware(train)
benign(test)
malware(test)
17
18. FFRI, Inc.
Experiment-3(4/6) – Effect to TPR
The number of classified data
1400
1.00
1200
0.98
1000
0.96
800
0.94
600
0.92
400
200
0.90
0
TP
FN
TPR
0.88
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Threshold of similarity
18
19. FFRI, Inc.
Experiment-3(5/6) – Effect to FPR
The number of classified data
2500
0.014
0.012
2000
0.010
1500
0.008
1000
0.006
TN
FP
FPR
0.004
500
0.002
0
0.000
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Threshold of similarity
19
20. FFRI, Inc.
The number of classified data
/ The number of total testing data
Experiment-3(6/6)
Transition of the number of classified data
malware
マルウェア
benign
正常系ソフトウェア
120%
100%
80%
60%
40%
20%
0%
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
1
Threshold of similarity
20
21. FFRI, Inc.
Consideration(1/3)
• In real scenario:
– trying to classify an unknown file/process whether it is
benign files or not
• If we apply Experiment-3:
– Files are classified only if similar data is already trained
– If not, files are not classified which results in
• FN if the files is malware
• TF if the files is benign (All right as a result)
• Therefore it is a problem about “TPR for unique malware”
(Unique malware is likely to be undetectable)
21
22. FFRI, Inc.
Consideration(2/3)
• If malware have many variants as the current
– ML-based detection works well
• Having many variants ∝ malware generators/obfuscators
• We have to investigate
– Trends of usage of the tools above
– Possibility of anti-machine learning detection
22
23. FFRI, Inc.
Consideration(3/3)
• How to deal with unclassified (filtered) data
1. Using other feature vectors
2. Enlarging a training dataset (Unique → Not unique)
3. Using other methods besides ML
23
24. FFRI, Inc.
Conclusion
• Distribution of similarity for malware and benign are difference
(Experiment-1)
•
Accuracy declines if trends of training and testing data are different
(Experiment-2)
• TPR of unique malware declines when we remove low similarity data
(Experiment-3)
• Continual investigation for trends of malware and related tools are required
• (Might be necessary to develop technology to determine benign files)
24