SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
FFRI, Inc.

Improving accuracy of malware detection by
Fourteenforty Research Institute, Inc.
filtering evaluation dataset based on its similarity
Junichi Murakami
Director of Advanced Development
FFRI, Inc.
http://www.ffri.jp
FFRI, Inc.

Preface
• This slides was used for a presentation at CSS2013
– http://www.iwsec.org/css/2013/english/index.html
• Please refer the original paper for the detail data
– http://www.ffri.jp/assets/files/research/research_papers/M
WS2013_paper.pdf
(Written in Japanese but the figures are common)
• Contact information
– research-feedback@ffri.jp
– @FFRI_Research (twitter)

2
FFRI, Inc.

Agenda
•
•
•
•
•
•
•
•

Background
Problem
Scope and purpose
Experiment 1
Experiment 2
Experiment 3
Consideration
Conclusion

3
FFRI, Inc.

Background – malware and its detection
Malware generators

Obfuscators

Increasing
malware
Targeted Attack
(Unknown malware)

Limitation of
signature matching

other methods
Heuristic

Bigdata

Machine learning
Could reputation

4
FFRI, Inc.

Background – Related works
• Mainly focusing on a combination of the factors below
– Features selection and modification, parameter settings
• Some good results are reported (TRP:90%+, FRP:1%-)

Features

Algorithms

Evaluation

Static information

SVM

TPR/FRP, etc.

Dynamic information

Naive bayes

Accuracy, Precision

Hybrid

Perceptron, etc.

ROC-curve, etc.

5
FFRI, Inc.

Problem
• General theory of machine learning:
– Accuracy of classification declines
if trends of training and testing data are different
• How about malware and benign files

?

?

6
FFRI, Inc.

Scope and purpose
① Investigating differences between similarities of
malware and benign files(Experiment-1)
② Investigating an effect for accuracy of classification
by the difference(Experiment-2)
③ Based on the result above, confirming an effect of
removing data whose similarity with a training data
is low (Experiment-3)

7
FFRI, Inc.

Experiment-1(1/3)
• Used FFRI Dataset 2013 and benign files we collected as datasets
• Calculated the similarity of each malware and benign files
(Jubatus, MinHash)
• Feature vector: A number of 4-gram of sequential API calls
– ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times
NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times
malware
benign

A
A

A
B
C
...

B

C

...

B

C

...

0.52 ...

A

ー

0.8

B

ー

ー

1.0

...

C

ー

ー

ー

...

...

ー

ー

ー

ー
8
FFRI, Inc.

Experiment-1(2/3)
Grouping malware and benign files based on their similarities

Threshold of similarity (0.0 - 1.0)
benign

malware

9
FFRI, Inc.

Experiment-1(3/3)
It is more difficult to find similar benign files compared to malware
100%
80%

60%
40%
unique
仲間無
not
仲間有

20%

0.8

0.85

0.9

0.95

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

malware
マルウェア

benign
正常系

0%

unique

1

Threshold of similarity
10
FFRI, Inc.

Experiment-2(1/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)

train

classify

jubatus

jubatus

TPR: ?
FPR: ?

malware

TPR: True Positive Rate
FPR: False Positive Rate

testing

train

benign

11
FFRI, Inc.

Experiment-2(2/3)
• How much does the difference affect a result?
• 50% of malware/benign are assigned to a training, the others are to a
testing dataset(Jubatus, AROW)

train

jubatus
malware
benign

testing

train

classify

jubatus

TPR: ?
FPR: ?

12
FFRI, Inc.

Experiment-2(3/3)
The accuracy declines if trends of training and testing data
are different

■TPR

■FPR
0.624(not unique)
+3.866

97.996(not unique)
81.297(unique)

4.49(unique)

-16.699
0

50

100
%

0

1

2

3

4

5

%

13
FFRI, Inc.

Experiment-3(1/6) – After a training
dividing line

malware

benign(train)

benign

malware(train)

benign(test)

malware(test)

14
FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

benign(train)

malware(train)

benign(test)

malware(test)

15
FFRI, Inc.

Experiment-3(2/6) – After a classification
dividing line

FP

FN
benign(train)

malware(train)

benign(test)

malware(test)

16
FFRI, Inc.

Experiment-3(3/6) – Low similarity data
dividing line

TP(accidentally)

FN
benign(train)

FN

malware(train)

benign(test)

malware(test)

17
FFRI, Inc.

Experiment-3(4/6) – Effect to TPR

The number of classified data

1400

1.00

1200

0.98

1000

0.96

800
0.94
600

0.92

400
200

0.90

0

TP
FN
TPR

0.88
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Threshold of similarity
18
FFRI, Inc.

Experiment-3(5/6) – Effect to FPR
The number of classified data

2500

0.014
0.012

2000
0.010
1500

0.008

1000

0.006

TN
FP
FPR

0.004
500
0.002
0

0.000
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Threshold of similarity
19
FFRI, Inc.

The number of classified data
/ The number of total testing data

Experiment-3(6/6)
Transition of the number of classified data
malware
マルウェア

benign
正常系ソフトウェア

120%
100%
80%
60%
40%
20%
0%
0

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

1

Threshold of similarity
20
FFRI, Inc.

Consideration(1/3)
• In real scenario:
– trying to classify an unknown file/process whether it is
benign files or not
• If we apply Experiment-3:
– Files are classified only if similar data is already trained
– If not, files are not classified which results in
• FN if the files is malware
• TF if the files is benign (All right as a result)
• Therefore it is a problem about “TPR for unique malware”
(Unique malware is likely to be undetectable)

21
FFRI, Inc.

Consideration(2/3)
• If malware have many variants as the current
– ML-based detection works well
• Having many variants ∝ malware generators/obfuscators
• We have to investigate
– Trends of usage of the tools above
– Possibility of anti-machine learning detection

22
FFRI, Inc.

Consideration(3/3)
• How to deal with unclassified (filtered) data

1. Using other feature vectors
2. Enlarging a training dataset (Unique → Not unique)
3. Using other methods besides ML

23
FFRI, Inc.

Conclusion
• Distribution of similarity for malware and benign are difference
(Experiment-1)
•

Accuracy declines if trends of training and testing data are different
(Experiment-2)

• TPR of unique malware declines when we remove low similarity data
(Experiment-3)
• Continual investigation for trends of malware and related tools are required

• (Might be necessary to develop technology to determine benign files)

24

Mais conteúdo relacionado

Semelhante a Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer securityFFRI, Inc.
 
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)FFRI, Inc.
 
MR201407 An example of antivirus detection rates and similarity of undetected...
MR201407 An example of antivirus detection rates and similarity of undetected...MR201407 An example of antivirus detection rates and similarity of undetected...
MR201407 An example of antivirus detection rates and similarity of undetected...FFRI, Inc.
 
SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012Rian Yulian
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detectionsujeeshkumarj
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and AnalysisPrashant Chopra
 
Cyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down IntrudersCyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down IntrudersInfosec
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)FFRI, Inc.
 
Deep Learning based Threat / Intrusion detection system
Deep Learning based Threat / Intrusion detection systemDeep Learning based Threat / Intrusion detection system
Deep Learning based Threat / Intrusion detection systemAffine Analytics
 
Cyber threat-hunting---part-2-25062021-095909pm
Cyber threat-hunting---part-2-25062021-095909pmCyber threat-hunting---part-2-25062021-095909pm
Cyber threat-hunting---part-2-25062021-095909pmMuhammadJalalShah1
 
The Rising Threat of Fileless Malware
The Rising Threat of Fileless MalwareThe Rising Threat of Fileless Malware
The Rising Threat of Fileless MalwareChelsea Sisson
 
Itis pentest slides hyd
Itis pentest slides  hydItis pentest slides  hyd
Itis pentest slides hydRama krishna
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...FFRI, Inc.
 
Classification with R
Classification with RClassification with R
Classification with RNajima Begum
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware DetectionKaspersky
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
 
AMP_Security_ Malware Protection Presentatiion
AMP_Security_ Malware Protection PresentatiionAMP_Security_ Malware Protection Presentatiion
AMP_Security_ Malware Protection PresentatiionSohanGole1
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesSandeep Kumar Seeram
 

Semelhante a Improving accuracy of malware detection by filtering evaluation dataset based on its similarity (20)

Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer security
 
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
 
MR201407 An example of antivirus detection rates and similarity of undetected...
MR201407 An example of antivirus detection rates and similarity of undetected...MR201407 An example of antivirus detection rates and similarity of undetected...
MR201407 An example of antivirus detection rates and similarity of undetected...
 
SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012SANS Digital Forensics and Incident Response Poster 2012
SANS Digital Forensics and Incident Response Poster 2012
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detection
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 
Cyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down IntrudersCyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down Intruders
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
 
Deep Learning based Threat / Intrusion detection system
Deep Learning based Threat / Intrusion detection systemDeep Learning based Threat / Intrusion detection system
Deep Learning based Threat / Intrusion detection system
 
Cyber threat-hunting---part-2-25062021-095909pm
Cyber threat-hunting---part-2-25062021-095909pmCyber threat-hunting---part-2-25062021-095909pm
Cyber threat-hunting---part-2-25062021-095909pm
 
The Rising Threat of Fileless Malware
The Rising Threat of Fileless MalwareThe Rising Threat of Fileless Malware
The Rising Threat of Fileless Malware
 
Itis pentest slides hyd
Itis pentest slides  hydItis pentest slides  hyd
Itis pentest slides hyd
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
 
Classification with R
Classification with RClassification with R
Classification with R
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware Detection
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
A44090104
A44090104A44090104
A44090104
 
AMP_Security_ Malware Protection Presentatiion
AMP_Security_ Malware Protection PresentatiionAMP_Security_ Malware Protection Presentatiion
AMP_Security_ Malware Protection Presentatiion
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 

Mais de FFRI, Inc.

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMFFRI, Inc.
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMFFRI, Inc.
 
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017) TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017) FFRI, Inc.
 
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...FFRI, Inc.
 
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017) An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017) FFRI, Inc.
 
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016) Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016) FFRI, Inc.
 
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)FFRI, Inc.
 
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7) About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7) FFRI, Inc.
 
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)FFRI, Inc.
 
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)FFRI, Inc.
 
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...FFRI, Inc.
 
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)FFRI, Inc.
 
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...FFRI, Inc.
 
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)FFRI, Inc.
 
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)FFRI, Inc.
 
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)FFRI, Inc.
 
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...FFRI, Inc.
 
Malwarem armed with PowerShell
Malwarem armed with PowerShellMalwarem armed with PowerShell
Malwarem armed with PowerShellFFRI, Inc.
 
MR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPressMR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPressFFRI, Inc.
 
MR201502 Intel Memory Protection Extensions Overview
MR201502 Intel Memory Protection Extensions OverviewMR201502 Intel Memory Protection Extensions Overview
MR201502 Intel Memory Protection Extensions OverviewFFRI, Inc.
 

Mais de FFRI, Inc. (20)

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017) TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
 
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
 
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017) An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
 
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016) Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
 
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
 
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7) About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
 
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
 
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
 
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
 
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
 
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
 
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
 
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
 
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
 
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
 
Malwarem armed with PowerShell
Malwarem armed with PowerShellMalwarem armed with PowerShell
Malwarem armed with PowerShell
 
MR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPressMR201504 Web Defacing Attacks Targeting WordPress
MR201504 Web Defacing Attacks Targeting WordPress
 
MR201502 Intel Memory Protection Extensions Overview
MR201502 Intel Memory Protection Extensions OverviewMR201502 Intel Memory Protection Extensions Overview
MR201502 Intel Memory Protection Extensions Overview
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Improving accuracy of malware detection by filtering evaluation dataset based on its similarity

  • 1. FFRI, Inc. Improving accuracy of malware detection by Fourteenforty Research Institute, Inc. filtering evaluation dataset based on its similarity Junichi Murakami Director of Advanced Development FFRI, Inc. http://www.ffri.jp
  • 2. FFRI, Inc. Preface • This slides was used for a presentation at CSS2013 – http://www.iwsec.org/css/2013/english/index.html • Please refer the original paper for the detail data – http://www.ffri.jp/assets/files/research/research_papers/M WS2013_paper.pdf (Written in Japanese but the figures are common) • Contact information – research-feedback@ffri.jp – @FFRI_Research (twitter) 2
  • 3. FFRI, Inc. Agenda • • • • • • • • Background Problem Scope and purpose Experiment 1 Experiment 2 Experiment 3 Consideration Conclusion 3
  • 4. FFRI, Inc. Background – malware and its detection Malware generators Obfuscators Increasing malware Targeted Attack (Unknown malware) Limitation of signature matching other methods Heuristic Bigdata Machine learning Could reputation 4
  • 5. FFRI, Inc. Background – Related works • Mainly focusing on a combination of the factors below – Features selection and modification, parameter settings • Some good results are reported (TRP:90%+, FRP:1%-) Features Algorithms Evaluation Static information SVM TPR/FRP, etc. Dynamic information Naive bayes Accuracy, Precision Hybrid Perceptron, etc. ROC-curve, etc. 5
  • 6. FFRI, Inc. Problem • General theory of machine learning: – Accuracy of classification declines if trends of training and testing data are different • How about malware and benign files ? ? 6
  • 7. FFRI, Inc. Scope and purpose ① Investigating differences between similarities of malware and benign files(Experiment-1) ② Investigating an effect for accuracy of classification by the difference(Experiment-2) ③ Based on the result above, confirming an effect of removing data whose similarity with a training data is low (Experiment-3) 7
  • 8. FFRI, Inc. Experiment-1(1/3) • Used FFRI Dataset 2013 and benign files we collected as datasets • Calculated the similarity of each malware and benign files (Jubatus, MinHash) • Feature vector: A number of 4-gram of sequential API calls – ex: NtCreateFile_NtWriteFile_NtWriteFile_NtClose: n times NtSetInformationFile_NtClose_NtClose_NtOpenMutext: m times malware benign A A A B C ... B C ... B C ... 0.52 ... A ー 0.8 B ー ー 1.0 ... C ー ー ー ... ... ー ー ー ー 8
  • 9. FFRI, Inc. Experiment-1(2/3) Grouping malware and benign files based on their similarities Threshold of similarity (0.0 - 1.0) benign malware 9
  • 10. FFRI, Inc. Experiment-1(3/3) It is more difficult to find similar benign files compared to malware 100% 80% 60% 40% unique 仲間無 not 仲間有 20% 0.8 0.85 0.9 0.95 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 malware マルウェア benign 正常系 0% unique 1 Threshold of similarity 10
  • 11. FFRI, Inc. Experiment-2(1/3) • How much does the difference affect a result? • 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW) train classify jubatus jubatus TPR: ? FPR: ? malware TPR: True Positive Rate FPR: False Positive Rate testing train benign 11
  • 12. FFRI, Inc. Experiment-2(2/3) • How much does the difference affect a result? • 50% of malware/benign are assigned to a training, the others are to a testing dataset(Jubatus, AROW) train jubatus malware benign testing train classify jubatus TPR: ? FPR: ? 12
  • 13. FFRI, Inc. Experiment-2(3/3) The accuracy declines if trends of training and testing data are different ■TPR ■FPR 0.624(not unique) +3.866 97.996(not unique) 81.297(unique) 4.49(unique) -16.699 0 50 100 % 0 1 2 3 4 5 % 13
  • 14. FFRI, Inc. Experiment-3(1/6) – After a training dividing line malware benign(train) benign malware(train) benign(test) malware(test) 14
  • 15. FFRI, Inc. Experiment-3(2/6) – After a classification dividing line benign(train) malware(train) benign(test) malware(test) 15
  • 16. FFRI, Inc. Experiment-3(2/6) – After a classification dividing line FP FN benign(train) malware(train) benign(test) malware(test) 16
  • 17. FFRI, Inc. Experiment-3(3/6) – Low similarity data dividing line TP(accidentally) FN benign(train) FN malware(train) benign(test) malware(test) 17
  • 18. FFRI, Inc. Experiment-3(4/6) – Effect to TPR The number of classified data 1400 1.00 1200 0.98 1000 0.96 800 0.94 600 0.92 400 200 0.90 0 TP FN TPR 0.88 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 18
  • 19. FFRI, Inc. Experiment-3(5/6) – Effect to FPR The number of classified data 2500 0.014 0.012 2000 0.010 1500 0.008 1000 0.006 TN FP FPR 0.004 500 0.002 0 0.000 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 19
  • 20. FFRI, Inc. The number of classified data / The number of total testing data Experiment-3(6/6) Transition of the number of classified data malware マルウェア benign 正常系ソフトウェア 120% 100% 80% 60% 40% 20% 0% 0 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Threshold of similarity 20
  • 21. FFRI, Inc. Consideration(1/3) • In real scenario: – trying to classify an unknown file/process whether it is benign files or not • If we apply Experiment-3: – Files are classified only if similar data is already trained – If not, files are not classified which results in • FN if the files is malware • TF if the files is benign (All right as a result) • Therefore it is a problem about “TPR for unique malware” (Unique malware is likely to be undetectable) 21
  • 22. FFRI, Inc. Consideration(2/3) • If malware have many variants as the current – ML-based detection works well • Having many variants ∝ malware generators/obfuscators • We have to investigate – Trends of usage of the tools above – Possibility of anti-machine learning detection 22
  • 23. FFRI, Inc. Consideration(3/3) • How to deal with unclassified (filtered) data 1. Using other feature vectors 2. Enlarging a training dataset (Unique → Not unique) 3. Using other methods besides ML 23
  • 24. FFRI, Inc. Conclusion • Distribution of similarity for malware and benign are difference (Experiment-1) • Accuracy declines if trends of training and testing data are different (Experiment-2) • TPR of unique malware declines when we remove low similarity data (Experiment-3) • Continual investigation for trends of malware and related tools are required • (Might be necessary to develop technology to determine benign files) 24