SlideShare uma empresa Scribd logo
1 de 23
•

•

•

•

•

•


•
•


•


•


•
•
•
•
•
•
•
•
•
•
•

•

•

•
Program p   Birthmark              MATCH!



                        Similar?



Program q   Birthmark              Different




 The software similarity problem.
•

•

•

•
proc(){
               L_0           L_0:                   W|IEH}R
                               while (v1 || v2) {
               L_3           L_1:
                                 if (v3) {
 true                        L_2:
               L_6
                                 } else {
        true                 L_4:
                                 }
L_1            L_7           L_5:
                      true     }
true                         L_7:
                               return;
L_2            L_4
                             }
               true

               L_5




A control flow graph, its structured
form, and its string representation                           .
•


•

•

•

•
•

•

•

•
•


•

•

                         n
    d1 ( p, q )   p q1         pi   qi
                         i 1
•

•

•

                  d (r , q)
    R {r   D} | 1             t
                     q
•

•


•

•

•

•

•
M1    S (P )
              1

    M2     S ( P2 )
    M 1 ' {ai      M 1} {b j } : 1       M1   j       M2
    M 2 ' {ai      M 2 } {b j } : 1      M2       j   M1
    C : M1' M 2 '       R
                      a,          if a   M1, b M 2
    C ( a, b) { b ,       if b M 2 , a M 2
               ed (a, b), if a M 1 , b M 2

Find a bijection f:M1’M2’ such that the
distance, d is minimized.
            d         a M1 '
                               C (a, f (a))
•

•

•


•
                  d ( p, q )
    p: p   E, | 1              t , d ( p, q )   q
                      q
•


•

•
Samples                                                              Malware
Unknown                                                                    New
                 From                                                    Signature
                                                                                     Database
 Sample
               Honeypots




                                                                        From
                                                                      Honeypot?                       New
                          Dynamic Analysis
                                              No                                                    Signature




                                                     End of            Static
          Packed    Yes          Emulate                        Yes
                                                   Unpacking?                        Classify
                                                                      Analysis




                                             No
                                                                          Non
                                                                                     Malicious
                                                                        Malicious




  The Malwise malware classification system                                                     .
•



•

•

•



•
Malware Detection Rates
                                          Classification
        False Positives                    Algorithm
                                                                 Klez           Netsky               Roron        Frethem

                                          Maximum                       36                49                 81             289
Similarity   K-Subgraphs   Q-Grams
                                          Exact                         20                29                 17             139
       0.0       1302161     2334251      Heuristic
                                          Approximate                   20                27                 43             144
       0.1        463170      413667      Q-Grams                       20                31                 79             226
       0.2        356345       40055      Optimal Distance              22                46                 73             220
                                          Q-Grams +
       0.3        285202        7899      Optimal Distance              20                43                 73             217
       0.4        200326        3790
       0.5        129790         327            False Positives with 10,000
       0.6         46320             11         Malware
       0.7         10784             0              Classification       False              FP
                                                     Algorithm          Positives       Percentage
       0.8          5883             0
                                                Q-Grams                         10             0.62
       0.9            19             0
                                                Q-Grams + Optimal
       1.0             0             0          Distance                            7          0.43
ao       b       d      e      g      k     m       q      a         ao       b      d      e      g      k      m      q      a
ao          0.44    0.28   0.27   0.28   0.55   0.44   0.44   0.47   ao          0.70   0.28   0.28   0.27   0.75   0.70   0.70   0.75
b    0.44           0.27   0.27   0.27   0.51   1.00   1.00   0.58   b    0.74          0.31   0.34   0.33   0.82   1.00   1.00   0.87
d    0.28   0.27           0.48   0.56   0.27   0.27   0.27   0.27   d    0.28   0.29          0.50   0.74   0.29   0.29   0.29   0.29
e    0.27   0.27    0.48          0.59   0.27   0.27   0.27   0.27   e    0.31   0.34   0.50          0.64   0.32   0.34   0.34   0.33
g    0.28   0.27    0.56   0.59          0.27   0.27   0.27   0.27   g    0.27   0.33   0.74   0.64          0.29   0.33   0.33   0.30
k    0.55   0.51    0.27   0.27   0.27          0.51   0.51   0.75   k    0.75   0.82   0.29   0.30   0.29          0.82   0.82   0.96
m    0.44   1.00    0.27   0.27   0.27   0.51          1.00   0.58   m    0.74   1.00   0.31   0.34   0.33   0.82          1.00   0.87
q    0.44   1.00    0.27   0.27   0.27   0.51   1.00          0.58   q    0.74   1.00   0.31   0.34   0.33   0.82   1.00          0.87
a    0.47   0.58    0.27   0.27   0.27   0.75   0.58   0.58          a    0.75   0.87   0.30   0.31   0.30   0.96   0.87   0.87


                   Exact Matching                                    Heuristic Approximate Matching
      ao       b       d      e      g      k     m       q      a         ao       b      d      e      g      k      m      q      a
ao          0.86    0.53   0.64   0.59   0.86   0.86   0.86   0.86   ao          0.86   0.49   0.54   0.50   0.87   0.86   0.86   0.86
b    0.88           0.66   0.76   0.71   0.97   1.00   1.00   0.97   b    0.87          0.57   0.63   0.62   0.96   1.00   1.00   0.96
d    0.65   0.72           0.88   0.93   0.73   0.72   0.72   0.73   d    0.61   0.64          0.85   0.91   0.64   0.64   0.64   0.64
e    0.72   0.80    0.87          0.93   0.80   0.80   0.80   0.80   e    0.64   0.69   0.85          0.90   0.68   0.69   0.69   0.68
g    0.69   0.77    0.93   0.93          0.77   0.77   0.77   0.77   g    0.62   0.68   0.91   0.91          0.68   0.68   0.68   0.68
k    0.88   0.97    0.67   0.77   0.72          0.97   0.97   0.99   k    0.88   0.96   0.58   0.62   0.61          0.96   0.96   0.99
m    0.88   1.00    0.66   0.76   0.71   0.97          1.00   0.97   m    0.87   1.00   0.57   0.63   0.62   0.96          1.00   0.96
q    0.88   1.00    0.66   0.76   0.71   0.97   1.00          0.97   q    0.87   1.00   0.57   0.63   0.62   0.96   1.00          0.96
a    0.87   0.97    0.67   0.77   0.72   0.99   0.97   0.97          a    0.87   0.96   0.58   0.62   0.61   0.99   0.96   0.96


                           Q-Grams                                   Optimal Distance Using
                                                                     Assignment Problem
•
•
    Benign and Malicious
    Processing Time
                Benign      Malware
    % Samples
                Time(s)     Time(s)
           10        0.02        0.16
           20        0.02        0.28
           30        0.03        0.30
           40        0.03        0.36
           50        0.06        0.84
           60        0.09        0.94
           70        0.13        0.97
           80        0.25        1.03
           90        0.56        1.31
          100        8.06      585.16
•

•


•


•


•

•

Mais conteúdo relacionado

Mais de Silvio Cesare

Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...Silvio Cesare
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisSilvio Cesare
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSilvio Cesare
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxSilvio Cesare
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSilvio Cesare
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareSilvio Cesare
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowSilvio Cesare
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...Silvio Cesare
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For EmulationSilvio Cesare
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 

Mais de Silvio Cesare (12)

Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary Analysis
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web Service
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in Linux
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux Distributions
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of Malware
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

The Software Similarity Problem and Malware Classification Techniques

  • 1.
  • 6.
  • 7. Program p Birthmark MATCH! Similar? Program q Birthmark Different The software similarity problem.
  • 9. proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5 A control flow graph, its structured form, and its string representation .
  • 12. • • • n d1 ( p, q ) p q1 pi qi i 1
  • 13. • • • d (r , q) R {r D} | 1 t q
  • 15. M1 S (P ) 1 M2 S ( P2 ) M 1 ' {ai M 1} {b j } : 1 M1 j M2 M 2 ' {ai M 2 } {b j } : 1 M2 j M1 C : M1' M 2 ' R a, if a M1, b M 2 C ( a, b) { b , if b M 2 , a M 2 ed (a, b), if a M 1 , b M 2 Find a bijection f:M1’M2’ such that the distance, d is minimized. d a M1 ' C (a, f (a))
  • 16. • • • • d ( p, q ) p: p E, | 1 t , d ( p, q ) q q
  • 18. Samples Malware Unknown New From Signature Database Sample Honeypots From Honeypot? New Dynamic Analysis No Signature End of Static Packed Yes Emulate Yes Unpacking? Classify Analysis No Non Malicious Malicious The Malwise malware classification system .
  • 20. Malware Detection Rates Classification False Positives Algorithm Klez Netsky Roron Frethem Maximum 36 49 81 289 Similarity K-Subgraphs Q-Grams Exact 20 29 17 139 0.0 1302161 2334251 Heuristic Approximate 20 27 43 144 0.1 463170 413667 Q-Grams 20 31 79 226 0.2 356345 40055 Optimal Distance 22 46 73 220 Q-Grams + 0.3 285202 7899 Optimal Distance 20 43 73 217 0.4 200326 3790 0.5 129790 327 False Positives with 10,000 0.6 46320 11 Malware 0.7 10784 0 Classification False FP Algorithm Positives Percentage 0.8 5883 0 Q-Grams 10 0.62 0.9 19 0 Q-Grams + Optimal 1.0 0 0 Distance 7 0.43
  • 21. ao b d e g k m q a ao b d e g k m q a ao 0.44 0.28 0.27 0.28 0.55 0.44 0.44 0.47 ao 0.70 0.28 0.28 0.27 0.75 0.70 0.70 0.75 b 0.44 0.27 0.27 0.27 0.51 1.00 1.00 0.58 b 0.74 0.31 0.34 0.33 0.82 1.00 1.00 0.87 d 0.28 0.27 0.48 0.56 0.27 0.27 0.27 0.27 d 0.28 0.29 0.50 0.74 0.29 0.29 0.29 0.29 e 0.27 0.27 0.48 0.59 0.27 0.27 0.27 0.27 e 0.31 0.34 0.50 0.64 0.32 0.34 0.34 0.33 g 0.28 0.27 0.56 0.59 0.27 0.27 0.27 0.27 g 0.27 0.33 0.74 0.64 0.29 0.33 0.33 0.30 k 0.55 0.51 0.27 0.27 0.27 0.51 0.51 0.75 k 0.75 0.82 0.29 0.30 0.29 0.82 0.82 0.96 m 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 m 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 q 0.44 1.00 0.27 0.27 0.27 0.51 1.00 0.58 q 0.74 1.00 0.31 0.34 0.33 0.82 1.00 0.87 a 0.47 0.58 0.27 0.27 0.27 0.75 0.58 0.58 a 0.75 0.87 0.30 0.31 0.30 0.96 0.87 0.87 Exact Matching Heuristic Approximate Matching ao b d e g k m q a ao b d e g k m q a ao 0.86 0.53 0.64 0.59 0.86 0.86 0.86 0.86 ao 0.86 0.49 0.54 0.50 0.87 0.86 0.86 0.86 b 0.88 0.66 0.76 0.71 0.97 1.00 1.00 0.97 b 0.87 0.57 0.63 0.62 0.96 1.00 1.00 0.96 d 0.65 0.72 0.88 0.93 0.73 0.72 0.72 0.73 d 0.61 0.64 0.85 0.91 0.64 0.64 0.64 0.64 e 0.72 0.80 0.87 0.93 0.80 0.80 0.80 0.80 e 0.64 0.69 0.85 0.90 0.68 0.69 0.69 0.68 g 0.69 0.77 0.93 0.93 0.77 0.77 0.77 0.77 g 0.62 0.68 0.91 0.91 0.68 0.68 0.68 0.68 k 0.88 0.97 0.67 0.77 0.72 0.97 0.97 0.99 k 0.88 0.96 0.58 0.62 0.61 0.96 0.96 0.99 m 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 m 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 q 0.88 1.00 0.66 0.76 0.71 0.97 1.00 0.97 q 0.87 1.00 0.57 0.63 0.62 0.96 1.00 0.96 a 0.87 0.97 0.67 0.77 0.72 0.99 0.97 0.97 a 0.87 0.96 0.58 0.62 0.61 0.99 0.96 0.96 Q-Grams Optimal Distance Using Assignment Problem
  • 22. • • Benign and Malicious Processing Time Benign Malware % Samples Time(s) Time(s) 10 0.02 0.16 20 0.02 0.28 30 0.03 0.30 40 0.03 0.36 50 0.06 0.84 60 0.09 0.94 70 0.13 0.97 80 0.25 1.03 90 0.56 1.31 100 8.06 585.16