SlideShare a Scribd company logo
1 of 25
Download to read offline
Problems
                  Solution
        Model Illustration




    Ensemble Based
   Categorization and
Adaptive Learning Model
 for Malware Detection
     Muhammad Najmi bin Ahmad Zabidi
       najmi.zabidi@gmail.com

      IAS 2011, Universiti Teknikal Melaka (UTEM)


              6th December 2011
        Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   1/25
Problems
                             Solution
                   Model Illustration




About



  • Phd student at Universiti Teknologi Malaysia, Skudai
  • Employed by International Islamic University Malaysia,
    Gombak




                   Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   2/25
Problems
                             Solution
                   Model Illustration




Overview


  • Malware detection is considered
    ‘‘undecidable’’[Cohen, 1986]
  • Means 100 percent detection for all time is impossible
  • But there’s still room for highest detection accuracy




                   Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   3/25
Problems
                              Solution
                    Model Illustration




Problem 1 - Features

   • Malware detection depends on features to generate
     signatures
   • Some features could be redundant, hence computation
     time is more expensive
   • Features could be weak, not relevant
   • There is possibility that strong features are enough, and
     discard the weaker ones
   • This, could be reduce by dimesion reduction method




                    Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   4/25
Problems
                              Solution
                    Model Illustration




Problem 2 - Classification of
Software


   • Classification here refers to classification between
     malicious, suspicious and benign software
   • Tackling the problem of false positive, false negative and
     increase precision




                    Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   5/25
Problems
                             Solution
                   Model Illustration




Problem 3 - Tackling new malware



   • Unknown malware is the problem
   • No prior knowledge
   • Suggesting unsupervised categorization




                   Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   6/25
Problems
                                Solution
                      Model Illustration




Related works on malware detection

 Statistical based:
   • [Chouchane et al., 2007, Saudi et al., 2010,
     Merkel et al., 2010]
 Data mining and machine learning:
   • [Sun et al., 2010, Komashinskiy and Kotenko, 2009,
     Komashinskiy and Kotenko, 2010]
   • [Elovici et al., 2007, Gavrilut et al., 2009,
     Firdausi et al., 2010, Golovko et al., 2010]




                      Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   7/25
Problems
                                Solution
                      Model Illustration




Solutions



  Feature Selection
    • Use feature selection to reduce processing overhead




                      Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   8/25
Problems
                             Solution
                   Model Illustration


Categorization and Ensemble

 • Use generic classifier at first to segregate malware and
   non malware
 • Use specific classifier secondly to segregate special traits
   of malware (trojan, worm, virus)
 • Supervised categorization is needed, to classify known
   malware features
 • In recent literatures, the term semi-supervised learning is
   coined to represent the ‘‘assisted’’ unsupervised
   categorization
 • Ensemble classification helps, since base weak learner
   could be boosted
 • Unsupervised categorization (clustering) needed, to
   categorization unknown malware

                  Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   9/25
Problems
                            Solution
                  Model Illustration




Adaptive Learning

  • Use adaptive learning hence the new malware which
   previously unknown can be taught as known, hence will
   be discarded at early phase




                 Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   10/25
Problems     Phase 1
                     Solution   Phase 2
           Model Illustration   Phase 3
                                Phase 4



Suggestion of Model




           Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   11/25
Problems     Phase 1
                    Solution   Phase 2
          Model Illustration   Phase 3
                               Phase 4



Phase 1




          Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   12/25
Problems     Phase 1
                              Solution   Phase 2
                    Model Illustration   Phase 3
                                         Phase 4



P1 descriptions



   • Preprocessing work includes ripping API calls, or any
     other useful information from the malware binaries
   • The process of feature selection is being done here




                    Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   13/25
Problems     Phase 1
                              Solution   Phase 2
                    Model Illustration   Phase 3
                                         Phase 4



Features


   • Features, in this case is API calls:
       • The less API calls could be used, the better
       • Dimension reduction method is being used to handle this
   • Future work, we considering adding entropy analysis of
     packed binary body, apart from the API calls profiling




                    Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   14/25
Problems     Phase 1
                             Solution   Phase 2
                   Model Illustration   Phase 3
                                        Phase 4



Interesting API calls
  CreateMutex,
  NtasdfCreateFile
  call shell32
  advapi32.RegOpenKey
  KERNEL32.CreateProcess,
  shdocvw,
  gethostbyname,
  advapi32.RegCreate,
  advapi32.RegSet
  http://
  OutputDebugString
  FindWindow
  IsDebuggerPresent

                  Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   15/25
Problems     Phase 1
                    Solution   Phase 2
          Model Illustration   Phase 3
                               Phase 4



Phase 2




          Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   16/25
Problems     Phase 1
                              Solution   Phase 2
                    Model Illustration   Phase 3
                                         Phase 4



P2 Descriptions


   • Malware being categorized according to common traits
     of generic malware
   • Next, specific symptom according to the classes of
     malware (worm, trojan, virus) being done
   • Malware could have all the packages together, but
     usually there is dominant feature




                   Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   17/25
Problems     Phase 1
                    Solution   Phase 2
          Model Illustration   Phase 3
                               Phase 4



Phase 3




          Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   18/25
Problems     Phase 1
                               Solution   Phase 2
                     Model Illustration   Phase 3
                                          Phase 4



P3 Descriptions


   • Use ensemble based classification, using weak learners
   • Many weak learners, via voting could represent more
     accurate results
   • If there is unknown class, it will go into into clustering
     phase




                     Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   19/25
Problems     Phase 1
          Solution   Phase 2
Model Illustration   Phase 3
                     Phase 4




Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   20/25
Problems     Phase 1
                                Solution   Phase 2
                      Model Illustration   Phase 3
                                           Phase 4



P4 Descriptions


   • A signature being created, if the malware is new
   • The new signature will be added to the current
     categorization
   • This will minimize the next detection cycle for the next
     malware




                      Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   21/25
Problems     Phase 1
                              Solution   Phase 2
                    Model Illustration   Phase 3
                                         Phase 4



The Dataset

 In malware research, there is no standard dataset, unlike
 Intrusion Detection area which usually relied on KDD/MIT
 Lincoln datasets.
   • We obtain malware samples from
     CyberSecurityMalaysia(CSM), consists of 2GB malware
     files, amounted around 30,000 malware binaries
   • We have to build our own dataset to extract the features
   • This, considered preprocessing work




                    Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   22/25
Problems     Phase 1
                                Solution   Phase 2
                      Model Illustration   Phase 3
                                           Phase 4



Conclusion

   • Soft computing approach could assist in malware
     detection
   • Feature selection could assist in minimizing feature
     processing
   • Ensemble methods could help in increasing malware
     categorization
   • Adaptive learning could help in avoiding redundant
     retraining for the n next iteration




                      Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   23/25
Problems     Phase 1
          Solution   Phase 2
Model Illustration   Phase 3
                     Phase 4




Muhammad Najmi       Information Assurance and Security Conf (IAS 2011)   24/25
Problems     Phase 1
                             Solution   Phase 2
                   Model Illustration   Phase 3
                                        Phase 4



Bibliography
   Chouchane, M. R., Walenstein, A., and Lakhotia, A. (2007).
   Statistical signatures for fast filtering of
   instruction-substituting metamorphic malware.
   In Proceedings of the 2007 ACM workshop on Recurring
   malcode, WORM ’07, pages 31--37, New York, NY, USA.
   ACM.
   Cohen, F. B. (1986).
   Computer viruses.
   PhD thesis, Los Angeles, CA, USA.
   AAI0559804.
   Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., and
   Glezer, C. (2007).
   Applying machine learning techniques for detection of
   malicious code in network traffic.
                  Muhammad Najmi        Information Assurance and Security Conf (IAS 2011)   25/25

More Related Content

What's hot

CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013Ian Sommerville
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflowIan Sommerville
 
Introduction to modeling_and_simulation
Introduction to modeling_and_simulationIntroduction to modeling_and_simulation
Introduction to modeling_and_simulationAysun Duran
 
syllabus.
syllabus.syllabus.
syllabus.butest
 
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...IJCNCJournal
 
CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2Ian Sommerville
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedThomas Zimmermann
 

What's hot (7)

CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflow
 
Introduction to modeling_and_simulation
Introduction to modeling_and_simulationIntroduction to modeling_and_simulation
Introduction to modeling_and_simulation
 
syllabus.
syllabus.syllabus.
syllabus.
 
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
 
CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
 

Viewers also liked

Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveChong-Kuan Chen
 
Malware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning TechniquesMalware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning TechniquesArshadRaja786
 
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...tumep
 
MMW Anti-Sandbox Techniques
MMW Anti-Sandbox TechniquesMMW Anti-Sandbox Techniques
MMW Anti-Sandbox TechniquesCyphort
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approachijsrd.com
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learningsecurityxploded
 
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...Wojciech Podgórski
 
AI approach to malware similarity analysis: Maping the malware genome with a...
AI approach to malware similarity analysis: Maping the  malware genome with a...AI approach to malware similarity analysis: Maping the  malware genome with a...
AI approach to malware similarity analysis: Maping the malware genome with a...Priyanka Aash
 
Model Reference Adaptive Control-Based Speed Control of Brushless DC Motor wi...
Model Reference Adaptive Control-BasedSpeed Control of Brushless DC Motorwi...Model Reference Adaptive Control-BasedSpeed Control of Brushless DC Motorwi...
Model Reference Adaptive Control-Based Speed Control of Brushless DC Motor wi...Risfendra Mt
 
Adatrix adaptive trend based optimization in digital advertising
Adatrix   adaptive trend based optimization in digital advertisingAdatrix   adaptive trend based optimization in digital advertising
Adatrix adaptive trend based optimization in digital advertisingBIZense
 
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Marco Balduzzi
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
 
Malware detection software using a support vector machine as a classifier
Malware detection software using a support vector machine as a classifierMalware detection software using a support vector machine as a classifier
Malware detection software using a support vector machine as a classifierNicole Bili?
 
Malware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual MachinesMalware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual Machinesintertelinvestigations
 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseStephan Chenette
 
Challenges in High Accuracy of Malware Detection
Challenges in High Accuracy of Malware DetectionChallenges in High Accuracy of Malware Detection
Challenges in High Accuracy of Malware DetectionMuhammad Najmi Ahmad Zabidi
 
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...Silvio Cesare
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationUltraUploader
 

Viewers also liked (20)

Malware Detection With Multiple Features
Malware Detection With Multiple FeaturesMalware Detection With Multiple Features
Malware Detection With Multiple Features
 
Malware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning PerspectiveMalware Detection - A Machine Learning Perspective
Malware Detection - A Machine Learning Perspective
 
Malware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning TechniquesMalware Detection Using Machine Learning Techniques
Malware Detection Using Machine Learning Techniques
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and...
 
MMW Anti-Sandbox Techniques
MMW Anti-Sandbox TechniquesMMW Anti-Sandbox Techniques
MMW Anti-Sandbox Techniques
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approach
 
Malicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine LearningMalicious Client Detection Using Machine Learning
Malicious Client Detection Using Machine Learning
 
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
 
AI approach to malware similarity analysis: Maping the malware genome with a...
AI approach to malware similarity analysis: Maping the  malware genome with a...AI approach to malware similarity analysis: Maping the  malware genome with a...
AI approach to malware similarity analysis: Maping the malware genome with a...
 
Model Reference Adaptive Control-Based Speed Control of Brushless DC Motor wi...
Model Reference Adaptive Control-BasedSpeed Control of Brushless DC Motorwi...Model Reference Adaptive Control-BasedSpeed Control of Brushless DC Motorwi...
Model Reference Adaptive Control-Based Speed Control of Brushless DC Motor wi...
 
Adatrix adaptive trend based optimization in digital advertising
Adatrix   adaptive trend based optimization in digital advertisingAdatrix   adaptive trend based optimization in digital advertising
Adatrix adaptive trend based optimization in digital advertising
 
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
Malware detection software using a support vector machine as a classifier
Malware detection software using a support vector machine as a classifierMalware detection software using a support vector machine as a classifier
Malware detection software using a support vector machine as a classifier
 
Malware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual MachinesMalware Analysis and Defeating using Virtual Machines
Malware Analysis and Defeating using Virtual Machines
 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive Defense
 
Challenges in High Accuracy of Malware Detection
Challenges in High Accuracy of Malware DetectionChallenges in High Accuracy of Malware Detection
Challenges in High Accuracy of Malware Detection
 
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creation
 

Similar to Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

Presentation on vulnerability analysis
Presentation on vulnerability analysisPresentation on vulnerability analysis
Presentation on vulnerability analysisAsif Anik
 
Introduction to Software Testing
Introduction to Software TestingIntroduction to Software Testing
Introduction to Software TestingHenry Muccini
 
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...idsecconf
 
Fuzzing101: Unknown vulnerability management for Telecommunications
Fuzzing101: Unknown vulnerability management for TelecommunicationsFuzzing101: Unknown vulnerability management for Telecommunications
Fuzzing101: Unknown vulnerability management for TelecommunicationsCodenomicon
 
Machine learning and Cybersecurity
Machine learning and Cybersecurity Machine learning and Cybersecurity
Machine learning and Cybersecurity Sravan Ankaraju
 
Sdl deployment in ics
Sdl deployment in icsSdl deployment in ics
Sdl deployment in icsMayur Mehta
 
Fuzzing 101 Webinar on Zero Day Management
Fuzzing 101 Webinar on Zero Day ManagementFuzzing 101 Webinar on Zero Day Management
Fuzzing 101 Webinar on Zero Day ManagementCodenomicon
 
Software Design Level Vulnerability Classification Model
Software Design Level Vulnerability Classification ModelSoftware Design Level Vulnerability Classification Model
Software Design Level Vulnerability Classification ModelCSCJournals
 
Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based TestingSSA KPI
 
Vulnerability and Penetration Testing
Vulnerability and Penetration TestingVulnerability and Penetration Testing
Vulnerability and Penetration TestingJeffery Brown
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliabilityranapoonam1
 
Standardizing Source Code Security Audits
Standardizing Source Code Security AuditsStandardizing Source Code Security Audits
Standardizing Source Code Security Auditsijseajournal
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Chris Theisen
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionMinh Nguyen
 
Application Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh UmmerApplication Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh UmmerOWASP-Qatar Chapter
 

Similar to Ensembled Based Categorization and Adaptive Learning Model for Malware Detection (20)

Presentation on vulnerability analysis
Presentation on vulnerability analysisPresentation on vulnerability analysis
Presentation on vulnerability analysis
 
Introduction to Software Testing
Introduction to Software TestingIntroduction to Software Testing
Introduction to Software Testing
 
Ajs 4 a
Ajs 4 aAjs 4 a
Ajs 4 a
 
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...
idsecconf2023 - Mangatas Tondang, Wahyu Nuryanto - Penerapan Model Detection ...
 
Fuzzing101: Unknown vulnerability management for Telecommunications
Fuzzing101: Unknown vulnerability management for TelecommunicationsFuzzing101: Unknown vulnerability management for Telecommunications
Fuzzing101: Unknown vulnerability management for Telecommunications
 
nullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexitiesnullcon 2011 - Fuzzing with Complexities
nullcon 2011 - Fuzzing with Complexities
 
Machine learning and Cybersecurity
Machine learning and Cybersecurity Machine learning and Cybersecurity
Machine learning and Cybersecurity
 
Sdl deployment in ics
Sdl deployment in icsSdl deployment in ics
Sdl deployment in ics
 
Fuzzing 101 Webinar on Zero Day Management
Fuzzing 101 Webinar on Zero Day ManagementFuzzing 101 Webinar on Zero Day Management
Fuzzing 101 Webinar on Zero Day Management
 
Software Security
Software SecuritySoftware Security
Software Security
 
Software Design Level Vulnerability Classification Model
Software Design Level Vulnerability Classification ModelSoftware Design Level Vulnerability Classification Model
Software Design Level Vulnerability Classification Model
 
Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based Testing
 
Application Security Risk Assessment
Application Security Risk AssessmentApplication Security Risk Assessment
Application Security Risk Assessment
 
Vulnerability and Penetration Testing
Vulnerability and Penetration TestingVulnerability and Penetration Testing
Vulnerability and Penetration Testing
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Standardizing Source Code Security Audits
Standardizing Source Code Security AuditsStandardizing Source Code Security Audits
Standardizing Source Code Security Audits
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]
 
Odin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_PredictionOdin2018_Minh_ML_Risk_Prediction
Odin2018_Minh_ML_Risk_Prediction
 
Application Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh UmmerApplication Security TRENDS – Lessons Learnt- Firosh Ummer
Application Security TRENDS – Lessons Learnt- Firosh Ummer
 
Software Metrics
Software MetricsSoftware Metrics
Software Metrics
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Ensembled Based Categorization and Adaptive Learning Model for Malware Detection

  • 1. Problems Solution Model Illustration Ensemble Based Categorization and Adaptive Learning Model for Malware Detection Muhammad Najmi bin Ahmad Zabidi najmi.zabidi@gmail.com IAS 2011, Universiti Teknikal Melaka (UTEM) 6th December 2011 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 1/25
  • 2. Problems Solution Model Illustration About • Phd student at Universiti Teknologi Malaysia, Skudai • Employed by International Islamic University Malaysia, Gombak Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 2/25
  • 3. Problems Solution Model Illustration Overview • Malware detection is considered ‘‘undecidable’’[Cohen, 1986] • Means 100 percent detection for all time is impossible • But there’s still room for highest detection accuracy Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 3/25
  • 4. Problems Solution Model Illustration Problem 1 - Features • Malware detection depends on features to generate signatures • Some features could be redundant, hence computation time is more expensive • Features could be weak, not relevant • There is possibility that strong features are enough, and discard the weaker ones • This, could be reduce by dimesion reduction method Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 4/25
  • 5. Problems Solution Model Illustration Problem 2 - Classification of Software • Classification here refers to classification between malicious, suspicious and benign software • Tackling the problem of false positive, false negative and increase precision Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 5/25
  • 6. Problems Solution Model Illustration Problem 3 - Tackling new malware • Unknown malware is the problem • No prior knowledge • Suggesting unsupervised categorization Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 6/25
  • 7. Problems Solution Model Illustration Related works on malware detection Statistical based: • [Chouchane et al., 2007, Saudi et al., 2010, Merkel et al., 2010] Data mining and machine learning: • [Sun et al., 2010, Komashinskiy and Kotenko, 2009, Komashinskiy and Kotenko, 2010] • [Elovici et al., 2007, Gavrilut et al., 2009, Firdausi et al., 2010, Golovko et al., 2010] Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 7/25
  • 8. Problems Solution Model Illustration Solutions Feature Selection • Use feature selection to reduce processing overhead Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 8/25
  • 9. Problems Solution Model Illustration Categorization and Ensemble • Use generic classifier at first to segregate malware and non malware • Use specific classifier secondly to segregate special traits of malware (trojan, worm, virus) • Supervised categorization is needed, to classify known malware features • In recent literatures, the term semi-supervised learning is coined to represent the ‘‘assisted’’ unsupervised categorization • Ensemble classification helps, since base weak learner could be boosted • Unsupervised categorization (clustering) needed, to categorization unknown malware Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 9/25
  • 10. Problems Solution Model Illustration Adaptive Learning • Use adaptive learning hence the new malware which previously unknown can be taught as known, hence will be discarded at early phase Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 10/25
  • 11. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Suggestion of Model Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 11/25
  • 12. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Phase 1 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 12/25
  • 13. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 P1 descriptions • Preprocessing work includes ripping API calls, or any other useful information from the malware binaries • The process of feature selection is being done here Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 13/25
  • 14. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Features • Features, in this case is API calls: • The less API calls could be used, the better • Dimension reduction method is being used to handle this • Future work, we considering adding entropy analysis of packed binary body, apart from the API calls profiling Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 14/25
  • 15. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Interesting API calls CreateMutex, NtasdfCreateFile call shell32 advapi32.RegOpenKey KERNEL32.CreateProcess, shdocvw, gethostbyname, advapi32.RegCreate, advapi32.RegSet http:// OutputDebugString FindWindow IsDebuggerPresent Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 15/25
  • 16. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Phase 2 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 16/25
  • 17. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 P2 Descriptions • Malware being categorized according to common traits of generic malware • Next, specific symptom according to the classes of malware (worm, trojan, virus) being done • Malware could have all the packages together, but usually there is dominant feature Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 17/25
  • 18. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Phase 3 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 18/25
  • 19. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 P3 Descriptions • Use ensemble based classification, using weak learners • Many weak learners, via voting could represent more accurate results • If there is unknown class, it will go into into clustering phase Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 19/25
  • 20. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 20/25
  • 21. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 P4 Descriptions • A signature being created, if the malware is new • The new signature will be added to the current categorization • This will minimize the next detection cycle for the next malware Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 21/25
  • 22. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 The Dataset In malware research, there is no standard dataset, unlike Intrusion Detection area which usually relied on KDD/MIT Lincoln datasets. • We obtain malware samples from CyberSecurityMalaysia(CSM), consists of 2GB malware files, amounted around 30,000 malware binaries • We have to build our own dataset to extract the features • This, considered preprocessing work Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 22/25
  • 23. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Conclusion • Soft computing approach could assist in malware detection • Feature selection could assist in minimizing feature processing • Ensemble methods could help in increasing malware categorization • Adaptive learning could help in avoiding redundant retraining for the n next iteration Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 23/25
  • 24. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 24/25
  • 25. Problems Phase 1 Solution Phase 2 Model Illustration Phase 3 Phase 4 Bibliography Chouchane, M. R., Walenstein, A., and Lakhotia, A. (2007). Statistical signatures for fast filtering of instruction-substituting metamorphic malware. In Proceedings of the 2007 ACM workshop on Recurring malcode, WORM ’07, pages 31--37, New York, NY, USA. ACM. Cohen, F. B. (1986). Computer viruses. PhD thesis, Los Angeles, CA, USA. AAI0559804. Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., and Glezer, C. (2007). Applying machine learning techniques for detection of malicious code in network traffic. Muhammad Najmi Information Assurance and Security Conf (IAS 2011) 25/25