SlideShare a Scribd company logo
1 of 13
Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness  Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data.  Can perform poorly for high-dimensional data. 8
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (3/4) 11 Proximity-based Approach ,[object Object]
The outlier score can be highly sensitive to the value of k
If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score

More Related Content

What's hot

Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMiningBilalAbbasAwan
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 

What's hot (20)

Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Decision tree
Decision treeDecision tree
Decision tree
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMining
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Data Mining
Data MiningData Mining
Data Mining
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 

Similar to Chapter 10 Anomaly Detection

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxrandyburney60861
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection IJORCS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamIIRindia
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
 

Similar to Chapter 10 Anomaly Detection (20)

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection
 
Kdd08 abod
Kdd08 abodKdd08 abod
Kdd08 abod
 
angle based outlier de
angle based outlier deangle based outlier de
angle based outlier de
 
12 outlier
12 outlier12 outlier
12 outlier
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed Approach
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Chapter 10 Anomaly Detection

  • 1. Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
  • 2. Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
  • 3. Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
  • 4. Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
  • 5. Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
  • 6. Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
  • 7. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
  • 8. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data. Can perform poorly for high-dimensional data. 8
  • 9. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
  • 10. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
  • 11.
  • 12. The outlier score can be highly sensitive to the value of k
  • 13. If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score
  • 14.
  • 15. Thank You ! www.themegallery.com