SlideShare a Scribd company logo
1 of 24
BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Anomaly Detection
database integrated
Dr. Olaf Nimz
Our company.
anomaly detection2 08/01/2018
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
Scoring Engine for R
3
EXEC sp_execute_external_script
@language =N'R',
-- SQL Part (sends to @script)
@input_data_1 =N 'SELECT 1 as Installed',
-- R Part (gets @input_data_1)
@script=N'OutputDataSet<-InputDataSet'
WITH RESULT SETS
(([Installed] int not null));
GO
Microsoft R ServerLaunchpad
(BxlServer and
SQL Satellite,
Rserver.dll)
08/01/2018 anomaly detection
Agenda
anomaly detection4 08/01/2018
1. Prioritise data quality effort
Cleansing DWH
IoT Streams
Online Learning
2. Unsupervised Measures
Mahalanobis distance
Clustering
Local Outlier Factor
Isolation Forest
Variational AutoEncoders
Novelty, Noise, Outlier, Anomaly, Fraud, Instability
anomaly detection5 08/01/2018
1. Special Observations
in Relation to Baseline
Contextual
2. Suspicious Observations
Novelty
Outlier / Anomaly
Data quality issue
Instable Process
Random Noise
Local - Global
anomaly detection6 08/01/2018
Local - Global
anomaly detection7 08/01/2018
High dimensional: Distance is meaningless

Approach for Detection
anomaly detection8 08/01/2018
1. Statistical Distribution
Entropy
2. Deviation from Normal
Sequence of Events
Conditional (temporal, spatial context)
Collective like DoS-Attack
3. Distance to neighborhood
4. Local Density
5. High-dimensional adaptations
Subspace projection
Angle based
 


Univariate Extreme Values
anomaly detection9 08/01/2018
IQR = Inter quartile range
~ 95%
Median = 50% Percentile
50%
> 2 stdev (~ 2%)
Grubb’s test per point
Scaling by z-Scores
(robust using
Median absolute deviation)
Multivariate data
anomaly detection10 08/01/2018
Robust
Mahalanobis
Distance
chisq.plot()
dimensions
2
No outlier ?
anomaly detection11 08/01/2018
+
SCADA of Wind Turbine
anomaly detection16 08/01/2018
Power Curve – Deviation from Prediction
anomaly detection17 08/01/2018
lm( power ~ wind_speed
+ I(wind_speed^2)
+ I(wind_speed^3) , data)
R2
adj.= 95%
Sample
Boxplot – univariate
anomaly detection18 08/01/2018
Mean
Median
Power
WindSpeed
Temperature
Wind distribution
anomaly detection19 08/01/2018
Mahalanobis Distance
anomaly detection20 08/01/2018
Multivariate:
multi dimensional
Scale:
How many stdev
away from center ?
Mahalanobis Distance
candidates
Outlier in Orginal Space
anomaly detection21 08/01/2018
Eigenvector Space
anomaly detection22 08/01/2018
Overview of Outliers
anomaly detection23 08/01/2018
Leland Wilkinson's probabilistic HDoutlier model
=> for mixture of numeric and categorical variables
1D
2D
3D
4D
HDBSCAN
anomaly detection24 08/01/2018
Only Borderline & Core cases
Scaled by z-Scores
Local Outlier Factor
anomaly detection25 08/01/2018
Reachability distance: It can be "reached" from its neighbors.
LOF: relative Reachability compared to its neighbours
K-Nearest Neighbors
Reachability distance
Hierarchical Clustering
anomaly detection26 08/01/2018
MDS coloured by cuttree
Isolation Forest - Emsemble
anomaly detection27 08/01/2018
Challenges
anomaly detection28 08/01/2018
1. Manual threshold – automatic is expensive
2. Mixed data type (numeric & categorical)
3. High dimensional spaces
4. High Cardinality (Granularity)
5. Multi-Modal: Global vs Local Scope
6. Online
http://projects.rajivshah.com/shiny/outlier/

More Related Content

Similar to Anomaly detection - database integrated

Similar to Anomaly detection - database integrated (20)

IRJET- Accident Detection and Vehicle Safety using Zigbee
IRJET-  	  Accident Detection and Vehicle Safety using ZigbeeIRJET-  	  Accident Detection and Vehicle Safety using Zigbee
IRJET- Accident Detection and Vehicle Safety using Zigbee
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
 
Smartive STORM
Smartive STORMSmartive STORM
Smartive STORM
 
Automated Driving Test and Issuing Of Driving Licenses
Automated Driving Test and Issuing Of Driving LicensesAutomated Driving Test and Issuing Of Driving Licenses
Automated Driving Test and Issuing Of Driving Licenses
 
IRJET- Design and Analysis of Passive Multi-Static Radar System
IRJET-  	  Design and Analysis of Passive Multi-Static Radar SystemIRJET-  	  Design and Analysis of Passive Multi-Static Radar System
IRJET- Design and Analysis of Passive Multi-Static Radar System
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Smart Helmet using GSM and GPS
Smart Helmet using GSM and GPSSmart Helmet using GSM and GPS
Smart Helmet using GSM and GPS
 
IRJET- Ad-hoc Based Outdoor Positioning System
IRJET- Ad-hoc Based Outdoor Positioning SystemIRJET- Ad-hoc Based Outdoor Positioning System
IRJET- Ad-hoc Based Outdoor Positioning System
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Criminal Identification using Arm7
Criminal Identification using Arm7Criminal Identification using Arm7
Criminal Identification using Arm7
 
Toxic Gas Detection-An Experiment with the Simulation Process
Toxic Gas Detection-An Experiment with the Simulation ProcessToxic Gas Detection-An Experiment with the Simulation Process
Toxic Gas Detection-An Experiment with the Simulation Process
 
Intelligent traffic light controller using embedded system
Intelligent traffic light controller using embedded systemIntelligent traffic light controller using embedded system
Intelligent traffic light controller using embedded system
 
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
Using Data Integration to Deliver Intelligence to Anyone, AnywhereUsing Data Integration to Deliver Intelligence to Anyone, Anywhere
Using Data Integration to Deliver Intelligence to Anyone, Anywhere
 
40120130406006
4012013040600640120130406006
40120130406006
 
Sensor Fault Detection in IoT System Using Machine Learning
Sensor Fault Detection in IoT System Using Machine LearningSensor Fault Detection in IoT System Using Machine Learning
Sensor Fault Detection in IoT System Using Machine Learning
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...
 
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
IRJET -  	  Detection of False Data Injection Attacks using K-Means Clusterin...IRJET -  	  Detection of False Data Injection Attacks using K-Means Clusterin...
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...
 
Irjet v7 i3475
Irjet v7 i3475Irjet v7 i3475
Irjet v7 i3475
 
Geo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality AssessmentGeo Spatial Data And it’s Quality Assessment
Geo Spatial Data And it’s Quality Assessment
 
IRJET- Secure Data on Multi-Cloud using Homomorphic Encryption
IRJET- Secure Data on Multi-Cloud using Homomorphic EncryptionIRJET- Secure Data on Multi-Cloud using Homomorphic Encryption
IRJET- Secure Data on Multi-Cloud using Homomorphic Encryption
 

More from Zurich_R_User_Group

More from Zurich_R_User_Group (11)

R at Sanitas - Workflow, Problems and Solutions
R at Sanitas - Workflow, Problems and SolutionsR at Sanitas - Workflow, Problems and Solutions
R at Sanitas - Workflow, Problems and Solutions
 
Modeling Bus Bunching
Modeling Bus BunchingModeling Bus Bunching
Modeling Bus Bunching
 
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
Visualizing the frequency of transit delays using QGIS and the Leaflet javasc...
 
Introduction to Renjin, the alternative engine for R
Introduction to Renjin, the alternative engine for R Introduction to Renjin, the alternative engine for R
Introduction to Renjin, the alternative engine for R
 
How to use R in different professions: R for Car Insurance Product (Speaker: ...
How to use R in different professions: R for Car Insurance Product (Speaker: ...How to use R in different professions: R for Car Insurance Product (Speaker: ...
How to use R in different professions: R for Car Insurance Product (Speaker: ...
 
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
How to use R in different professions: R In Finance (Speaker: Gabriel Foix, M...
 
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
Where South America is Swinging to the Right:  An R-Driven Data Journalism Pr...Where South America is Swinging to the Right:  An R-Driven Data Journalism Pr...
Where South America is Swinging to the Right: An R-Driven Data Journalism Pr...
 
Visualization Challenge: Mapping Health During Travel
Visualization Challenge: Mapping Health During TravelVisualization Challenge: Mapping Health During Travel
Visualization Challenge: Mapping Health During Travel
 
Zurich R User group: Desc tools
Zurich R User group: Desc tools Zurich R User group: Desc tools
Zurich R User group: Desc tools
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig WangDecember 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
December 2015 Meetup - Shiny: Make Your R Code Interactive - Craig Wang
 

Recently uploaded

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Anomaly detection - database integrated

  • 1. BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Anomaly Detection database integrated Dr. Olaf Nimz
  • 2. Our company. anomaly detection2 08/01/2018 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 3. Scoring Engine for R 3 EXEC sp_execute_external_script @language =N'R', -- SQL Part (sends to @script) @input_data_1 =N 'SELECT 1 as Installed', -- R Part (gets @input_data_1) @script=N'OutputDataSet<-InputDataSet' WITH RESULT SETS (([Installed] int not null)); GO Microsoft R ServerLaunchpad (BxlServer and SQL Satellite, Rserver.dll) 08/01/2018 anomaly detection
  • 4. Agenda anomaly detection4 08/01/2018 1. Prioritise data quality effort Cleansing DWH IoT Streams Online Learning 2. Unsupervised Measures Mahalanobis distance Clustering Local Outlier Factor Isolation Forest Variational AutoEncoders
  • 5. Novelty, Noise, Outlier, Anomaly, Fraud, Instability anomaly detection5 08/01/2018 1. Special Observations in Relation to Baseline Contextual 2. Suspicious Observations Novelty Outlier / Anomaly Data quality issue Instable Process Random Noise
  • 6. Local - Global anomaly detection6 08/01/2018
  • 7. Local - Global anomaly detection7 08/01/2018 High dimensional: Distance is meaningless
  • 8.  Approach for Detection anomaly detection8 08/01/2018 1. Statistical Distribution Entropy 2. Deviation from Normal Sequence of Events Conditional (temporal, spatial context) Collective like DoS-Attack 3. Distance to neighborhood 4. Local Density 5. High-dimensional adaptations Subspace projection Angle based    
  • 9. Univariate Extreme Values anomaly detection9 08/01/2018 IQR = Inter quartile range ~ 95% Median = 50% Percentile 50% > 2 stdev (~ 2%) Grubb’s test per point Scaling by z-Scores (robust using Median absolute deviation)
  • 10. Multivariate data anomaly detection10 08/01/2018 Robust Mahalanobis Distance chisq.plot() dimensions 2
  • 11. No outlier ? anomaly detection11 08/01/2018 +
  • 12. SCADA of Wind Turbine anomaly detection16 08/01/2018
  • 13. Power Curve – Deviation from Prediction anomaly detection17 08/01/2018 lm( power ~ wind_speed + I(wind_speed^2) + I(wind_speed^3) , data) R2 adj.= 95% Sample
  • 14. Boxplot – univariate anomaly detection18 08/01/2018 Mean Median Power WindSpeed Temperature
  • 16. Mahalanobis Distance anomaly detection20 08/01/2018 Multivariate: multi dimensional Scale: How many stdev away from center ? Mahalanobis Distance candidates
  • 17. Outlier in Orginal Space anomaly detection21 08/01/2018
  • 19. Overview of Outliers anomaly detection23 08/01/2018 Leland Wilkinson's probabilistic HDoutlier model => for mixture of numeric and categorical variables 1D 2D 3D 4D
  • 20. HDBSCAN anomaly detection24 08/01/2018 Only Borderline & Core cases Scaled by z-Scores
  • 21. Local Outlier Factor anomaly detection25 08/01/2018 Reachability distance: It can be "reached" from its neighbors. LOF: relative Reachability compared to its neighbours K-Nearest Neighbors Reachability distance
  • 22. Hierarchical Clustering anomaly detection26 08/01/2018 MDS coloured by cuttree
  • 23. Isolation Forest - Emsemble anomaly detection27 08/01/2018
  • 24. Challenges anomaly detection28 08/01/2018 1. Manual threshold – automatic is expensive 2. Mixed data type (numeric & categorical) 3. High dimensional spaces 4. High Cardinality (Granularity) 5. Multi-Modal: Global vs Local Scope 6. Online http://projects.rajivshah.com/shiny/outlier/