SlideShare uma empresa Scribd logo
1 de 16
Henry Pak
Solutions Architect
Machine Learning for the Elastic Stack
Anomalies in your data could indicate trouble
1
Spiked 404 errors
Web attack
IT Operational Analytics Security Analytics Business Analytics
Unusual DNS activity
Data exfiltration
Rare log messages
Failing sensor
Operational Analytics
• Is my website seeing unusual traffic volume?
• Are bots or attackers visiting my website?
• Do I worry about the database errors in my logs?
Use Case
Security Analytics
• Has my system been compromised by malware?
• Could one of my users be an insider threat?
• Is there indication of data theft in my DNS logs?
Use Case
Telemetry / Sensors
 Is the unusual latency spike from a ISP outage?
 Which trucks in my fleet show unusual driving pattern?
 Does this rare event type indicate a failing sensor?
Use Case
5
Detecting (noteworthy) anomalies is hard!
• Data is complex, high dimensional, fast moving
• Human inspection is not practical
• Easy to miss things
Visual
inspection is
not practical
Where’s the anomaly?
6
Detecting (noteworthy) anomalies is hard!
• Defining “normal” via static thresholds is hard
• Rules don’t evolve with data / infrastructure
• Rules can be bypassed
Rule-based
alerts are
insufficient
What’s the right threshold ?
X-Pack solves this with automated anomaly detection
• Uses unsupervised machine learning techniques to
 Learn what’s “normal” by modeling historic behavior
 Detect anomalies when data falls outside expected bounds
7
X-Pack solves this with automated anomaly detection
• Unsupervised techniques - no manual training / input needed
• Evolves with the data - “online” model learns continuously
• Influencer detection - accelerates root cause identification
8
Detect anomalies of different types
• Time series - single / multiple
• Outliers in population (using entity profiling)
• Rare / unusual rates in “categories” of events
9
Anomalies in temporal pattern
• Single (univariate) time series
Example: Is there unusual traffic on website ?
10
Time
Metric
Anomalies in temporal pattern
• Multiple time series
 Multiple metrics
 Single metric split by a field;
• Each series modeled
independently
Example:
Is there unusual web activity
from any country?
11
Time
Metric
USAUKFranceChina
Outliers in population (using entity profiling)
• Create a profile for a “typical” entity (server, user, IP, etc.) in a population
• Detects entities (outlier) that deviate from the typical profile
Example:
• Which IP address is not like the others?
(indication of a bot / attacker)
12
Outliers in population (using entity profiling)
• Create a profile for a “typical” entity (server, user, IP, etc.) in a population
• Detects entities (outlier) that deviate from the typical profile
Example:
• Which IP address is not like the others?
(indication of a bot / attacker)
13
Unusual or rare events (via log categorization)
14
• Classify raw messages into groups based on similarity
• Models frequencies of each message category over time
• Spot anomalous in message groups
Example:
• Do my application logs contain unusual messages
DEMO
15

Mais conteúdo relacionado

Destaque

Destaque (20)

Bde presentatie bakker_bart_20170920
Bde presentatie bakker_bart_20170920Bde presentatie bakker_bart_20170920
Bde presentatie bakker_bart_20170920
 
Notilyze SAS
Notilyze SASNotilyze SAS
Notilyze SAS
 
Incident response on a shoestring budget
Incident response on a shoestring budgetIncident response on a shoestring budget
Incident response on a shoestring budget
 
Building Blocks Big Data Expo
Building Blocks Big Data ExpoBuilding Blocks Big Data Expo
Building Blocks Big Data Expo
 
De groote de man Ingrid de Poorter
De groote de man Ingrid de PoorterDe groote de man Ingrid de Poorter
De groote de man Ingrid de Poorter
 
Bde presentation dv
Bde presentation dvBde presentation dv
Bde presentation dv
 
If-If-If-If
If-If-If-IfIf-If-If-If
If-If-If-If
 
Crossyn
CrossynCrossyn
Crossyn
 
What should I do when my website got hack?
What should I do when my website got hack?What should I do when my website got hack?
What should I do when my website got hack?
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1
 
Presentatie big data expo swarovski
Presentatie big data expo swarovskiPresentatie big data expo swarovski
Presentatie big data expo swarovski
 
Java start01 in 2hours
Java start01 in 2hoursJava start01 in 2hours
Java start01 in 2hours
 
Datasnap web client
Datasnap web clientDatasnap web client
Datasnap web client
 
NUS-ISS Learning Day 2016 - Big Data Analytics
NUS-ISS Learning Day 2016 - Big Data AnalyticsNUS-ISS Learning Day 2016 - Big Data Analytics
NUS-ISS Learning Day 2016 - Big Data Analytics
 
Polar Bears Mario
Polar Bears MarioPolar Bears Mario
Polar Bears Mario
 
Picnic Big Data Expo
Picnic Big Data ExpoPicnic Big Data Expo
Picnic Big Data Expo
 
Mainframe Customer Education Webcast: New Ironstream Facilities for Enhanced ...
Mainframe Customer Education Webcast: New Ironstream Facilities for Enhanced ...Mainframe Customer Education Webcast: New Ironstream Facilities for Enhanced ...
Mainframe Customer Education Webcast: New Ironstream Facilities for Enhanced ...
 
Travelbird
TravelbirdTravelbird
Travelbird
 
ProRail Laurens Koppenol & Paul van der Voort
ProRail Laurens Koppenol & Paul van der VoortProRail Laurens Koppenol & Paul van der Voort
ProRail Laurens Koppenol & Paul van der Voort
 
Elasticsearch 5.0 les nouveautés
Elasticsearch 5.0 les nouveautésElasticsearch 5.0 les nouveautés
Elasticsearch 5.0 les nouveautés
 

Semelhante a Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak

Vest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuvenVest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuven
Marc Hullegie
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
Raffael Marty
 

Semelhante a Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak (20)

IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvert
 
CNIT 152 11 Analysis Methodology
CNIT 152 11 Analysis MethodologyCNIT 152 11 Analysis Methodology
CNIT 152 11 Analysis Methodology
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
IDS for Security Analysts: How to Get Actionable Insights from your IDS
IDS for Security Analysts: How to Get Actionable Insights from your IDSIDS for Security Analysts: How to Get Actionable Insights from your IDS
IDS for Security Analysts: How to Get Actionable Insights from your IDS
 
11 Analysis Methodology
11 Analysis Methodology11 Analysis Methodology
11 Analysis Methodology
 
intrusion detection system (IDS)
intrusion detection system (IDS)intrusion detection system (IDS)
intrusion detection system (IDS)
 
Vest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuvenVest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuven
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
 
Memory forensics and incident response
Memory forensics and incident responseMemory forensics and incident response
Memory forensics and incident response
 
All your logs are belong to you!
All your logs are belong to you!All your logs are belong to you!
All your logs are belong to you!
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private LimitedThreat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
 
Defcon 23 - damon small - beyond the scan
Defcon 23 - damon small - beyond the scanDefcon 23 - damon small - beyond the scan
Defcon 23 - damon small - beyond the scan
 
Splunk live! Customer Presentation – Prelert
Splunk live! Customer Presentation – PrelertSplunk live! Customer Presentation – Prelert
Splunk live! Customer Presentation – Prelert
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs  SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs
 
CNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis MethodologyCNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis Methodology
 
Identify and Stop Insider Threats
Identify and Stop Insider ThreatsIdentify and Stop Insider Threats
Identify and Stop Insider Threats
 

Mais de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak

  • 1. Henry Pak Solutions Architect Machine Learning for the Elastic Stack
  • 2. Anomalies in your data could indicate trouble 1 Spiked 404 errors Web attack IT Operational Analytics Security Analytics Business Analytics Unusual DNS activity Data exfiltration Rare log messages Failing sensor
  • 3. Operational Analytics • Is my website seeing unusual traffic volume? • Are bots or attackers visiting my website? • Do I worry about the database errors in my logs? Use Case
  • 4. Security Analytics • Has my system been compromised by malware? • Could one of my users be an insider threat? • Is there indication of data theft in my DNS logs? Use Case
  • 5. Telemetry / Sensors  Is the unusual latency spike from a ISP outage?  Which trucks in my fleet show unusual driving pattern?  Does this rare event type indicate a failing sensor? Use Case
  • 6. 5 Detecting (noteworthy) anomalies is hard! • Data is complex, high dimensional, fast moving • Human inspection is not practical • Easy to miss things Visual inspection is not practical Where’s the anomaly?
  • 7. 6 Detecting (noteworthy) anomalies is hard! • Defining “normal” via static thresholds is hard • Rules don’t evolve with data / infrastructure • Rules can be bypassed Rule-based alerts are insufficient What’s the right threshold ?
  • 8. X-Pack solves this with automated anomaly detection • Uses unsupervised machine learning techniques to  Learn what’s “normal” by modeling historic behavior  Detect anomalies when data falls outside expected bounds 7
  • 9. X-Pack solves this with automated anomaly detection • Unsupervised techniques - no manual training / input needed • Evolves with the data - “online” model learns continuously • Influencer detection - accelerates root cause identification 8
  • 10. Detect anomalies of different types • Time series - single / multiple • Outliers in population (using entity profiling) • Rare / unusual rates in “categories” of events 9
  • 11. Anomalies in temporal pattern • Single (univariate) time series Example: Is there unusual traffic on website ? 10 Time Metric
  • 12. Anomalies in temporal pattern • Multiple time series  Multiple metrics  Single metric split by a field; • Each series modeled independently Example: Is there unusual web activity from any country? 11 Time Metric USAUKFranceChina
  • 13. Outliers in population (using entity profiling) • Create a profile for a “typical” entity (server, user, IP, etc.) in a population • Detects entities (outlier) that deviate from the typical profile Example: • Which IP address is not like the others? (indication of a bot / attacker) 12
  • 14. Outliers in population (using entity profiling) • Create a profile for a “typical” entity (server, user, IP, etc.) in a population • Detects entities (outlier) that deviate from the typical profile Example: • Which IP address is not like the others? (indication of a bot / attacker) 13
  • 15. Unusual or rare events (via log categorization) 14 • Classify raw messages into groups based on similarity • Models frequencies of each message category over time • Spot anomalous in message groups Example: • Do my application logs contain unusual messages