O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
1© 2010 Cisco and/or its affiliates. All rights reserved.
Detecting Hacks:
Anomaly Detection on
Networking Data
James Siro...
© 2015 Cisco and/or its affiliates. All rights reserved. 2
In the next few minutes…
•  Defense in Depth for Big Data
•  Ne...
© 2015 Cisco and/or its affiliates. All rights reserved. 3
Who are we?
Big Data
Security
Analytics
Open Source
Managed Ser...
© 2015 Cisco and/or its affiliates. All rights reserved. 4
The New Defense-In-Depth
Defense
Strategy
Static
Sandboxing
Thr...
© 2015 Cisco and/or its affiliates. All rights reserved. 5
Network Anomaly Detection
Network
Anomaly
Detection
Volume-
Bas...
© 2015 Cisco and/or its affiliates. All rights reserved. 6
Volume-Based vs. Feature Based
Telemetry Volume-Based Feature-B...
© 2015 Cisco and/or its affiliates. All rights reserved. 7
Anomaly Detection: 3-Phase Process
Unstructured Data
Identify
A...
© 2015 Cisco and/or its affiliates. All rights reserved. 8
Phase 1: Identify
Unstructured Data
Understandingof
Normal
Anom...
© 2015 Cisco and/or its affiliates. All rights reserved. 9
Phase 2: Classify
Full Packet Telemetry DPI Telemetry Telemetry...
© 2015 Cisco and/or its affiliates. All rights reserved. 10
Phase 3: Examine + Reinforce
Full Packet Telemetry DPI Telemet...
© 2015 Cisco and/or its affiliates. All rights reserved. 11
Basic Anomalies
Anomaly 	
   Definition	
  
Alpha Flows Large ...
© 2015 Cisco and/or its affiliates. All rights reserved. 12
Batch Analytics
Normalcy Models
© 2015 Cisco and/or its affiliates. All rights reserved. 13
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricI...
© 2015 Cisco and/or its affiliates. All rights reserved. 14
Batch Analytics
Forecasting Models
Forecast
Forecasting Algori...
© 2015 Cisco and/or its affiliates. All rights reserved. 15
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricI...
© 2015 Cisco and/or its affiliates. All rights reserved. 16
Time Series DB
Batch Model Deployment
Step 1: Bootstrap: Strea...
© 2015 Cisco and/or its affiliates. All rights reserved. 17
Online Analytics
Data Preparation
Deseasonalizer
AV CMA RAT UF...
© 2015 Cisco and/or its affiliates. All rights reserved. 18
Online Analytics
Other things to check for
Trend:
Seasonal Var...
© 2015 Cisco and/or its affiliates. All rights reserved. 19
Online Processing
3-Sigma Algorithms
Micro Forecasting
Histogr...
© 2015 Cisco and/or its affiliates. All rights reserved. 20
Frequency Domain
High
•  Trendless
•  Noise
•  Spikes represen...
© 2015 Cisco and/or its affiliates. All rights reserved. 21
Frequency Domain – Wavelet Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 22
Online Model Deployment
Time Series DB
Step 1: Bootstrap: Stre...
© 2015 Cisco and/or its affiliates. All rights reserved. 23
Feature-Based Anomaly Detection
Continuous Numeric Features*
•...
© 2015 Cisco and/or its affiliates. All rights reserved. 24
Feature-Based Anomaly Detection
Categorical Features *
•  Cate...
© 2015 Cisco and/or its affiliates. All rights reserved. 25
Feature-Based Anomaly Detection
Feature Ratios
HyperLogLog: ap...
© 2015 Cisco and/or its affiliates. All rights reserved. 26
Feature-Based Anomaly Detection
Correlation - Information Theo...
© 2015 Cisco and/or its affiliates. All rights reserved. 27
Principal Component Analysis (PCA)
Analysis
Component
Principa...
© 2015 Cisco and/or its affiliates. All rights reserved. 28
PCA – Component Construction
ServerA
Traffic
X
-0.5052803
Serv...
© 2015 Cisco and/or its affiliates. All rights reserved. 29
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 30
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 31
Putting it All Together: OpenSOC
RAW Transform Enrich Alert
(R...
© 2015 Cisco and/or its affiliates. All rights reserved. 32
We are hiring…
•  Data Scientists (Security)
•  Aspiring Data ...
© 2015 Cisco and/or its affiliates. All rights reserved. 33
Book idea…
Security Analytics on Hadoop
•  Anomaly Detection
•...
© 2015 Cisco and/or its affiliates. All rights reserved. 34
OpenSOC Resources (@ProjectOpenSOC)
Github Repo
•  https://git...
Thank you.
Próximos SlideShares
Carregando em…5
×

Detecting Hacks: Anomaly Detection on Networking Data

6.105 visualizações

Publicada em

See https://medium.com/@jamessirota for a series of blog entries that goes with this deck...

Defense in Depth for Big Data
Network Anomaly Detection Overview
Volume Anomaly Detection
Feature Anomaly Detection
Model Architecture
Deployment on OpenSOC Platform
Questions

Publicada em: Software, Engenharia
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Detecting Hacks: Anomaly Detection on Networking Data

  1. 1. 1© 2010 Cisco and/or its affiliates. All rights reserved. Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015
  2. 2. © 2015 Cisco and/or its affiliates. All rights reserved. 2 In the next few minutes… •  Defense in Depth for Big Data •  Network Anomaly Detection Overview •  Volume Anomaly Detection •  Feature Anomaly Detection •  Model Architecture •  Deployment on OpenSOC Platform •  Questions
  3. 3. © 2015 Cisco and/or its affiliates. All rights reserved. 3 Who are we? Big Data Security Analytics Open Source Managed Service
  4. 4. © 2015 Cisco and/or its affiliates. All rights reserved. 4 The New Defense-In-Depth Defense Strategy Static Sandboxing Threat Intel Feeds Rules Engines Volume- Based Feature- Based NLP-Based Token Clustering User Profiling Asset Profiling Interaction Profiling Dynamic Sandboxing Malware Classifiers Script Classifiers Perimeter Monitoring Web Scraping Soc. Media Analytics Model Validators Training Set Generation Signature Matching Rules- Based Matching Network Anomaly Detection Log Anomaly Detection Behavioral Anomaly Detection Malware Family Script Family Scraping Honeypots Misuse Detection Intrusion Detection Supervised Class. Look- Ahead Analytics Legacy Mindset Generic Threats Targeted Threats Future Threats
  5. 5. © 2015 Cisco and/or its affiliates. All rights reserved. 5 Network Anomaly Detection Network Anomaly Detection Volume- Based Feature- Based Statistical Process Control Frequency Domain Time series Forecasting Information Theory Principal Component Analysis Sketch- Based 3-sigma algorithms Exponential Smoothing ARIMA Fast Fourier Transform Wavelets Entropy Subspace Heavy Hitters Set Cardinality Probability Models Markov Models Bayes Nets Unsupervis ed ML Clustering Density Proximity Anomalous Traffic Patterns Interrelationships between Features
  6. 6. © 2015 Cisco and/or its affiliates. All rights reserved. 6 Volume-Based vs. Feature Based Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO Raw Packet + Header Metadata YES YES Machine Exhaust Data YES (online) NO DPI Metadata NO YES Netflow YES YES Enrichment Metadata YES YES Application Logs YES YES Other Alerts NO* YES
  7. 7. © 2015 Cisco and/or its affiliates. All rights reserved. 7 Anomaly Detection: 3-Phase Process Unstructured Data Identify Anomaly Classify Alert Examine + Reinforce Training Set Historical Context
  8. 8. © 2015 Cisco and/or its affiliates. All rights reserved. 8 Phase 1: Identify Unstructured Data Understandingof Normal Anomaly A Anomaly B Anomaly C Anomaly (N)
  9. 9. © 2015 Cisco and/or its affiliates. All rights reserved. 9 Phase 2: Classify Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x Port Scan x x x x False Positive x x x x x x DDoS
  10. 10. © 2015 Cisco and/or its affiliates. All rights reserved. 10 Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x False Positive x x x x x x DDoS x x x x x x False Positive x x x x x x False Positive x x x x False Positive x x x x x x DDoS
  11. 11. © 2015 Cisco and/or its affiliates. All rights reserved. 11 Basic Anomalies Anomaly   Definition   Alpha Flows Large volume point-to-point flows DoS Denial of service (distributed or single source) Flash Crowd Large volume of traffic to a single destination from a large number of sources Port Scan Probe to many destination ports on a small number of destination addresses Network Scan Probe to many destination addresses on a small number of destination ports Outage Events Traffic shifts because of equipment failures or maintenance Plateau Behavior Behavior caused by traffic reaching environmental limits Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
  12. 12. © 2015 Cisco and/or its affiliates. All rights reserved. 12 Batch Analytics Normalcy Models
  13. 13. © 2015 Cisco and/or its affiliates. All rights reserved. 13 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Asset Bin Value Server 1 15 5pt * Server 2 15 5pt * Server (N) 15 5pt * assetID-metricID-Bin : 5pt Telemetry Anomaly? * 5-point summary (5pt): 1.  the sample minimum (smallest observation) 2.  the lower quartile or first quartile 3.  the median (middle value) 4.  the upper quartile or third quartile 5.  the sample maximum (largest observation) Table Name: Metric ID (Cumulative Volume)
  14. 14. © 2015 Cisco and/or its affiliates. All rights reserved. 14 Batch Analytics Forecasting Models Forecast Forecasting Algorithm (ARIMA/Holt-Winters, …)
  15. 15. © 2015 Cisco and/or its affiliates. All rights reserved. 15 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Key: assetID-metricID-Bin: [Expected | STD] Telemetry Anomaly? Asset Bin Value Server 1 15 EX |STD Server 2 15 EX |STD Server (N) 15 EX |STD Table Name: Metric ID (Cumulative Volume)
  16. 16. © 2015 Cisco and/or its affiliates. All rights reserved. 16 Time Series DB Batch Model Deployment Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Pre-Compute Expected Values (Batch) Timestamp HIVE Time Series DB MR/SparkMR/SparkMR/Spark Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Expected Values Reference Cache Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Expected Values Reference Cache
  17. 17. © 2015 Cisco and/or its affiliates. All rights reserved. 17 Online Analytics Data Preparation Deseasonalizer AV CMA RAT UF RF DV
  18. 18. © 2015 Cisco and/or its affiliates. All rights reserved. 18 Online Analytics Other things to check for Trend: Seasonal Variability: Evolution of Regularities:
  19. 19. © 2015 Cisco and/or its affiliates. All rights reserved. 19 Online Processing 3-Sigma Algorithms Micro Forecasting Histogram Bins
  20. 20. © 2015 Cisco and/or its affiliates. All rights reserved. 20 Frequency Domain High •  Trendless •  Noise •  Spikes represent Anomalies Medium •  Flatter •  Finer-grained Trends Low •  Seasonal & ‘Peaky’ •  Weekly/Daily Trends
  21. 21. © 2015 Cisco and/or its affiliates. All rights reserved. 21 Frequency Domain – Wavelet Separation
  22. 22. © 2015 Cisco and/or its affiliates. All rights reserved. 22 Online Model Deployment Time Series DB Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Generate Adjuster Timestamp HIVE Time Series DB MR/Spark Adjuster / Decomposer Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Adjuster Decomposer MR/Spark MR/Spark
  23. 23. © 2015 Cisco and/or its affiliates. All rights reserved. 23 Feature-Based Anomaly Detection Continuous Numeric Features* •  Continuous Numeric Feature - can take on any value between its minimum value and its maximum value •  Normalization - adjusting values measured on different scales to a notionally common scale 1.  Proximity Based Techniques Example: K-Nearest Neighbors (KNN) 2. Clustering Example: K-Means 3. Density - Based MPS Anomaly KBps Anomaly Possible Explanation TOO HIGH TOO LOW Port Scan Network Scan TOO HIGH TOO HIGH DDoS TOO LOW TOO HIGH Control Traffic Anomaly OK OK No Anomaly Sample Anomalies Detected
  24. 24. © 2015 Cisco and/or its affiliates. All rights reserved. 24 Feature-Based Anomaly Detection Categorical Features * •  Categorical Features - can take on one of a limited, and usually fixed, number of possible values •  Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset Time Series DBCategorical Data CM Sketch Heavy Hitters Asset Bin Value Server 1 15 HH Server 2 15 HH Server (N) 15 HH MR Table Name: Protocol Unstructured Data CM Sketch Alert Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
  25. 25. © 2015 Cisco and/or its affiliates. All rights reserved. 25 Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1] •  Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means Unstructured Data Hyper LogLog Distinct Src_port Dst_port Src_ip Dst_ip Storm Bolt Src_port Dst_port Src_ip Dst_ip Ack Total Ratios Digest * Alert FEATURE DT RATIO Anomaly Possible Reason SRC_IP ~1/~0 Flash Crowd/DDoS SRC_PORT ~1/~0 Failure Probing/App Hijack DST_IP ~1/~0 Network Scan/DDoS DST_PORT ~1/~0 Port Scan/Footprinting
  26. 26. © 2015 Cisco and/or its affiliates. All rights reserved. 26 Feature-Based Anomaly Detection Correlation - Information Theory •  Information Theory - study of fundamental limits on signal processing, compression, and storage •  Entropy- a measure of unpredictability of information content Unstructured Data Anomaly-Free Training Set Entropy Summarizer Entropy Src_port Dst_port Src_ip Dst_ipTime Bin (n) SRC_I P SRC_POR T DST_I P DST_PORT SRC_IP - .95 .85 .75 SRC_PORT - .97 .76 DST_IP - - - .98 DST_PORT - - - - MR Alert Time Bin (n)
  27. 27. © 2015 Cisco and/or its affiliates. All rights reserved. 27 Principal Component Analysis (PCA) Analysis Component Principal •  Feature Selection Algorithm •  Dimensionality Reduction •  E.g. 4 features •  ServerA (A) •  ServerB (B) •  ServerC (C) •  Cumulative = A + B + C
  28. 28. © 2015 Cisco and/or its affiliates. All rights reserved. 28 PCA – Component Construction ServerA Traffic X -0.5052803 ServerB Traffic X -0.4990556 ServerC Traffic X -0.4816276 Cumulative X -0.5134882 PC1 σ: 0.0135 ServerA Traffic X 0.2801275 ServerB Traffic X 0.4611079 ServerC Traffic X -0.8395562 Cumulative X 0.0636666 PC2 σ: 0.5773 ServerA Traffic X 0.6867089 ServerB Traffic X -0.6988557 ServerC Traffic X -0.1441834 Cumulative X 0.138718 PC3 σ: 0.5773 ServerA Traffic X -0.4411929 ServerB Traffic X -0.2234362 ServerC Traffic X -0.2058916 Cumulative X 0.8444132 PC4 σ: 0.5773
  29. 29. © 2015 Cisco and/or its affiliates. All rights reserved. 29 PCA – Component Separation
  30. 30. © 2015 Cisco and/or its affiliates. All rights reserved. 30 PCA – Component Separation
  31. 31. © 2015 Cisco and/or its affiliates. All rights reserved. 31 Putting it All Together: OpenSOC RAW Transform Enrich Alert (Rules-Based) Enriched Filter Aggregators Router Model 1 Scorer HIVE + Hbase Long-Term Data Store Flume Kafka Storm Model 2 Model n OpenSOC-Streaming OpenSOC-Aggregation OpenSOC-ML SOC Alert Consumers UIUIUI UIUIWeb Services Secure Gateway Services External Alert Consumers Big Data Stores Elastic Search Real-Time Index and Search Hbase OpenTSDB Titan Graph Alerts ES/HIVE Alerts Store Remedy Ticketing System
  32. 32. © 2015 Cisco and/or its affiliates. All rights reserved. 32 We are hiring… •  Data Scientists (Security) •  Aspiring Data Scientists •  Security/Networking Experience Required •  Software Engineering Experience Required •  PhD not required •  Background in stats or ML not required •  Security Researchers *Please contact us via LinkedIn with your profile
  33. 33. © 2015 Cisco and/or its affiliates. All rights reserved. 33 Book idea… Security Analytics on Hadoop •  Anomaly Detection •  Targeted Models •  Deployment Best Practices •  Alerts •  Visualization Techniques •  Etc… If interested in contributing please contact James Sirota on LinkedIn
  34. 34. © 2015 Cisco and/or its affiliates. All rights reserved. 34 OpenSOC Resources (@ProjectOpenSOC) Github Repo •  https://github.com/OpenSOC/opensoc Slides •  http://www.slideshare.net/JamesSirota •  https://speakerdeck.com/jsirota Corporate Blogs •  http://blogs.cisco.com/author/jamessirota •  http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security Contributor Blogs •  https://medium.com/@jamessirota •  parrottsquawk.com
  35. 35. Thank you.

×