O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Detecting Hacks: Anomaly Detection on Networking Data

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Cisco OpenSOC
Cisco OpenSOC
Carregando em…3
×

Confira estes a seguir

1 de 34 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Detecting Hacks: Anomaly Detection on Networking Data (20)

Mais de DataWorks Summit (20)

Anúncio

Mais recentes (20)

Detecting Hacks: Anomaly Detection on Networking Data

  1. 1. 1© 2010 Cisco and/or its affiliates. All rights reserved. Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015
  2. 2. © 2015 Cisco and/or its affiliates. All rights reserved. 2 In the next few minutes… • Defense in Depth for Big Data • Network Anomaly Detection Overview • Volume Anomaly Detection • Feature Anomaly Detection • Model Architecture • Deployment on OpenSOC Platform • Questions
  3. 3. © 2015 Cisco and/or its affiliates. All rights reserved. 3 Who are we? Big Data Security Analytics Open Source Managed Service
  4. 4. © 2015 Cisco and/or its affiliates. All rights reserved. 4 The New Defense-In-Depth Defense Strategy Static Sandboxing Threat Intel Feeds Rules Engines Volume- Based Feature- Based NLP-Based Token Clustering User Profiling Asset Profiling Interaction Profiling Dynamic Sandboxing Malware Classifiers Script Classifiers Perimeter Monitoring Web Scraping Soc. Media Analytics Model Validators Training Set Generation Signature Matching Rules- Based Matching Network Anomaly Detection Log Anomaly Detection Behavioral Anomaly Detection Malware Family Script Family Scraping Honeypots Misuse Detection Intrusion Detection Supervised Class. Look- Ahead Analytics Legacy Mindset Generic Threats Targeted Threats Future Threats
  5. 5. © 2015 Cisco and/or its affiliates. All rights reserved. 5 Network Anomaly Detection Network Anomaly Detection Volume- Based Feature- Based Statistical Process Control Frequency Domain Time series Forecasting Information Theory Principal Component Analysis Sketch- Based 3-sigma algorithms Exponential Smoothing ARIMA Fast Fourier Transform Wavelets Entropy Subspace Heavy Hitters Set Cardinality Probability Models Markov Models Bayes Nets Unsupervis ed ML Clustering Density Proximity Anomalous Traffic Patterns Interrelationships between Features
  6. 6. © 2015 Cisco and/or its affiliates. All rights reserved. 6 Volume-Based vs. Feature Based Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO Raw Packet + Header Metadata YES YES Machine Exhaust Data YES (online) NO DPI Metadata NO YES Netflow YES YES Enrichment Metadata YES YES Application Logs YES YES Other Alerts NO* YES
  7. 7. © 2015 Cisco and/or its affiliates. All rights reserved. 7 Anomaly Detection: 3-Phase Process Unstructured Data Identify Anomaly Classify Alert Examine + Reinforce Training Set Historical Context
  8. 8. © 2015 Cisco and/or its affiliates. All rights reserved. 8 Phase 1: Identify Unstructured Data Understandingof Normal Anomaly A Anomaly B Anomaly C Anomaly (N)
  9. 9. © 2015 Cisco and/or its affiliates. All rights reserved. 9 Phase 2: Classify Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x Port Scan x x x x False Positive x x x x x x DDoS
  10. 10. © 2015 Cisco and/or its affiliates. All rights reserved. 10 Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x False Positive x x x x x x DDoS x x x x x x False Positive x x x x x x False Positive x x x x False Positive x x x x x x DDoS
  11. 11. © 2015 Cisco and/or its affiliates. All rights reserved. 11 Basic Anomalies Anomaly Definition Alpha Flows Large volume point-to-point flows DoS Denial of service (distributed or single source) Flash Crowd Large volume of traffic to a single destination from a large number of sources Port Scan Probe to many destination ports on a small number of destination addresses Network Scan Probe to many destination addresses on a small number of destination ports Outage Events Traffic shifts because of equipment failures or maintenance Plateau Behavior Behavior caused by traffic reaching environmental limits Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
  12. 12. © 2015 Cisco and/or its affiliates. All rights reserved. 12 Batch Analytics Normalcy Models
  13. 13. © 2015 Cisco and/or its affiliates. All rights reserved. 13 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Asset Bin Value Server 1 15 5pt * Server 2 15 5pt * Server (N) 15 5pt * assetID-metricID-Bin : 5pt Telemetry Anomaly? * 5-point summary (5pt): 1. the sample minimum (smallest observation) 2. the lower quartile or first quartile 3. the median (middle value) 4. the upper quartile or third quartile 5. the sample maximum (largest observation) Table Name: Metric ID (Cumulative Volume)
  14. 14. © 2015 Cisco and/or its affiliates. All rights reserved. 14 Batch Analytics Forecasting Models Forecast Forecasting Algorithm (ARIMA/Holt-Winters, …)
  15. 15. © 2015 Cisco and/or its affiliates. All rights reserved. 15 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Key: assetID-metricID-Bin: [Expected | STD] Telemetry Anomaly? Asset Bin Value Server 1 15 EX |STD Server 2 15 EX |STD Server (N) 15 EX |STD Table Name: Metric ID (Cumulative Volume)
  16. 16. © 2015 Cisco and/or its affiliates. All rights reserved. 16 Time Series DB Batch Model Deployment Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Pre-Compute Expected Values (Batch) Timestamp HIVE Time Series DB MR/SparkMR/SparkMR/Spark Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Expected Values Reference Cache Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Expected Values Reference Cache
  17. 17. © 2015 Cisco and/or its affiliates. All rights reserved. 17 Online Analytics Data Preparation Deseasonalizer AV CMA RAT UF RF DV
  18. 18. © 2015 Cisco and/or its affiliates. All rights reserved. 18 Online Analytics Other things to check for Trend: Seasonal Variability: Evolution of Regularities:
  19. 19. © 2015 Cisco and/or its affiliates. All rights reserved. 19 Online Processing 3-Sigma Algorithms Micro Forecasting Histogram Bins
  20. 20. © 2015 Cisco and/or its affiliates. All rights reserved. 20 Frequency Domain High • Trendless • Noise • Spikes represent Anomalies Medium • Flatter • Finer-grained Trends Low • Seasonal & ‘Peaky’ • Weekly/Daily Trends
  21. 21. © 2015 Cisco and/or its affiliates. All rights reserved. 21 Frequency Domain – Wavelet Separation
  22. 22. © 2015 Cisco and/or its affiliates. All rights reserved. 22 Online Model Deployment Time Series DB Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Generate Adjuster Timestamp HIVE Time Series DB MR/Spark Adjuster / Decomposer Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Adjuster Decomposer MR/Spark MR/Spark
  23. 23. © 2015 Cisco and/or its affiliates. All rights reserved. 23 Feature-Based Anomaly Detection Continuous Numeric Features* • Continuous Numeric Feature - can take on any value between its minimum value and its maximum value • Normalization - adjusting values measured on different scales to a notionally common scale 1. Proximity Based Techniques Example: K-Nearest Neighbors (KNN) 2. Clustering Example: K-Means 3. Density - Based MPS Anomaly KBps Anomaly Possible Explanation TOO HIGH TOO LOW Port Scan Network Scan TOO HIGH TOO HIGH DDoS TOO LOW TOO HIGH Control Traffic Anomaly OK OK No Anomaly Sample Anomalies Detected
  24. 24. © 2015 Cisco and/or its affiliates. All rights reserved. 24 Feature-Based Anomaly Detection Categorical Features * • Categorical Features - can take on one of a limited, and usually fixed, number of possible values • Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset Time Series DBCategorical Data CM Sketch Heavy Hitters Asset Bin Value Server 1 15 HH Server 2 15 HH Server (N) 15 HH MR Table Name: Protocol Unstructured Data CM Sketch Alert Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
  25. 25. © 2015 Cisco and/or its affiliates. All rights reserved. 25 Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1] • Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means Unstructured Data Hyper LogLog Distinct Src_port Dst_port Src_ip Dst_ip Storm Bolt Src_port Dst_port Src_ip Dst_ip Ack Total Ratios Digest * Alert FEATURE DT RATIO Anomaly Possible Reason SRC_IP ~1/~0 Flash Crowd/DDoS SRC_PORT ~1/~0 Failure Probing/App Hijack DST_IP ~1/~0 Network Scan/DDoS DST_PORT ~1/~0 Port Scan/Footprinting
  26. 26. © 2015 Cisco and/or its affiliates. All rights reserved. 26 Feature-Based Anomaly Detection Correlation - Information Theory • Information Theory - study of fundamental limits on signal processing, compression, and storage • Entropy- a measure of unpredictability of information content Unstructured Data Anomaly-Free Training Set Entropy Summarizer Entropy Src_port Dst_port Src_ip Dst_ipTime Bin (n) SRC_I P SRC_POR T DST_I P DST_PORT SRC_IP - .95 .85 .75 SRC_PORT - .97 .76 DST_IP - - - .98 DST_PORT - - - - MR Alert Time Bin (n)
  27. 27. © 2015 Cisco and/or its affiliates. All rights reserved. 27 Principal Component Analysis (PCA) Analysis Component Principal • Feature Selection Algorithm • Dimensionality Reduction • E.g. 4 features • ServerA (A) • ServerB (B) • ServerC (C) • Cumulative = A + B + C
  28. 28. © 2015 Cisco and/or its affiliates. All rights reserved. 28 PCA – Component Construction ServerA Traffic X -0.5052803 ServerB Traffic X -0.4990556 ServerC Traffic X -0.4816276 Cumulative X -0.5134882 PC1 σ: 0.0135 ServerA Traffic X 0.2801275 ServerB Traffic X 0.4611079 ServerC Traffic X -0.8395562 Cumulative X 0.0636666 PC2 σ: 0.5773 ServerA Traffic X 0.6867089 ServerB Traffic X -0.6988557 ServerC Traffic X -0.1441834 Cumulative X 0.138718 PC3 σ: 0.5773 ServerA Traffic X -0.4411929 ServerB Traffic X -0.2234362 ServerC Traffic X -0.2058916 Cumulative X 0.8444132 PC4 σ: 0.5773
  29. 29. © 2015 Cisco and/or its affiliates. All rights reserved. 29 PCA – Component Separation
  30. 30. © 2015 Cisco and/or its affiliates. All rights reserved. 30 PCA – Component Separation
  31. 31. © 2015 Cisco and/or its affiliates. All rights reserved. 31 Putting it All Together: OpenSOC RAW Transform Enrich Alert (Rules-Based) Enriched Filter Aggregators Router Model 1 Scorer HIVE + Hbase Long-Term Data Store Flume Kafka Storm Model 2 Model n OpenSOC-Streaming OpenSOC-Aggregation OpenSOC-ML SOC Alert Consumers UIUIUI UIUIWeb Services Secure Gateway Services External Alert Consumers Big Data Stores Elastic Search Real-Time Index and Search Hbase OpenTSDB Titan Graph Alerts ES/HIVE Alerts Store Remedy Ticketing System
  32. 32. © 2015 Cisco and/or its affiliates. All rights reserved. 32 We are hiring… • Data Scientists (Security) • Aspiring Data Scientists • Security/Networking Experience Required • Software Engineering Experience Required • PhD not required • Background in stats or ML not required • Security Researchers *Please contact us via LinkedIn with your profile
  33. 33. © 2015 Cisco and/or its affiliates. All rights reserved. 33 Book idea… Security Analytics on Hadoop • Anomaly Detection • Targeted Models • Deployment Best Practices • Alerts • Visualization Techniques • Etc… If interested in contributing please contact James Sirota on LinkedIn
  34. 34. Thank you.

Notas do Editor

  • Views data as a continuous signal
    Split signal into frequency bands
    High – short term spikes
    Low – long term trends
    Fourier Transforms (FFT)
    Classically based & rigid
    Has inverse function
    Original signal can be reconstructed
    Wavelet Analysis
    More recently studied
    Handles discontinuities and spikes better than FFT
    Preferred over FFT


    FFT
    * Fourier analysis is the process of decomposing a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases. The sum of these sinusoids can exactly match the original waveform. This lossless transform presents a new perspective of the signal under study (in the frequency do- main), which has proved useful in very many applications.
    * The Inverse Discrete Fourier Transform (IDFT) is used to reconstruct the signal in the time domain; DFT and IDFT can be efficiently implemented by using the FFT.
    * filter out the low frequency components in the link traffic time series. In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior.

    Wavelet Analysis
    * Considered superior to traditional Fourier methods where signal contains transients such as discontinuities and sharp spikes
    * Wavelet techniques are one of the most up-to-date modeling tools to exploit both non-stationary and long-range dependence.

    ftp://net9.cs.utexas.edu/pub/techreports/tr05-38.pdf (Comparison of PCA to FFT & Wavelets & ARIMA)
    http://cegroup.ece.tamu.edu/techpubs/2003/TAMU-ECE-2003-03.pdf (wavelet feature based, batch/real-time, correlation of port-numbers)
    http://www.cs.cmu.edu/~srini/15-744/readings/BKPR02.pdf (wavelet signal based; generic features, exposes anomalies even when nested in large amounts of other traffic [noise])
    http://www.cse.sc.edu/~huangct/wens06.pdf (framework paper, show various feature-based wavelets and their ability to detect anomalies)
    http://pages.cs.wisc.edu/~pb/paper_imw_02.pdf (wavelets & spline filters)
    http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.2, background info)


  • Feature selection algorithm
    Reduces dimensionality
    Iteratively select uncorrelated features with most variance
    Applicable to traffic volume & other features
    Source/Destination IP Address
    Source/Destination Ports
    Packet Size
    Volume compliments feature distribution
    Batch method of selecting representative data
    Sensitive to input parameters due to temporal correlation
    * Functional Combination of Features
    Variance (accounts for much of the variability of the data as possible)

    http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes])
    http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives)
    http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based)
    https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based)
    http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly])
    http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info)
    http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  • Feature selection algorithm
    Reduces dimensionality
    Iteratively select uncorrelated features with most variance
    Applicable to traffic volume & other features
    Source/Destination IP Address
    Source/Destination Ports
    Packet Size
    Volume compliments feature distribution
    Batch method of selecting representative data
    Sensitive to input parameters due to temporal correlation
    * Functional Combination of Features
    Variance (accounts for much of the variability of the data as possible)

    http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes])
    http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives)
    http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based)
    https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based)
    http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly])
    http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info)
    http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  • Feature selection algorithm
    Reduces dimensionality
    Iteratively select uncorrelated features with most variance
    Applicable to traffic volume & other features
    Source/Destination IP Address
    Source/Destination Ports
    Packet Size
    Volume compliments feature distribution
    Batch method of selecting representative data
    Sensitive to input parameters due to temporal correlation
    * Functional Combination of Features
    Variance (accounts for much of the variability of the data as possible)

    http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes])
    http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives)
    http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based)
    https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based)
    http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly])
    http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info)
    http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  • Feature selection algorithm
    Reduces dimensionality
    Iteratively select uncorrelated features with most variance
    Applicable to traffic volume & other features
    Source/Destination IP Address
    Source/Destination Ports
    Packet Size
    Volume compliments feature distribution
    Batch method of selecting representative data
    Sensitive to input parameters due to temporal correlation
    * Functional Combination of Features
    Variance (accounts for much of the variability of the data as possible)

    http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes])
    http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives)
    http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based)
    https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based)
    http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly])
    http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info)
    http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)

×