SlideShare uma empresa Scribd logo
1 de 54
LiveAnomalyDetection
Identifying anomalies in live data
1
Dhruv Choudhary Francois Orsini
Arun Kejariwal
SATORI #StrataData
What is live data ?
Petabytes of Data Live Reactions
Time to Reaction ~ less than 5 msecs
2
44546A
SATORI
What is live data ?
3
Live Data is new Streaming data
that needs to be reacted upon
instantly
SATORI #StrataData
Where is live data ?
4
SATORI #StrataData
Most Recent and Realtime Reactive Data
Most Recent
Low Latency
Unstructured
Highly Reactive
High Throughput
5
Live Data Properties
SATORI #StrataData
6
Satori - A Live Data Computation Mesh
Satori powers a live data mesh that is capable of
managing data flows from billions endpoints
simultaneously at milliseconds latency.
SATORI #StrataData
7 7
SATORI #StrataData
Live Data and Big Data ?
8
SATORI #StrataData
9
SATORI #StrataData
Live Data Platform
•Satori is a fully managed 

platform as a service
•Connect, process and react to 

streaming live data at ultra-low 

latency.
•Use cases are within 

(but not limited to) IoT, mobile 

fitness, gaming and smart cities.
10
SATORI #StrataData
In-Stream Live Anomaly Detection
11
SATORI #StrataData
Data Quadrants
12
SATORI #StrataData
What are live
anomalies ?
Audio
Time Series
Video
Text
Binary
Gun Shot Sound
Stock market Crash
Profanity Filters
Road Accident
13
SATORI #StrataData
Audio Time Series
Time Series
Audio
Prediction Error
Audio Series
Engine Misfiring
14
Connected Cars
Surveillance
Equipment
Malfunction
Applications
FFT Window Features
Timbre, Tempo, Dynamics
Wavenet
Other approaches
SATORI #StrataData
Text to Time Series
Text
Word2vec Averaging
Word Anomalies
Paragraph Anomalies
Novelty Detection
Applications
15
C0
C1
C2
Clustering Word2vec Averaging
Word2vec Averaging
Time Series
SATORI #StrataData
Video to Time Series
Video
Shape Anomalies
Deep Encoders
Representation
16
Time Series
SATORI #StrataData
Attributes of Live Anomaly Detection
Model Selection
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
17
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
POINT ANOMALY
Individual points that break the
pattern made by adjoining points
CHANGEPOINT
Change in mean, variance or
structure of the series
PATTERN ANOMALY
A group of collective points that
form a pattern never seen before.
TREND ANOMALY
Significant Perturbation in the
longterm trend of a series
18
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
19
MEMORY CONSTRAINTS
Can all the data be loaded
into memory ?
COMPUTE CONSTRAINTS
Can you keep up to the
data rate ?
EVOLUTIONARY DATA
Is the structure in data
changing continuously?
SATORI #StrataData
True Positives
True Negatives
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
DATA OBSOLESCENCE
How fast should we forget past
anomalies ?
ANOMALY CLUSTERING
Anomalies usually occur
close to each other
20
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
CONSTANT RATE
How much human
attention to allocate ?
21
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
SEMI - SUPERVISED LEARNING
Use Unlabelled Data for Training
SUPERVISED LEARNING
Separate positive and
negative samples
LABELTRAININFERENCE
22
SATORI #StrataData
Type of Anomaly
Incremental
Robust
False Alarm Rate
Labels
Time Granularity
Hourly Series, Daily Seasonality
Minutely Series, Hourly Seasonality
Secondly Series, Seasonal Jitter
23
SATORI #StrataData
Research Domains
Statistics
Pattern
Mining
Time
Series
Machine
Learning
Stream
Clustering
Deep
Learning
24
SATORI #StrataData
Statistics
25
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
SATORI #StrataData
Statistics
26
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
ROBUST STATISTICS
Rejecting the effect of anomalies
while modeling the distribution
Median-MAD, Winsorization
Grubb’s test, Generalized-ESD
Student’s t-test
SATORI #StrataData
Statistics
NON-PARAMETRIC STATISTICS
Histogram based techniques
t-digest
Adjusted Box plots
99.73 %00.27 %
27
PARAMETRIC STATISTICS
Anomaly detection based on
strong distribution assumptions
µ ± 3σ
Poisson ( ℷ )
p-value based
Point Anomalies
Incremental
ROBUST STATISTICS
Rejecting the effect of anomalies
while modeling the distribution
Median-MAD, Winsorization
Grubb’s test, Generalized-ESD
Student’s t-test
SATORI #StrataData
Time Series Analysis
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
ARMA, SARMA, EWMA,
TBATS
Model Estimation based on
past data
Point Anomalies
28
SATORI #StrataData
Time Series Analysis
STL, LOESS
Non-parametric regression
to model time series
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
29
Point Anomalies
SATORI #StrataData
Time Series Analysis
PCA, RobustPCA
Principal Component
Analysis
EDM, BCP, SDAR
Breakout Detection,
Sequential Discounting
AUTOREGRESSIVE MODELS
Model the autocorrelation
NON-PARAMETRIC MODELS
No distribution assumption
about the structure of residuals
DIMENSIONALITY REDUCTION
Model regular perturbations using
a lower rank representation
SEASONAL STRUCTURE
Regular pattern that occurs at a
known seasonal period
TREND STRUCTURE
Long term change in the
level of the series
EVOLUTIONARY STRUCTURE
Changing structure (unknown) of
the time series
30
Point Anomalies
SATORI #StrataData
Pattern Mining
Mark the rarest elements in
the stream as anomalies
Inter arrival times for patterns
HOTSAX
Rare-Rule Anomaly
Pattern Anomalies Incremental False Alarm Rate Robust
31
SATORI #StrataData
DBscan
k-means
c-means
Clu-Stream
DenStream
Clustree
D-Stream
HPStream
DBStream
Clustering
Micro-Clusters
Online
Micro-
Clustering
Offline
Clustering of
MCs
32
SATORI #StrataData
Clustering
Micro-Clusters
Online
Micro-
Clustering
Offline
Clustering of
MCs
d
d
DBStream
ClusTree
DenStream
DBStream
Runtimes
Pattern Anomalies
Incremental
Robust
33
SATORI #StrataData
Deep Learning
LSTM
Encoders
LSTM Auto-Encoders
Anomaly
Input
Input
Reconstructed
Input
Explicitly Models Time Series
Structure
Non-linear dimensionality
reduction without modeling
time series structure
Performance degrades as the
modality of the series increases
34
SATORI #StrataData
Time Series Prediction
Prediction
Input
Point Anomalies
Deep Learning
LSTM
Anomaly
Input
Classifier
Labels
Point Anomalies
No need for a fixed size
window for model estimation
Time Series Pattern Prediction
Pattern Prediction
Input
Pattern Anomalies
35
SATORI #StrataData
Deep Learning
LSTM
Time Series Pattern Prediction
Pattern Prediction
Input
Pattern Anomalies
Multiple predicted value for
each future observation
Model the errors as multivariate
gaussian to find anomalous
observations
Model the Euclidean distance
between predicted and true
sequences as error
Prediction Error
Prediction Error
36
SATORI #StrataData
Deep Learning
LSTM Encoders Time Series Reconstruction
Pattern Prediction
Input
Pattern Anomalies
Decoder Network
Encoder Network
LSTM-Encoder
LSTM
AutoEncoder
Runtimes
37
SATORI #StrataData
Correlation in Anomalies
s1
s2
s3
s8
Multi-dimensionality
Model Correlation
Correlation in anomaly space
can be captured in a graph
What about jitter in anomalies ?
Model anomalies in fixed sized buckets of time
What about contextual anomalies ?
Modeling correlation in the space of the whole
series is very expensive for live data
Naive Algorithm
Majority vote across all
dimensions
s4
s5
s6
s7
s6
s4
s2
s5
s8
s1
38
SATORI #StrataData
Multi-dimensionality
What about jitter in anomalies ?
Model anomalies in fixed sized buckets of time
What about contextual anomalies ?
Modeling correlation in the space of the whole
series can be very expensive
LSTM
No need for a fixed size window for model estimation
Can model high dimensionality
Works with non-stationary time series with irregular structure
Does not work with evolutionary series
39
SATORI #StrataData
Model Selection
Single Dimensional Series Multi-Dimensional Series
DBStream
Runtimes
Statistics
TimeSeriesAnalysis
HOTSAX/RRA
OneSVM/Iforest
LSTM
DBStream
Statistics
TimeSeries
LSTM
Runtimes
40
SATORI
41
SATORI #StrataData
Business Metrics
Performance Metrics
Health Metrics
Operations
Point Anomalies
Change Points
Trend Anomalies
What kind of anomalies
< 100 msec
Latency Sensitivity
42
#StrataData
SATORI #StrataData
Electrocardiograms
Fitness Trackers
Healthcare
Pattern anomalies
Change points
What kind of anomalies
< 1 msec
Latency Sensitivity
43
#StrataData
ECG
Driver Stress
Epilepsy onset
SATORI #StrataData
Traffic Routing
Passenger Load
Scheduling
Transportation
Point anomalies
Change points
Trend Anomalies
Spatial Anomalies
What kind of anomalies
< 5 seconds
Latency Sensitivity
44
#StrataData
SATORI #StrataData
Stock Trades
Share Prices
Financial Data
Point anomalies
Change points
What kind of anomalies
< 100 micro-seconds
Latency Sensitivity
45
#StrataData
SATORI #StrataData
Network Intrusion
Video Surveillance
Security
Point anomalies
Change points
What kind of anomalies
< 5 msecs
Latency Sensitivity
46
#StrataData
SATORI #StrataData
Smart Homes
Connected Cars
Smart Devices
Internet of Things
Pattern anomalies
Change points
Trend anomalies
What kind of anomalies
< 5 msecs
Latency Sensitivity
47
SATORI
48
SATORI
Check our tech @ satori.com
@choudharydhruv
@SatoriLiveData
@arun_kejariwal @FrancoisOrsini_
49
SATORI
Resources
“Computing Extremely Accurate Quantiles using t-Digests”, https://github.com/tdunning/t-digest
50
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
https://www.tensorflow.org/tutorials/word2vec
https://blog.keras.io/building-autoencoders-in-keras.html
“Deep Learning for Time Series Analysis”, https://arxiv.org/pdf/1701.01887.pdf
SATORI
Readings
“Using Natural Language Processing Models for Understanding Network Anomalies”, HPEC’17.
51
“Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection”, CIN’17.
“Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network”, FDSE’16.
“Deep Structured Energy Based Models for Anomaly Detection”, ICML’16.
“Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series”, ICML’16.
SATORI
Readings
52
“Long Short Term Memory Networks for. Anomaly Detection in Time Series”, ESANN’15.
“Clustering Data Streams based on Shared Density Between Clusters”, TKDE’16.
“LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML’16 Anomaly Detection Workshop.
“Sequence to Sequence Model for Anomaly Detection in Financial Transactions”, ICML’16.
“MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection”, NetworkML’16.
SATORI
Readings
“Anomaly detection: A survey”, ACM Computing Surveys, 2009.
“Time Series Analysis by State Space Methods”, by J. Durbin and S. J. Koopman, 2001.
53
“HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence”, ICDM, 2005
“Real-time change-point detection using sequentially discounting normalized maximum likelihood coding”,
Advanced Knowledge Discovery Data Mining, 2011.
“Unsupervised Learning of Video Representations using LSTMs”, ICML’15.
SATORI
Thank you
54

Mais conteúdo relacionado

Semelhante a Live Anomaly Detection in Streaming Data

Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep LearningJorge Cardoso
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0Makoto Yui
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Pavel Hardak
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameNumenta
 
Data Science in Retail-as-a-Service (RaaS)
Data Science in Retail-as-a-Service (RaaS)Data Science in Retail-as-a-Service (RaaS)
Data Science in Retail-as-a-Service (RaaS)zhiking
 
Automated clock mesh analysis for faster turnaround
Automated clock mesh analysis for faster turnaroundAutomated clock mesh analysis for faster turnaround
Automated clock mesh analysis for faster turnaroundAnimesh Sharma
 
Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017Istvan Rath
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureMarco Parenzan
 
GHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesGHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesElenaRosalesLpez
 
Stock Market Prediction.pptx
Stock Market Prediction.pptxStock Market Prediction.pptx
Stock Market Prediction.pptxRastogiAman
 
The unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo imagesThe unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo imagesieeepondy
 
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Spark Summit
 
Денис Баталов
Денис БаталовДенис Баталов
Денис БаталовCodeFest
 
GHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesGHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesElenaRosalesLpez
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Forecasting at Scale with Marcello Tomasini
Forecasting at Scale with Marcello TomasiniForecasting at Scale with Marcello Tomasini
Forecasting at Scale with Marcello TomasiniInfluxData
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 

Semelhante a Live Anomaly Detection in Streaming Data (20)

Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social Media
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Data Science in Retail-as-a-Service (RaaS)
Data Science in Retail-as-a-Service (RaaS)Data Science in Retail-as-a-Service (RaaS)
Data Science in Retail-as-a-Service (RaaS)
 
Automated clock mesh analysis for faster turnaround
Automated clock mesh analysis for faster turnaroundAutomated clock mesh analysis for faster turnaround
Automated clock mesh analysis for faster turnaround
 
Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017Eclipse VIATRA Overview 2017
Eclipse VIATRA Overview 2017
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
GHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesGHENOVA 360 Digital Services
GHENOVA 360 Digital Services
 
Stock Market Prediction.pptx
Stock Market Prediction.pptxStock Market Prediction.pptx
Stock Market Prediction.pptx
 
The unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo imagesThe unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo images
 
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
 
Денис Баталов
Денис БаталовДенис Баталов
Денис Баталов
 
GHENOVA 360 Digital Services
GHENOVA 360 Digital ServicesGHENOVA 360 Digital Services
GHENOVA 360 Digital Services
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Forecasting at Scale with Marcello Tomasini
Forecasting at Scale with Marcello TomasiniForecasting at Scale with Marcello Tomasini
Forecasting at Scale with Marcello Tomasini
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 

Mais de Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

Mais de Arun Kejariwal (19)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Live Anomaly Detection in Streaming Data

  • 1. LiveAnomalyDetection Identifying anomalies in live data 1 Dhruv Choudhary Francois Orsini Arun Kejariwal
  • 2. SATORI #StrataData What is live data ? Petabytes of Data Live Reactions Time to Reaction ~ less than 5 msecs 2 44546A
  • 3. SATORI What is live data ? 3 Live Data is new Streaming data that needs to be reacted upon instantly
  • 5. SATORI #StrataData Most Recent and Realtime Reactive Data Most Recent Low Latency Unstructured Highly Reactive High Throughput 5 Live Data Properties
  • 6. SATORI #StrataData 6 Satori - A Live Data Computation Mesh Satori powers a live data mesh that is capable of managing data flows from billions endpoints simultaneously at milliseconds latency.
  • 8. SATORI #StrataData Live Data and Big Data ? 8
  • 10. SATORI #StrataData Live Data Platform •Satori is a fully managed 
 platform as a service •Connect, process and react to 
 streaming live data at ultra-low 
 latency. •Use cases are within 
 (but not limited to) IoT, mobile 
 fitness, gaming and smart cities. 10
  • 11. SATORI #StrataData In-Stream Live Anomaly Detection 11
  • 13. SATORI #StrataData What are live anomalies ? Audio Time Series Video Text Binary Gun Shot Sound Stock market Crash Profanity Filters Road Accident 13
  • 14. SATORI #StrataData Audio Time Series Time Series Audio Prediction Error Audio Series Engine Misfiring 14 Connected Cars Surveillance Equipment Malfunction Applications FFT Window Features Timbre, Tempo, Dynamics Wavenet Other approaches
  • 15. SATORI #StrataData Text to Time Series Text Word2vec Averaging Word Anomalies Paragraph Anomalies Novelty Detection Applications 15 C0 C1 C2 Clustering Word2vec Averaging Word2vec Averaging Time Series
  • 16. SATORI #StrataData Video to Time Series Video Shape Anomalies Deep Encoders Representation 16 Time Series
  • 17. SATORI #StrataData Attributes of Live Anomaly Detection Model Selection Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 17
  • 18. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity POINT ANOMALY Individual points that break the pattern made by adjoining points CHANGEPOINT Change in mean, variance or structure of the series PATTERN ANOMALY A group of collective points that form a pattern never seen before. TREND ANOMALY Significant Perturbation in the longterm trend of a series 18
  • 19. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity 19 MEMORY CONSTRAINTS Can all the data be loaded into memory ? COMPUTE CONSTRAINTS Can you keep up to the data rate ? EVOLUTIONARY DATA Is the structure in data changing continuously?
  • 20. SATORI #StrataData True Positives True Negatives Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity DATA OBSOLESCENCE How fast should we forget past anomalies ? ANOMALY CLUSTERING Anomalies usually occur close to each other 20
  • 21. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity CONSTANT RATE How much human attention to allocate ? 21
  • 22. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity SEMI - SUPERVISED LEARNING Use Unlabelled Data for Training SUPERVISED LEARNING Separate positive and negative samples LABELTRAININFERENCE 22
  • 23. SATORI #StrataData Type of Anomaly Incremental Robust False Alarm Rate Labels Time Granularity Hourly Series, Daily Seasonality Minutely Series, Hourly Seasonality Secondly Series, Seasonal Jitter 23
  • 25. SATORI #StrataData Statistics 25 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental
  • 26. SATORI #StrataData Statistics 26 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  • 27. SATORI #StrataData Statistics NON-PARAMETRIC STATISTICS Histogram based techniques t-digest Adjusted Box plots 99.73 %00.27 % 27 PARAMETRIC STATISTICS Anomaly detection based on strong distribution assumptions µ ± 3σ Poisson ( ℷ ) p-value based Point Anomalies Incremental ROBUST STATISTICS Rejecting the effect of anomalies while modeling the distribution Median-MAD, Winsorization Grubb’s test, Generalized-ESD Student’s t-test
  • 28. SATORI #StrataData Time Series Analysis AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series ARMA, SARMA, EWMA, TBATS Model Estimation based on past data Point Anomalies 28
  • 29. SATORI #StrataData Time Series Analysis STL, LOESS Non-parametric regression to model time series AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 29 Point Anomalies
  • 30. SATORI #StrataData Time Series Analysis PCA, RobustPCA Principal Component Analysis EDM, BCP, SDAR Breakout Detection, Sequential Discounting AUTOREGRESSIVE MODELS Model the autocorrelation NON-PARAMETRIC MODELS No distribution assumption about the structure of residuals DIMENSIONALITY REDUCTION Model regular perturbations using a lower rank representation SEASONAL STRUCTURE Regular pattern that occurs at a known seasonal period TREND STRUCTURE Long term change in the level of the series EVOLUTIONARY STRUCTURE Changing structure (unknown) of the time series 30 Point Anomalies
  • 31. SATORI #StrataData Pattern Mining Mark the rarest elements in the stream as anomalies Inter arrival times for patterns HOTSAX Rare-Rule Anomaly Pattern Anomalies Incremental False Alarm Rate Robust 31
  • 34. SATORI #StrataData Deep Learning LSTM Encoders LSTM Auto-Encoders Anomaly Input Input Reconstructed Input Explicitly Models Time Series Structure Non-linear dimensionality reduction without modeling time series structure Performance degrades as the modality of the series increases 34
  • 35. SATORI #StrataData Time Series Prediction Prediction Input Point Anomalies Deep Learning LSTM Anomaly Input Classifier Labels Point Anomalies No need for a fixed size window for model estimation Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies 35
  • 36. SATORI #StrataData Deep Learning LSTM Time Series Pattern Prediction Pattern Prediction Input Pattern Anomalies Multiple predicted value for each future observation Model the errors as multivariate gaussian to find anomalous observations Model the Euclidean distance between predicted and true sequences as error Prediction Error Prediction Error 36
  • 37. SATORI #StrataData Deep Learning LSTM Encoders Time Series Reconstruction Pattern Prediction Input Pattern Anomalies Decoder Network Encoder Network LSTM-Encoder LSTM AutoEncoder Runtimes 37
  • 38. SATORI #StrataData Correlation in Anomalies s1 s2 s3 s8 Multi-dimensionality Model Correlation Correlation in anomaly space can be captured in a graph What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series is very expensive for live data Naive Algorithm Majority vote across all dimensions s4 s5 s6 s7 s6 s4 s2 s5 s8 s1 38
  • 39. SATORI #StrataData Multi-dimensionality What about jitter in anomalies ? Model anomalies in fixed sized buckets of time What about contextual anomalies ? Modeling correlation in the space of the whole series can be very expensive LSTM No need for a fixed size window for model estimation Can model high dimensionality Works with non-stationary time series with irregular structure Does not work with evolutionary series 39
  • 40. SATORI #StrataData Model Selection Single Dimensional Series Multi-Dimensional Series DBStream Runtimes Statistics TimeSeriesAnalysis HOTSAX/RRA OneSVM/Iforest LSTM DBStream Statistics TimeSeries LSTM Runtimes 40
  • 42. SATORI #StrataData Business Metrics Performance Metrics Health Metrics Operations Point Anomalies Change Points Trend Anomalies What kind of anomalies < 100 msec Latency Sensitivity 42 #StrataData
  • 43. SATORI #StrataData Electrocardiograms Fitness Trackers Healthcare Pattern anomalies Change points What kind of anomalies < 1 msec Latency Sensitivity 43 #StrataData ECG Driver Stress Epilepsy onset
  • 44. SATORI #StrataData Traffic Routing Passenger Load Scheduling Transportation Point anomalies Change points Trend Anomalies Spatial Anomalies What kind of anomalies < 5 seconds Latency Sensitivity 44 #StrataData
  • 45. SATORI #StrataData Stock Trades Share Prices Financial Data Point anomalies Change points What kind of anomalies < 100 micro-seconds Latency Sensitivity 45 #StrataData
  • 46. SATORI #StrataData Network Intrusion Video Surveillance Security Point anomalies Change points What kind of anomalies < 5 msecs Latency Sensitivity 46 #StrataData
  • 47. SATORI #StrataData Smart Homes Connected Cars Smart Devices Internet of Things Pattern anomalies Change points Trend anomalies What kind of anomalies < 5 msecs Latency Sensitivity 47
  • 49. SATORI Check our tech @ satori.com @choudharydhruv @SatoriLiveData @arun_kejariwal @FrancoisOrsini_ 49
  • 50. SATORI Resources “Computing Extremely Accurate Quantiles using t-Digests”, https://github.com/tdunning/t-digest 50 https://deepmind.com/blog/wavenet-generative-model-raw-audio/ https://www.tensorflow.org/tutorials/word2vec https://blog.keras.io/building-autoencoders-in-keras.html “Deep Learning for Time Series Analysis”, https://arxiv.org/pdf/1701.01887.pdf
  • 51. SATORI Readings “Using Natural Language Processing Models for Understanding Network Anomalies”, HPEC’17. 51 “Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection”, CIN’17. “Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network”, FDSE’16. “Deep Structured Energy Based Models for Anomaly Detection”, ICML’16. “Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series”, ICML’16.
  • 52. SATORI Readings 52 “Long Short Term Memory Networks for. Anomaly Detection in Time Series”, ESANN’15. “Clustering Data Streams based on Shared Density Between Clusters”, TKDE’16. “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML’16 Anomaly Detection Workshop. “Sequence to Sequence Model for Anomaly Detection in Financial Transactions”, ICML’16. “MS-LSTM: a Multi-Scale LSTM Model for BGP Anomaly Detection”, NetworkML’16.
  • 53. SATORI Readings “Anomaly detection: A survey”, ACM Computing Surveys, 2009. “Time Series Analysis by State Space Methods”, by J. Durbin and S. J. Koopman, 2001. 53 “HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence”, ICDM, 2005 “Real-time change-point detection using sequentially discounting normalized maximum likelihood coding”, Advanced Knowledge Discovery Data Mining, 2011. “Unsupervised Learning of Video Representations using LSTMs”, ICML’15.