SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Andreu Mora, Adyen
Time series forecasting
and monitoring with
Apache Spark and
ElasticSearch
#UnifiedDataAnalytics #SparkAISummit
Adyen
Payments Processor
Tech company
International customers (aka merchants)
Omnichannel
Back in the day…
The legacy monitor was based on a SQL query that
would compute an average for the hour of the week
and compare to a threshold.
Doesn’t quite work:
• Generates loads of False Positives
• It was fairly trimmed down: top merchants.
Reduce False
Positives
Catch anomalies
Do that at scale
Harness the
detection
performance
Connect to a live
platform
OK, but
What is an anomaly?
No luxury of a labelled dataset, divergence 

of opinions.
Connecting to a live platform without 

ML deployment hooks ready.
We were working on MLflow but not there yet.
No standard for timeseries forecasting at scale
With spark, several choices.
Considerations when dealing with Big Data
Big Technology
Leverage on mature Tech to
solve the problem (hello Spark).
Big diversity
Many different topologies for
our merchants and yet one
algorithm to track them all.
Big consequences
1000 merchants * 10 min * 95%
accuracy = 50400 emails/week
Big Data Platform
Volumes Predictions
Big Data Platform
Volumes Predictions
TimeSeries Ecosystem
Flint
Spark-ts
FB Prophet
Stats models
TimeSeries Ecosystem
Flint
Spark-ts
FB Prophet
Stats models
Data size
consideration
1 year @ 1 min @ double64 = 4.2 mb
Scoring in Java
While working on a fully functional engine to
deploy ML models based on MLflow.
Launch fast and iterate!
Transporting the model
The model transported for tens of thousands of
accounts needs to be lightweight.
Harness the maths
No using blackboxed models, equations need to
be understood and replicated in Java.
Needs to perform fast
Score and decide whether our seen traffic form
ElasticSearch is actually anomalous on the ms
scale.
Big Data Platform
Volumes Predictions
Big Data Platform
Volumes
Model
Coefficients
Fourier
components
Would not
optimise the
business cycles
ARIMA
Not perfect for
picking up
seasonality
Isolation Forests
Great for
multidimensional
data, not so much
for time series.
Autoencoders
Good luck
transporting the
model for each
merchant.
XGBM
Noice, but score
that in Java.
Research stage
Understand a problem and build a solution, decide what’s best.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy. easy
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Trendspotting
Estimating hinges and trends and offering it as
subproduct to Account Managers for evaluating
the low variations of volume.
Train set: 90 days
Test set: 7 days
Real volume
Predicted volume
95% confidence
How do the predictions look like?
Missed event
The implementation
on Spark
How did we get there, on the Spark side.
Reusability
Overloads of scikit-learns and pandas allow us to
ensure reusability
Cross-validation
Ensure the best tuning through tuning of
hyperparameters.
Scalability
Using Spark’s map-reduce paradigm we totally
control the computational performances.
SeasonalEstimator(BaseEstimator,RegressorMixin)
Input daily time series —> {t:[…], v:[…]}
Collect to list —> [{t:[…], v:[…]}]
Hinges and Hyperparameters
Distribute UDF
Making it happen at scale
Cross-validation
F4-sampling score: favours higher sampling
considering classical precision and recall.
Custom cv folds split TimeSeriesWeekSplit get the
sense of the business cycle
The output
Harnessing 

the prediction
performance
Enabling canary
roll-out based on
scores
Overcoming
unsupervised
learning
Alarm rate and synthetic recall allow us to
know for each case how many alarms would
have been captured and raised, even without
having a labelled dataset.
Trade-off alarm
rates and recall
We provide a number of choices (95%, 97%,
99% probability and completely profile what to
expect in terms of anomalies.
The model payload
Go Live
Houston? Houston? …
Grafana dashboard
So we saw this on the data
’You don’t call us, we call you’
Post on Medium
https://medium.com/adyen
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Mais conteúdo relacionado

Semelhante a Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye

Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptx
Chen Qian
 
Dileep Rai Oracle EBS. 010417
Dileep Rai Oracle EBS. 010417Dileep Rai Oracle EBS. 010417
Dileep Rai Oracle EBS. 010417
Dileep Rai
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 

Semelhante a Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye (20)

BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
Being Reactive with Spring
Being Reactive with SpringBeing Reactive with Spring
Being Reactive with Spring
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn Schepers
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptx
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
Softwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSoftwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelse
 
Softwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSoftwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelse
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Dileep Rai Oracle EBS. 010417
Dileep Rai Oracle EBS. 010417Dileep Rai Oracle EBS. 010417
Dileep Rai Oracle EBS. 010417
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
 

Mais de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Andreu Mora, Adyen Time series forecasting and monitoring with Apache Spark and ElasticSearch #UnifiedDataAnalytics #SparkAISummit
  • 3. Adyen Payments Processor Tech company International customers (aka merchants) Omnichannel
  • 4. Back in the day… The legacy monitor was based on a SQL query that would compute an average for the hour of the week and compare to a threshold.
  • 5. Doesn’t quite work: • Generates loads of False Positives • It was fairly trimmed down: top merchants.
  • 8. Do that at scale
  • 10. Connect to a live platform
  • 11. OK, but What is an anomaly? No luxury of a labelled dataset, divergence 
 of opinions. Connecting to a live platform without 
 ML deployment hooks ready. We were working on MLflow but not there yet. No standard for timeseries forecasting at scale With spark, several choices.
  • 12. Considerations when dealing with Big Data Big Technology Leverage on mature Tech to solve the problem (hello Spark). Big diversity Many different topologies for our merchants and yet one algorithm to track them all. Big consequences 1000 merchants * 10 min * 95% accuracy = 50400 emails/week
  • 16. TimeSeries Ecosystem Flint Spark-ts FB Prophet Stats models Data size consideration 1 year @ 1 min @ double64 = 4.2 mb
  • 17. Scoring in Java While working on a fully functional engine to deploy ML models based on MLflow. Launch fast and iterate! Transporting the model The model transported for tens of thousands of accounts needs to be lightweight. Harness the maths No using blackboxed models, equations need to be understood and replicated in Java. Needs to perform fast Score and decide whether our seen traffic form ElasticSearch is actually anomalous on the ms scale.
  • 20. Fourier components Would not optimise the business cycles ARIMA Not perfect for picking up seasonality Isolation Forests Great for multidimensional data, not so much for time series. Autoencoders Good luck transporting the model for each merchant. XGBM Noice, but score that in Java. Research stage Understand a problem and build a solution, decide what’s best.
  • 21. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 22. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 23. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 24. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 25. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 26. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 27. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 28. Ridge Regression Makes scoring in Java nice and kinda easy. easy Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 29. Trendspotting Estimating hinges and trends and offering it as subproduct to Account Managers for evaluating the low variations of volume.
  • 30. Train set: 90 days Test set: 7 days Real volume Predicted volume 95% confidence How do the predictions look like?
  • 32. The implementation on Spark How did we get there, on the Spark side. Reusability Overloads of scikit-learns and pandas allow us to ensure reusability Cross-validation Ensure the best tuning through tuning of hyperparameters. Scalability Using Spark’s map-reduce paradigm we totally control the computational performances.
  • 34. Input daily time series —> {t:[…], v:[…]} Collect to list —> [{t:[…], v:[…]}] Hinges and Hyperparameters Distribute UDF Making it happen at scale
  • 35. Cross-validation F4-sampling score: favours higher sampling considering classical precision and recall. Custom cv folds split TimeSeriesWeekSplit get the sense of the business cycle
  • 39. Overcoming unsupervised learning Alarm rate and synthetic recall allow us to know for each case how many alarms would have been captured and raised, even without having a labelled dataset.
  • 40. Trade-off alarm rates and recall We provide a number of choices (95%, 97%, 99% probability and completely profile what to expect in terms of anomalies.
  • 44. So we saw this on the data
  • 45.
  • 46.
  • 47.
  • 48. ’You don’t call us, we call you’
  • 50. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT