SlideShare a Scribd company logo
1 of 39
Download to read offline
Real-Time Anomaly Detection
with Spark MLlib, Akka and
Cassandra
Natalino Busa
Data Platform Architect at Ing
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
ING group
Empowering people to stay a step ahead
in life and in business.
http://www.ing.com/About-us/Purpose-Strategy.htm
ING group
http://www.ing.com/About-us/Purpose-Strategy.htm
Clear and Easy
Anytime, Anywhere
Empower
Keep getting better
Apply advanced, predictive analytics on live data
Event-Driven and exposed via APIs
Lean Architecture, Easy to integrate
Available, Consistent, Streaming, Real-time Data
Resilient, Distributed, Scalable, Maintainable
Clear and Easy
Anytime, Anywhere
Empower
Keep getting better
Data Principles
ING group
Big Data and Fast Data
population:events,transactions,
sessions,customers,etc
Why Fast Data?
1. Relevant up-to-date information.
2. Delivers actionable events.
Why Big Data?
1. Analyze and model
2. Learn, cluster, categorize, organize facts
10
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Training, Scoring and Exposing models
11
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Training, Scoring and Exposing models
12
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Training, Scoring and Exposing models
Cassandra+Akka+Spark: Machine Learning
Fast writes
2D Data Structure
Replicated
Tunable consistency
Multi-Data centers
C*Akka Spark
Very Fast processing
Distributed, Scalable computing
Actor-based Pipelines
Actor state can be persisted
Supervision strategies
Ad-Hoc Queries
Joins, Aggregate
User Defined Functions
Machine Learning,
Advanced Stats and Analytics
Akka-Cassandra-Spark Stack
Cassandra-Spark Connector
Cassandra
Spark
Streaming SQL MLlib Graphx
Extract
Data
Create Models,
Enrich, Transform
Fetch from other
Sources: Kafka
Fetch from other
Sources: DB’s, Files
Akka
Analytics, Statistics, Data
Science, Model Training
Access
Model
Persist
Actors’ State
Cassandra-Spark Connector
Cassandra: Store all the data
Spark: Analyze all the data
DC1: replication factor 3 DC2: replication factor 3 DC3: replication factor 3 + Spark Executors
Storage! Analytics!
Data
Data Science: Anomaly Detection
An outlier is an observation that deviates so much from other
observations as to arouse suspicion that it was generated by a different
mechanism.
Hawkins, 1980
Data Science: Anomaly Detection
Distance Based Density Based
Example: Analyze gowalla check-ins
year | month | day | time | uid | lat | lon | ts | vid
------+-------+-----+------+--------+----------+-----------+--------------------------+---------
2010 | 9 | 14 | 91 | 853 | 40.73474 | -73.87434 | 2010-09-14 00:01:31+0000 | 917955
2010 | 9 | 14 | 328 | 4516 | 40.72585 | -73.99289 | 2010-09-14 00:05:28+0000 | 37160
2010 | 9 | 14 | 344 | 2964 | 40.67621 | -73.98405 | 2010-09-14 00:05:44+0000 | 956870
Check-ins dataset
Venues dataset
vid | name | lat | long ------+-------+-----+------+--------+----------+-----------
+--------------------------+---------
754108 | My Suit NY | 40.73474 | -73.87434
249755 | UA Court Street Stadium 12 | 40.72585 | -73.99289
6919688 | Sky Asian Bistro | 40.67621 | -73.98405
Data Science: clustering venues
Data Science: clustering venues
Weekly visitors patterns!
Madison Square, Apple Store, Radio City Music Hall
Thursdays, Fridays, Saturdays are busy
Statue of Liberty, Jacob K. Javits Convention Center,
Whole Foods Market (Columbus Circle)
Not popular on midweek
Intuition:
Data Science: clustering with k-means
Histograms components as dimensions
Similar histograms would occupy similar places in
the feature space
How do I compare histograms:
- EMD
- Chi-squared distance
- Space transformation (DCT)
Intuition:
K-Means: Featurize data + cluster
val weekly_visits = checkins_venues.select("vid","ts")
.map(row => (row.getLong("vid"), vectorize_time(s.getTimestamp("ts"))
.reduceByKey(_ + _)
.mapValues(_ => featurize_histogram(_._1))
val numClusters = 15
val numIterations = 100
val clusters = KMeans.train(weekly_visits, numClusters, numIterations)
PairRDDs, weekly patterns per venue
cluster similar weekly patterns
How to use it
1) Classification
Classify venues to given groups
2) Anomaly Detection
Detect shift in the clustering assignment for a given venue for a given week
Keep monitoring weekly change in patterns, when it happens trigger a signal
week 26 week 27
Action
Data Science: clustering users’ venues
Data Science: clustering users’ venues
Users tend to stick in the same places
People have habits
By clustering the places together
We can identify anomalous locations
Size of the cluster matters
More points means less anomalous
Mini-clusters and single anomalies are
treated in similar ways ...
Intuition:
Data Science: clustering with DBSCAN
DBSCAN find clusters based on neighbouring density
Does not require the number of cluster k beforehand.
Clusters are not spherical
Data Science: clustering users’ venues
val locs = checkins_venues.select("uid", "lat","lon")
.map(s => (s.getLong(0), Seq( (s.getDouble(1), s.getDouble(2)) ))
.reduceByKey(_ + _)
.mapValues( dbscan (_) )
Have a look at: scalanlp/nak
Data Science:
Two ways to find anomalies with clustering
- Cluster big amount of data with k-means and histograms
- Apply clustering independently to million of users,
to each identify the patterns with dbscan algorithm
MLlib vs PairRDDs
KMeans.train(FeaturesRDD, numClusters, numIterations)
UserFeaturesPairRDD.GroupbyKey().mapValues( dbscan(_) )
RDDs map functions
Parallelism easy to exploit
The function runs locally for each Key
Pick your fav machine learning algorithms
Limited nr of points
Running in parallel for millions of Keys
MLlib
Truly distributed algorithm
Classify venues to given groups
Millions of datapoints
Limited amount of clusters
30
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Training, Scoring and Exposing models
Training vs Scoring: Latency budget
● Akka: millisecond response
● Spark: in-memory data models
Train: Spark
Score: Spark
Train: Spark
Score: Akka
slow: minutes fast: millisecs
Model Scoring
ModelTraining
slow:minutes
Akka
Mixed Load Cassandra Cluster
Coral: Web API for dynamic data flows
Akka
Web API for dynamic data flows
● a web api to define/manage/run streaming data-flows
● open source and community managed
● event processing as a service
coral-streaming/coral
Steven Raemaekers
Jasper van Zandbeek
Ger van Rossum
Hoda Alemi
Koen Verschuren
34
Real Time APIs
Streaming Data
Data Sources,
Files, DB extracts
Batched Data
Summary:
Akka
Feedback to the community:
More Algorithms for machine learning!
- DBSCAN, OPTICS, PAM
- More metrics, non-euclidean spaces, etc
- Non distributed algorithms: more scalanlp integration?
Streaming all the way:
Unify batch (Spark) and event streaming (Akka) computing
Thanks!
- Vision and strategy on an event-driven bank
- ING CIO management team and awesome colleagues
Spark, Cassandra, Akka communities !
webinar + live demo: Dec 9th
Resources
Coral: event processing webapi
https://github.com/coral-streaming/coral
Spark + Cassandra: Clustering Events
http://www.natalinobusa.com/2015/07/clustering-check-ins-with-spark-and.html
Spark: Machine Learning, SQL frames
https://spark.apache.org/docs/latest/mllib-guide.html
https://spark.apache.org/docs/latest/sql-programming-guide.html
Datastax: Analytics and Spark connector
http://www.slideshare.net/doanduyhai/spark-cassandra-connector-api-best-practices-and-usecases
http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/anaHome/anaHome.html
Anomaly Detection
Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey"(PDF). ACM Computing Surveys 41 (3): 1. doi:10.1145/1541880.1541882.
Resources
Datasets
https://snap.stanford.edu/data/loc-gowalla.html
E. Cho, S. A. Myers, J. Leskovec. Friendship and Mobility: Friendship and Mobility: User Movement in Location-Based Social Networks ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD), 2011
https://code.google.com/p/locrec/downloads/detail?name=gowalla-dataset.zip
The project is being developed in the context of the SInteliGIS project financed by the Portuguese Foundation for Science and Technology (FCT) through project grant
PTDC/EIA-EIA/109840/2009. .
Pictures:
"DBSCAN-density-data" by Chire - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:DBSCAN-density-data.
svg#/media/File:DBSCAN-density-data.svg
"DBSCAN-Illustration" by Chire - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:DBSCAN-Illustration.svg#/media/File:
DBSCAN-Illustration.svg
"Multimodal" by Visnut - Own work. Licensed under CC BY-SA 4.0 via Commons -
https://commons.wikimedia.org/wiki/File:Multimodal.png#/media/File:Multimodal.png
"Standard deviation diagram" by Mwtoews - Own work, based (in concept) on figure by Jeremy Kemp, on 2005-02-09. Licensed under CC BY 2.5 via Commons - https:
//commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg#/media/File:Standard_deviation_diagram.svg
"Michelsonmorley-boxplot" by User:Schutz - Own work. Licensed under Public Domain via Commons - https://commons.wikimedia.org/wiki/File:Michelsonmorley-boxplot.
svg#/media/File:Michelsonmorley-boxplot.svg

More Related Content

What's hot

Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy Webinar
Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy WebinarRole with IT(IL) - V3 Roles and Responsibilities - ITSM Academy Webinar
Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy WebinarITSM Academy, Inc.
 
Transforming Procurement
Transforming ProcurementTransforming Procurement
Transforming Procurementmeganejames
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
ITIL and CMMI for service
ITIL and CMMI for serviceITIL and CMMI for service
ITIL and CMMI for serviceBoonNam Goh
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceDhruv Majumdar
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixDataWorks Summit
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드Tae Young Lee
 
From capabilities to services modelling for business-it alignment v.2
From capabilities to services   modelling for business-it alignment v.2From capabilities to services   modelling for business-it alignment v.2
From capabilities to services modelling for business-it alignment v.2Trond Hjorteland
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
I Love APIs 2015: Microservices at Amazon
I Love APIs 2015: Microservices at AmazonI Love APIs 2015: Microservices at Amazon
I Love APIs 2015: Microservices at AmazonApigee | Google Cloud
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéDatabricks
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Servicesconfluent
 
DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)Ravi Tadwalkar
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringRick Hwang
 
Your Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and ContainersYour Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and ContainersAtlassian
 
Servicenow overview
Servicenow overviewServicenow overview
Servicenow overviewCloudSyntrix
 

What's hot (20)

Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy Webinar
Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy WebinarRole with IT(IL) - V3 Roles and Responsibilities - ITSM Academy Webinar
Role with IT(IL) - V3 Roles and Responsibilities - ITSM Academy Webinar
 
Transforming Procurement
Transforming ProcurementTransforming Procurement
Transforming Procurement
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
ITIL and CMMI for service
ITIL and CMMI for serviceITIL and CMMI for service
ITIL and CMMI for service
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat Intelligence
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드
 
ITIL4 and ServiceNow
ITIL4 and ServiceNowITIL4 and ServiceNow
ITIL4 and ServiceNow
 
From capabilities to services modelling for business-it alignment v.2
From capabilities to services   modelling for business-it alignment v.2From capabilities to services   modelling for business-it alignment v.2
From capabilities to services modelling for business-it alignment v.2
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
I Love APIs 2015: Microservices at Amazon
I Love APIs 2015: Microservices at AmazonI Love APIs 2015: Microservices at Amazon
I Love APIs 2015: Microservices at Amazon
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Services
 
DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and Monitoring
 
Your Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and ContainersYour Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
Your Journey to Cloud-Native Begins with DevOps, Microservices, and Containers
 
Observability
Observability Observability
Observability
 
Servicenow overview
Servicenow overviewServicenow overview
Servicenow overview
 

Viewers also liked

Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Robert "Chip" Senkbeil
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationPatrick Di Loreto
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP serverKazuho Oku
 
Container Orchestration Wars
Container Orchestration WarsContainer Orchestration Wars
Container Orchestration WarsKarl Isenberg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 

Viewers also liked (20)

Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP server
 
Container Orchestration Wars
Container Orchestration WarsContainer Orchestration Wars
Container Orchestration Wars
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 

Similar to Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
AWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek SinhaAWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek SinhaAmazon Web Services
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016Marc Dutoo
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OW2
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataVMware Tanzu
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataCarlos Andrés García
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your DataDanilo Poccia
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 

Similar to Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra (20)

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
AWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek SinhaAWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your Data
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 

More from Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

More from Natalino Busa (17)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Data in Action
Data in ActionData in Action
Data in Action
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Recently uploaded

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra

  • 1. Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra Natalino Busa Data Platform Architect at Ing
  • 2.
  • 4. ING group Empowering people to stay a step ahead in life and in business. http://www.ing.com/About-us/Purpose-Strategy.htm
  • 5. ING group http://www.ing.com/About-us/Purpose-Strategy.htm Clear and Easy Anytime, Anywhere Empower Keep getting better
  • 6. Apply advanced, predictive analytics on live data Event-Driven and exposed via APIs Lean Architecture, Easy to integrate Available, Consistent, Streaming, Real-time Data Resilient, Distributed, Scalable, Maintainable Clear and Easy Anytime, Anywhere Empower Keep getting better Data Principles ING group
  • 7. Big Data and Fast Data population:events,transactions, sessions,customers,etc
  • 8. Why Fast Data? 1. Relevant up-to-date information. 2. Delivers actionable events.
  • 9. Why Big Data? 1. Analyze and model 2. Learn, cluster, categorize, organize facts
  • 10. 10 Real Time APIs Streaming Data Data Sources, Files, DB extracts Batched Data Training, Scoring and Exposing models
  • 11. 11 Real Time APIs Streaming Data Data Sources, Files, DB extracts Batched Data Training, Scoring and Exposing models
  • 12. 12 Real Time APIs Streaming Data Data Sources, Files, DB extracts Batched Data Training, Scoring and Exposing models
  • 13. Cassandra+Akka+Spark: Machine Learning Fast writes 2D Data Structure Replicated Tunable consistency Multi-Data centers C*Akka Spark Very Fast processing Distributed, Scalable computing Actor-based Pipelines Actor state can be persisted Supervision strategies Ad-Hoc Queries Joins, Aggregate User Defined Functions Machine Learning, Advanced Stats and Analytics
  • 14. Akka-Cassandra-Spark Stack Cassandra-Spark Connector Cassandra Spark Streaming SQL MLlib Graphx Extract Data Create Models, Enrich, Transform Fetch from other Sources: Kafka Fetch from other Sources: DB’s, Files Akka Analytics, Statistics, Data Science, Model Training Access Model Persist Actors’ State
  • 15. Cassandra-Spark Connector Cassandra: Store all the data Spark: Analyze all the data DC1: replication factor 3 DC2: replication factor 3 DC3: replication factor 3 + Spark Executors Storage! Analytics! Data
  • 16. Data Science: Anomaly Detection An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Hawkins, 1980
  • 17. Data Science: Anomaly Detection Distance Based Density Based
  • 18. Example: Analyze gowalla check-ins year | month | day | time | uid | lat | lon | ts | vid ------+-------+-----+------+--------+----------+-----------+--------------------------+--------- 2010 | 9 | 14 | 91 | 853 | 40.73474 | -73.87434 | 2010-09-14 00:01:31+0000 | 917955 2010 | 9 | 14 | 328 | 4516 | 40.72585 | -73.99289 | 2010-09-14 00:05:28+0000 | 37160 2010 | 9 | 14 | 344 | 2964 | 40.67621 | -73.98405 | 2010-09-14 00:05:44+0000 | 956870 Check-ins dataset Venues dataset vid | name | lat | long ------+-------+-----+------+--------+----------+----------- +--------------------------+--------- 754108 | My Suit NY | 40.73474 | -73.87434 249755 | UA Court Street Stadium 12 | 40.72585 | -73.99289 6919688 | Sky Asian Bistro | 40.67621 | -73.98405
  • 20. Data Science: clustering venues Weekly visitors patterns! Madison Square, Apple Store, Radio City Music Hall Thursdays, Fridays, Saturdays are busy Statue of Liberty, Jacob K. Javits Convention Center, Whole Foods Market (Columbus Circle) Not popular on midweek Intuition:
  • 21. Data Science: clustering with k-means Histograms components as dimensions Similar histograms would occupy similar places in the feature space How do I compare histograms: - EMD - Chi-squared distance - Space transformation (DCT) Intuition:
  • 22. K-Means: Featurize data + cluster val weekly_visits = checkins_venues.select("vid","ts") .map(row => (row.getLong("vid"), vectorize_time(s.getTimestamp("ts")) .reduceByKey(_ + _) .mapValues(_ => featurize_histogram(_._1)) val numClusters = 15 val numIterations = 100 val clusters = KMeans.train(weekly_visits, numClusters, numIterations) PairRDDs, weekly patterns per venue cluster similar weekly patterns
  • 23. How to use it 1) Classification Classify venues to given groups 2) Anomaly Detection Detect shift in the clustering assignment for a given venue for a given week Keep monitoring weekly change in patterns, when it happens trigger a signal week 26 week 27 Action
  • 24. Data Science: clustering users’ venues
  • 25. Data Science: clustering users’ venues Users tend to stick in the same places People have habits By clustering the places together We can identify anomalous locations Size of the cluster matters More points means less anomalous Mini-clusters and single anomalies are treated in similar ways ... Intuition:
  • 26. Data Science: clustering with DBSCAN DBSCAN find clusters based on neighbouring density Does not require the number of cluster k beforehand. Clusters are not spherical
  • 27. Data Science: clustering users’ venues val locs = checkins_venues.select("uid", "lat","lon") .map(s => (s.getLong(0), Seq( (s.getDouble(1), s.getDouble(2)) )) .reduceByKey(_ + _) .mapValues( dbscan (_) ) Have a look at: scalanlp/nak
  • 28. Data Science: Two ways to find anomalies with clustering - Cluster big amount of data with k-means and histograms - Apply clustering independently to million of users, to each identify the patterns with dbscan algorithm
  • 29. MLlib vs PairRDDs KMeans.train(FeaturesRDD, numClusters, numIterations) UserFeaturesPairRDD.GroupbyKey().mapValues( dbscan(_) ) RDDs map functions Parallelism easy to exploit The function runs locally for each Key Pick your fav machine learning algorithms Limited nr of points Running in parallel for millions of Keys MLlib Truly distributed algorithm Classify venues to given groups Millions of datapoints Limited amount of clusters
  • 30. 30 Real Time APIs Streaming Data Data Sources, Files, DB extracts Batched Data Training, Scoring and Exposing models
  • 31. Training vs Scoring: Latency budget ● Akka: millisecond response ● Spark: in-memory data models Train: Spark Score: Spark Train: Spark Score: Akka slow: minutes fast: millisecs Model Scoring ModelTraining slow:minutes
  • 32. Akka Mixed Load Cassandra Cluster Coral: Web API for dynamic data flows
  • 33. Akka Web API for dynamic data flows ● a web api to define/manage/run streaming data-flows ● open source and community managed ● event processing as a service coral-streaming/coral Steven Raemaekers Jasper van Zandbeek Ger van Rossum Hoda Alemi Koen Verschuren
  • 34. 34 Real Time APIs Streaming Data Data Sources, Files, DB extracts Batched Data Summary:
  • 35. Akka Feedback to the community: More Algorithms for machine learning! - DBSCAN, OPTICS, PAM - More metrics, non-euclidean spaces, etc - Non distributed algorithms: more scalanlp integration? Streaming all the way: Unify batch (Spark) and event streaming (Akka) computing
  • 36. Thanks! - Vision and strategy on an event-driven bank - ING CIO management team and awesome colleagues Spark, Cassandra, Akka communities !
  • 37. webinar + live demo: Dec 9th
  • 38. Resources Coral: event processing webapi https://github.com/coral-streaming/coral Spark + Cassandra: Clustering Events http://www.natalinobusa.com/2015/07/clustering-check-ins-with-spark-and.html Spark: Machine Learning, SQL frames https://spark.apache.org/docs/latest/mllib-guide.html https://spark.apache.org/docs/latest/sql-programming-guide.html Datastax: Analytics and Spark connector http://www.slideshare.net/doanduyhai/spark-cassandra-connector-api-best-practices-and-usecases http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/anaHome/anaHome.html Anomaly Detection Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey"(PDF). ACM Computing Surveys 41 (3): 1. doi:10.1145/1541880.1541882.
  • 39. Resources Datasets https://snap.stanford.edu/data/loc-gowalla.html E. Cho, S. A. Myers, J. Leskovec. Friendship and Mobility: Friendship and Mobility: User Movement in Location-Based Social Networks ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2011 https://code.google.com/p/locrec/downloads/detail?name=gowalla-dataset.zip The project is being developed in the context of the SInteliGIS project financed by the Portuguese Foundation for Science and Technology (FCT) through project grant PTDC/EIA-EIA/109840/2009. . Pictures: "DBSCAN-density-data" by Chire - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:DBSCAN-density-data. svg#/media/File:DBSCAN-density-data.svg "DBSCAN-Illustration" by Chire - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:DBSCAN-Illustration.svg#/media/File: DBSCAN-Illustration.svg "Multimodal" by Visnut - Own work. Licensed under CC BY-SA 4.0 via Commons - https://commons.wikimedia.org/wiki/File:Multimodal.png#/media/File:Multimodal.png "Standard deviation diagram" by Mwtoews - Own work, based (in concept) on figure by Jeremy Kemp, on 2005-02-09. Licensed under CC BY 2.5 via Commons - https: //commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg#/media/File:Standard_deviation_diagram.svg "Michelsonmorley-boxplot" by User:Schutz - Own work. Licensed under Public Domain via Commons - https://commons.wikimedia.org/wiki/File:Michelsonmorley-boxplot. svg#/media/File:Michelsonmorley-boxplot.svg