SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
®
© 2017 MapR Technologies 1® 1MapR Confidential © 2017 MapR Technologies
®
Applying Machine Learning to
Live Patient Data
Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems)
March 15, 2017
®
© 2017 MapR Technologies 2® 2MapR Confidential
Data-Driven Experience
®
© 2017 MapR Technologies 3® 3MapR Confidential
The Promise of Big Data in Healthcare
SMARTERBIGGER FASTER
®
© 2017 MapR Technologies 4® 4MapR Confidential
Life moves pretty fast. If you don't
stop and look around once in a
while, you could miss it.
Ferris Bueller, Fictional High School Student
®
© 2017 MapR Technologies 5® 5MapR Confidential
Reading an EKG
P
Q
R
S
T
atrial
depolarization
ventricular
depolarization
ventricular
repolarization
®
© 2017 MapR Technologies 6® 6MapR Confidential
Windowing the EKG for Clustering
window length = 32, step size = 2
®
© 2017 MapR Technologies 7® 7MapR Confidential
Displaying Centroids
Showing 25 of K=400 centroids
Begin reconstruction
®
© 2017 MapR Technologies 8® 8MapR Confidential
Reconstructing the Signal
1 2
1
2
+
window length = 32,
step size = 16
®
© 2017 MapR Technologies 9® 9MapR Confidential
Diagnosing the Anomalies
residuals
®
© 2017 MapR Technologies 10® 10MapR Confidential
Putting it all together…
shape
catalog
input reconstruct
encoder t-digest
error
quantile
estimator
®
© 2017 MapR Technologies 11® 11MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
Use Case Architecture
®
© 2017 MapR Technologies 12® 12MapR Confidential
Lots of things are producing Streaming Data
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
®
© 2017 MapR Technologies 13® 13MapR Confidential
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
Streams capture unbounded sequences of events
Old
Message
New
Message
Events are delivered in the order they are received, like a queue.
Kafka API Kafka API
®
© 2017 MapR Technologies 14® 14MapR Confidential
Stream Topics Organize Events into Categories
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
Unlike a queue messages are not deleted, allows processing of same event
for different views
®
© 2017 MapR Technologies 15® 15MapR Confidential
Predictive Analytics
Machine
Learning
Algorithms
Test Model
Predictions
Model Evaluation
Predictive
Model
Predictions
Model
Building
Model
scoring
Featurization
Historical Data
+
+
̶+
̶ ̶
+
+
̶+
̶ ̶
New Data
Stream
Topic
®
© 2017 MapR Technologies 16® 16MapR Confidential
Stream Processing Architecture
Serve DataCollect DataData Sources Stream Processing
Derive
features
process
Batch Processing
Model
build model
update model
Machine-
learning
Models
Devices
Feature
extraction
Stream
Topic
Images
HL7
Social Media
lab
Stream
Topic
®
© 2017 MapR Technologies 17® 17MapR Confidential
// put data in a vector
val vrdd = rdd.map(line =>
Vectors.dense(line.split('t').map(_.toDouble)))
//window and normalize each record....
// call Kmeans , which returns the model
val model = KMeans.train(processed, 300, 10)
model.save(sc, "/user/user01/data/anomaly-detection-master")
Build Model
®
© 2017 MapR Technologies 18® 18MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
Use the Model with Streaming Data
®
© 2017 MapR Technologies 19® 19MapR Confidential
Use Case: Real Time Anomaly Detection
real-time
monitoring
read
EKG
data
Spark processing
enrich with cluster normalized data
Spark
Streaming
Stream
Topic
Stream
Topic
17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
®
© 2017 MapR Technologies 20® 20MapR Confidential
Create a DStream
DStream: a sequence of RDDs
representing a stream of data
val model = KMeansModel.load(ssc.sparkContext, modelpath)
val messagesDStream = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent, consumerStrategy
)
batch
time 0 to 1
batch
time 1 to 2
batch
time 2 to 3
dStream
Stored in memory
as an RDD
Stream
Topic
®
© 2017 MapR Technologies 21® 21MapR Confidential
Process DStream
// get message values from key,value
val valuesDStream: DStream[String] = messagesDStream.map(_.value())
valuesDStream.foreachRDD { rdd =>
val producer = KafkaProducerFactory.getOrCreateProducer(conf)
....
// enrich message with model
val cluster = model.predict(processed)
....
val record = new ProducerRecord(topicp, "key", message)
// send enriched message
producer.send(record)
}
}
®
© 2017 MapR Technologies 22® 22MapR Confidential
Process DStream
dStream RDDs
batch
time 2 to 3
batch
time 1 to 2
batch
time 0 to 1
ValueDStream RDDs
Transformed RDDs
map map map
Stream
Topic
®
© 2017 MapR Technologies 23® 23MapR Confidential
Use Case: Real Time Anomaly Detection
real-time
monitoring
read
Spark processing
enrich with cluster normalized data
Spark
Streaming
Stream
Topic
Vert.x
HTTP
Event bus
WebSocket Event Bus
Framework
{”c":120,"colA":[17.92,
12.88, ..],"colB":[17.91,
12.89, 0...]}
{”c":120,"colA":[17.92,
12.88, ..],"colB":[17.91,
12.89, 0...]}
®
© 2017 MapR Technologies 24® 24MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
®
© 2017 MapR Technologies 25® 25MapR Confidential
Resources
•  EKG basics - http://en.wikipedia.org/wiki/Electrocardiography
•  Source data -
http://physionet.org/physiobank/database/apnea-ecg/
•  K-Means basics -
http://www.coursera.org/learn/machine-learning/lecture/93VPG/k-
means-algorithm
•  Code repositories
–  Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg
–  UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard
•  t-digest for anomalies - http://github.com/tdunning/t-digest
®
© 2017 MapR Technologies 26® 26MapR Confidential
e-book available courtesy of MapR
https://www.mapr.com/practical-machine-
learning-new-look-anomaly-detection
A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman (published by O’Reilly)
®
© 2017 MapR Technologies 27® 27MapR Confidential
MapR Blog mapr.com/blog
®
© 2017 MapR Technologies 28® 28MapR Confidential
Q&A
@mapr
Engage with us!
mapr-technologies
Carol McDonald (@caroljmcdonald)
Joseph Blue (@joebluems)

Mais conteúdo relacionado

Mais procurados

Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningCarol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBCarol McDonald
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...Carol McDonald
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Carol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 

Mais procurados (20)

Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 

Destaque

Att 3 the abc's of stress release for parents, students and teachers-
Att 3 the abc's of stress release  for parents, students and teachers-Att 3 the abc's of stress release  for parents, students and teachers-
Att 3 the abc's of stress release for parents, students and teachers-Dr. Ron Rubenzer
 
Top puertos Brasil
Top puertos BrasilTop puertos Brasil
Top puertos BrasilVeconinter
 
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...Dr. Ron Rubenzer
 
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...maysam araee daronkola
 
Educating the other half... for research gate 3 9-2017
Educating the other half... for research gate 3 9-2017Educating the other half... for research gate 3 9-2017
Educating the other half... for research gate 3 9-2017Dr. Ron Rubenzer
 
Indicadores de tablero
Indicadores de tablero Indicadores de tablero
Indicadores de tablero david pluma
 
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...Rondoniadinamica Jornal Eletrônico
 
7.1.4. lesson plan
7.1.4. lesson plan7.1.4. lesson plan
7.1.4. lesson planddonahuereid
 
Anatomía topográfica del aparato respiratorio.
Anatomía topográfica del aparato respiratorio.Anatomía topográfica del aparato respiratorio.
Anatomía topográfica del aparato respiratorio.Jozsy Gorgeouss
 
Política em paulo freire
Política em paulo freirePolítica em paulo freire
Política em paulo freireArcilene Chaves
 
Primer parcial tecnolog_a_y_sociedad
Primer parcial tecnolog_a_y_sociedadPrimer parcial tecnolog_a_y_sociedad
Primer parcial tecnolog_a_y_sociedadMIGUEL RAMIREZ
 
鹿児島移住ドラフト会議
鹿児島移住ドラフト会議鹿児島移住ドラフト会議
鹿児島移住ドラフト会議Kazari Fukushima
 
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.ai
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.aiSmart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.ai
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.aiRussell McAthy
 
ENDOCA presentazione azienda e prodotti
ENDOCA presentazione azienda e prodottiENDOCA presentazione azienda e prodotti
ENDOCA presentazione azienda e prodottiAlesh Trcek
 
Behaviourist learning theory (in SLA)
Behaviourist learning theory (in SLA) Behaviourist learning theory (in SLA)
Behaviourist learning theory (in SLA) Iffat Jahan Suchona
 

Destaque (20)

Att 3 the abc's of stress release for parents, students and teachers-
Att 3 the abc's of stress release  for parents, students and teachers-Att 3 the abc's of stress release  for parents, students and teachers-
Att 3 the abc's of stress release for parents, students and teachers-
 
Independent cinema
Independent cinemaIndependent cinema
Independent cinema
 
Top puertos Brasil
Top puertos BrasilTop puertos Brasil
Top puertos Brasil
 
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
 
WHAT ARE THEY DOING NOW?
WHAT ARE THEY DOING NOW?WHAT ARE THEY DOING NOW?
WHAT ARE THEY DOING NOW?
 
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...
Political weight governmental weight-by maysam araee darunkola- مفاهیم عمومی ...
 
Educating the other half... for research gate 3 9-2017
Educating the other half... for research gate 3 9-2017Educating the other half... for research gate 3 9-2017
Educating the other half... for research gate 3 9-2017
 
Stakeholders
StakeholdersStakeholders
Stakeholders
 
Indicadores de tablero
Indicadores de tablero Indicadores de tablero
Indicadores de tablero
 
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...
Empresa responsável por obra no Espaço Alternativo figura na lista do trabalh...
 
7.1.4. lesson plan
7.1.4. lesson plan7.1.4. lesson plan
7.1.4. lesson plan
 
Anatomía topográfica del aparato respiratorio.
Anatomía topográfica del aparato respiratorio.Anatomía topográfica del aparato respiratorio.
Anatomía topográfica del aparato respiratorio.
 
Θ.Ε.3.4 Διάλογος
Θ.Ε.3.4 ΔιάλογοςΘ.Ε.3.4 Διάλογος
Θ.Ε.3.4 Διάλογος
 
Política em paulo freire
Política em paulo freirePolítica em paulo freire
Política em paulo freire
 
Primer parcial tecnolog_a_y_sociedad
Primer parcial tecnolog_a_y_sociedadPrimer parcial tecnolog_a_y_sociedad
Primer parcial tecnolog_a_y_sociedad
 
鹿児島移住ドラフト会議
鹿児島移住ドラフト会議鹿児島移住ドラフト会議
鹿児島移住ドラフト会議
 
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.ai
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.aiSmart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.ai
Smart Attribution + Cross Device - #SMXMunich 2017 - www.cubed.ai
 
ENDOCA presentazione azienda e prodotti
ENDOCA presentazione azienda e prodottiENDOCA presentazione azienda e prodotti
ENDOCA presentazione azienda e prodotti
 
Sprint 56
Sprint 56Sprint 56
Sprint 56
 
Behaviourist learning theory (in SLA)
Behaviourist learning theory (in SLA) Behaviourist learning theory (in SLA)
Behaviourist learning theory (in SLA)
 

Semelhante a Applying Machine Learning to Live Patient Data

Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data PipelinesMapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesEUDAT
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 

Semelhante a Applying Machine Learning to Live Patient Data (20)

Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 

Mais de Carol McDonald

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churnCarol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with SparkCarol McDonald
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 

Mais de Carol McDonald (8)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 

Último

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 

Último (20)

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Applying Machine Learning to Live Patient Data

  • 1. ® © 2017 MapR Technologies 1® 1MapR Confidential © 2017 MapR Technologies ® Applying Machine Learning to Live Patient Data Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems) March 15, 2017
  • 2. ® © 2017 MapR Technologies 2® 2MapR Confidential Data-Driven Experience
  • 3. ® © 2017 MapR Technologies 3® 3MapR Confidential The Promise of Big Data in Healthcare SMARTERBIGGER FASTER
  • 4. ® © 2017 MapR Technologies 4® 4MapR Confidential Life moves pretty fast. If you don't stop and look around once in a while, you could miss it. Ferris Bueller, Fictional High School Student
  • 5. ® © 2017 MapR Technologies 5® 5MapR Confidential Reading an EKG P Q R S T atrial depolarization ventricular depolarization ventricular repolarization
  • 6. ® © 2017 MapR Technologies 6® 6MapR Confidential Windowing the EKG for Clustering window length = 32, step size = 2
  • 7. ® © 2017 MapR Technologies 7® 7MapR Confidential Displaying Centroids Showing 25 of K=400 centroids Begin reconstruction
  • 8. ® © 2017 MapR Technologies 8® 8MapR Confidential Reconstructing the Signal 1 2 1 2 + window length = 32, step size = 16
  • 9. ® © 2017 MapR Technologies 9® 9MapR Confidential Diagnosing the Anomalies residuals
  • 10. ® © 2017 MapR Technologies 10® 10MapR Confidential Putting it all together… shape catalog input reconstruct encoder t-digest error quantile estimator
  • 11. ® © 2017 MapR Technologies 11® 11MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use Case Architecture
  • 12. ® © 2017 MapR Technologies 12® 12MapR Confidential Lots of things are producing Streaming Data Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  • 13. ® © 2017 MapR Technologies 13® 13MapR Confidential Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers Streams capture unbounded sequences of events Old Message New Message Events are delivered in the order they are received, like a queue. Kafka API Kafka API
  • 14. ® © 2017 MapR Technologies 14® 14MapR Confidential Stream Topics Organize Events into Categories Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API Unlike a queue messages are not deleted, allows processing of same event for different views
  • 15. ® © 2017 MapR Technologies 15® 15MapR Confidential Predictive Analytics Machine Learning Algorithms Test Model Predictions Model Evaluation Predictive Model Predictions Model Building Model scoring Featurization Historical Data + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ New Data Stream Topic
  • 16. ® © 2017 MapR Technologies 16® 16MapR Confidential Stream Processing Architecture Serve DataCollect DataData Sources Stream Processing Derive features process Batch Processing Model build model update model Machine- learning Models Devices Feature extraction Stream Topic Images HL7 Social Media lab Stream Topic
  • 17. ® © 2017 MapR Technologies 17® 17MapR Confidential // put data in a vector val vrdd = rdd.map(line => Vectors.dense(line.split('t').map(_.toDouble))) //window and normalize each record.... // call Kmeans , which returns the model val model = KMeans.train(processed, 300, 10) model.save(sc, "/user/user01/data/anomaly-detection-master") Build Model
  • 18. ® © 2017 MapR Technologies 18® 18MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use the Model with Streaming Data
  • 19. ® © 2017 MapR Technologies 19® 19MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read EKG data Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Stream Topic 17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  • 20. ® © 2017 MapR Technologies 20® 20MapR Confidential Create a DStream DStream: a sequence of RDDs representing a stream of data val model = KMeansModel.load(ssc.sparkContext, modelpath) val messagesDStream = KafkaUtils.createDirectStream[String, String]( ssc, LocationStrategies.PreferConsistent, consumerStrategy ) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD Stream Topic
  • 21. ® © 2017 MapR Technologies 21® 21MapR Confidential Process DStream // get message values from key,value val valuesDStream: DStream[String] = messagesDStream.map(_.value()) valuesDStream.foreachRDD { rdd => val producer = KafkaProducerFactory.getOrCreateProducer(conf) .... // enrich message with model val cluster = model.predict(processed) .... val record = new ProducerRecord(topicp, "key", message) // send enriched message producer.send(record) } }
  • 22. ® © 2017 MapR Technologies 22® 22MapR Confidential Process DStream dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 ValueDStream RDDs Transformed RDDs map map map Stream Topic
  • 23. ® © 2017 MapR Technologies 23® 23MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Vert.x HTTP Event bus WebSocket Event Bus Framework {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]} {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  • 24. ® © 2017 MapR Technologies 24® 24MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
  • 25. ® © 2017 MapR Technologies 25® 25MapR Confidential Resources •  EKG basics - http://en.wikipedia.org/wiki/Electrocardiography •  Source data - http://physionet.org/physiobank/database/apnea-ecg/ •  K-Means basics - http://www.coursera.org/learn/machine-learning/lecture/93VPG/k- means-algorithm •  Code repositories –  Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg –  UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard •  t-digest for anomalies - http://github.com/tdunning/t-digest
  • 26. ® © 2017 MapR Technologies 26® 26MapR Confidential e-book available courtesy of MapR https://www.mapr.com/practical-machine- learning-new-look-anomaly-detection A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman (published by O’Reilly)
  • 27. ® © 2017 MapR Technologies 27® 27MapR Confidential MapR Blog mapr.com/blog
  • 28. ® © 2017 MapR Technologies 28® 28MapR Confidential Q&A @mapr Engage with us! mapr-technologies Carol McDonald (@caroljmcdonald) Joseph Blue (@joebluems)