SlideShare uma empresa Scribd logo
1 de 18
© 2014 MapR Technologies 1
Talk Overview
• Agile Real-time Stats
• R + Storm
github.com/allenday/R-Storm
• DEMO
• How to do it?
• Q & A @allenday
Agile
Methods
Advanced
Statistics
Continuous
Real-time
Delivery
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Technologies 2© 2014 MapR Technologies
Architecting R into the Storm
Application Development Process
© 2014 MapR Technologies 3
Allen (me) and Sungwook @ MapR
• Allen Day, Principal Data Scientist [ @allenday ]
7yr Hadoop dev, 12yr R dev/author
PhD, Human Genetics, UCLA Medicine
• Sungwook Yoon, Data Scientist
Spark & Security Expert
PhD, Computer Engineering, Purdue
• MapR [ @mapr ]
Distributes open source components for Hadoop
Adds major technology for performance, HA, industry standard APIs
© 2014 MapR Technologies 4
What’s Storm? What’s R?
• What’s Storm?
– Processes a data stream. Akin to UNIX pipe + tee & merge commands
– Runs on a cluster. Fault-tolerant and designed to scale out
– Used for: real-time analytics & machine learning
• What’s R?
– Programming language with advanced statistics libraries
– Does not scale out. Can scale up
– Used for: prototyping, data modeling, visualization
How to combine these?
© 2014 MapR Technologies 5
R outside, Storm inside: not practical. Why?
• Model-building and QA is done
on data snapshots
• However, R => Hadoop is
realistic. Key difference:
referenced data can be static
– Use MapR snapshots for dev and
QA
– See also: RHIPE (Purdue) and
RHadoop (RevolutionAnalytics)
R
Storm
User
© 2014 MapR Technologies 6
Storm outside, R inside: a good fit
• Enables separation of concerns
– Independently manage modeling,
ops timelines, and version control
– Integrate as needed
• Enables role specialization
– R built-ins allow faster iteration
and more concise stats-type code
– Do DevOps with specific SW
engineering tech, e.g. Java
Storm
R
User
© 2014 MapR Technologies 7© 2014 MapR Technologies
Q: Who really likes statistics?
A: Baseball fans
A: Team Managers = Portfolio Managers
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Fresh Local Data Tonight!
© 2014 MapR Technologies 10
Famous Vintage Data
Oakland Athletics
2002 Season
20 consecutive
wins – the current
record
Obligatory movie
ref… I’m from LA
LET’S GO DODGERS!
© 2014 MapR Technologies 11© 2014 MapR Technologies
Goal: Detect “Moneyball” 2002 Winning Streak
© 2014 MapR Technologies 12
Methods:
Change Point Detection
Find natural breakpoints in a
time-series set of data points
R packages implement this:
changepoint: more
sensitve, but not streaming
bcp: streaming, but less
sensitive
© 2014 MapR Technologies 13
GIFs to
MapR
Filesystem
Methods: R+Storm Demo Architecture
Storm Bolt
R online
change point
detector
Storm Bolt
(write to Jetty)
Oakland A’s
Data
(accelerated)
Jetty
Webserver
Browser
(D3.js) Us 
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Technologies 14© 2014 MapR Technologies
50-game sliding
window/buffer to
detect change points
Cumulative history
with detected break
points
Raw data (score
difference between
A’s and opponent)
Demo
© 2014 MapR Technologies 15
Methods Details: How it’s done
• Uses R-Storm binding github.com/allenday/R-Storm
– Storm package on CRAN cran.r-project.org/web/packages/Storm
Storm (dev team)
R
(stats team)
Storm
(dev team, pure
Java)
Producer Consumer
© 2014 MapR Technologies 16
Methods Details: Easy integration
R: lambda function
storm = Storm$new();
storm$lambda = function(s) {
t = s$tuple;
t$output = vector(length=1); t$output[1] = “tada!”
s$emit(t)
}
Storm: extend ShellBolt
public static class MyRBolt extends ShellBolt implements IRichBolt
{
public RBolt() {
super("Rscript", ”my.R");
}
}
© 2014 MapR Technologies 17
Results
• Change points are identified, but none for winning streak
– Not using score difference, anyway
• Time to integrate with the modeling team!
– Send @kunpognr or @allenday a pull request on GitHub
• Applicable to many other use cases, e.g.
– Security (fraud detection, intrusion detection)
– Marketing (intent to purchase / social media streams)
– Customer Support (help desk voice calls)
Discussion
© 2014 MapR Technologies 18
Q&A
@allenday maprtech
allenday@mapr.com
Engage with us!
MapR
maprtech
linkedin.com/in/allenday

Mais conteúdo relacionado

Mais procurados

Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
Chris Huang
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
Chris Huang
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
 

Mais procurados (20)

Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Scaling big-data-mining-infra2
Scaling big-data-mining-infra2Scaling big-data-mining-infra2
Scaling big-data-mining-infra2
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 

Semelhante a R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose

Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
Tomer Shiran
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
Revolution Analytics
 

Semelhante a R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose (20)

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Spark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleSpark & Hadoop at Production at Scale
Spark & Hadoop at Production at Scale
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 

Mais de Allen Day, PhD

Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Allen Day, PhD
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
 

Mais de Allen Day, PhD (20)

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
baharayali
 

Último (20)

Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeTechnical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
 
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxUEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
 
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
 
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfJORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
 
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxNetherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
 
Personal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley DennisPersonal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley Dennis
 
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docxAlbania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
Albania Vs Spain South American coaches lead Albania to Euro 2024 spot.docx
 
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
 
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room packageWhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
 
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls AgencyHire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
Hire 💕 8617697112 Kasauli Call Girls Service Call Girls Agency
 
Cricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdf
 
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuropean Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
 
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.Who Is Emmanuel Katto Uganda? His Career, personal life etc.
Who Is Emmanuel Katto Uganda? His Career, personal life etc.
 
Spain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docxSpain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docx
 
Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)
 
Unveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar ChartUnveiling the Mystery of Main Bazar Chart
Unveiling the Mystery of Main Bazar Chart
 
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMuzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose

  • 1. © 2014 MapR Technologies 1 Talk Overview • Agile Real-time Stats • R + Storm github.com/allenday/R-Storm • DEMO • How to do it? • Q & A @allenday Agile Methods Advanced Statistics Continuous Real-time Delivery github.com/allenday/hadoop-summit-r-storm-demo-public
  • 2. © 2014 MapR Technologies 2© 2014 MapR Technologies Architecting R into the Storm Application Development Process
  • 3. © 2014 MapR Technologies 3 Allen (me) and Sungwook @ MapR • Allen Day, Principal Data Scientist [ @allenday ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine • Sungwook Yoon, Data Scientist Spark & Security Expert PhD, Computer Engineering, Purdue • MapR [ @mapr ] Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard APIs
  • 4. © 2014 MapR Technologies 4 What’s Storm? What’s R? • What’s Storm? – Processes a data stream. Akin to UNIX pipe + tee & merge commands – Runs on a cluster. Fault-tolerant and designed to scale out – Used for: real-time analytics & machine learning • What’s R? – Programming language with advanced statistics libraries – Does not scale out. Can scale up – Used for: prototyping, data modeling, visualization How to combine these?
  • 5. © 2014 MapR Technologies 5 R outside, Storm inside: not practical. Why? • Model-building and QA is done on data snapshots • However, R => Hadoop is realistic. Key difference: referenced data can be static – Use MapR snapshots for dev and QA – See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) R Storm User
  • 6. © 2014 MapR Technologies 6 Storm outside, R inside: a good fit • Enables separation of concerns – Independently manage modeling, ops timelines, and version control – Integrate as needed • Enables role specialization – R built-ins allow faster iteration and more concise stats-type code – Do DevOps with specific SW engineering tech, e.g. Java Storm R User
  • 7. © 2014 MapR Technologies 7© 2014 MapR Technologies Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers
  • 8. © 2014 MapR Technologies 8
  • 9. © 2014 MapR Technologies 9 Fresh Local Data Tonight!
  • 10. © 2014 MapR Technologies 10 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins – the current record Obligatory movie ref… I’m from LA LET’S GO DODGERS!
  • 11. © 2014 MapR Technologies 11© 2014 MapR Technologies Goal: Detect “Moneyball” 2002 Winning Streak
  • 12. © 2014 MapR Technologies 12 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive
  • 13. © 2014 MapR Technologies 13 GIFs to MapR Filesystem Methods: R+Storm Demo Architecture Storm Bolt R online change point detector Storm Bolt (write to Jetty) Oakland A’s Data (accelerated) Jetty Webserver Browser (D3.js) Us  github.com/allenday/hadoop-summit-r-storm-demo-public
  • 14. © 2014 MapR Technologies 14© 2014 MapR Technologies 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A’s and opponent) Demo
  • 15. © 2014 MapR Technologies 15 Methods Details: How it’s done • Uses R-Storm binding github.com/allenday/R-Storm – Storm package on CRAN cran.r-project.org/web/packages/Storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer
  • 16. © 2014 MapR Technologies 16 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = “tada!” s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { public RBolt() { super("Rscript", ”my.R"); } }
  • 17. © 2014 MapR Technologies 17 Results • Change points are identified, but none for winning streak – Not using score difference, anyway • Time to integrate with the modeling team! – Send @kunpognr or @allenday a pull request on GitHub • Applicable to many other use cases, e.g. – Security (fraud detection, intrusion detection) – Marketing (intent to purchase / social media streams) – Customer Support (help desk voice calls) Discussion
  • 18. © 2014 MapR Technologies 18 Q&A @allenday maprtech allenday@mapr.com Engage with us! MapR maprtech linkedin.com/in/allenday

Notas do Editor

  1. FILL IN RED WITH CORRECT DETAILS
  2. FILL IN RED WITH CORRECT DETAILS