More Related Content
Similar to Hadoop and R Go to the Movies
Similar to Hadoop and R Go to the Movies (20)
More from DataWorks Summit
More from DataWorks Summit (20)
Hadoop and R Go to the Movies
- 2. © 2014 MapR Technologies 2
Agenda
A sample problem
A general approach
Complications arise
Light is cast on the villains
Who flee from the scene
- 3. © 2014 MapR Technologies 3
Agenda Script
A sample problem
A general approach
Complications arise
Light is cast on the villains
Who flee from the scene
- 4. © 2014 MapR Technologies 4
Model Building in a Nutshell
Gather
data
Build
models
Predict
future
World
domination!
Fight fraud
Save the
planet
✔
- 6. © 2014 MapR Technologies 6
Modeling Energy Use
• Modeling office and home energy use can save energy
• Guides retrofits
• Finds bad leaks
• Increases awareness and understanding of problems
• Demonstrated results of 20% or more savings
• Savings = less CO2 = less planet warming
- 7. © 2014 MapR Technologies 7
Modeling Energy Use
See ASHRAE RP-1050
http://bit.ly/1ovwGfy
- 8. © 2014 MapR Technologies 8
Modeling Energy Use (or not)
- 9. © 2014 MapR Technologies 9
Modeling Energy Use (complete hash)
- 10. © 2014 MapR Technologies 10
Some Notes on the Method
• Can’t change method since this is ASHRAE standard
• Small changes in cutoff can have ragged effect on model fit
– Linear methods out of the question
– Gradient based methods find local minima
• All parameters interact strongly
– Can’t solve for one at a time
- 11. © 2014 MapR Technologies 11
Evolutionary Algorithms
• Basic algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
} until happy with results
• Works great
• Converges very slowly
– If mutation is small, takes many, many steps to find best, gets trapped
– If mutation is too big, keeps jumping away from optimum
- 12. © 2014 MapR Technologies 12
Doesn’t work in practice
- 13. © 2014 MapR Technologies 13
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
use mutation size to set mutation rate per candidate
} until happy with results
• Works great
• Converges very fast
– If small jump works, we get more of that
– If big jump works, we get more of that
- 14. © 2014 MapR Technologies 14
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
do {
keep best x% of solutions
mutate survivors to fill population
use mutation size to set mutation rate per candidate
} until happy with results
• Works great
• Converges very fast
– If small jump works, we get more of that
– If big jump works, we get more of that
- 15. © 2014 MapR Technologies 15
Meta-Evolutionary Algorithms
• Algorithm may go wrong way
• May take wrong-size steps
• But it quickly learns to correct
• Bad strategies die out along with
bad solutions
- 16. © 2014 MapR Technologies 16
But There’s a Rub
• This new algorithm may be gang busters
– But it comes with new knobs to turn
• How can we tell where to turn them?
• How do we make sense of a seething mass of 5 dimensional
spiders?
- 18. © 2014 MapR Technologies 18
Demo Reel Synopsis
• Constant mutation rate failure example
• Meta-mutation succeeds
• Meta-mutation can handle highly correlated narrow valleys
• Very complex landscapes can be navigated
• Strategy shifts fluidly to find solutions
- 20. © 2014 MapR Technologies 20
Not quite that simple
• Current problem is 5-dimensional
• Problem parameters don’t make sense directly
• So we need to show the human face of the problem
(that is where we started!)
• We also need dynamics to understand how the algorithm gets
where it goes
- 21. © 2014 MapR Technologies 21
Main-line Model and Visualization Flow
Data
repo
Solver
grep
Solver
JSON
model
d3 +
twistd
JSON
model
Conventional
Scalable
- 25. © 2014 MapR Technologies 25
Diagnostic Visualizations
Solver
JSON
model
Scalable
Logs
ScaleR ffmpeg
- 26. © 2014 MapR Technologies 26
Of Note
• RevoScaleR solves most of the parallelism issues
• We still want to run arbitrary R
• Some legacy functions are Particularly Unfriendly to hdfs
– png(filename) – requires conventional file access
– system(command) – assumes conventional file access
– ffmpeg (1) – assumes conventional file access
- 27. © 2014 MapR Technologies 27
Simple Solution
• MapR provides hdfs and NFS access to cluster
• All path names are the same
• Map reduce programs can use legacy POSIX code
- 28. © 2014 MapR Technologies 28
Diagnostic Videos
• 5D x 100 can get trapped in local minimum
– ’470 example
• 5D x 500 avoids trapping issues
– ’470 quiescence and resurgence
• 3D x 500 and 3D x 100 also avoid trapping
• Need to distinguish empty house from occupied
– ’771 shows poor fit to either regime, classic real world issue
- 29. © 2014 MapR Technologies 29
Lessons I Learned by Watching Movies
• Lower dimensional problems are easier
– Evolve baseline level and cut-points, solve for wing slopes
– Hybrid solutions are not “cheating”
• Real-world data always has surprises and I am always surprised
by this
• Can use 5P models as cluster “centroids” to handle 2-state
homes
- 30. © 2014 MapR Technologies 30
And there’s a
PRIZE in every
box!