3. BigML, Inc #MLSEV: ML a Technical Perspective
Sampling the Audience
!3
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale
Aficionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed
Practitioner: Very familiar with ML packages (Weka,
Scikit, BigML, etc.)
Newbie: Just taking Coursera ML class or reading an
introductory book to ML
Absolute beginner: ML sounds like science fiction
7. BigML, Inc #MLSEV: ML a Technical Perspective
What is Machine Learning?
!7
Let’s start with what is NOT Machine Learning…
• Sentience
• Killer robots
• Generalized Artificial Intelligence
• Anything to do with the word “singularity”
8. BigML, Inc #MLSEV: ML a Technical Perspective
Oh the Hype!
!8
AlphaGo Zero beats a human at Go… killer robots far off?
• First of all, AlphaGo Zero is impressive!
• But, no need to fear killer robots power by AlphaGo Zero:
• Learning is not transferrable: retrain for chess, etc.
• Works only for rule based systems / perfect simulator
• Relies on games/systems with clear objectives (win/lose)
• Cost $25 million1
“While AlphaGo Zero is a step towards a general-purpose AI, it can only work on
problems that can be perfectly simulated in a computer, making tasks such as
driving a car out of the question. AIs that match humans at a huge range of
tasks are still a long way off” - Demis Hassabis, CEO of DeepMind2
2. https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own
1. https://www.inc.com/lisa-calhoun/google-artificial-intelligence-alpha-go-zero-just-pressed-reset-on-how-we-learn.html
9. BigML, Inc #MLSEV: ML a Technical Perspective
Three Domains
!9
Artificial
Intelligence
Cool/Scary things…
that mostly don’t exist
Machine
Learning
AI Concepts applied to
very specific problems
Deep
Learning
Specific techniques of
Machine Learning
10. BigML, Inc #MLSEV: ML a Technical Perspective
What is Machine Learning?
!10
Let’s start with what is NOT Machine Learning…
• Sentience
• Killer robots
• Generalized Artificial Intelligence
• Anything to do with the word “singularity”
• Something “new”
• First International Conference on ML held in 1980
• Top-performing algorithms have been around for decades
How do these things relate?
11. BigML, Inc #MLSEV: ML a Technical Perspective
AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
What is Machine Learning?
!11
Finding patterns in data that can be used to
make inferences
Predictive Models
A practical definition…
12. BigML, Inc #MLSEV: ML a Technical Perspective
Machine Learning Terminology
!12
Instances
Features
New Instance
Predictive model
Prediction
Confidence
ML algorithm
Label
Training / Learning Predicting / Scoring
Data
14. BigML, Inc #MLSEV: ML a Technical Perspective
Why Machine Learning
!14
COMPLEXITYOFTASKS
TIME20th century 21st century
-
+
15. BigML, Inc #MLSEV: ML a Technical Perspective
Traditional Programming
!15
Lost Baggage Policy
• Explicit rules defined by requirements and experience
• How do we program when the rules are unknown or
very difficult to determine?
16. BigML, Inc #MLSEV: ML a Technical Perspective
Programming with ML
!16
AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
Want: Flight Delay Prediction
Flight Delay Model????
What else can ML do?
18. BigML, Inc #MLSEV: ML a Technical Perspective
Machine Learning Tasks
!18
CLUSTER
ANALYSIS
ANOMALY
DETECTION
ASSOCIATION
DISCOVERY
TOPIC MODELING
TIME SERIES
UNSUPERVISED
CLASSIFICATION AND REGRESSION
SUPERVISED
19. BigML, Inc #MLSEV: ML a Technical Perspective
Predictive Maintenance
!19
CLASSIFICATION Will this component fail?
REGRESSION How many days until this component fails?
TIME SERIES FORECASTING How many components will fail in a week from now?
CLUSTER ANALYSIS Which machines behave similarly?
ANOMALY DETECTION Is this behavior normal?
ASSOCIATION DISCOVERY What alerts are triggered together before a failure?
20. BigML, Inc #MLSEV: ML a Technical Perspective
Personalized Music
!20
CLASSIFICATION Will this song be a hit?
REGRESSION How many users will play this song next month?
TIME SERIES FORECASTING
How many downloads this song will have in 3
months?
CLUSTER ANALYSIS Which songs are similar?
ANOMALY DETECTION Is this song being played more than normal?
ASSOCIATION DISCOVERY What songs people like to play together?
21. BigML, Inc #MLSEV: ML a Technical Perspective
Airline Revenue Management
!21
CLASSIFICATION Will this flight be booked at 80% 14 days out?
REGRESSION
How many passengers will book this flight 7 days
out?
TIME SERIES FORECASTING How many tickets will be cancelled this week?
CLUSTER ANALYSIS Which flight booking patterns are similar?
ANOMALY DETECTION Are these flights booking patterns normal?
ASSOCIATION DISCOVERY What price changes help overbook sooner?
22. BigML, Inc #MLSEV: ML a Technical Perspective
Network Security
!22
CLASSIFICATION Is this email part of a phishing attack?
REGRESSION How many logins after work per week?
TIME SERIES FORECASTING What will be the number of false alarms next week?
CLUSTER ANALYSIS Are these users behaving similarly?
ANOMALY DETECTION Is this user behavior worth to inspect?
ASSOCIATION DISCOVERY What alerts were triggered before this attack?
24. BigML, Inc #MLSEV: ML a Technical Perspective
All ML Models are WRONG
!24
TRUE FALSE
DEEPNET ENSEMBLELOGISTIC
REGRESION
DECISION TREE
Some model(s) is wrong… which one?
Same patient… different models… different predictions!
Insight: Need a way to measure model fitness
25. BigML, Inc #MLSEV: ML a Technical Perspective
Evaluating Models
!25
TEST
TRAINING
CONFIDENCEPREDICTION
%
EVALUATION
%
ENSEMBLE
PATIENT DATA
Stay Tuned: You will see this in Evaluations
26. BigML, Inc #MLSEV: ML a Technical Perspective
Measuring ML Mistakes
!26
TRUE FALSE
TRUE
TRUE
POSITIVE
FALSE
POSITIVE
FALSE
FALSE
NEGATIVE
TRUE
NEGATIVE
MODEL
ACTUAL
We can bend the rules a bit…
27. BigML, Inc #MLSEV: ML a Technical Perspective
Operating Point
!27
TRUE
FALSE
100% 0%
0% 100%
Operating Point
More False Positives More False Negatives
Why would you do this?
28. BigML, Inc #MLSEV: ML a Technical Perspective
Comparing Models
!28
%TRUEPOSITIVES
% FALSE POSITIVES
WORST(?) MODEL
IDEAL MODEL
GOOD
BETTER
R
AN
D
O
M
TRIVIAL MODEL
TRIVIAL MODEL
29. BigML, Inc #MLSEV: ML a Technical Perspective
Mistakes can be Costly
!29
+ =
FUN!
DANGER!
30. BigML, Inc #MLSEV: ML a Technical Perspective
Cost Functions
!30
GOOD
BETTER?%TRUEPOSITIVES
% FALSE POSITIVES
• What is the cost of predicting cancer incorrectly?
• What is the cost of labeling a fraudulent transaction as valid?
• What is the cost of incorrectly predicting an aircraft part is safe?
• Why can’t I just have a perfect model?
FALSE NEGATIVE COST
FALSE POSITIVE COST
One possibility
31. BigML, Inc #MLSEV: ML a Technical Perspective
How it Goes All Wrong
!31
• Over-fitting
• Under-fitting
32. BigML, Inc #MLSEV: ML a Technical Perspective
Hunting Dog Image Classifier
!32
TRU
E
FAL
SE
Which images are pictures of dogs that are
bred to be hunters?
33. BigML, Inc #MLSEV: ML a Technical Perspective
Over-fitting…
!33
“Hunting dogs are short-
haired spotted puppies that
lay out on the grass”
34. BigML, Inc #MLSEV: ML a Technical Perspective
Title
!34
A perfect model! How about some new images…
TRU
E
FAL
SE
35. BigML, Inc #MLSEV: ML a Technical Perspective
Over-fitting
!35
Model: true
Reality: false
Model: false
Reality: true
• This is an example or poor generalization
• The model “fit” the training data perfectly
• But it does not generalize to new instances well
36. BigML, Inc #MLSEV: ML a Technical Perspective
Under-fitting
!36
“Dogs with drop or pendant
ears are hunters”
Only use ear shape:
37. BigML, Inc #MLSEV: ML a Technical Perspective
Title
!37
An imperfect model… now we are making some
mistakes on the training data.
TRU
E
FAL
SE
38. BigML, Inc #MLSEV: ML a Technical Perspective
Under-fitting
!38
• This is an example of good generalization
• The model “under-fit” the training data
• But it is generalizing to new instances better
Model: true
Reality: true
Model: false
Reality: false
39. BigML, Inc #MLSEV: ML a Technical Perspective
Under-fitting
!39
Model: false
Reality: true
Model: false
Reality: true
40. BigML, Inc #MLSEV: ML a Technical Perspective
Learning Problems / Complexity
!40
Under-fitting Over-fitting
• High Complexity Model
• Fitting the data too well
One way to mitigate this is with different types of models…
• Low Complexity Model
• Not fitting the data very well
41. BigML, Inc #MLSEV: ML a Technical Perspective
Choosing the ML Algorithm
!41
Decreasing Interpretability / Better Representation / Longer Training
IncreasingDataSize/Complexity
Early Stage
Rapid Prototyping
Mid Stage
Proven Application
Late Stage
Critical Performance
DeepnetsSingle Tree Model
Logistic Regression Boosted Trees
Random
Decision Forest
Decision Forest
Hard?
44. BigML, Inc #MLSEV: ML a Technical Perspective
BigML Deepnet
!44
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes significant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)
45. BigML, Inc #MLSEV: ML a Technical Perspective
Automating Machine Learning
!45
http://www.clparker.org/ml_benchmark/
46. BigML, Inc #MLSEV: ML a Technical Perspective
Automating Machine Learning
!46
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to find ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but finds the optimum
machine learning algorithm and parameters for your data
automatically
Key Insight: We can solve any parameter selection
problem in a similar way.
48. BigML, Inc #MLSEV: ML a Technical Perspective
Fusions
!48
Key Insight: ML algorithms each have unique
strengths and weaknesses
Single Tree: output changes abruptly
with inputs near decision boundary
Tree + Deepnet: output changes smoothly
with inputs near decision boundary
49. BigML, Inc #MLSEV: ML a Technical Perspective
Fusions
!49
Model Skills: Some ML algorithms “generally” do better
on some feature types:
• RDF for sparse text vectors
• LR/Deepnets for numeric features
• Trees for categorical features
Full
Numeric
Text
50. BigML, Inc #MLSEV: ML a Technical Perspective
Summary
!50
• Machine Learning is a subset of “Artificial Intelligence”
• Finds patterns in data that can be used to make inferences
• Can be thought of as “programming with data”
• Has been around for a long time (only recently practical)
• Already being used to solve real-world problems
• Caveat Emptor:
• Machine Learning mistakes are expected
• Care must be taken to address the cost of mistakes
• Automating Machine Learning
• Powerful application of ML to parameterizing ML
• Models can be fused to address specific data complexities