3. BigML, Inc 3Introduction to ML and BigML Platform
Sampling the Audience
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale
Aficionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed
Practitioner: Very familiar with ML packages (Weka,
Scikit, BigML, etc.)
Newbie: Just taking Coursera ML class or reading an
introductory book to ML
Absolute beginner: ML sounds like science fiction
4. BigML, Inc 4Introduction to ML and BigML Platform
What is Machine Learning?
Finding patterns in data that can be used to
make inference
predictive models
5. BigML, Inc 5Introduction to ML and BigML Platform
BigMLCOMPLEXITYOFTASKS
TIME20th century 21st century
-
+
A NEW PROGRAMMING PARADIGM
10. BigML, Inc 10Introduction to ML and BigML Platform
What just happened?
Churn
Data
How many
support calls?
Model Prediction:
Churn=yes
11. BigML, Inc 11Introduction to ML and BigML Platform
Some Terminology…
Churn
Data
Model Prediction:
Churn=yes
Training
Data
• Modeling
• Clustering
• Anomaly Detection
• Association Discovery
ML
Resource
ML
Platform
“Consume” the model
or
“put into production”
• Dashboard
• Custom Application
• Wearable / Edge device
• Batch Process
12. BigML, Inc 12Introduction to ML and BigML Platform
A Brief History of BigML
• BigML Mission: To make Machine
Learning Beautifully Simple
• BigML Founded in Corvallis,
Oregon in 2011 - long before ML
was "cool"
• You’ve never heard of it?
• Most innovative city in the United
States!
14. BigML, Inc 14Introduction to ML and BigML Platform
BigML Platform
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
15. BigML, Inc 15Introduction to ML and BigML Platform
BigML Platform
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
On-Premises
16. BigML, Inc 16Introduction to ML and BigML Platform
Machine Learning Tasks
Question ML Resource
Based on previous loan outcomes,
Will this customer default on a loan?
Models / Ensembles
Deepnets / LR
How many customers will apply for a
loan next month?
Models, Ensembles,
Deepnets
Based on trends, how much of this
product will I sell next month?
Time Series
Is the consumption of this product
unusual?
Anomaly Detection
Is the behavior of these customers
similar?
Clusters
Are these products purchased
together?
Association
Discovery
What are the thematic contents of
these documents?
Topic Models
17. BigML, Inc 17Introduction to ML and BigML Platform
Lending Club Loan Lifecycle
“Closed”“Open”
Fully Paid
Late
16-30
Days
Late
31-120
Days
Charged
Off
Default
Current
In Grace
Period
( if ( = ( field "loan_status" ) "Fully Paid" ) "good", "bad" )
s3://bigml-public/csv/lc_sample.csv.gz
19. BigML, Inc 19Introduction to ML and BigML Platform
What just happened?
• We started with loan data from Lending Club pulled from S3
• At the Source step, we fixed some datatypes
• At the Dataset step, we used Flatline to filter and create the
loan "quality" feature
• When configuring the Model, we set some advanced settings:
• removed correlated features: int_rate
• enabled objective balancing (wait… why?)
20. BigML, Inc 20Introduction to ML and BigML Platform
Weighting
Instance Rate Payment Status Predict Confidence
1 23 % 134 Paid Paid 20 %
2 23 % 134 Paid Paid 25 %
3 23 % 134 Paid Paid 30 %
... ... ... ... ...
1000 23 % 134 Paid Paid 99,5 %
1001 23 % 134 Default Paid 99,4 %
Problem: Default is “more important”
but occurs less often than Paid
21. BigML, Inc 21Introduction to ML and BigML Platform
What just happened?
• We started with loan data from Lending Club pulled from S3
• At the Source step, we fixed some datatypes
• At the Dataset step, we used Flatline to filter and create the
loan "quality" feature
• When configuring the Model, we set some advanced settings:
• removed correlated features: int_rate
• enabled objective balancing
• We explored the Model to see what factors predict default.
• We deployed the Model into a voice interface to make
Predictions.
Question: Can we trust this model?
22. BigML, Inc 22Introduction to ML and BigML Platform
Evaluations
DATASET
TRAIN SET
TEST SET
PREDICTIONS
METRICS
?
?
?
?
?
?
23. BigML, Inc 23Introduction to ML and BigML Platform
Evaluation Metrics
• Imagine we have a model that can predict a person’s dominant
hand, that is for any individual it predicts left / right
• Define the positive class
• This selection is arbitrary
• It is the class you are interested in!
• The negative class is the “other” class (or others)
• For this example, we choose : left
24. BigML, Inc 24Introduction to ML and BigML Platform
Evaluation Metrics
• We choose the positive class: left
• True Positive (TP)
• We predicted left and the correct answer was left
• True Negative (TN)
• We predicted right and the correct answer was right
• False Positive (FP)
• Predicted left but the correct answer was right
• False Negative (FN)
• Predict right but the correct answer was left
25. BigML, Inc 25Introduction to ML and BigML Platform
Evaluation Metrics
True Positive: Correctly predicted the positive class
True Negative: Correctly predicted the negative class
False Positive: Incorrectly predicted the positive class
False Negative: Incorrectly predicted the negative class
26. BigML, Inc 26Introduction to ML and BigML Platform
Accuracy
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Ex: 90% of people are right-handed and 10% are left
• A silly model which always predicts right handed is
90% accurate
27. BigML, Inc 27Introduction to ML and BigML Platform
Accuracy
Classified as
Left Handed
Classified as
Right Handed
TP = 0
FP = 0
TN = 7
FN = 3
= Left
= RightPositive
Class
Negative
Class TP + TN
Total
= 70%
28. BigML, Inc 28Introduction to ML and BigML Platform
Precision
TP
TP + FP
• “accuracy” or “purity” of positive class
• How well you did separating the positive class from the
negative class
• If Precision = 1 then no FP.
• You may have missed some left handers, but of the
ones you identified, all are left handed. No mistakes.
• If Precision = 0 then no TP
• None of the left handers you identified are actually left
handed. All mistakes.
29. BigML, Inc 29Introduction to ML and BigML Platform
Precision
Classified as
Left Handed
Classified as
Right Handed
TP = 2
FP = 2
TN = 5
FN = 1
Positive
Class
Negative
Class
= Left
= Right
TP
TP + FP
= 50%
30. BigML, Inc 30Introduction to ML and BigML Platform
Recall
TP
TP + FN
• percentage of positive class correctly identified
• A measure of how well you identified all of the positive
class examples
• If Recall = 1 then no FN → All left handers identified
• There may be FP, so precision could be <1
• If Recall = 0 then no TP → No left handers identified
31. BigML, Inc 31Introduction to ML and BigML Platform
Recall
Classified as
Left Handed
Classified as
Right Handed
TP = 2
FP = 2
TN = 5
FN = 1
Positive
Class
Negative
Class
= Left
= Right
TP
TP + FN
= 66%
32. BigML, Inc 32Introduction to ML and BigML Platform
f-Measure
2 * Recall * Precision
Recall + Precision
• harmonic mean of Recall & Precision
• If f-measure = 1 then Recall == Precision == 1
• If Precision OR Recall is small then the f-measure is small
33. BigML, Inc 33Introduction to ML and BigML Platform
f-Measure
Classified as
Fraud
Classified as
Not Fraud
R = 66%
P = 50%
f = 57%
Positive
Class
Negative
Class
= Left
= Right
34. BigML, Inc 34Introduction to ML and BigML Platform
Phi Coefficient
__________TP*TN_-_FP*FN__________
SQRT[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]
• Returns a value between -1 and 1
• If -1 then predictions are opposite reality
• =0 no correlation between predictions and reality
• =1 then predictions are always correct
35. BigML, Inc 35Introduction to ML and BigML Platform
Phi Coefficient
Classified as
Fraud
Classified as
Not Fraud
TP = 2
FP = 2
TN = 5
FN = 1
Phi = 0.356
Positive
Class
Negative
Class
= Left
= Right
37. BigML, Inc 37Introduction to ML and BigML Platform
What just happened?
• We split the Lending Club data into training and test Datasets
• We created a Model and Evaluation
• Looking at the Accuracy, we saw that the Model was
performing well but because of unbalanced classes
• The resulting Model did well at predicting good loans
• But bad loans are "more important"
• We tried different weights to increase the Recall of bad loans:
• objective balancing: equal consideration
• class weights: bad = 1000, good = 1
• Finally, we explored the impact of changing the probability
threshold
38. BigML, Inc 38Introduction to ML and BigML Platform
Evaluation
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
• Even a train/test split may not be enough!
• Might get a “lucky” split
• Solution is to repeat several times (formally to cross validate)
• Don’t forget that accuracy can be mis-leading!
• Mostly useless with unbalanced classes (left/right?)
• Use weighting, operating points, other tricks…
39. BigML, Inc 39Introduction to ML and BigML Platform
What else can we try?
• Rather than build a single model…
• Combine the output of several typically
“weaker” models into a powerful ensemble…
• How do we create unique models from the
same training dataset?
Ensembles!
40. BigML, Inc 40Introduction to ML and BigML Platform
Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
COMBINER
41. BigML, Inc 41Introduction to ML and BigML Platform
Random Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
SAMPLE 1
PREDICTION
COMBINER
42. BigML, Inc 42Introduction to ML and BigML Platform
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
ERROR
PREDICTION 1
ERROR
PREDICTION 2
ERROR
PREDICTION 3
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
46. BigML, Inc 46Introduction to ML and BigML Platform
Logistic Regression
𝑥➝-∞
𝑙(𝑥)➝0
• Fits logistic function to probability of
output class
• Result is a set of coefficients,
one for each feature * class
𝑥➝∞
𝑙(𝑥)➝1
Goal
1
1 + 𝒆− 𝑥
𝑙(𝑥) =
Logistic Function
𝑃≈0 𝑃≈10<𝑃<1
𝑓(𝑿)=𝛽0+𝞫·𝑿=𝛽0+𝛽1 𝑥1+⋯+𝛽𝑖 𝑥𝑖
𝑃(𝑿)=
1
1+𝑒−𝑓(𝑿)
53. BigML, Inc 53Introduction to ML and BigML Platform
BigML Deepnets
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes significant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)
55. BigML, Inc 55Introduction to ML and BigML Platform
OptiML
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to find ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but finds the optimum
machine learning algorithm and parameters for your data
automatically
56. BigML, Inc 56Introduction to ML and BigML Platform
Fusions
• Similar to an Ensemble, but we can mix different model types
• Logistic Regression, plus a Deepnet for example
• You can also create a fusion with different training sets!
• Last week, plus last month data, etc
• Or a Fusion of OptiML models
• Combines the “best of the best”
58. BigML, Inc 58Introduction to ML and BigML Platform
What just happened?
• We applied several different classification methods to the
Lending Club training data
• Decision Forest
• Random Decision Forest: Examined effect of threshold
• Boosted Trees
• Logistic Regression
• Deepnets
• OptiML: Optimized for Recall of bad
• Fusion: Created from top OptiML models
• Then we created an evaluation of each one of the methods
using the test dataset
• We compared the evaluations using a ROC curve
59. BigML, Inc 59Introduction to ML and BigML Platform
Supervised Learning
animal state … proximity action
tiger hungry … close run
elephant happy … far take picture
… … … … …
Classification
animal state … proximity min_kmh
tiger hungry … close 70
hippo angry … far 10
… …. … … …
Regression
label
We need different Evaluation Metrics…
61. BigML, Inc 61Introduction to ML and BigML Platform
Mean Absolute Error
e1
e2
e7
e6
e5
e4
e3
MAE = |e1|+|e2|+ … +|en|
n
62. BigML, Inc 62Introduction to ML and BigML Platform
Mean Squared Error
e1
e2
e7
e6
e5
e4
e3
MSE = (e1)2
+(e2)2
+ … +(en)2
n
63. BigML, Inc 63Introduction to ML and BigML Platform
MSE versus MAE
• For both MAE & MSE: Smaller is better, but values are
unbounded
• MSE is always larger than or equal to MAE
66. BigML, Inc 66Introduction to ML and BigML Platform
R-Squared Error
e1
e2
e7
e6
e5
e4
e3
Mean
v1
v2
v3 v4 v5
v7
v6
MSEmodel
MSEmean
RSE = 1 -
67. BigML, Inc 67Introduction to ML and BigML Platform
R-Squared Error
• RSE: measure of how much better the model is than
always predicting the mean
• < 0 model is worse then mean
• MSEmodel > MSEmean
• = 0 model is no better than the mean
• MSEmodel = MSEmean
• ➞ 1 model fits the data “perfectly”
• MSEmodel = 0 (or MSEmean >> MSEmodel)
MSEmodel
MSEmean
RSE = 1 -
69. BigML, Inc 69Introduction to ML and BigML Platform
What just happened?
• We started with the open loans in the Lending Club dataset
• Performed a train/test split
• Built a model to predict the int_rate without grade/sub-grade
• Evaluated this model
70. BigML, Inc 70Introduction to ML and BigML Platform
Time Series
Year Pineapple Harvest
1986 50,74
1987 22,03
1988 50,69
1989 40,38
1990 29,80
1991 9,90
1992 73,93
1993 22,95
1994 139,09
1995 115,17
1996 193,88
1997 175,31
1998 223,41
1999 295,03
2000 450,53
Pineapple Harvest
Tons
0
125
250
375
500
Year
1986 1988 1990 1992 1994 1996 1998 2000
Trend
71. BigML, Inc 71Introduction to ML and BigML Platform
Exponential Smoothing
For training values 𝒙𝒕
Smoothing Factor 0 < α < 1
Predicted values 𝒔𝒕
𝒔𝒕 = α·𝒙𝒕 + ⟮1-α⟯·𝒔𝒕-1
Weight
0
0,05
0,1
0,15
0,2
1 3 5 7 9 11 13
Each new value in the series depends on all previous
values with a decaying weight
Idea:
74. BigML, Inc 74Introduction to ML and BigML Platform
Unsupervised Learning
1. Training data provides “examples” - no specific “outcome”
2. The machine tries to find “interesting” patterns in the data
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
75. BigML, Inc 75Introduction to ML and BigML Platform
Unsupervised Learning
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
The Sally 6788 sign food 26339 51
Anomaly Detection
unusual
76. BigML, Inc 76Introduction to ML and BigML Platform
Unsupervised Learning
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
The Sally 6788 sign food 26339 51
Clustering
similar
77. BigML, Inc 77Introduction to ML and BigML Platform
Unsupervised Learning
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
zip = 46140
amount < 100
{customer = Bob, account = 3421}
{class = gas}
Association Discovery
78. BigML, Inc 78Introduction to ML and BigML Platform
Let’s build a recommender
Typical way to shop for a home…
79. BigML, Inc 79Introduction to ML and BigML Platform
Recommender Idea
?
?
?
?
Preference
Model
Preference
Data
Sample
… then use the Preference Model to
filter all the homes on the market
All Homes
Forsale
80. BigML, Inc 80Introduction to ML and BigML Platform
Recommender Problem #1
What if there are really unusual homes in the data?
• A mansion with 20 bathrooms
• A home with no bedrooms
• A lot size that is smaller than the home?
We don’t want to show these as suggestions
because they are unusual….
How do we detect anomalies?
81. BigML, Inc 81Introduction to ML and BigML Platform
Isolation Forest
Grow a random decision tree until
each instance is in its own leaf
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several times and
use average Depth to compute anomaly
score: 0 (similar) -> 1 (dissimilar)
83. BigML, Inc 83Introduction to ML and BigML Platform
What just happened?
• We wanted to find and remove unusual houses.
• We created an Anomaly Detector and examined
the top anomalies.
• We found some unusual houses to remove and
discovered bad data (missing values) that we want
to fix.
84. BigML, Inc 84Introduction to ML and BigML Platform
A clever way to fix missing data
Let’s use Machine Learning…
BEDS BATHS
SQFT PRICE BEDS BATHS
3.125 $530.000 5 3
2.100 $460.000 2
1.200 $250.000 3
3.950 $610.000 6 4
4
1.5
86. BigML, Inc 86Introduction to ML and BigML Platform
What just happened?
• We had a Dataset with missing values.
• We wanted to apply an algorithm to fix the missing
values with Machine Learning
• Rather than write the algorithm, we found what we
needed in the WhizzML public gallery.
• Now that we have cloned the Script we can use it
again and again.
• We can write new ones too!
87. BigML, Inc 87Introduction to ML and BigML Platform
Recommender Problem #2
• How can we avoid showing essentially the
same house over and over?
All Homes
?
?
?
Sample
Modern
88. BigML, Inc 88Introduction to ML and BigML Platform
Recommender Problem #2
• How can we avoid showing essentially the
same house over and over?
All Homes
Modern
Lots of
Land
• Great! What if we don’t know how to group
them? Or how many groups?
?
sample
?
sample
90. BigML, Inc 90Introduction to ML and BigML Platform
What just happened?
• Since we don’t know how many groups of homes
there should be, we used G-means Clustering to find
the optimum number of groups of homes
• Our recommender will use these groups to create a
better sampling for user preference
• We also tried to understand the home clusters using
“model clusters” but the models were difficult to
interpret
91. BigML, Inc 91Introduction to ML and BigML Platform
Understanding Clusters Better
If SQFT >= 3,125 THEN “Cluster 1”
What if we could get rules like…
SQFT PRICE BEDS BATHS CLUSTER
3.125 $530.000 5 3 Cluster 1
2.100 $460.000 4 2 Cluster 3
1.200 $250.000 3 1,5 Cluster 5
3.950 $610.000 6 4 Cluster 1
93. BigML, Inc 93Introduction to ML and BigML Platform
What just happened?
• We used a Batch Centroid to add the Cluster
assignment of each home as a feature to the Dataset
• We use Association Discovery to find “interesting”
relationships between the features including the Cluster
assignment
94. BigML, Inc 94Introduction to ML and BigML Platform
Recommender Problem #3
There is much more interesting information than just the
number of BEDS, BATHS, etc.
• Unfortunately, these "remarks" are not available in the
Redfin download
• Adding them to our dataset requires crawling the
website
• Like most ML projects, preparing the data is 80% of
the difficulty (fortunately I already did it!)
96. BigML, Inc 96Introduction to ML and BigML Platform
What just happened?
• We extending the home dataset with the syndicated
remarks text field
• We built a model to predict sale price and explored how
key words discovered in the remarks impacted price
• We used topic modeling to create a deeper thematic
understanding of the remarks
• Homes that are "in-town" or "out-of-town"
• We extended the dataset with fields that represent for
each home how related they are to each of these topics
• This will allow our clustering to group homes by a deeper
meaning than just BEDS, BATHS, etc
97. BigML, Inc 97Introduction to ML and BigML Platform
Recommender Idea
?
?
Modern
Lots of
Land
Small
?
?
?
?
Preference
Model
Preference
Data