Mais conteúdo relacionado Semelhante a Datascience101presentation4 (20) Mais de Salford Systems (20) Datascience101presentation42. Revisit Today’s Webinar Materials
For anyone who may have been running late
or wanted to reference these materials, we are
happy to provide the presentation and a link
to the recording of the webinar.
Expect to hear from us after the presentation!
© Minitab Inc. 210/24/2017
3. Today’s Discussion (10/24)
Quick Refresher – What can Machine Learning do for you
Salford Systems – Pioneering Predictive Analytics and
Machine Learning
Manufacturing Defects Dataset: Applied Examples
CART
TreeNet
Random Forest
© Minitab Inc. 310/24/2017
Today’s Presenter
Charlie Harrison
Charlie is part of Salford’s Data
Scientist Team, and has been
providing customer support and
training for several years.
His favorite thing about Data Science
is proving theoretical results.
4. What Can Machine Learning Do For You?
© Minitab Inc. 410/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
5. How Broad and Deep is the Application
Potential?
Machine learning methods can be applied in almost any context. The following is a
brief selection of industry and functional examples:
© Minitab Inc. 510/24/2017
FINANCIAL
SERVICES
MANUFACTURING SALES MARKETING
FUNCTIONAL AREASINDUSTRIES
Loan Defaults
Manufacturing
Defects
Fraud Prevention
Preventative
Maintenance
Customer
Churn
Customer
Segmentation
Cross-
Sell/Upsell
Marketing Lift
HEALTH CARE
Disease
Prevention
Genetics
OTHER
INDUSTRIES
Insurance Claims
Environmental
Impacts
6. CLASSIFICATION MODELS using CART, Gradient Boosting
& Random Forests
© Minitab Inc. 610/24/2017
CLASSIFICATION
Predict a qualitative value
UNSUPERVISED LEARNING
Clustering
TIME SERIES
Predict future values
based on past values
SURVIVAL ANALYSIS
Predict time until occurrence
REGRESSION
Predict a quantitative value
7. What Do You Need to Get Started?
© Minitab Inc. 710/24/2017
Sufficient Data Pick the Right
Problem
Solve with
the Right Tool
Have you downloaded SPM 8.2? After this webinar, we’ll give you access to the dataset used so you
can try it out for yourself.
https://info.salford-systems.com/spm-8-download
9. Salford’s Legacy in Pioneering Predictive Analytics &
Machine Learning
Salford’s solutions are innovative, reliable and robust because they were created and
are implemented by inventors and pioneers of Predictive Analytics & Machine
Learning (PAML):
• Dr. Jerome Friedman (Professor of Statistics, Stanford)
• Dr. Leo Breiman (Professor of Statistics, UC Berkeley)
The algorithms covered today were either created or co-created by either Dr. Breiman
or Dr. Friedman.
© Minitab Inc. 910/24/2017
10. © Minitab Inc. 1010/24/2017
Accuracy of Prediction
Defensibility of Models
Salford’s models are defensible internally to executive stakeholders and
externally to regulators
Salford solutions are distinguished in particular by their:
Salford Stands Out Against Competitors
Salford’s models stand the test of time and are used by some of the biggest
corporations in the world
Ease of Use
Salford’s models don’t require coding
11. Suite of Solutions – Data Science Toolkit
Time- and market-tested predictive modeling tools including
everything from market-leading decision tree and classification
engines to advanced interaction detection and automation to state-
of-the-art machine learning capabilities.
© Minitab Inc. 1110/24/2017
SPM Software Suite
CART MARS
Random
Forests
TreeNet RuleLearner ISLE GPS
Decision trees Nonlinear
regression
Data ensemble
bagging
Gradient
boosting
Rule ensemble Model
compression
Regularized
regression
12. Why Do Classification Models Matter?
Classification methods are a simple, effective
and accurate approach to solve organization’s
most difficult problems and uncover new
opportunities by narrowing down with factors
have the most impact in your outcome
Some of the most common applications
include:
• Fraud Prevention
• Risk Reduction in Credit Scoring and Loan
Default
• Optimizing Marketing Campaigns
• Improving Operations
© Minitab Inc. 1210/24/2017
FINANCIAL SERVICES
MANUFACTURING
SALES
MARKETING
FUNCTIONAL
AREAS
INDUSTRIES
What promotions are most effective?
HEALTH CARE
What machine signals are predictive
of defects?
Does customer satisfaction influence
loyalty?
Does level of education impact credit
risk?
Does body weight influence the risk
of heart disease?
13. Machine Learning Terminology
Response Variable = Dependent Variable = Target
Variable
This is what we are trying to predict
Examples: default vs. no default, air pressure, number of
claims, etc.
Predictor Variables = Predictors = Factors
This is what we use to predict the response.
Example: I will use two predictors, level of education and
work experience, to predict income which is the target
variable.
Algorithm = Method Used = Technique
This is the method that we will use to both predict the
target variable and discover the relationships, if any,
between the predictors and the target.
Examples: CART decision trees, gradient boosted trees,
Random Forests, LASSO, Elastic Net, MARS, Support
Vector Machines (SVMs), and Neural Networks.
© Minitab Inc. 1310/24/2017
Target Variable: Defect
Predictor Variables: Signal
1, Signal 2, …, Signal 590
Algorithm: Logistic
Regression
Regression𝐷𝑒𝑓𝑒𝑐𝑡 = = 𝛽0 + 𝛽1 𝑆𝑖𝑔𝑛𝑎𝑙1 + ⋯
+ 𝛽590 𝑆𝑖𝑔𝑛𝑎𝑙590
CART =
𝐷𝑒𝑓𝑒𝑐𝑡 =
Target Variable: Defect
Predictor Variables:
Signal 1, Signal 2, …, Signal
590
Algorithm: CART decision
tree
Putting It All Together
Signal 1, Signal 2, … Signal 590
Signal 1, Signal 2, … Signal 590
15. Let’s Get Started . . .
© Minitab Inc. 1510/24/2017
MANUFACTURING
Open SPM
DATA SET
Live Demo
Manufacturing
Defects
A manufacturing process involves myriad machines, and the information concerning the operation of the machines is recorded.
There are 590 metrics recorded from the machines from the start of the process to the end and we’ll refer to these metrics as
“signals.”
1. What signals, if any, are
predictive of
manufacturing defects?
2. If signals are predictive
of defects, then how are
these signals related to
the likelihood of
manufacturing defects?
16. CART and Random Forests
© Minitab Inc.
16
10/24/2017
SPM ENGINE
PREDICTIVE
PERFORMANCE
AUTOMATIC
VARIABLE
SELECTION
AUTOMATIC
INTERACTION
DETECTION
AUTOMATIC
MISSING
VALUE/OUTLIER
HANDLING
AUTOMATIC
MODELING OF
LOCAL EFFECTS
INVARIANT TO
MONOTONE
TRANSFORMATIONS
OF PREDICTORS
INTERPRETABILITY
A Random Forest prediction is really just an average of CART tree predictions. When you build a Random
Forest model just keep this picture in the back of your mind:
© Minitab Inc.
16
10/24/2017
17. Solving Problems with Machine Learning:
Machine Settings and Manufacturing Defects
A manufacturing process involves myriad machines, and the information concerning the operation of the
machines is recorded. There are 590 metrics recorded from the machines from the start of the process to
the end and we’ll refer to these metrics as “signals.”
We will try to answer two primary questions:
1. What signals, if any, are predictive of manufacturing defects?
2. If signals are predictive of defects, then how are these signals related to the likelihood of manufacturing
defects?
We will use an algorithm called gradient boosting to do this. TreeNet® software will be used. TreeNet is
unique in that its code was originally written by Jerome Friedman, the creator of gradient boosting.
© Minitab Inc. 1710/24/2017
Manufacturing
Defects
18. Dataset Citations
Manufacturing Defect Dataset: Michael McCann and Adrian
Johnston donated the dataset to the UCI Machine Learning
Repository in 2008:
Link: https://archive.ics.uci.edu/ml/datasets/SECOM
© Minitab Inc. 1810/24/2017
19. CART
Let’s apply CART to the
manufacturing defect dataset.
Applying CART
1. Build the model in SPM
2. Understand CART Relative Cost
3. Find the most interesting rules that
are predictive of manufacturing
defects using Hotspot Detection
4. Using the model: Generating
manufacturing defect predictions
and deploying CART outside of SPM
SIGNAL_66
SIGNAL_247
SIGNAL_246
SIGNAL_60
SIGNAL_293
SIGNAL_60 SIGNAL_311
SIGNAL_111
SIGNAL_549
SIGNAL_246
SIGNAL_112
SIGNAL_158
SIGNAL_158 SIGNAL_21
SIGNAL_359
SIGNAL_359
SIGNAL_294
© Minitab Inc. 1910/24/2017
20. CART Review
© Minitab Inc. 2010/24/2017
CART is a decision tree algorithm that divides
the data so that the dependent variable can be
predicted more accurately
CART automatically:
1. Selects variables
2. Models nonlinear relationships
3. Model local effects
4. Models interactions
5. Handles missing values
21. CART : Relative Cost
Relative Cost =
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑀𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐶𝑜𝑠𝑡 𝑈𝑠𝑖𝑛𝑔 𝑎 𝐶𝐴𝑅𝑇 𝑡𝑟𝑒𝑒
𝑁𝑜 𝐷𝑎𝑡𝑎 𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝑅𝑢𝑙𝑒
The No Data Optimal Rule classifies every observation as one class. More specifically, the class
chosen for the no data optimal rule is the class that has the lowest cost compared to the other(s)
Good: If the relative cost is closer to zero (closer is better) then CART is better than the No Data
Optimal Rule
Bad: If the relative cost is equal to 1 then the CART error is the same as the No Data Optimal Rule
which means that CART is no better than just predicting every observation as the same class
The relative cost can be greater 1 which is especially bad and, more generally, values around 1 should be
considered “bad”
No Data Optimal Rule
Predicted Class:
Relative Cost = .44
CART Predicted
Class:
CART Predicted
Class:
CART Predicted
Class:
X2 <= -0.49
Terminal
Node 1
Class = Circle
Class Cases %
Circle 6 100.0
Triangle 0 0.0
W = 6.00
N = 6
X1 <= 0.23
Terminal
Node 2
Class = Triangle
Class Cases %
Circle 1 14.3
Triangle 6 85.7
W = 7.00
N = 7
X1 > 0.23
Terminal
Node 3
Class = Circle
Class Cases %
Circle 9 75.0
Triangle 3 25.0
W = 12.00
N = 12
X2 > -0.49
Node 2
Class = Circle
X1 <= 0.23
Class Cases %
Circle 10 52.6
Triangle 9 47.4
W = 19.00
N = 19
Node 1
Class = Circle
X2 <= -0.49
Class Cases %
Circle 16 64.0
Triangle 9 36.0
W = 25.00
N = 25
22. CART Confusion Matrix
© Minitab Inc. 2210/24/2017
Use the Confusion Matrix to assess CART and the
types of correct or incorrect predictions that it
makes.
CART correctly predicted “No Defect” 935 times
CART correctly predicted “No Defect” 57 times
CART incorrectly predicted “Defect” when
there was actually no defect 528 times (we call
this a false positive)
CART incorrectly predicted “No Defect” when
there actually was a defect 47 times (we call this
a false negative)
23. CART: Variable Selection & Importance
There were 590 variables
available to be selected by CART.
13 variables appear in the tree
79 variables are used in the
model (i.e. 13 variables used in
the tree and 66 used to handle
missing values via surrogate
splits)
© Minitab Inc. 2310/24/2017
24. CART: Hotspot Detection
Recall: a CART tree can be thought
of as a collection of rules.
Each rule defines a path to a
terminal node
For large CART trees, is there an
easy way to find the “most
interesting” rules? Yes, use
Hotspot Detection.
© Minitab Inc. 2410/24/2017
SIGNAL_66
SIGNAL_247
SIGNAL_246
SIGNAL_60
SIGNAL_293
SIGNAL_60 SIGNAL_311
SIGNAL_111
SIGNAL_549
SIGNAL_246
SIGNAL_112
SIGNAL_158
SIGNAL_158 SIGNAL_21
SIGNAL_359
SIGNAL_359
SIGNAL_294
25. CART: Hotspot Detection
© Minitab Inc. 2510/24/2017
Hotspot Detection computes
summary information about
each terminal node (every rule
leads to a terminal node) and
displays the information
conveniently to the user.
Use this information to easily
and efficiently find the most
important rules in your CART
tree.
26. CART: Using Hotspot Detection
© Minitab Inc. 2610/24/2017
Here terminal node 5 has the
largest class count and a lift
value of around 2.5. This
means that the probability of a
“Defect” is 2.5 times more
likely than the overall
population.
What rule leads to terminal
node 5?
27. CART Hotspot Interpretation
If Signal 294 <= 368.82 and Signal 293 > .006
and Signal 60 > 1.51 and Signal 246 <= 1.42 and
Signal 247 > 2.98 then we predict “Defect”.
If the machine signals satisfy this rule then the
probability of a defect is 2.5 times larger than
the overall probability of a defect.
© Minitab Inc. 2710/24/2017
28. CART: Hotspot Detection
© Minitab Inc. 2810/24/2017
Focus Class: the class (i.e. “Defect” or “No Defect” that you
want to generate the hotspot report for. I set the focus
class to be “Defect.”
𝐿𝑖𝑓𝑡 =
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑂𝑣𝑒𝑟𝑎𝑙𝑙
Node Class Count = number of records in the sample that
fall into the node
If Lift = 1, then the probability of a “Defect” is the same as
it is in the overall sample.
If Lift = 2 then the probability of a “Defect” is twice as
much in the terminal node as it is in the overall sample.
If Lift = .5 then the probability of a “Defect” is half as
much in the terminal node as it is in the overall sample.
29. What Can Machine Learning Do For You?
© Minitab Inc. 2910/24/2017
Explore Data Solve Your
Problem
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
30. Deploying CART
If you want to use CART to generate predictions, you have two
primary options:
1. Generate Predictions inside of SPM
2. Translate CART into a programming language and deploy it in
your environment
© Minitab Inc. 3010/24/2017
31. Generating Predictions Inside of SPM
Let’s suppose that you have a
set of machine signal values
(i.e. you know the values for
Signal 1 – Signal 590) and you
want to predict if there will be
a product defect (i.e. you don’t
know the “STATUS” value)
© Minitab Inc. 3110/24/2017
32. Deploying CART via Code Translations
A CART model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.
***Use the code to generate CART
predictions in other
applications/programs or to make
predictions in real-time.
© Minitab Inc. 3210/24/2017
33. What Can Machine Learning Do For You?
© Minitab Inc. 3310/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
34. CART: Finding Rules
CART automatically gave us a set of interpretable rules that are
predictive of manufacturing defects. Now we will need to determine
what the signals actually measure and determine if we can control
the inputs that drive the settings.
© Minitab Inc. 3410/24/2017
35. CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.
2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.
© Minitab Inc. 3510/24/2017
36. TreeNet Gradient Boosting
Let’s apply the gradient boosting algorithm using TreeNet® software
Applying TreeNet
1. Understanding the model: Partial Dependency Plots
2. Choosing the number of trees (set the maximum number of trees such that the
error no longer meaningfully declines; SPM will choose the optimal number for
you)
3. Choosing the number of nodes with Automate NODES
4. Discover important interactions with interaction reporting
5. Making predictions and deploying the model.
© Minitab Inc. 3610/24/2017
37. Gradient Boosting Review
Idea: fit a CART tree to the error
from the previous error and use
this new prediction to update
the model
© Minitab Inc. 3710/24/2017
38. © Minitab Inc. 3810/24/2017
Gradient Boosting: Why it works
How does TreeNet model this curve? It makes small improvements (i.e. the
learning rate is a small number that “shrinks” the model updates). The
small improvements, taken together, produce an accurate model.Tree 1
Tree 10
Tree 50
Tree 100
Tree 150
Tree 200
Tree 400
Tree 600
Note: Noise ~ N(0,1)
Tree 600
39. What Can Machine Learning Do For You?
© Minitab Inc. 3910/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
40. Most Important Signals
TreeNet, like CART, automatically selects the
most important variables (i.e. the signals).
Steps
1. Import the dataset
2. Select “TreeNet Gradient Boosting
Machine”
3. Set variables
4. Click “Start”
5. View variable importance measures
Of the 590 signals, TreeNet automatically
identifies 299 of them as useful (you can actually
run a series of variable “shaving” experiments to
see if you can reduce the number of variables
used even more)
© Minitab Inc. 4010/24/2017
Manufacturing
Defects
41. What Can Machine Learning Do For You?
© Minitab Inc. 4110/24/2017
Predict Future
Observations
Solve Your
Problem
Find the Most
Important
Relationships in
Factors &
Response
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Explore Data
42. How are Most Important Signals Related to
the Likelihood of Product Defects?
The plots on the right are generated
automatically from a TreeNet model, so you
only have to click two buttons to see the plots.
The plots are ordered in terms of the variable
importance (most important first).
© Minitab Inc. 4210/24/2017
Manufacturing
Defects
43. Most Important Signal: Signal 60
This plot tells us that, after accounting
for the other 299 variables in the
model, the likelihood of a product
defect increases once Signal 60 has
values beyond 3.25. Once Signal 60
reaches about 13.3, the likelihood of a
defect remains constant.
TreeNet automatically discovered this
relationship. Now we have a few
questions to answer:
What does Signal 60 actually measure?
What machine settings have an effect on
Signal 60? To what extent, if any, can we
control these settings?
© Minitab Inc. 4310/24/2017
Signal_60=3.25
Signal_60=13.3
Manufacturing
Defects
44. Most Important Two-Way Interaction:
Signal 60 and Signal 334
The most important two-way interaction in the
model is between Signal 60 and Signal 334.
The red and orange areas in the plot on the right
mean that the likelihood of a defect is higher.
When Signal 60 is between about 15 and 150 and
Signal 334 is between 30 and 100, then the
likelihood of a defect is higher.
Follow-up questions for identifying the machine
settings that affect the signals:
What do the two signals measure?
What machine settings, if any, have an affect on
Signal 60 and Signal 334?
© Minitab Inc. 4410/24/2017
Defect is more likely
Manufacturing
Defects
45. Interaction Statistics: Global Score
Use the Global Score to find the most important two-way interactions in the model. The
Global Score for a pair of variables tells you the percentage of the total variation in the
predicted response that is accounted for by the two-way interaction between two variables. A
value of 5.66 means that 5.66% of the variation in the predicted response is accounted for by
the interaction between Signal 60 and Signal 334.
© Minitab Inc. 4510/24/2017
𝐆𝐥𝐨𝐛𝐚𝐥 𝐒𝐜𝐨𝐫𝐞 =
− −
Total Variation in the Predicted Response
46. Using the Interaction Statistics: Next Webinar
One way to leverage the interaction statistics is allow only
interactions between the pairs of variable deemed to be “important”
by the TreeNet interaction statistics and disallow interactions
among the unimportant variables. If we do this and the model error
does not change meaningfully then we can be more confident that
the interaction is real (i.e. not noise!). We will talk more about this
in Webinar 5.
© Minitab Inc. 4610/24/2017
47. What Can Machine Learning Do For You?
© Minitab Inc. 4710/24/2017
Explore Data Solve Your
Problem
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
48. Solving the Problem:
Predicting Future Observations & Running Simulations
Engineers can predict the likelihood
of a defect based on the signal
values:
1. Take data (i.e. hypothetical signal
values or estimated signal values
given the machine settings) and
substitute the values into the
TreeNet model
2. TreeNet will generate the
probability of a defect based on the
signal values supplied.
***If we can predict signal values based on
the machine settings, then we could
predict the probability of a defect based on
chosen machine settings***
© Minitab Inc. 4810/24/2017
Hypothetical (or estimated) Signal Values
Predicted probability of “Defect” and the
predicted class: “Defect” or “No Defect.”
Proposed Machine Settings
Manufacturing
Defects
49. Generating Predictions in SPM
We can generate predictions
inside of SPM just like CART
(the same is true for Random
Forests, MARS, etc.)
Click the “Score” button
© Minitab Inc. 4910/24/2017
50. Deploying TreeNet via Code Translations
A TreeNet model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.
***Use the code to generate TreeNet
predictions in other
applications/programs or to make
predictions in real-time.
© Minitab Inc. 5010/24/2017
51. What Can Machine Learning Do For You?
© Minitab Inc. 5110/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
52. Solving the Problem:
Predicting Future Observations & Running Simulations
Engineers can predict the likelihood
of a defect based on the signal
values:
1. Take data (i.e. hypothetical signal
values or estimated signal values
given the machine settings) and
substitute the values into the
TreeNet model
2. TreeNet will generate the
probability of a defect based on the
signal values supplied.
***If we can predict signal values based on
the machine settings, then we could
predict the probability of a defect based on
chosen machine settings***
© Minitab Inc. 5210/24/2017
Hypothetical (or estimated) Signal Values
Predicted probability of “Defect” and the
predicted class: “Defect” or “No Defect.”
Proposed Machine Settings
Manufacturing
Defects
53. Solving the Problem:
Understanding the relationship of signals and the likelihood of defects
Use TreeNet gradient boosting to
1. View signals that are useful in
predicting defects (or, conversely,
non-defects; signals that are not
important are either rarely used in
the model or not used at all)
2. Visually understand the
relationship between the likelihood
of a defect and a signal
3. Visually understand the nature of
the interactions that are important
in the model.
© Minitab Inc. 5310/24/2017
Manufacturing
Defects
54. Optimizing Models with SPM Automates
One way to choose the optimal value for a model parameter in TreeNet is to run an experiment:
build multiple TreeNet models with identical settings except that change the value of one
parameter each time.
Model experimentation and optimization routines are pre-packaged for you in SPM, so you
never have to write even a single line of code. We want you to spend time on solving problems,
not troubleshooting while loops and function calls!
We will discuss this more in the second webinar, but we will provide one example.
© Minitab Inc. 5410/24/2017
55. Automate NODES
The number of terminal nodes in
each tree in the TreeNet model
controls the extent to which the
model can capture interactions.
Use Automate NODES to easily
find the optimal number of
terminal nodes in each tree. Here
the optimal number of terminal
nodes is 6 (this is actually the
default value).
© Minitab Inc. 5510/24/2017
56. What Can Machine Learning Do For You?
© Minitab Inc. 5610/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
57. CART: Finding Rules
CART automatically gave us a set of interpretable rules that are
predictive of manufacturing defects. Now we will need to determine
what the signals actually measure and determine if we can control
the inputs that drive the settings.
© Minitab Inc. 5710/24/2017
58. CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.
2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.
© Minitab Inc. 5810/24/2017
59. Random Forests: Review
© Minitab Inc. 5910/24/2017
Idea: fit CART trees to
independent bootstrap samples
and combine the predictions
60. Random Forest Output
For smaller datasets (i.e. <10,000 records) we can compute a variety
of useful metrics including outlier statistics.
© Minitab Inc. 6010/24/2017
61. Optimizing Random Forests: Automate
RFNPREDS
Use Automate RFNPREDS to
conveniently find optimal value
for the random variable subset
size.
Here the optimal size is
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 ∗ 2 =
49
© Minitab Inc. 6110/24/2017
63. Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical
study
Proceedings of the 2011 International Conference on Materials and Products Manufacturing
Technology: https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf
Manufacturing: Value Creation through
Machine Learning Application
© Minitab Inc. 6310/24/2017
MANUFACTURING
INDUSTRIES
Organizations Gain Efficiencies Through Smarter Lean Adoption
Identifying challenges and the benefits of LEAN implementation in
small to medium sized companies using CART.
64. Mining the customer credit using classification and regression tree and multivariate
adaptive regression splines
Computational Statistics & Data Analysis:
http://www.sciencedirect.com/science/article/pii/S016794730400355X
Financial Services: Value Creation through
Machine Learning Application
© Minitab Inc. 6410/24/2017
FINANCIAL SERVICES
INDUSTRIES
Improving Credit Scoring in Highly-Competitive Environment
Accurate credit scoring using CART and TreeNet is critical for
financial services and is increasingly competitive. Less risk is assumed
as future instances of loan default are predicted.
65. Panel of Serum Biomarkers for the Diagnosis of Lung Cancer
Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392
Healthcare: Value Creation through Machine
Learning Application
© Minitab Inc. 6510/24/2017
HEALTHCARE
INDUSTRIES
Predicting Lung Cancer for High Risk Patients
Medical researchers were looking to improve lung cancer detection
through blood testing. CART analysis was leveraged to predict which
patients had cancer given the serum biomarkers.
66. Continue To Use Machine Learning On Your Own
© Minitab Inc. 6610/24/2017
We’ll provide you a
link to the dataset
used today in a follow
up email
Download a trial version of SPM
https://info.salford-
systems.com/spm-8-download
If you need help getting started, give us a shout:
support@salford-systems.com
Check out our other training materials online:
https://www.salford-systems.com/resources/training-
videos
Practice, Practice, Practice
Feeling Stuck? We Can Help!
Schedule a demo and we’ll walk you
through the example shown today
67. Ready For More? Join Our Next Webinar
Tuesday October 31, 2017 @ 10 am (PDT):
Real-world demonstration for the advanced modeler
Register: http://info.salford-
systems.com/datascience101webinarseries
In this webinar I am going to explain the how to leverage powerful
Machine Learning algorithms in detail using SPM software.
© Minitab Inc. 6710/24/2017
69. CART® Software Applications
Predicting Return to Work with Data Mining
Society of Actuaries: https://www.soa.org/files/research/projects/data-mining.pdf
Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical study
Proceedings of the 2011 International Conference on Materials and Products Manufacturing Technology:
https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf
Assessing the prediction of employee productivity: a comparison of OLS vs. CART
International Journal of Productivity and Quality Management: http://www.inderscienceonline.com/doi/abs/10.1504/IJPQM.2011.042511
Mining the customer credit using classification and regression tree and multivariate adaptive regression splines
Computational Statistics & Data Analysis: http://www.sciencedirect.com/science/article/pii/S016794730400355X
Panel of Serum Biomarkers for the Diagnosis of Lung Cancer
Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392
Automated urban land-use classification with remote sensing
International Journal of Remote Sensing: http://www.tandfonline.com/doi/abs/10.1080/01431161.2012.714510
© Minitab Inc. 6910/24/2017
70. Random Forest® Software Applications
Mapping Oil and Gas Development Potential in the US Intermountain West and Estimating Impacts to Species
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007400
Random Forests applied as a soil spatial predictive model in arid Utah
Digital Soil Mapping: http://link.springer.com/content/pdf/10.1007/978-90-481-8863-5.pdf#page=188
Factors Associated With Increased Reading Frequency in Children Exposed to Reach Out and Read
Academic Pediatrics: ttp://www.sciencedirect.com/science/article/pii/S1876285915002752
This paper used Random Forests® software to pick the factors
Using Random Forests to Provide Predicted Species Distribution Maps as a Metric for Ecological Inventory & Monitoring
Programs
Applications of Computational Intelligence in Biology: https://link.springer.com/chapter/10.1007/978-3-540-78534-7_9
Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues
Iberian Conference on Pattern Recognition and Image Analysis: https://link.springer.com/chapter/10.1007/978-3-540-72849-8_61
© Minitab Inc. 7010/24/2017