SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
Real-world demonstration
For the beginner modeler
Revisit Today’s Webinar Materials
For anyone who may have been running late
or wanted to reference these materials, we are
happy to provide the presentation and a link
to the recording of the webinar.
Expect to hear from us after the presentation!
© Minitab Inc. 210/24/2017
Today’s Discussion (10/24)
Quick Refresher – What can Machine Learning do for you
Salford Systems – Pioneering Predictive Analytics and
Machine Learning
Manufacturing Defects Dataset: Applied Examples
CART
TreeNet
Random Forest
© Minitab Inc. 310/24/2017
Today’s Presenter
Charlie Harrison
Charlie is part of Salford’s Data
Scientist Team, and has been
providing customer support and
training for several years.
His favorite thing about Data Science
is proving theoretical results.
What Can Machine Learning Do For You?
© Minitab Inc. 410/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
How Broad and Deep is the Application
Potential?
Machine learning methods can be applied in almost any context. The following is a
brief selection of industry and functional examples:
© Minitab Inc. 510/24/2017
FINANCIAL
SERVICES
MANUFACTURING SALES MARKETING
FUNCTIONAL AREASINDUSTRIES
Loan Defaults
Manufacturing
Defects
Fraud Prevention
Preventative
Maintenance
Customer
Churn
Customer
Segmentation
Cross-
Sell/Upsell
Marketing Lift
HEALTH CARE
Disease
Prevention
Genetics
OTHER
INDUSTRIES
Insurance Claims
Environmental
Impacts
CLASSIFICATION MODELS using CART, Gradient Boosting
& Random Forests
© Minitab Inc. 610/24/2017
CLASSIFICATION
Predict a qualitative value
UNSUPERVISED LEARNING
Clustering
TIME SERIES
Predict future values
based on past values
SURVIVAL ANALYSIS
Predict time until occurrence
REGRESSION
Predict a quantitative value
What Do You Need to Get Started?
© Minitab Inc. 710/24/2017
Sufficient Data Pick the Right
Problem
Solve with
the Right Tool
Have you downloaded SPM 8.2? After this webinar, we’ll give you access to the dataset used so you
can try it out for yourself.
https://info.salford-systems.com/spm-8-download
Salford Systems
© Minitab Inc. 810/24/2017
Salford’s Legacy in Pioneering Predictive Analytics &
Machine Learning
Salford’s solutions are innovative, reliable and robust because they were created and
are implemented by inventors and pioneers of Predictive Analytics & Machine
Learning (PAML):
• Dr. Jerome Friedman (Professor of Statistics, Stanford)
• Dr. Leo Breiman (Professor of Statistics, UC Berkeley)
The algorithms covered today were either created or co-created by either Dr. Breiman
or Dr. Friedman.
© Minitab Inc. 910/24/2017
© Minitab Inc. 1010/24/2017
Accuracy of Prediction
Defensibility of Models
Salford’s models are defensible internally to executive stakeholders and
externally to regulators
Salford solutions are distinguished in particular by their:
Salford Stands Out Against Competitors
Salford’s models stand the test of time and are used by some of the biggest
corporations in the world
Ease of Use
Salford’s models don’t require coding
Suite of Solutions – Data Science Toolkit
Time- and market-tested predictive modeling tools including
everything from market-leading decision tree and classification
engines to advanced interaction detection and automation to state-
of-the-art machine learning capabilities.
© Minitab Inc. 1110/24/2017
SPM Software Suite
CART MARS
Random
Forests
TreeNet RuleLearner ISLE GPS
Decision trees Nonlinear
regression
Data ensemble
bagging
Gradient
boosting
Rule ensemble Model
compression
Regularized
regression
Why Do Classification Models Matter?
Classification methods are a simple, effective
and accurate approach to solve organization’s
most difficult problems and uncover new
opportunities by narrowing down with factors
have the most impact in your outcome
Some of the most common applications
include:
• Fraud Prevention
• Risk Reduction in Credit Scoring and Loan
Default
• Optimizing Marketing Campaigns
• Improving Operations
© Minitab Inc. 1210/24/2017
FINANCIAL SERVICES
MANUFACTURING
SALES
MARKETING
FUNCTIONAL
AREAS
INDUSTRIES
What promotions are most effective?
HEALTH CARE
What machine signals are predictive
of defects?
Does customer satisfaction influence
loyalty?
Does level of education impact credit
risk?
Does body weight influence the risk
of heart disease?
Machine Learning Terminology
Response Variable = Dependent Variable = Target
Variable
This is what we are trying to predict
Examples: default vs. no default, air pressure, number of
claims, etc.
Predictor Variables = Predictors = Factors
This is what we use to predict the response.
Example: I will use two predictors, level of education and
work experience, to predict income which is the target
variable.
Algorithm = Method Used = Technique
This is the method that we will use to both predict the
target variable and discover the relationships, if any,
between the predictors and the target.
Examples: CART decision trees, gradient boosted trees,
Random Forests, LASSO, Elastic Net, MARS, Support
Vector Machines (SVMs), and Neural Networks.
© Minitab Inc. 1310/24/2017
Target Variable: Defect
Predictor Variables: Signal
1, Signal 2, …, Signal 590
Algorithm: Logistic
Regression
Regression𝐷𝑒𝑓𝑒𝑐𝑡 = = 𝛽0 + 𝛽1 𝑆𝑖𝑔𝑛𝑎𝑙1 + ⋯
+ 𝛽590 𝑆𝑖𝑔𝑛𝑎𝑙590
CART =
𝐷𝑒𝑓𝑒𝑐𝑡 =
Target Variable: Defect
Predictor Variables:
Signal 1, Signal 2, …, Signal
590
Algorithm: CART decision
tree
Putting It All Together
Signal 1, Signal 2, … Signal 590
Signal 1, Signal 2, … Signal 590
Hands-on Practice
© Minitab Inc. 1410/24/2017
Let’s Get Started . . .
© Minitab Inc. 1510/24/2017
MANUFACTURING
Open SPM
DATA SET
Live Demo
Manufacturing
Defects
A manufacturing process involves myriad machines, and the information concerning the operation of the machines is recorded.
There are 590 metrics recorded from the machines from the start of the process to the end and we’ll refer to these metrics as
“signals.”
1. What signals, if any, are
predictive of
manufacturing defects?
2. If signals are predictive
of defects, then how are
these signals related to
the likelihood of
manufacturing defects?
CART and Random Forests
© Minitab Inc.
16
10/24/2017
SPM ENGINE
PREDICTIVE
PERFORMANCE
AUTOMATIC
VARIABLE
SELECTION
AUTOMATIC
INTERACTION
DETECTION
AUTOMATIC
MISSING
VALUE/OUTLIER
HANDLING
AUTOMATIC
MODELING OF
LOCAL EFFECTS
INVARIANT TO
MONOTONE
TRANSFORMATIONS
OF PREDICTORS
INTERPRETABILITY
A Random Forest prediction is really just an average of CART tree predictions. When you build a Random
Forest model just keep this picture in the back of your mind:
© Minitab Inc.
16
10/24/2017
Solving Problems with Machine Learning:
Machine Settings and Manufacturing Defects
A manufacturing process involves myriad machines, and the information concerning the operation of the
machines is recorded. There are 590 metrics recorded from the machines from the start of the process to
the end and we’ll refer to these metrics as “signals.”
We will try to answer two primary questions:
1. What signals, if any, are predictive of manufacturing defects?
2. If signals are predictive of defects, then how are these signals related to the likelihood of manufacturing
defects?
We will use an algorithm called gradient boosting to do this. TreeNet® software will be used. TreeNet is
unique in that its code was originally written by Jerome Friedman, the creator of gradient boosting.
© Minitab Inc. 1710/24/2017
Manufacturing
Defects
Dataset Citations
Manufacturing Defect Dataset: Michael McCann and Adrian
Johnston donated the dataset to the UCI Machine Learning
Repository in 2008:
Link: https://archive.ics.uci.edu/ml/datasets/SECOM
© Minitab Inc. 1810/24/2017
CART
Let’s apply CART to the
manufacturing defect dataset.
Applying CART
1. Build the model in SPM
2. Understand CART Relative Cost
3. Find the most interesting rules that
are predictive of manufacturing
defects using Hotspot Detection
4. Using the model: Generating
manufacturing defect predictions
and deploying CART outside of SPM
SIGNAL_66
SIGNAL_247
SIGNAL_246
SIGNAL_60
SIGNAL_293
SIGNAL_60 SIGNAL_311
SIGNAL_111
SIGNAL_549
SIGNAL_246
SIGNAL_112
SIGNAL_158
SIGNAL_158 SIGNAL_21
SIGNAL_359
SIGNAL_359
SIGNAL_294
© Minitab Inc. 1910/24/2017
CART Review
© Minitab Inc. 2010/24/2017
CART is a decision tree algorithm that divides
the data so that the dependent variable can be
predicted more accurately
CART automatically:
1. Selects variables
2. Models nonlinear relationships
3. Model local effects
4. Models interactions
5. Handles missing values
CART : Relative Cost
Relative Cost =
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑀𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐶𝑜𝑠𝑡 𝑈𝑠𝑖𝑛𝑔 𝑎 𝐶𝐴𝑅𝑇 𝑡𝑟𝑒𝑒
𝑁𝑜 𝐷𝑎𝑡𝑎 𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝑅𝑢𝑙𝑒
The No Data Optimal Rule classifies every observation as one class. More specifically, the class
chosen for the no data optimal rule is the class that has the lowest cost compared to the other(s)
Good: If the relative cost is closer to zero (closer is better) then CART is better than the No Data
Optimal Rule
Bad: If the relative cost is equal to 1 then the CART error is the same as the No Data Optimal Rule
which means that CART is no better than just predicting every observation as the same class
The relative cost can be greater 1 which is especially bad and, more generally, values around 1 should be
considered “bad”
No Data Optimal Rule
Predicted Class:
Relative Cost = .44
CART Predicted
Class:
CART Predicted
Class:
CART Predicted
Class:
X2 <= -0.49
Terminal
Node 1
Class = Circle
Class Cases %
Circle 6 100.0
Triangle 0 0.0
W = 6.00
N = 6
X1 <= 0.23
Terminal
Node 2
Class = Triangle
Class Cases %
Circle 1 14.3
Triangle 6 85.7
W = 7.00
N = 7
X1 > 0.23
Terminal
Node 3
Class = Circle
Class Cases %
Circle 9 75.0
Triangle 3 25.0
W = 12.00
N = 12
X2 > -0.49
Node 2
Class = Circle
X1 <= 0.23
Class Cases %
Circle 10 52.6
Triangle 9 47.4
W = 19.00
N = 19
Node 1
Class = Circle
X2 <= -0.49
Class Cases %
Circle 16 64.0
Triangle 9 36.0
W = 25.00
N = 25
CART Confusion Matrix
© Minitab Inc. 2210/24/2017
Use the Confusion Matrix to assess CART and the
types of correct or incorrect predictions that it
makes.
CART correctly predicted “No Defect” 935 times
CART correctly predicted “No Defect” 57 times
CART incorrectly predicted “Defect” when
there was actually no defect 528 times (we call
this a false positive)
CART incorrectly predicted “No Defect” when
there actually was a defect 47 times (we call this
a false negative)
CART: Variable Selection & Importance
There were 590 variables
available to be selected by CART.
13 variables appear in the tree
79 variables are used in the
model (i.e. 13 variables used in
the tree and 66 used to handle
missing values via surrogate
splits)
© Minitab Inc. 2310/24/2017
CART: Hotspot Detection
Recall: a CART tree can be thought
of as a collection of rules.
Each rule defines a path to a
terminal node
For large CART trees, is there an
easy way to find the “most
interesting” rules? Yes, use
Hotspot Detection.
© Minitab Inc. 2410/24/2017
SIGNAL_66
SIGNAL_247
SIGNAL_246
SIGNAL_60
SIGNAL_293
SIGNAL_60 SIGNAL_311
SIGNAL_111
SIGNAL_549
SIGNAL_246
SIGNAL_112
SIGNAL_158
SIGNAL_158 SIGNAL_21
SIGNAL_359
SIGNAL_359
SIGNAL_294
CART: Hotspot Detection
© Minitab Inc. 2510/24/2017
Hotspot Detection computes
summary information about
each terminal node (every rule
leads to a terminal node) and
displays the information
conveniently to the user.
Use this information to easily
and efficiently find the most
important rules in your CART
tree.
CART: Using Hotspot Detection
© Minitab Inc. 2610/24/2017
Here terminal node 5 has the
largest class count and a lift
value of around 2.5. This
means that the probability of a
“Defect” is 2.5 times more
likely than the overall
population.
What rule leads to terminal
node 5?
CART Hotspot Interpretation
If Signal 294 <= 368.82 and Signal 293 > .006
and Signal 60 > 1.51 and Signal 246 <= 1.42 and
Signal 247 > 2.98 then we predict “Defect”.
If the machine signals satisfy this rule then the
probability of a defect is 2.5 times larger than
the overall probability of a defect.
© Minitab Inc. 2710/24/2017
CART: Hotspot Detection
© Minitab Inc. 2810/24/2017
Focus Class: the class (i.e. “Defect” or “No Defect” that you
want to generate the hotspot report for. I set the focus
class to be “Defect.”
𝐿𝑖𝑓𝑡 =
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑂𝑣𝑒𝑟𝑎𝑙𝑙
Node Class Count = number of records in the sample that
fall into the node
If Lift = 1, then the probability of a “Defect” is the same as
it is in the overall sample.
If Lift = 2 then the probability of a “Defect” is twice as
much in the terminal node as it is in the overall sample.
If Lift = .5 then the probability of a “Defect” is half as
much in the terminal node as it is in the overall sample.
What Can Machine Learning Do For You?
© Minitab Inc. 2910/24/2017
Explore Data Solve Your
Problem
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Deploying CART
If you want to use CART to generate predictions, you have two
primary options:
1. Generate Predictions inside of SPM
2. Translate CART into a programming language and deploy it in
your environment
© Minitab Inc. 3010/24/2017
Generating Predictions Inside of SPM
Let’s suppose that you have a
set of machine signal values
(i.e. you know the values for
Signal 1 – Signal 590) and you
want to predict if there will be
a product defect (i.e. you don’t
know the “STATUS” value)
© Minitab Inc. 3110/24/2017
Deploying CART via Code Translations
A CART model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.
***Use the code to generate CART
predictions in other
applications/programs or to make
predictions in real-time.
© Minitab Inc. 3210/24/2017
What Can Machine Learning Do For You?
© Minitab Inc. 3310/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
CART: Finding Rules
CART automatically gave us a set of interpretable rules that are
predictive of manufacturing defects. Now we will need to determine
what the signals actually measure and determine if we can control
the inputs that drive the settings.
© Minitab Inc. 3410/24/2017
CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.
2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.
© Minitab Inc. 3510/24/2017
TreeNet Gradient Boosting
Let’s apply the gradient boosting algorithm using TreeNet® software
Applying TreeNet
1. Understanding the model: Partial Dependency Plots
2. Choosing the number of trees (set the maximum number of trees such that the
error no longer meaningfully declines; SPM will choose the optimal number for
you)
3. Choosing the number of nodes with Automate NODES
4. Discover important interactions with interaction reporting
5. Making predictions and deploying the model.
© Minitab Inc. 3610/24/2017
Gradient Boosting Review
Idea: fit a CART tree to the error
from the previous error and use
this new prediction to update
the model
© Minitab Inc. 3710/24/2017
© Minitab Inc. 3810/24/2017
Gradient Boosting: Why it works
How does TreeNet model this curve? It makes small improvements (i.e. the
learning rate is a small number that “shrinks” the model updates). The
small improvements, taken together, produce an accurate model.Tree 1
Tree 10
Tree 50
Tree 100
Tree 150
Tree 200
Tree 400
Tree 600
Note: Noise ~ N(0,1)
Tree 600
What Can Machine Learning Do For You?
© Minitab Inc. 3910/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
Most Important Signals
TreeNet, like CART, automatically selects the
most important variables (i.e. the signals).
Steps
1. Import the dataset
2. Select “TreeNet Gradient Boosting
Machine”
3. Set variables
4. Click “Start”
5. View variable importance measures
Of the 590 signals, TreeNet automatically
identifies 299 of them as useful (you can actually
run a series of variable “shaving” experiments to
see if you can reduce the number of variables
used even more)
© Minitab Inc. 4010/24/2017
Manufacturing
Defects
What Can Machine Learning Do For You?
© Minitab Inc. 4110/24/2017
Predict Future
Observations
Solve Your
Problem
Find the Most
Important
Relationships in
Factors &
Response
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Explore Data
How are Most Important Signals Related to
the Likelihood of Product Defects?
The plots on the right are generated
automatically from a TreeNet model, so you
only have to click two buttons to see the plots.
The plots are ordered in terms of the variable
importance (most important first).
© Minitab Inc. 4210/24/2017
Manufacturing
Defects
Most Important Signal: Signal 60
This plot tells us that, after accounting
for the other 299 variables in the
model, the likelihood of a product
defect increases once Signal 60 has
values beyond 3.25. Once Signal 60
reaches about 13.3, the likelihood of a
defect remains constant.
TreeNet automatically discovered this
relationship. Now we have a few
questions to answer:
What does Signal 60 actually measure?
What machine settings have an effect on
Signal 60? To what extent, if any, can we
control these settings?
© Minitab Inc. 4310/24/2017
Signal_60=3.25
Signal_60=13.3
Manufacturing
Defects
Most Important Two-Way Interaction:
Signal 60 and Signal 334
The most important two-way interaction in the
model is between Signal 60 and Signal 334.
The red and orange areas in the plot on the right
mean that the likelihood of a defect is higher.
When Signal 60 is between about 15 and 150 and
Signal 334 is between 30 and 100, then the
likelihood of a defect is higher.
Follow-up questions for identifying the machine
settings that affect the signals:
What do the two signals measure?
What machine settings, if any, have an affect on
Signal 60 and Signal 334?
© Minitab Inc. 4410/24/2017
Defect is more likely
Manufacturing
Defects
Interaction Statistics: Global Score
Use the Global Score to find the most important two-way interactions in the model. The
Global Score for a pair of variables tells you the percentage of the total variation in the
predicted response that is accounted for by the two-way interaction between two variables. A
value of 5.66 means that 5.66% of the variation in the predicted response is accounted for by
the interaction between Signal 60 and Signal 334.
© Minitab Inc. 4510/24/2017
𝐆𝐥𝐨𝐛𝐚𝐥 𝐒𝐜𝐨𝐫𝐞 =
− −
Total Variation in the Predicted Response
Using the Interaction Statistics: Next Webinar
One way to leverage the interaction statistics is allow only
interactions between the pairs of variable deemed to be “important”
by the TreeNet interaction statistics and disallow interactions
among the unimportant variables. If we do this and the model error
does not change meaningfully then we can be more confident that
the interaction is real (i.e. not noise!). We will talk more about this
in Webinar 5.
© Minitab Inc. 4610/24/2017
What Can Machine Learning Do For You?
© Minitab Inc. 4710/24/2017
Explore Data Solve Your
Problem
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solving the Problem:
Predicting Future Observations & Running Simulations
Engineers can predict the likelihood
of a defect based on the signal
values:
1. Take data (i.e. hypothetical signal
values or estimated signal values
given the machine settings) and
substitute the values into the
TreeNet model
2. TreeNet will generate the
probability of a defect based on the
signal values supplied.
***If we can predict signal values based on
the machine settings, then we could
predict the probability of a defect based on
chosen machine settings***
© Minitab Inc. 4810/24/2017
Hypothetical (or estimated) Signal Values
Predicted probability of “Defect” and the
predicted class: “Defect” or “No Defect.”
Proposed Machine Settings
Manufacturing
Defects
Generating Predictions in SPM
We can generate predictions
inside of SPM just like CART
(the same is true for Random
Forests, MARS, etc.)
Click the “Score” button
© Minitab Inc. 4910/24/2017
Deploying TreeNet via Code Translations
A TreeNet model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.
***Use the code to generate TreeNet
predictions in other
applications/programs or to make
predictions in real-time.
© Minitab Inc. 5010/24/2017
What Can Machine Learning Do For You?
© Minitab Inc. 5110/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
Solving the Problem:
Predicting Future Observations & Running Simulations
Engineers can predict the likelihood
of a defect based on the signal
values:
1. Take data (i.e. hypothetical signal
values or estimated signal values
given the machine settings) and
substitute the values into the
TreeNet model
2. TreeNet will generate the
probability of a defect based on the
signal values supplied.
***If we can predict signal values based on
the machine settings, then we could
predict the probability of a defect based on
chosen machine settings***
© Minitab Inc. 5210/24/2017
Hypothetical (or estimated) Signal Values
Predicted probability of “Defect” and the
predicted class: “Defect” or “No Defect.”
Proposed Machine Settings
Manufacturing
Defects
Solving the Problem:
Understanding the relationship of signals and the likelihood of defects
Use TreeNet gradient boosting to
1. View signals that are useful in
predicting defects (or, conversely,
non-defects; signals that are not
important are either rarely used in
the model or not used at all)
2. Visually understand the
relationship between the likelihood
of a defect and a signal
3. Visually understand the nature of
the interactions that are important
in the model.
© Minitab Inc. 5310/24/2017
Manufacturing
Defects
Optimizing Models with SPM Automates
One way to choose the optimal value for a model parameter in TreeNet is to run an experiment:
build multiple TreeNet models with identical settings except that change the value of one
parameter each time.
Model experimentation and optimization routines are pre-packaged for you in SPM, so you
never have to write even a single line of code. We want you to spend time on solving problems,
not troubleshooting while loops and function calls!
We will discuss this more in the second webinar, but we will provide one example.
© Minitab Inc. 5410/24/2017
Automate NODES
The number of terminal nodes in
each tree in the TreeNet model
controls the extent to which the
model can capture interactions.
Use Automate NODES to easily
find the optimal number of
terminal nodes in each tree. Here
the optimal number of terminal
nodes is 6 (this is actually the
default value).
© Minitab Inc. 5510/24/2017
What Can Machine Learning Do For You?
© Minitab Inc. 5610/24/2017
Explore Data
Discover the
Most Important
Features
Find the Most
Important
Relationships in
Factors &
Response
Predict Future
Observations
Solve Your
Problem
CART: Finding Rules
CART automatically gave us a set of interpretable rules that are
predictive of manufacturing defects. Now we will need to determine
what the signals actually measure and determine if we can control
the inputs that drive the settings.
© Minitab Inc. 5710/24/2017
CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.
2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.
© Minitab Inc. 5810/24/2017
Random Forests: Review
© Minitab Inc. 5910/24/2017
Idea: fit CART trees to
independent bootstrap samples
and combine the predictions
Random Forest Output
For smaller datasets (i.e. <10,000 records) we can compute a variety
of useful metrics including outlier statistics.
© Minitab Inc. 6010/24/2017
Optimizing Random Forests: Automate
RFNPREDS
Use Automate RFNPREDS to
conveniently find optimal value
for the random variable subset
size.
Here the optimal size is
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 ∗ 2 =
49
© Minitab Inc. 6110/24/2017
Other Machine Learning Applications
© Minitab Inc. 6210/24/2017
Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical
study
Proceedings of the 2011 International Conference on Materials and Products Manufacturing
Technology: https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf
Manufacturing: Value Creation through
Machine Learning Application
© Minitab Inc. 6310/24/2017
MANUFACTURING
INDUSTRIES
Organizations Gain Efficiencies Through Smarter Lean Adoption
Identifying challenges and the benefits of LEAN implementation in
small to medium sized companies using CART.
Mining the customer credit using classification and regression tree and multivariate
adaptive regression splines
Computational Statistics & Data Analysis:
http://www.sciencedirect.com/science/article/pii/S016794730400355X
Financial Services: Value Creation through
Machine Learning Application
© Minitab Inc. 6410/24/2017
FINANCIAL SERVICES
INDUSTRIES
Improving Credit Scoring in Highly-Competitive Environment
Accurate credit scoring using CART and TreeNet is critical for
financial services and is increasingly competitive. Less risk is assumed
as future instances of loan default are predicted.
Panel of Serum Biomarkers for the Diagnosis of Lung Cancer
Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392
Healthcare: Value Creation through Machine
Learning Application
© Minitab Inc. 6510/24/2017
HEALTHCARE
INDUSTRIES
Predicting Lung Cancer for High Risk Patients
Medical researchers were looking to improve lung cancer detection
through blood testing. CART analysis was leveraged to predict which
patients had cancer given the serum biomarkers.
Continue To Use Machine Learning On Your Own
© Minitab Inc. 6610/24/2017
We’ll provide you a
link to the dataset
used today in a follow
up email
Download a trial version of SPM
https://info.salford-
systems.com/spm-8-download
If you need help getting started, give us a shout:
support@salford-systems.com
Check out our other training materials online:
https://www.salford-systems.com/resources/training-
videos
Practice, Practice, Practice
Feeling Stuck? We Can Help!
Schedule a demo and we’ll walk you
through the example shown today
Ready For More? Join Our Next Webinar
Tuesday October 31, 2017 @ 10 am (PDT):
Real-world demonstration for the advanced modeler
Register: http://info.salford-
systems.com/datascience101webinarseries
In this webinar I am going to explain the how to leverage powerful
Machine Learning algorithms in detail using SPM software.
© Minitab Inc. 6710/24/2017
Appendix
© Minitab Inc. 6810/24/2017
CART® Software Applications
Predicting Return to Work with Data Mining
Society of Actuaries: https://www.soa.org/files/research/projects/data-mining.pdf
Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical study
Proceedings of the 2011 International Conference on Materials and Products Manufacturing Technology:
https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf
Assessing the prediction of employee productivity: a comparison of OLS vs. CART
International Journal of Productivity and Quality Management: http://www.inderscienceonline.com/doi/abs/10.1504/IJPQM.2011.042511
Mining the customer credit using classification and regression tree and multivariate adaptive regression splines
Computational Statistics & Data Analysis: http://www.sciencedirect.com/science/article/pii/S016794730400355X
Panel of Serum Biomarkers for the Diagnosis of Lung Cancer
Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392
Automated urban land-use classification with remote sensing
International Journal of Remote Sensing: http://www.tandfonline.com/doi/abs/10.1080/01431161.2012.714510
© Minitab Inc. 6910/24/2017
Random Forest® Software Applications
Mapping Oil and Gas Development Potential in the US Intermountain West and Estimating Impacts to Species
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007400
Random Forests applied as a soil spatial predictive model in arid Utah
Digital Soil Mapping: http://link.springer.com/content/pdf/10.1007/978-90-481-8863-5.pdf#page=188
Factors Associated With Increased Reading Frequency in Children Exposed to Reach Out and Read
Academic Pediatrics: ttp://www.sciencedirect.com/science/article/pii/S1876285915002752
This paper used Random Forests® software to pick the factors
Using Random Forests to Provide Predicted Species Distribution Maps as a Metric for Ecological Inventory & Monitoring
Programs
Applications of Computational Intelligence in Biology: https://link.springer.com/chapter/10.1007/978-3-540-78534-7_9
Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues
Iberian Conference on Pattern Recognition and Image Analysis: https://link.springer.com/chapter/10.1007/978-3-540-72849-8_61
© Minitab Inc. 7010/24/2017

Mais conteúdo relacionado

Mais procurados

13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solution
Krunal Shah
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
Krunal Shah
 
11 ch ken black solution
11 ch ken black solution11 ch ken black solution
11 ch ken black solution
Krunal Shah
 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
butest
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and Variance
Jufil Hombria
 

Mais procurados (20)

What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Krupa rm
Krupa rmKrupa rm
Krupa rm
 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solution
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Applied Business Statistics ,ken black , ch 15
Applied Business Statistics ,ken black , ch 15Applied Business Statistics ,ken black , ch 15
Applied Business Statistics ,ken black , ch 15
 
11 ch ken black solution
11 ch ken black solution11 ch ken black solution
11 ch ken black solution
 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Chapter3
Chapter3Chapter3
Chapter3
 
Standard Deviation and Variance
Standard Deviation and VarianceStandard Deviation and Variance
Standard Deviation and Variance
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 

Semelhante a Datascience101presentation4

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 

Semelhante a Datascience101presentation4 (20)

Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
Intern Poster Presentation
Intern Poster PresentationIntern Poster Presentation
Intern Poster Presentation
 
Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance Machinelearning: The next step in manufacturing performance
Machinelearning: The next step in manufacturing performance
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And Applications
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdf
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsMachine Learning Ml Overview Algorithms Use Cases And Applications
Machine Learning Ml Overview Algorithms Use Cases And Applications
 
Analytics
AnalyticsAnalytics
Analytics
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 
PyData London 2018 talk on feature selection
PyData London 2018 talk on feature selectionPyData London 2018 talk on feature selection
PyData London 2018 talk on feature selection
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 

Mais de Salford Systems

Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
Salford Systems
 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
Salford Systems
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Salford Systems
 

Mais de Salford Systems (20)

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorial
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Datascience101presentation4

  • 2. Revisit Today’s Webinar Materials For anyone who may have been running late or wanted to reference these materials, we are happy to provide the presentation and a link to the recording of the webinar. Expect to hear from us after the presentation! © Minitab Inc. 210/24/2017
  • 3. Today’s Discussion (10/24) Quick Refresher – What can Machine Learning do for you Salford Systems – Pioneering Predictive Analytics and Machine Learning Manufacturing Defects Dataset: Applied Examples CART TreeNet Random Forest © Minitab Inc. 310/24/2017 Today’s Presenter Charlie Harrison Charlie is part of Salford’s Data Scientist Team, and has been providing customer support and training for several years. His favorite thing about Data Science is proving theoretical results.
  • 4. What Can Machine Learning Do For You? © Minitab Inc. 410/24/2017 Explore Data Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations Solve Your Problem
  • 5. How Broad and Deep is the Application Potential? Machine learning methods can be applied in almost any context. The following is a brief selection of industry and functional examples: © Minitab Inc. 510/24/2017 FINANCIAL SERVICES MANUFACTURING SALES MARKETING FUNCTIONAL AREASINDUSTRIES Loan Defaults Manufacturing Defects Fraud Prevention Preventative Maintenance Customer Churn Customer Segmentation Cross- Sell/Upsell Marketing Lift HEALTH CARE Disease Prevention Genetics OTHER INDUSTRIES Insurance Claims Environmental Impacts
  • 6. CLASSIFICATION MODELS using CART, Gradient Boosting & Random Forests © Minitab Inc. 610/24/2017 CLASSIFICATION Predict a qualitative value UNSUPERVISED LEARNING Clustering TIME SERIES Predict future values based on past values SURVIVAL ANALYSIS Predict time until occurrence REGRESSION Predict a quantitative value
  • 7. What Do You Need to Get Started? © Minitab Inc. 710/24/2017 Sufficient Data Pick the Right Problem Solve with the Right Tool Have you downloaded SPM 8.2? After this webinar, we’ll give you access to the dataset used so you can try it out for yourself. https://info.salford-systems.com/spm-8-download
  • 8. Salford Systems © Minitab Inc. 810/24/2017
  • 9. Salford’s Legacy in Pioneering Predictive Analytics & Machine Learning Salford’s solutions are innovative, reliable and robust because they were created and are implemented by inventors and pioneers of Predictive Analytics & Machine Learning (PAML): • Dr. Jerome Friedman (Professor of Statistics, Stanford) • Dr. Leo Breiman (Professor of Statistics, UC Berkeley) The algorithms covered today were either created or co-created by either Dr. Breiman or Dr. Friedman. © Minitab Inc. 910/24/2017
  • 10. © Minitab Inc. 1010/24/2017 Accuracy of Prediction Defensibility of Models Salford’s models are defensible internally to executive stakeholders and externally to regulators Salford solutions are distinguished in particular by their: Salford Stands Out Against Competitors Salford’s models stand the test of time and are used by some of the biggest corporations in the world Ease of Use Salford’s models don’t require coding
  • 11. Suite of Solutions – Data Science Toolkit Time- and market-tested predictive modeling tools including everything from market-leading decision tree and classification engines to advanced interaction detection and automation to state- of-the-art machine learning capabilities. © Minitab Inc. 1110/24/2017 SPM Software Suite CART MARS Random Forests TreeNet RuleLearner ISLE GPS Decision trees Nonlinear regression Data ensemble bagging Gradient boosting Rule ensemble Model compression Regularized regression
  • 12. Why Do Classification Models Matter? Classification methods are a simple, effective and accurate approach to solve organization’s most difficult problems and uncover new opportunities by narrowing down with factors have the most impact in your outcome Some of the most common applications include: • Fraud Prevention • Risk Reduction in Credit Scoring and Loan Default • Optimizing Marketing Campaigns • Improving Operations © Minitab Inc. 1210/24/2017 FINANCIAL SERVICES MANUFACTURING SALES MARKETING FUNCTIONAL AREAS INDUSTRIES What promotions are most effective? HEALTH CARE What machine signals are predictive of defects? Does customer satisfaction influence loyalty? Does level of education impact credit risk? Does body weight influence the risk of heart disease?
  • 13. Machine Learning Terminology Response Variable = Dependent Variable = Target Variable This is what we are trying to predict Examples: default vs. no default, air pressure, number of claims, etc. Predictor Variables = Predictors = Factors This is what we use to predict the response. Example: I will use two predictors, level of education and work experience, to predict income which is the target variable. Algorithm = Method Used = Technique This is the method that we will use to both predict the target variable and discover the relationships, if any, between the predictors and the target. Examples: CART decision trees, gradient boosted trees, Random Forests, LASSO, Elastic Net, MARS, Support Vector Machines (SVMs), and Neural Networks. © Minitab Inc. 1310/24/2017 Target Variable: Defect Predictor Variables: Signal 1, Signal 2, …, Signal 590 Algorithm: Logistic Regression Regression𝐷𝑒𝑓𝑒𝑐𝑡 = = 𝛽0 + 𝛽1 𝑆𝑖𝑔𝑛𝑎𝑙1 + ⋯ + 𝛽590 𝑆𝑖𝑔𝑛𝑎𝑙590 CART = 𝐷𝑒𝑓𝑒𝑐𝑡 = Target Variable: Defect Predictor Variables: Signal 1, Signal 2, …, Signal 590 Algorithm: CART decision tree Putting It All Together Signal 1, Signal 2, … Signal 590 Signal 1, Signal 2, … Signal 590
  • 14. Hands-on Practice © Minitab Inc. 1410/24/2017
  • 15. Let’s Get Started . . . © Minitab Inc. 1510/24/2017 MANUFACTURING Open SPM DATA SET Live Demo Manufacturing Defects A manufacturing process involves myriad machines, and the information concerning the operation of the machines is recorded. There are 590 metrics recorded from the machines from the start of the process to the end and we’ll refer to these metrics as “signals.” 1. What signals, if any, are predictive of manufacturing defects? 2. If signals are predictive of defects, then how are these signals related to the likelihood of manufacturing defects?
  • 16. CART and Random Forests © Minitab Inc. 16 10/24/2017 SPM ENGINE PREDICTIVE PERFORMANCE AUTOMATIC VARIABLE SELECTION AUTOMATIC INTERACTION DETECTION AUTOMATIC MISSING VALUE/OUTLIER HANDLING AUTOMATIC MODELING OF LOCAL EFFECTS INVARIANT TO MONOTONE TRANSFORMATIONS OF PREDICTORS INTERPRETABILITY A Random Forest prediction is really just an average of CART tree predictions. When you build a Random Forest model just keep this picture in the back of your mind: © Minitab Inc. 16 10/24/2017
  • 17. Solving Problems with Machine Learning: Machine Settings and Manufacturing Defects A manufacturing process involves myriad machines, and the information concerning the operation of the machines is recorded. There are 590 metrics recorded from the machines from the start of the process to the end and we’ll refer to these metrics as “signals.” We will try to answer two primary questions: 1. What signals, if any, are predictive of manufacturing defects? 2. If signals are predictive of defects, then how are these signals related to the likelihood of manufacturing defects? We will use an algorithm called gradient boosting to do this. TreeNet® software will be used. TreeNet is unique in that its code was originally written by Jerome Friedman, the creator of gradient boosting. © Minitab Inc. 1710/24/2017 Manufacturing Defects
  • 18. Dataset Citations Manufacturing Defect Dataset: Michael McCann and Adrian Johnston donated the dataset to the UCI Machine Learning Repository in 2008: Link: https://archive.ics.uci.edu/ml/datasets/SECOM © Minitab Inc. 1810/24/2017
  • 19. CART Let’s apply CART to the manufacturing defect dataset. Applying CART 1. Build the model in SPM 2. Understand CART Relative Cost 3. Find the most interesting rules that are predictive of manufacturing defects using Hotspot Detection 4. Using the model: Generating manufacturing defect predictions and deploying CART outside of SPM SIGNAL_66 SIGNAL_247 SIGNAL_246 SIGNAL_60 SIGNAL_293 SIGNAL_60 SIGNAL_311 SIGNAL_111 SIGNAL_549 SIGNAL_246 SIGNAL_112 SIGNAL_158 SIGNAL_158 SIGNAL_21 SIGNAL_359 SIGNAL_359 SIGNAL_294 © Minitab Inc. 1910/24/2017
  • 20. CART Review © Minitab Inc. 2010/24/2017 CART is a decision tree algorithm that divides the data so that the dependent variable can be predicted more accurately CART automatically: 1. Selects variables 2. Models nonlinear relationships 3. Model local effects 4. Models interactions 5. Handles missing values
  • 21. CART : Relative Cost Relative Cost = 𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑀𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐶𝑜𝑠𝑡 𝑈𝑠𝑖𝑛𝑔 𝑎 𝐶𝐴𝑅𝑇 𝑡𝑟𝑒𝑒 𝑁𝑜 𝐷𝑎𝑡𝑎 𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝑅𝑢𝑙𝑒 The No Data Optimal Rule classifies every observation as one class. More specifically, the class chosen for the no data optimal rule is the class that has the lowest cost compared to the other(s) Good: If the relative cost is closer to zero (closer is better) then CART is better than the No Data Optimal Rule Bad: If the relative cost is equal to 1 then the CART error is the same as the No Data Optimal Rule which means that CART is no better than just predicting every observation as the same class The relative cost can be greater 1 which is especially bad and, more generally, values around 1 should be considered “bad” No Data Optimal Rule Predicted Class: Relative Cost = .44 CART Predicted Class: CART Predicted Class: CART Predicted Class: X2 <= -0.49 Terminal Node 1 Class = Circle Class Cases % Circle 6 100.0 Triangle 0 0.0 W = 6.00 N = 6 X1 <= 0.23 Terminal Node 2 Class = Triangle Class Cases % Circle 1 14.3 Triangle 6 85.7 W = 7.00 N = 7 X1 > 0.23 Terminal Node 3 Class = Circle Class Cases % Circle 9 75.0 Triangle 3 25.0 W = 12.00 N = 12 X2 > -0.49 Node 2 Class = Circle X1 <= 0.23 Class Cases % Circle 10 52.6 Triangle 9 47.4 W = 19.00 N = 19 Node 1 Class = Circle X2 <= -0.49 Class Cases % Circle 16 64.0 Triangle 9 36.0 W = 25.00 N = 25
  • 22. CART Confusion Matrix © Minitab Inc. 2210/24/2017 Use the Confusion Matrix to assess CART and the types of correct or incorrect predictions that it makes. CART correctly predicted “No Defect” 935 times CART correctly predicted “No Defect” 57 times CART incorrectly predicted “Defect” when there was actually no defect 528 times (we call this a false positive) CART incorrectly predicted “No Defect” when there actually was a defect 47 times (we call this a false negative)
  • 23. CART: Variable Selection & Importance There were 590 variables available to be selected by CART. 13 variables appear in the tree 79 variables are used in the model (i.e. 13 variables used in the tree and 66 used to handle missing values via surrogate splits) © Minitab Inc. 2310/24/2017
  • 24. CART: Hotspot Detection Recall: a CART tree can be thought of as a collection of rules. Each rule defines a path to a terminal node For large CART trees, is there an easy way to find the “most interesting” rules? Yes, use Hotspot Detection. © Minitab Inc. 2410/24/2017 SIGNAL_66 SIGNAL_247 SIGNAL_246 SIGNAL_60 SIGNAL_293 SIGNAL_60 SIGNAL_311 SIGNAL_111 SIGNAL_549 SIGNAL_246 SIGNAL_112 SIGNAL_158 SIGNAL_158 SIGNAL_21 SIGNAL_359 SIGNAL_359 SIGNAL_294
  • 25. CART: Hotspot Detection © Minitab Inc. 2510/24/2017 Hotspot Detection computes summary information about each terminal node (every rule leads to a terminal node) and displays the information conveniently to the user. Use this information to easily and efficiently find the most important rules in your CART tree.
  • 26. CART: Using Hotspot Detection © Minitab Inc. 2610/24/2017 Here terminal node 5 has the largest class count and a lift value of around 2.5. This means that the probability of a “Defect” is 2.5 times more likely than the overall population. What rule leads to terminal node 5?
  • 27. CART Hotspot Interpretation If Signal 294 <= 368.82 and Signal 293 > .006 and Signal 60 > 1.51 and Signal 246 <= 1.42 and Signal 247 > 2.98 then we predict “Defect”. If the machine signals satisfy this rule then the probability of a defect is 2.5 times larger than the overall probability of a defect. © Minitab Inc. 2710/24/2017
  • 28. CART: Hotspot Detection © Minitab Inc. 2810/24/2017 Focus Class: the class (i.e. “Defect” or “No Defect” that you want to generate the hotspot report for. I set the focus class to be “Defect.” 𝐿𝑖𝑓𝑡 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑂𝑣𝑒𝑟𝑎𝑙𝑙 Node Class Count = number of records in the sample that fall into the node If Lift = 1, then the probability of a “Defect” is the same as it is in the overall sample. If Lift = 2 then the probability of a “Defect” is twice as much in the terminal node as it is in the overall sample. If Lift = .5 then the probability of a “Defect” is half as much in the terminal node as it is in the overall sample.
  • 29. What Can Machine Learning Do For You? © Minitab Inc. 2910/24/2017 Explore Data Solve Your Problem Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations
  • 30. Deploying CART If you want to use CART to generate predictions, you have two primary options: 1. Generate Predictions inside of SPM 2. Translate CART into a programming language and deploy it in your environment © Minitab Inc. 3010/24/2017
  • 31. Generating Predictions Inside of SPM Let’s suppose that you have a set of machine signal values (i.e. you know the values for Signal 1 – Signal 590) and you want to predict if there will be a product defect (i.e. you don’t know the “STATUS” value) © Minitab Inc. 3110/24/2017
  • 32. Deploying CART via Code Translations A CART model is fundamentally a collection of rules where each rule is an if-then statement (also else-if statements etc.). We can then take these if-then statements and translate them into different programming languages. In SPM we can translate into 4 languages: C, PMML, Java, and SAS. ***Use the code to generate CART predictions in other applications/programs or to make predictions in real-time. © Minitab Inc. 3210/24/2017
  • 33. What Can Machine Learning Do For You? © Minitab Inc. 3310/24/2017 Explore Data Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations Solve Your Problem
  • 34. CART: Finding Rules CART automatically gave us a set of interpretable rules that are predictive of manufacturing defects. Now we will need to determine what the signals actually measure and determine if we can control the inputs that drive the settings. © Minitab Inc. 3410/24/2017
  • 35. CART: Generating Predictions 1. Use CART to predict if there will or will not be a product defect inside of SPM. 2. Translate CART into C (or Java, PMML, or SAS) and deploy your CART model in your environment in order to make predictions in real-time. © Minitab Inc. 3510/24/2017
  • 36. TreeNet Gradient Boosting Let’s apply the gradient boosting algorithm using TreeNet® software Applying TreeNet 1. Understanding the model: Partial Dependency Plots 2. Choosing the number of trees (set the maximum number of trees such that the error no longer meaningfully declines; SPM will choose the optimal number for you) 3. Choosing the number of nodes with Automate NODES 4. Discover important interactions with interaction reporting 5. Making predictions and deploying the model. © Minitab Inc. 3610/24/2017
  • 37. Gradient Boosting Review Idea: fit a CART tree to the error from the previous error and use this new prediction to update the model © Minitab Inc. 3710/24/2017
  • 38. © Minitab Inc. 3810/24/2017 Gradient Boosting: Why it works How does TreeNet model this curve? It makes small improvements (i.e. the learning rate is a small number that “shrinks” the model updates). The small improvements, taken together, produce an accurate model.Tree 1 Tree 10 Tree 50 Tree 100 Tree 150 Tree 200 Tree 400 Tree 600 Note: Noise ~ N(0,1) Tree 600
  • 39. What Can Machine Learning Do For You? © Minitab Inc. 3910/24/2017 Explore Data Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations Solve Your Problem
  • 40. Most Important Signals TreeNet, like CART, automatically selects the most important variables (i.e. the signals). Steps 1. Import the dataset 2. Select “TreeNet Gradient Boosting Machine” 3. Set variables 4. Click “Start” 5. View variable importance measures Of the 590 signals, TreeNet automatically identifies 299 of them as useful (you can actually run a series of variable “shaving” experiments to see if you can reduce the number of variables used even more) © Minitab Inc. 4010/24/2017 Manufacturing Defects
  • 41. What Can Machine Learning Do For You? © Minitab Inc. 4110/24/2017 Predict Future Observations Solve Your Problem Find the Most Important Relationships in Factors & Response Discover the Most Important Features Find the Most Important Relationships in Factors & Response Explore Data
  • 42. How are Most Important Signals Related to the Likelihood of Product Defects? The plots on the right are generated automatically from a TreeNet model, so you only have to click two buttons to see the plots. The plots are ordered in terms of the variable importance (most important first). © Minitab Inc. 4210/24/2017 Manufacturing Defects
  • 43. Most Important Signal: Signal 60 This plot tells us that, after accounting for the other 299 variables in the model, the likelihood of a product defect increases once Signal 60 has values beyond 3.25. Once Signal 60 reaches about 13.3, the likelihood of a defect remains constant. TreeNet automatically discovered this relationship. Now we have a few questions to answer: What does Signal 60 actually measure? What machine settings have an effect on Signal 60? To what extent, if any, can we control these settings? © Minitab Inc. 4310/24/2017 Signal_60=3.25 Signal_60=13.3 Manufacturing Defects
  • 44. Most Important Two-Way Interaction: Signal 60 and Signal 334 The most important two-way interaction in the model is between Signal 60 and Signal 334. The red and orange areas in the plot on the right mean that the likelihood of a defect is higher. When Signal 60 is between about 15 and 150 and Signal 334 is between 30 and 100, then the likelihood of a defect is higher. Follow-up questions for identifying the machine settings that affect the signals: What do the two signals measure? What machine settings, if any, have an affect on Signal 60 and Signal 334? © Minitab Inc. 4410/24/2017 Defect is more likely Manufacturing Defects
  • 45. Interaction Statistics: Global Score Use the Global Score to find the most important two-way interactions in the model. The Global Score for a pair of variables tells you the percentage of the total variation in the predicted response that is accounted for by the two-way interaction between two variables. A value of 5.66 means that 5.66% of the variation in the predicted response is accounted for by the interaction between Signal 60 and Signal 334. © Minitab Inc. 4510/24/2017 𝐆𝐥𝐨𝐛𝐚𝐥 𝐒𝐜𝐨𝐫𝐞 = − − Total Variation in the Predicted Response
  • 46. Using the Interaction Statistics: Next Webinar One way to leverage the interaction statistics is allow only interactions between the pairs of variable deemed to be “important” by the TreeNet interaction statistics and disallow interactions among the unimportant variables. If we do this and the model error does not change meaningfully then we can be more confident that the interaction is real (i.e. not noise!). We will talk more about this in Webinar 5. © Minitab Inc. 4610/24/2017
  • 47. What Can Machine Learning Do For You? © Minitab Inc. 4710/24/2017 Explore Data Solve Your Problem Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations
  • 48. Solving the Problem: Predicting Future Observations & Running Simulations Engineers can predict the likelihood of a defect based on the signal values: 1. Take data (i.e. hypothetical signal values or estimated signal values given the machine settings) and substitute the values into the TreeNet model 2. TreeNet will generate the probability of a defect based on the signal values supplied. ***If we can predict signal values based on the machine settings, then we could predict the probability of a defect based on chosen machine settings*** © Minitab Inc. 4810/24/2017 Hypothetical (or estimated) Signal Values Predicted probability of “Defect” and the predicted class: “Defect” or “No Defect.” Proposed Machine Settings Manufacturing Defects
  • 49. Generating Predictions in SPM We can generate predictions inside of SPM just like CART (the same is true for Random Forests, MARS, etc.) Click the “Score” button © Minitab Inc. 4910/24/2017
  • 50. Deploying TreeNet via Code Translations A TreeNet model is fundamentally a collection of rules where each rule is an if-then statement (also else-if statements etc.). We can then take these if-then statements and translate them into different programming languages. In SPM we can translate into 4 languages: C, PMML, Java, and SAS. ***Use the code to generate TreeNet predictions in other applications/programs or to make predictions in real-time. © Minitab Inc. 5010/24/2017
  • 51. What Can Machine Learning Do For You? © Minitab Inc. 5110/24/2017 Explore Data Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations Solve Your Problem
  • 52. Solving the Problem: Predicting Future Observations & Running Simulations Engineers can predict the likelihood of a defect based on the signal values: 1. Take data (i.e. hypothetical signal values or estimated signal values given the machine settings) and substitute the values into the TreeNet model 2. TreeNet will generate the probability of a defect based on the signal values supplied. ***If we can predict signal values based on the machine settings, then we could predict the probability of a defect based on chosen machine settings*** © Minitab Inc. 5210/24/2017 Hypothetical (or estimated) Signal Values Predicted probability of “Defect” and the predicted class: “Defect” or “No Defect.” Proposed Machine Settings Manufacturing Defects
  • 53. Solving the Problem: Understanding the relationship of signals and the likelihood of defects Use TreeNet gradient boosting to 1. View signals that are useful in predicting defects (or, conversely, non-defects; signals that are not important are either rarely used in the model or not used at all) 2. Visually understand the relationship between the likelihood of a defect and a signal 3. Visually understand the nature of the interactions that are important in the model. © Minitab Inc. 5310/24/2017 Manufacturing Defects
  • 54. Optimizing Models with SPM Automates One way to choose the optimal value for a model parameter in TreeNet is to run an experiment: build multiple TreeNet models with identical settings except that change the value of one parameter each time. Model experimentation and optimization routines are pre-packaged for you in SPM, so you never have to write even a single line of code. We want you to spend time on solving problems, not troubleshooting while loops and function calls! We will discuss this more in the second webinar, but we will provide one example. © Minitab Inc. 5410/24/2017
  • 55. Automate NODES The number of terminal nodes in each tree in the TreeNet model controls the extent to which the model can capture interactions. Use Automate NODES to easily find the optimal number of terminal nodes in each tree. Here the optimal number of terminal nodes is 6 (this is actually the default value). © Minitab Inc. 5510/24/2017
  • 56. What Can Machine Learning Do For You? © Minitab Inc. 5610/24/2017 Explore Data Discover the Most Important Features Find the Most Important Relationships in Factors & Response Predict Future Observations Solve Your Problem
  • 57. CART: Finding Rules CART automatically gave us a set of interpretable rules that are predictive of manufacturing defects. Now we will need to determine what the signals actually measure and determine if we can control the inputs that drive the settings. © Minitab Inc. 5710/24/2017
  • 58. CART: Generating Predictions 1. Use CART to predict if there will or will not be a product defect inside of SPM. 2. Translate CART into C (or Java, PMML, or SAS) and deploy your CART model in your environment in order to make predictions in real-time. © Minitab Inc. 5810/24/2017
  • 59. Random Forests: Review © Minitab Inc. 5910/24/2017 Idea: fit CART trees to independent bootstrap samples and combine the predictions
  • 60. Random Forest Output For smaller datasets (i.e. <10,000 records) we can compute a variety of useful metrics including outlier statistics. © Minitab Inc. 6010/24/2017
  • 61. Optimizing Random Forests: Automate RFNPREDS Use Automate RFNPREDS to conveniently find optimal value for the random variable subset size. Here the optimal size is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 ∗ 2 = 49 © Minitab Inc. 6110/24/2017
  • 62. Other Machine Learning Applications © Minitab Inc. 6210/24/2017
  • 63. Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical study Proceedings of the 2011 International Conference on Materials and Products Manufacturing Technology: https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf Manufacturing: Value Creation through Machine Learning Application © Minitab Inc. 6310/24/2017 MANUFACTURING INDUSTRIES Organizations Gain Efficiencies Through Smarter Lean Adoption Identifying challenges and the benefits of LEAN implementation in small to medium sized companies using CART.
  • 64. Mining the customer credit using classification and regression tree and multivariate adaptive regression splines Computational Statistics & Data Analysis: http://www.sciencedirect.com/science/article/pii/S016794730400355X Financial Services: Value Creation through Machine Learning Application © Minitab Inc. 6410/24/2017 FINANCIAL SERVICES INDUSTRIES Improving Credit Scoring in Highly-Competitive Environment Accurate credit scoring using CART and TreeNet is critical for financial services and is increasingly competitive. Less risk is assumed as future instances of loan default are predicted.
  • 65. Panel of Serum Biomarkers for the Diagnosis of Lung Cancer Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392 Healthcare: Value Creation through Machine Learning Application © Minitab Inc. 6510/24/2017 HEALTHCARE INDUSTRIES Predicting Lung Cancer for High Risk Patients Medical researchers were looking to improve lung cancer detection through blood testing. CART analysis was leveraged to predict which patients had cancer given the serum biomarkers.
  • 66. Continue To Use Machine Learning On Your Own © Minitab Inc. 6610/24/2017 We’ll provide you a link to the dataset used today in a follow up email Download a trial version of SPM https://info.salford- systems.com/spm-8-download If you need help getting started, give us a shout: support@salford-systems.com Check out our other training materials online: https://www.salford-systems.com/resources/training- videos Practice, Practice, Practice Feeling Stuck? We Can Help! Schedule a demo and we’ll walk you through the example shown today
  • 67. Ready For More? Join Our Next Webinar Tuesday October 31, 2017 @ 10 am (PDT): Real-world demonstration for the advanced modeler Register: http://info.salford- systems.com/datascience101webinarseries In this webinar I am going to explain the how to leverage powerful Machine Learning algorithms in detail using SPM software. © Minitab Inc. 6710/24/2017
  • 68. Appendix © Minitab Inc. 6810/24/2017
  • 69. CART® Software Applications Predicting Return to Work with Data Mining Society of Actuaries: https://www.soa.org/files/research/projects/data-mining.pdf Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical study Proceedings of the 2011 International Conference on Materials and Products Manufacturing Technology: https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf Assessing the prediction of employee productivity: a comparison of OLS vs. CART International Journal of Productivity and Quality Management: http://www.inderscienceonline.com/doi/abs/10.1504/IJPQM.2011.042511 Mining the customer credit using classification and regression tree and multivariate adaptive regression splines Computational Statistics & Data Analysis: http://www.sciencedirect.com/science/article/pii/S016794730400355X Panel of Serum Biomarkers for the Diagnosis of Lung Cancer Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392 Automated urban land-use classification with remote sensing International Journal of Remote Sensing: http://www.tandfonline.com/doi/abs/10.1080/01431161.2012.714510 © Minitab Inc. 6910/24/2017
  • 70. Random Forest® Software Applications Mapping Oil and Gas Development Potential in the US Intermountain West and Estimating Impacts to Species http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007400 Random Forests applied as a soil spatial predictive model in arid Utah Digital Soil Mapping: http://link.springer.com/content/pdf/10.1007/978-90-481-8863-5.pdf#page=188 Factors Associated With Increased Reading Frequency in Children Exposed to Reach Out and Read Academic Pediatrics: ttp://www.sciencedirect.com/science/article/pii/S1876285915002752 This paper used Random Forests® software to pick the factors Using Random Forests to Provide Predicted Species Distribution Maps as a Metric for Ecological Inventory & Monitoring Programs Applications of Computational Intelligence in Biology: https://link.springer.com/chapter/10.1007/978-3-540-78534-7_9 Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues Iberian Conference on Pattern Recognition and Image Analysis: https://link.springer.com/chapter/10.1007/978-3-540-72849-8_61 © Minitab Inc. 7010/24/2017