SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
Machine Learning Specialization
Overfitting
in decision trees
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization3	
Review of loan default prediction
©2015-2016 Emily Fox & Carlos Guestrin
Safe
✓	
Risky
✘	
Risky
✘	
Intelligent loan application
review system
Loan
Applications
Machine Learning Specialization4	 ©2015-2016 Emily Fox & Carlos Guestrin
Decision tree review
T(xi) = Traverse decision tree
Loan
Application
Input: xi
ŷi	
Credit?
Safe Term?
Risky Safe
Income?
Term?
Risky Safe
Risky
start
excellent	 poor	
fair	
5 years	3 years	
Low	high	
5 years	3 years
Machine Learning Specialization
Overfitting review
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization8	
Overfitting in logistic regression
©2015-2016 Emily Fox & Carlos Guestrin
Model complexity
Error=
ClassificationError
Overfitting if there exists w*:
•  training_error(w*) > training_error(ŵ)
•  true_error(w*) < true_error(ŵ)
True error
Training error
Machine Learning Specialization9	
Overfitting è
Overconfident predictions
©2015-2016 Emily Fox & Carlos Guestrin
Logistic Regression
(Degree 6)
Logistic Regression
(Degree 20)
Machine Learning Specialization
Overfitting in decision trees
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization13	
Decision stump (Depth 1):
Split on x[1]
©2015-2016 Emily Fox & Carlos Guestrin
Root
18 13
x[1] < -0.07
13 3
x[1] >= -0.07
4 11
x[1]
y values
- +
Machine Learning Specialization14	
Tree depth depth = 1 depth = 2 depth = 3 depth = 5 depth = 10
Training error 0.22 0.13 0.10 0.03 0.00
Decision
boundary
©2015-2016 Emily Fox & Carlos Guestrin
Training error reduces with depth
What happens when we increase depth?
Machine Learning Specialization16	
Deeper trees è lower training error
©2015-2016 Emily Fox & Carlos Guestrin
Tree depth
TrainingError
Depth 10 (training error = 0.0)
Machine Learning Specialization17	
Training error = 0: Is this model perfect?
©2015-2016 Emily Fox & Carlos Guestrin
Depth 10 (training error = 0.0)
NOT PERFECT
Machine Learning Specialization18	
Why training error reduces with depth?
©2015-2016 Emily Fox & Carlos Guestrin
Root
22 18
excellent
9 0
good
9 4
fair
4 14
Loan status:
Safe Risky
Credit?
Tree Training
error
(root) 0.45
split on credit 0.20
Safe Safe Risky Training error
improved by 0.25
because of the split
Split on credit
Machine Learning Specialization19	
Feature split selection algorithm
©2015-2016 Emily Fox & Carlos Guestrin
•  Given a subset of data M (a node in a tree)
•  For each feature hi(x):
1.  Split data of M according to feature hi(x)
2.  Compute classification error split
•  Chose feature h*(x) with lowest
classification error
By design, each split
reduces training error
Machine Learning Specialization21	
Decision trees overfitting
on loan data
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization
Principle of Occam’s razor:
Simpler trees are better
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization25	
Principle of Occam’s Razor
©2015-2016 Emily Fox & Carlos Guestrin
“Among competing hypotheses, the one with
fewest assumptions should be selected”,
William of Occam, 13th Century
OR
Diagnosis 1: 2 diseases
Two diseases D1 and D2 where
D1 explains S1, D2 explains S2
Diagnosis 2: 1 disease
Disease D3 explains both
symptoms S1 and S2
Symptoms: S1 and S2
SIMPLER
Machine Learning Specialization26	
Occam’s Razor for decision trees
©2015-2016 Emily Fox & Carlos Guestrin
When two trees have similar classification error
on the validation set, pick the simpler one
Complexity Train
error
Validation
error
Simple 0.23 0.24
Moderate 0.12 0.15
Complex 0.07 0.15
Super complex 0 0.18 Overfit
Same validation
error
Machine Learning Specialization27	
Which tree is simpler?
©2015-2016 Emily Fox & Carlos Guestrin
OR
SIMPLER
Machine Learning Specialization29	
Modified tree learning problem
©2015-2016 Emily Fox & Carlos Guestrin
T1(X) T2(X)
T4(X)
Find a “simple” decision tree with low classification error
Complex treesSimple trees
Machine Learning Specialization30	
How do we pick simpler trees?
©2015-2016 Emily Fox & Carlos Guestrin
1.  Early Stopping: Stop learning algorithm
before tree become too complex
2.  Pruning: Simplify tree after
learning algorithm terminates
Machine Learning Specialization
Early stopping for
learning decision trees
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization33	
Deeper trees è
Increasing complexity
©2015-2016 Emily Fox & Carlos Guestrin
Model complexity increases with depth
Depth = 1 Depth = 2 Depth = 10
Machine Learning Specialization
Early stopping condition 1:
Limit the depth of a tree
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization36	
Restrict tree learning to shallow trees?
©2015-2016 Emily Fox & Carlos Guestrin
ClassificationError
Complex
trees
True error
Training error
Simple
trees
Tree depth
max_depth
Machine Learning Specialization37	
Early stopping condition 1:
Limit depth of tree
©2015-2016 Emily Fox & Carlos Guestrin
ClassificationError
Stop tree building when
depth = max_depth
max_depth	
Tree depth
Machine Learning Specialization38	
Picking value for max_depth???
©2015-2016 Emily Fox & Carlos Guestrin
ClassificationError
Validation set or
cross-validation
max_depth	
Tree depth
Machine Learning Specialization
Early stopping condition 2:
Use classification error to
limit depth of tree
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization40	
Decision tree recursion review
©2015-2016 Emily Fox & Carlos Guestrin
Loan status:
Safe Risky
Credit?
Safe
Root
22 18
excellent
16 0
fair
1 2
poor
5 16
Build decision stump
with subset of data
where Credit = poor
Build decision stump
with subset of data
where Credit = fair
Machine Learning Specialization42	
Split selection for credit=poor
©2015-2016 Emily Fox & Carlos Guestrin
No split improves
classification error
è Stop!
Splits for
credit=poor
Classification
error
(no split) 0.45
split on term 0.24
split on income 0.24
Credit?
Safe
Root
22 18
excellent
16 0
fair
1 2
poor
5 16
Loan status:
Safe Risky
Splits for
credit=poor
Classification
error
(no split) 0.24
Machine Learning Specialization43	
Early stopping condition 2:
No split improves classification error
©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Safe Risky
Early stopping
condition 2
Root
22 18
excellent
16 0
fair
1 2
poor
5 16
Splits for
credit=poor
Classification
error
(no split) 0.24
split on term 0.24
split on income 0.24
Build decision stump
with subset of data
where Credit = fair
Loan status:
Safe Risky
Machine Learning Specialization45	
Practical notes about stopping when
classification error doesn’t decrease
1.  Typically, add magic parameter ε
-  Stop if error doesn’t decrease by more than ε
2.  Some pitfalls to this rule (see pruning section)
3.  Very useful in practice
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization
Early stopping condition 3:
Stop if number of data points
contained in a node is too small
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization47	
Can we trust nodes with very few points?
©2015-2016 Emily Fox & Carlos Guestrin
Root
22 18
excellent
16 0
fair
1 2
poor
5 16
Loan status:
Safe Risky
Credit?
Safe
Stop recursing Only 3 data
points!
Risky
Machine Learning Specialization48	
Early stopping condition 3:
Stop when data points in a node <= Nmin
©2015-2016 Emily Fox & Carlos Guestrin
Root
22 18
excellent
16 0
fair
1 2
poor
5 16
Credit?
Safe Risky
Example: Nmin = 10
Risky
Early stopping
condition 3
Loan status:
Safe Risky
Machine Learning Specialization
Summary of decision trees
with early stopping
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization50	
Early stopping: Summary
©2015-2016 Emily Fox & Carlos Guestrin
1.  Limit tree depth: Stop splitting after a
certain depth
2.  Classification error: Do not consider any
split that does not cause a sufficient
decrease in classification error
3.  Minimum node “size”: Do not split an
intermediate node which contains
too few data points
Machine Learning Specialization52	
Recursion
Stopping
conditions 1 & 2
or
Early stopping
conditions 1, 2 & 3
Greedy decision tree learning
©2015-2016 Emily Fox & Carlos Guestrin
•  Step 1: Start with an empty tree
•  Step 2: Select a feature to split data
•  For each split of the tree:
•  Step 3: If nothing more to,
make predictions
•  Step 4: Otherwise, go to Step 2 &
continue (recurse) on this split
Machine Learning Specialization
Overfitting in Decision Trees:
Pruning
©2015-2016 Emily Fox & Carlos Guestrin
OPTIONAL
Machine Learning Specialization55	
Stopping condition summary
•  Stopping condition:
1.  All examples have the same target value
2.  No more features to split on
•  Early stopping conditions:
1.  Limit tree depth
2.  Do not consider splits that do not cause a
sufficient decrease in classification error
3.  Do not split an intermediate node which
contains too few data points
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization
Exploring some challenges
with early stopping conditions
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization57	
Challenge with early stopping condition 1
©2015-2016 Emily Fox & Carlos Guestrin
ClassificationError
Complex
trees
True error
Training error
Simple
trees
Tree depth
max_depth	
Hard	to	know	exactly	
when	to	stop
Machine Learning Specialization58	
Is early stopping condition 2 a good idea?
©2015-2016 Emily Fox & Carlos Guestrin
Tree depth
ClassificationError
Stop because of
zero decrease in
classification error
Machine Learning Specialization60	
Early stopping condition 2:
Don’t stop if error doesn’t decrease???
©2015-2016 Emily Fox & Carlos Guestrin
y values
True False
Root
2 2
Error = .
=
Tree Classification error
(root) 0.5
x[1] x[2] y
False False False
False True True
True False True
True True False
y = x[1] xor x[2]
Machine Learning Specialization61	
Consider split on x[1]
©2015-2016 Emily Fox & Carlos Guestrin
y values
True False
Root
2 2
Error = .
=
Tree Classification error
(root) 0.5
Split on x[1] 0.5
True
1 1
False
1 1
x[1]
x[1] x[2] y
False False False
False True True
True False True
True True False
y = x[1] xor x[2]
Machine Learning Specialization62	
Consider split on x[2]
©2015-2016 Emily Fox & Carlos Guestrin
y values
True False
Root
2 2
Error = .
=
Tree Classification error
(root) 0.5
Split on x[1] 0.5
Split on x[2] 0.5
True
1 1
False
1 1
x[2]
Neither features
improve training error…
Stop now???
x[1] x[2] y
False False False
False True True
True False True
True True False
y = x[1] xor x[2]
Machine Learning Specialization63	
Final tree with early stopping condition 2
©2015-2016 Emily Fox & Carlos Guestrin
y values
True False
Root
2 2
True
x[1] x[2] y
False False False
False True True
True False True
True True False
y = x[1] xor x[2]
Tree Classification
error
with early stopping
condition 2
0.5
Machine Learning Specialization64	
Without early stopping condition 2
©2015-2016 Emily Fox & Carlos Guestrin
y values
True False
Root
2 2
True
1 1
False
1 1
x[1]
True
0 1
x[2]
True
1 0
False
1 0
x[2]
False
0 1
True FalseFalse True
Tree Classification
error
with early stopping
condition 2
0.5
without early
stopping condition 2
0.0
x[1] x[2] y
False False False
False True True
True False True
True True False
y = x[1] xor x[2]
Machine Learning Specialization66	
Early stopping condition 2: Pros and Cons
•  Pros:
-  A reasonable heuristic for early stopping to
avoid useless splits
•  Cons:
-  Too short sighted: We may miss out on “good”
splits may occur right after “useless” splits
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization
Tree pruning
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization68	
Two approaches to picking simpler trees
©2015-2016 Emily Fox & Carlos Guestrin
1.  Early Stopping: Stop the learning
algorithm before the tree becomes
too complex
2.  Pruning: Simplify the tree after the
learning algorithm terminates
Complements early stopping
Machine Learning Specialization70	
Pruning: Intuition
Train a complex tree, simplify later
©2015-2016 Emily Fox & Carlos Guestrin
Complex Tree
Simplify
Simpler Tree
Machine Learning Specialization71	
Pruning motivation
©2015-2016 Emily Fox & Carlos Guestrin
ClassificationError
True Error
Training Error
Tree depth
Don’t stop
too early
Complex
tree
Simplify after
tree is built
Simple
tree
Machine Learning Specialization72	
Example 1: Which tree is simpler?
©2015-2016 Emily Fox & Carlos Guestrin
OR
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Term?
Risky Safe
Risky
excellent	 poor	
5 years	
low	high	
5 years	3 years	
Credit?
Start
Safe Safe
excellent	 poor	
Risky
fair	
SIMPLER
Machine Learning Specialization73	
Example 2: Which tree is simpler???
©2015-2016 Emily Fox & Carlos Guestrin
ORCredit?
Start
Safe
Risky
excellent	 poor	
bad	
SafeSafe
Risky
good	 fair	
Term?
Start
3 years	 5 years	
Risky Safe
Machine Learning Specialization74	
Simple measure of complexity of tree
©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Start
Safe
Risky
excellent	 poor	
bad	
SafeSafe
Risky
good	 fair	
L(T) = # of leaf nodes
Machine Learning Specialization75	
Which tree has lower L(T)?
©2015-2016 Emily Fox & Carlos Guestrin
ORCredit?
Start
Safe
Risky
excellent	 poor	
bad	
SafeSafe
Risky
good	 fair	
Term?
Start
3 years	 5 years	
Risky Safe
L(T1) = 5 L(T2) = 2
SIMPLER
Machine Learning Specialization76	 ©2015-2016 Emily Fox & Carlos Guestrin
Balance simplicity & predictive power
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Term?
Risky Safe
Risky
excellent	 poor	
5 years	
low	high	
5 years	3 years	
Risky
Start
Too complex, risk of overfitting
Too simple, high
classification error
Machine Learning Specialization78	
Want to balance:
i.  How well tree fits data
ii.  Complexity of tree
Total cost =
measure of fit + measure of complexity
©2015-2016 Emily Fox & Carlos Guestrin
(classification error)
Large # = bad fit to
training data
Large # = likely to
overfit
want to balance
Desired total quality format
Machine Learning Specialization79	
Consider specific total cost
Total cost =
classification error + number of leaf nodes
©2015-2016 Emily Fox & Carlos Guestrin
Error(T) L(T)
Machine Learning Specialization81	
Balancing fit and complexity
Total cost C(T) = Error(T) + λ L(T)
©2015-2016 Emily Fox & Carlos Guestrin
tuning parameter
If λ=0:
If λ=∞: 	
If λ in between:
Machine Learning Specialization82	
Use total cost to simplify trees
©2015-2016 Emily Fox & Carlos Guestrin
Total quality
based pruning
Complex tree
Simpler tree
Machine Learning Specialization
Tree pruning algorithm
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization85	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Risky
excellent	 poor	
5 years	
low	high	
Pruning Intuition
Tree T
Term?
Risky Safe
5 years	3 years
Machine Learning Specialization86	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Risky
excellent	 poor	
5 years	
low	high	
Term?
Risky Safe
5 years	3 years	
Step 1: Consider a split
Tree T
Candidate for
pruning
Machine Learning Specialization88	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Risky
excellent	 poor	
5 years	
low	high	
Step 2: Compute total cost C(T) of split
C(T) = Error(T) + λ L(T)
Tree Error #Leaves Total
T 0.25
λ = 0.3
Tree T
Term?
Risky Safe
5 years	3 years	
Candidate for
pruning
Machine Learning Specialization89	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Safe Risky
excellent	 poor	
5 years	
low	high	
Step 2: “Undo” the splits on Tsmaller
Replace split
by leaf node?
Tree Tsmaller
C(T) = Error(T) + λ L(T)
Tree Error #Leaves Total
T 0.25 6 0.43
Tsmaller 0.26
λ = 0.3
Machine Learning Specialization90	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Safe Risky
excellent	 poor	
5 years	
low	high	
Prune if total cost is lower: C(Tsmaller) ≤ C(T)
Replace split
by leaf node?
Tree Tsmaller
C(T) = Error(T) + λ L(T)
Tree Error #Leaves Total
T 0.25 6 0.43
Tsmaller 0.26 5 0.41
λ = 0.3
Worse training
error but lower
overall cost
YES!
Machine Learning Specialization92	 ©2015-2016 Emily Fox & Carlos Guestrin
Credit?
Term?
Risky
Start
fair	
3 years	
Safe
Safe
Income?
Safe Risky
excellent	 poor	
5 years	
low	high	
Step 5: Repeat Steps 1-4 for every split
Decide if each
split can be
“pruned”
Machine Learning Specialization93	
Decision tree pruning algorithm
©2015-2016 Emily Fox & Carlos Guestrin
•  Start at bottom of tree T and traverse up,
apply prune_split to each decision node M
•  prune_split(T,M):
1.  Compute total cost of tree T using
C(T) = Error(T) + λ L(T)
2.  Let Tsmaller be tree after pruning subtree
below M
3.  Compute total cost complexity of Tsmaller
C(Tsmaller) = Error(Tsmaller) + λ L(Tsmaller)
4.  If C(Tsmaller) < C(T), prune to Tsmaller
Machine Learning Specialization
Summary of overfitting in
decision trees
©2015-2016 Emily Fox & Carlos Guestrin
Machine Learning Specialization95	
•  Identify when overfitting in decision trees
•  Prevent overfitting with early stopping
-  Limit tree depth
-  Do not consider splits that do not reduce
classification error
-  Do not split intermediate nodes with only
few points
•  Prevent overfitting by pruning complex trees
-  Use a total cost formula that balances
classification error and tree complexity
-  Use total cost to merge potentially complex
trees into simpler ones
©2015-2016 Emily Fox & Carlos Guestrin
What you can do now…
Machine Learning Specialization96	
Thank you to Dr. Krishna Sridhar
Dr. Krishna Sridhar
Staff Data Scientist, Dato, Inc.
©2015-2016 Emily Fox & Carlos Guestrin

Mais conteúdo relacionado

Mais de Daniel LIAO (18)

C4.5
C4.5C4.5
C4.5
 
C4.4
C4.4C4.4
C4.4
 
C4.3.1
C4.3.1C4.3.1
C4.3.1
 
Lime
LimeLime
Lime
 
C4.1.2
C4.1.2C4.1.2
C4.1.2
 
C3.7.1
C3.7.1C3.7.1
C3.7.1
 
C3.6.1
C3.6.1C3.6.1
C3.6.1
 
C3.5.1
C3.5.1C3.5.1
C3.5.1
 
C2.3
C2.3C2.3
C2.3
 
C2.6
C2.6C2.6
C2.6
 
C2.2
C2.2C2.2
C2.2
 
C2.5
C2.5C2.5
C2.5
 
C2.4
C2.4C2.4
C2.4
 
C3.1.2
C3.1.2C3.1.2
C3.1.2
 
C3.2.2
C3.2.2C3.2.2
C3.2.2
 
C3.2.1
C3.2.1C3.2.1
C3.2.1
 
C3.1.logistic intro
C3.1.logistic introC3.1.logistic intro
C3.1.logistic intro
 
C2.1 intro
C2.1 introC2.1 intro
C2.1 intro
 

Último

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Último (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

C3.4.1

  • 1. Machine Learning Specialization Overfitting in decision trees Emily Fox & Carlos Guestrin Machine Learning Specialization University of Washington ©2015-2016 Emily Fox & Carlos Guestrin
  • 2. Machine Learning Specialization3 Review of loan default prediction ©2015-2016 Emily Fox & Carlos Guestrin Safe ✓ Risky ✘ Risky ✘ Intelligent loan application review system Loan Applications
  • 3. Machine Learning Specialization4 ©2015-2016 Emily Fox & Carlos Guestrin Decision tree review T(xi) = Traverse decision tree Loan Application Input: xi ŷi Credit? Safe Term? Risky Safe Income? Term? Risky Safe Risky start excellent poor fair 5 years 3 years Low high 5 years 3 years
  • 4. Machine Learning Specialization Overfitting review ©2015-2016 Emily Fox & Carlos Guestrin
  • 5. Machine Learning Specialization8 Overfitting in logistic regression ©2015-2016 Emily Fox & Carlos Guestrin Model complexity Error= ClassificationError Overfitting if there exists w*: •  training_error(w*) > training_error(ŵ) •  true_error(w*) < true_error(ŵ) True error Training error
  • 6. Machine Learning Specialization9 Overfitting è Overconfident predictions ©2015-2016 Emily Fox & Carlos Guestrin Logistic Regression (Degree 6) Logistic Regression (Degree 20)
  • 7. Machine Learning Specialization Overfitting in decision trees ©2015-2016 Emily Fox & Carlos Guestrin
  • 8. Machine Learning Specialization13 Decision stump (Depth 1): Split on x[1] ©2015-2016 Emily Fox & Carlos Guestrin Root 18 13 x[1] < -0.07 13 3 x[1] >= -0.07 4 11 x[1] y values - +
  • 9. Machine Learning Specialization14 Tree depth depth = 1 depth = 2 depth = 3 depth = 5 depth = 10 Training error 0.22 0.13 0.10 0.03 0.00 Decision boundary ©2015-2016 Emily Fox & Carlos Guestrin Training error reduces with depth What happens when we increase depth?
  • 10. Machine Learning Specialization16 Deeper trees è lower training error ©2015-2016 Emily Fox & Carlos Guestrin Tree depth TrainingError Depth 10 (training error = 0.0)
  • 11. Machine Learning Specialization17 Training error = 0: Is this model perfect? ©2015-2016 Emily Fox & Carlos Guestrin Depth 10 (training error = 0.0) NOT PERFECT
  • 12. Machine Learning Specialization18 Why training error reduces with depth? ©2015-2016 Emily Fox & Carlos Guestrin Root 22 18 excellent 9 0 good 9 4 fair 4 14 Loan status: Safe Risky Credit? Tree Training error (root) 0.45 split on credit 0.20 Safe Safe Risky Training error improved by 0.25 because of the split Split on credit
  • 13. Machine Learning Specialization19 Feature split selection algorithm ©2015-2016 Emily Fox & Carlos Guestrin •  Given a subset of data M (a node in a tree) •  For each feature hi(x): 1.  Split data of M according to feature hi(x) 2.  Compute classification error split •  Chose feature h*(x) with lowest classification error By design, each split reduces training error
  • 14. Machine Learning Specialization21 Decision trees overfitting on loan data ©2015-2016 Emily Fox & Carlos Guestrin
  • 15. Machine Learning Specialization Principle of Occam’s razor: Simpler trees are better ©2015-2016 Emily Fox & Carlos Guestrin
  • 16. Machine Learning Specialization25 Principle of Occam’s Razor ©2015-2016 Emily Fox & Carlos Guestrin “Among competing hypotheses, the one with fewest assumptions should be selected”, William of Occam, 13th Century OR Diagnosis 1: 2 diseases Two diseases D1 and D2 where D1 explains S1, D2 explains S2 Diagnosis 2: 1 disease Disease D3 explains both symptoms S1 and S2 Symptoms: S1 and S2 SIMPLER
  • 17. Machine Learning Specialization26 Occam’s Razor for decision trees ©2015-2016 Emily Fox & Carlos Guestrin When two trees have similar classification error on the validation set, pick the simpler one Complexity Train error Validation error Simple 0.23 0.24 Moderate 0.12 0.15 Complex 0.07 0.15 Super complex 0 0.18 Overfit Same validation error
  • 18. Machine Learning Specialization27 Which tree is simpler? ©2015-2016 Emily Fox & Carlos Guestrin OR SIMPLER
  • 19. Machine Learning Specialization29 Modified tree learning problem ©2015-2016 Emily Fox & Carlos Guestrin T1(X) T2(X) T4(X) Find a “simple” decision tree with low classification error Complex treesSimple trees
  • 20. Machine Learning Specialization30 How do we pick simpler trees? ©2015-2016 Emily Fox & Carlos Guestrin 1.  Early Stopping: Stop learning algorithm before tree become too complex 2.  Pruning: Simplify tree after learning algorithm terminates
  • 21. Machine Learning Specialization Early stopping for learning decision trees ©2015-2016 Emily Fox & Carlos Guestrin
  • 22. Machine Learning Specialization33 Deeper trees è Increasing complexity ©2015-2016 Emily Fox & Carlos Guestrin Model complexity increases with depth Depth = 1 Depth = 2 Depth = 10
  • 23. Machine Learning Specialization Early stopping condition 1: Limit the depth of a tree ©2015-2016 Emily Fox & Carlos Guestrin
  • 24. Machine Learning Specialization36 Restrict tree learning to shallow trees? ©2015-2016 Emily Fox & Carlos Guestrin ClassificationError Complex trees True error Training error Simple trees Tree depth max_depth
  • 25. Machine Learning Specialization37 Early stopping condition 1: Limit depth of tree ©2015-2016 Emily Fox & Carlos Guestrin ClassificationError Stop tree building when depth = max_depth max_depth Tree depth
  • 26. Machine Learning Specialization38 Picking value for max_depth??? ©2015-2016 Emily Fox & Carlos Guestrin ClassificationError Validation set or cross-validation max_depth Tree depth
  • 27. Machine Learning Specialization Early stopping condition 2: Use classification error to limit depth of tree ©2015-2016 Emily Fox & Carlos Guestrin
  • 28. Machine Learning Specialization40 Decision tree recursion review ©2015-2016 Emily Fox & Carlos Guestrin Loan status: Safe Risky Credit? Safe Root 22 18 excellent 16 0 fair 1 2 poor 5 16 Build decision stump with subset of data where Credit = poor Build decision stump with subset of data where Credit = fair
  • 29. Machine Learning Specialization42 Split selection for credit=poor ©2015-2016 Emily Fox & Carlos Guestrin No split improves classification error è Stop! Splits for credit=poor Classification error (no split) 0.45 split on term 0.24 split on income 0.24 Credit? Safe Root 22 18 excellent 16 0 fair 1 2 poor 5 16 Loan status: Safe Risky Splits for credit=poor Classification error (no split) 0.24
  • 30. Machine Learning Specialization43 Early stopping condition 2: No split improves classification error ©2015-2016 Emily Fox & Carlos Guestrin Credit? Safe Risky Early stopping condition 2 Root 22 18 excellent 16 0 fair 1 2 poor 5 16 Splits for credit=poor Classification error (no split) 0.24 split on term 0.24 split on income 0.24 Build decision stump with subset of data where Credit = fair Loan status: Safe Risky
  • 31. Machine Learning Specialization45 Practical notes about stopping when classification error doesn’t decrease 1.  Typically, add magic parameter ε -  Stop if error doesn’t decrease by more than ε 2.  Some pitfalls to this rule (see pruning section) 3.  Very useful in practice ©2015-2016 Emily Fox & Carlos Guestrin
  • 32. Machine Learning Specialization Early stopping condition 3: Stop if number of data points contained in a node is too small ©2015-2016 Emily Fox & Carlos Guestrin
  • 33. Machine Learning Specialization47 Can we trust nodes with very few points? ©2015-2016 Emily Fox & Carlos Guestrin Root 22 18 excellent 16 0 fair 1 2 poor 5 16 Loan status: Safe Risky Credit? Safe Stop recursing Only 3 data points! Risky
  • 34. Machine Learning Specialization48 Early stopping condition 3: Stop when data points in a node <= Nmin ©2015-2016 Emily Fox & Carlos Guestrin Root 22 18 excellent 16 0 fair 1 2 poor 5 16 Credit? Safe Risky Example: Nmin = 10 Risky Early stopping condition 3 Loan status: Safe Risky
  • 35. Machine Learning Specialization Summary of decision trees with early stopping ©2015-2016 Emily Fox & Carlos Guestrin
  • 36. Machine Learning Specialization50 Early stopping: Summary ©2015-2016 Emily Fox & Carlos Guestrin 1.  Limit tree depth: Stop splitting after a certain depth 2.  Classification error: Do not consider any split that does not cause a sufficient decrease in classification error 3.  Minimum node “size”: Do not split an intermediate node which contains too few data points
  • 37. Machine Learning Specialization52 Recursion Stopping conditions 1 & 2 or Early stopping conditions 1, 2 & 3 Greedy decision tree learning ©2015-2016 Emily Fox & Carlos Guestrin •  Step 1: Start with an empty tree •  Step 2: Select a feature to split data •  For each split of the tree: •  Step 3: If nothing more to, make predictions •  Step 4: Otherwise, go to Step 2 & continue (recurse) on this split
  • 38. Machine Learning Specialization Overfitting in Decision Trees: Pruning ©2015-2016 Emily Fox & Carlos Guestrin OPTIONAL
  • 39. Machine Learning Specialization55 Stopping condition summary •  Stopping condition: 1.  All examples have the same target value 2.  No more features to split on •  Early stopping conditions: 1.  Limit tree depth 2.  Do not consider splits that do not cause a sufficient decrease in classification error 3.  Do not split an intermediate node which contains too few data points ©2015-2016 Emily Fox & Carlos Guestrin
  • 40. Machine Learning Specialization Exploring some challenges with early stopping conditions ©2015-2016 Emily Fox & Carlos Guestrin
  • 41. Machine Learning Specialization57 Challenge with early stopping condition 1 ©2015-2016 Emily Fox & Carlos Guestrin ClassificationError Complex trees True error Training error Simple trees Tree depth max_depth Hard to know exactly when to stop
  • 42. Machine Learning Specialization58 Is early stopping condition 2 a good idea? ©2015-2016 Emily Fox & Carlos Guestrin Tree depth ClassificationError Stop because of zero decrease in classification error
  • 43. Machine Learning Specialization60 Early stopping condition 2: Don’t stop if error doesn’t decrease??? ©2015-2016 Emily Fox & Carlos Guestrin y values True False Root 2 2 Error = . = Tree Classification error (root) 0.5 x[1] x[2] y False False False False True True True False True True True False y = x[1] xor x[2]
  • 44. Machine Learning Specialization61 Consider split on x[1] ©2015-2016 Emily Fox & Carlos Guestrin y values True False Root 2 2 Error = . = Tree Classification error (root) 0.5 Split on x[1] 0.5 True 1 1 False 1 1 x[1] x[1] x[2] y False False False False True True True False True True True False y = x[1] xor x[2]
  • 45. Machine Learning Specialization62 Consider split on x[2] ©2015-2016 Emily Fox & Carlos Guestrin y values True False Root 2 2 Error = . = Tree Classification error (root) 0.5 Split on x[1] 0.5 Split on x[2] 0.5 True 1 1 False 1 1 x[2] Neither features improve training error… Stop now??? x[1] x[2] y False False False False True True True False True True True False y = x[1] xor x[2]
  • 46. Machine Learning Specialization63 Final tree with early stopping condition 2 ©2015-2016 Emily Fox & Carlos Guestrin y values True False Root 2 2 True x[1] x[2] y False False False False True True True False True True True False y = x[1] xor x[2] Tree Classification error with early stopping condition 2 0.5
  • 47. Machine Learning Specialization64 Without early stopping condition 2 ©2015-2016 Emily Fox & Carlos Guestrin y values True False Root 2 2 True 1 1 False 1 1 x[1] True 0 1 x[2] True 1 0 False 1 0 x[2] False 0 1 True FalseFalse True Tree Classification error with early stopping condition 2 0.5 without early stopping condition 2 0.0 x[1] x[2] y False False False False True True True False True True True False y = x[1] xor x[2]
  • 48. Machine Learning Specialization66 Early stopping condition 2: Pros and Cons •  Pros: -  A reasonable heuristic for early stopping to avoid useless splits •  Cons: -  Too short sighted: We may miss out on “good” splits may occur right after “useless” splits ©2015-2016 Emily Fox & Carlos Guestrin
  • 49. Machine Learning Specialization Tree pruning ©2015-2016 Emily Fox & Carlos Guestrin
  • 50. Machine Learning Specialization68 Two approaches to picking simpler trees ©2015-2016 Emily Fox & Carlos Guestrin 1.  Early Stopping: Stop the learning algorithm before the tree becomes too complex 2.  Pruning: Simplify the tree after the learning algorithm terminates Complements early stopping
  • 51. Machine Learning Specialization70 Pruning: Intuition Train a complex tree, simplify later ©2015-2016 Emily Fox & Carlos Guestrin Complex Tree Simplify Simpler Tree
  • 52. Machine Learning Specialization71 Pruning motivation ©2015-2016 Emily Fox & Carlos Guestrin ClassificationError True Error Training Error Tree depth Don’t stop too early Complex tree Simplify after tree is built Simple tree
  • 53. Machine Learning Specialization72 Example 1: Which tree is simpler? ©2015-2016 Emily Fox & Carlos Guestrin OR Credit? Term? Risky Start fair 3 years Safe Safe Income? Term? Risky Safe Risky excellent poor 5 years low high 5 years 3 years Credit? Start Safe Safe excellent poor Risky fair SIMPLER
  • 54. Machine Learning Specialization73 Example 2: Which tree is simpler??? ©2015-2016 Emily Fox & Carlos Guestrin ORCredit? Start Safe Risky excellent poor bad SafeSafe Risky good fair Term? Start 3 years 5 years Risky Safe
  • 55. Machine Learning Specialization74 Simple measure of complexity of tree ©2015-2016 Emily Fox & Carlos Guestrin Credit? Start Safe Risky excellent poor bad SafeSafe Risky good fair L(T) = # of leaf nodes
  • 56. Machine Learning Specialization75 Which tree has lower L(T)? ©2015-2016 Emily Fox & Carlos Guestrin ORCredit? Start Safe Risky excellent poor bad SafeSafe Risky good fair Term? Start 3 years 5 years Risky Safe L(T1) = 5 L(T2) = 2 SIMPLER
  • 57. Machine Learning Specialization76 ©2015-2016 Emily Fox & Carlos Guestrin Balance simplicity & predictive power Credit? Term? Risky Start fair 3 years Safe Safe Income? Term? Risky Safe Risky excellent poor 5 years low high 5 years 3 years Risky Start Too complex, risk of overfitting Too simple, high classification error
  • 58. Machine Learning Specialization78 Want to balance: i.  How well tree fits data ii.  Complexity of tree Total cost = measure of fit + measure of complexity ©2015-2016 Emily Fox & Carlos Guestrin (classification error) Large # = bad fit to training data Large # = likely to overfit want to balance Desired total quality format
  • 59. Machine Learning Specialization79 Consider specific total cost Total cost = classification error + number of leaf nodes ©2015-2016 Emily Fox & Carlos Guestrin Error(T) L(T)
  • 60. Machine Learning Specialization81 Balancing fit and complexity Total cost C(T) = Error(T) + λ L(T) ©2015-2016 Emily Fox & Carlos Guestrin tuning parameter If λ=0: If λ=∞: If λ in between:
  • 61. Machine Learning Specialization82 Use total cost to simplify trees ©2015-2016 Emily Fox & Carlos Guestrin Total quality based pruning Complex tree Simpler tree
  • 62. Machine Learning Specialization Tree pruning algorithm ©2015-2016 Emily Fox & Carlos Guestrin
  • 63. Machine Learning Specialization85 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Risky excellent poor 5 years low high Pruning Intuition Tree T Term? Risky Safe 5 years 3 years
  • 64. Machine Learning Specialization86 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Risky excellent poor 5 years low high Term? Risky Safe 5 years 3 years Step 1: Consider a split Tree T Candidate for pruning
  • 65. Machine Learning Specialization88 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Risky excellent poor 5 years low high Step 2: Compute total cost C(T) of split C(T) = Error(T) + λ L(T) Tree Error #Leaves Total T 0.25 λ = 0.3 Tree T Term? Risky Safe 5 years 3 years Candidate for pruning
  • 66. Machine Learning Specialization89 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Safe Risky excellent poor 5 years low high Step 2: “Undo” the splits on Tsmaller Replace split by leaf node? Tree Tsmaller C(T) = Error(T) + λ L(T) Tree Error #Leaves Total T 0.25 6 0.43 Tsmaller 0.26 λ = 0.3
  • 67. Machine Learning Specialization90 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Safe Risky excellent poor 5 years low high Prune if total cost is lower: C(Tsmaller) ≤ C(T) Replace split by leaf node? Tree Tsmaller C(T) = Error(T) + λ L(T) Tree Error #Leaves Total T 0.25 6 0.43 Tsmaller 0.26 5 0.41 λ = 0.3 Worse training error but lower overall cost YES!
  • 68. Machine Learning Specialization92 ©2015-2016 Emily Fox & Carlos Guestrin Credit? Term? Risky Start fair 3 years Safe Safe Income? Safe Risky excellent poor 5 years low high Step 5: Repeat Steps 1-4 for every split Decide if each split can be “pruned”
  • 69. Machine Learning Specialization93 Decision tree pruning algorithm ©2015-2016 Emily Fox & Carlos Guestrin •  Start at bottom of tree T and traverse up, apply prune_split to each decision node M •  prune_split(T,M): 1.  Compute total cost of tree T using C(T) = Error(T) + λ L(T) 2.  Let Tsmaller be tree after pruning subtree below M 3.  Compute total cost complexity of Tsmaller C(Tsmaller) = Error(Tsmaller) + λ L(Tsmaller) 4.  If C(Tsmaller) < C(T), prune to Tsmaller
  • 70. Machine Learning Specialization Summary of overfitting in decision trees ©2015-2016 Emily Fox & Carlos Guestrin
  • 71. Machine Learning Specialization95 •  Identify when overfitting in decision trees •  Prevent overfitting with early stopping -  Limit tree depth -  Do not consider splits that do not reduce classification error -  Do not split intermediate nodes with only few points •  Prevent overfitting by pruning complex trees -  Use a total cost formula that balances classification error and tree complexity -  Use total cost to merge potentially complex trees into simpler ones ©2015-2016 Emily Fox & Carlos Guestrin What you can do now…
  • 72. Machine Learning Specialization96 Thank you to Dr. Krishna Sridhar Dr. Krishna Sridhar Staff Data Scientist, Dato, Inc. ©2015-2016 Emily Fox & Carlos Guestrin