SlideShare uma empresa Scribd logo
1 de 24
Decision Tree
-- C4.5 and CART
Xueping Peng
Xueping.peng@uts.edu.au
ID3 (1/2)
 ID3 is a strong system that
 Uses hill-climbing search based on the information gain measure
to search through the space of decision trees
 Outputs a single hypothesis
 Never backtracks.It converges to locally optimal solutions
 Uses all training examples at each step, contrary to methods that
make decisions incrementally
 Uses statistical properties of all examples:the search is less
sensitive to errors in individual training examples
ID3 (2/2)
 However, ID3 has some drawbacks
 It can only deal with nominal data (e.g. continuous data)
 It may be not robust in presence of noise (e.g. overfitting)
 It is not able to deal with noisy data sets (e.g. missing values)
How C4.5 enhances C4.5 to
 Be robust in the presence of noise. Avoid overfitting
 Deal with continuous attributes
 Deal with missing data
 Convert trees to rules
What’s Overfitting?
 Overfitting = Given a hypothesis space H, a hypothesis hєH is
said to overfit the training data if there exists some
alternative hypothesis h’єH, such that
 h has smaller error than h’ over the training examples, but
 h’ has a smaller error than h over the entire distribution of
instances.
Why May my System Overfit?
 In domains with noise or uncertainty
 the system may try to decrease the training error by completely
fitting all the training examples
How to Avoid Overfitting?
 Pre-prune: Stop growing a branch when information
becomes unreliable
 Post-prune:Take a fully-grown decision tree and discard
unreliable parts
Post-pruning
 First, build the full tree
 Then, prune it
 Fully-grown tree shows all attribute interactions
 Two pruning operations:
 Subtree replacement
 Subtree raising
 Possible strategies:
 error estimation
 significance testing
 MDL principle
Subtree Replacement
 Bottom up approach
 Consider replacing a tree after considering all its subtrees
 Ex: labor negotiations
Subtree Replacement
 Algorithm:
 Split the data into training and validation set
 Do until further pruning is harmful:
 Evaluate impact on the validation set of
pruning each possible node
 Select the node whose removal most
increases the validation set accuracy
Deal with continuous attributes
 When dealing with nominal data
 We evaluated the grain for each possible value
 In continuous data, we have infinite values
 What should we do?
 Continuous-valued attributes may take infinite values, but we
have a limited number of values in our instances (at most N if we
have N instances)
 Therefore, simulate that you have N nominal values
 Evaluate information gain for every possible split point of the attribute
 Choose the best split point
 The information gain of the attribute is the information gain of the best split
Deal with continuous attributes
 Example
Deal with continuous attributes
 Split on temperature attribute:
 E.g.: temperature < 71.5: yes/4, no/2
temperature ≥ 71.5: yes/5, no/3
 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
 Place split points halfway between values
 Can evaluate all split points in one pass!
Deal with continuous attributes
 To speed up
 Entropy only needs to be evaluated between points of different
classes
Potential optimal breakpoints
Breakpoints between values of the same class cannot be optimal
Deal with Missing Data
 Treat missing values as a separate value
 Missing value denoted “?” in C4.X
 Simple idea: treat missing as a separate value
 Split instances with missing values into pieces
 A piece going down a branch receives a weight proportional to the
popularity of the branch
 weights sum to 1
 Info gain works with fractional instances
 Use sums of weights instead of counts
 During classification, split the instance into pieces in the same way
 Merge probability distribution using weights
What is CART?
 Classification And Regression Trees
 Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
 Introduced tree-based modeling into the statistical mainstream
 Rigorous approach involving cross-validation to select the optimal
tree
 One of many tree-based modeling techniques.
 CART -- the classic
 CHAID
 C5.0
 Software package variants (SAS, S-Plus, R…)
 Note: the “rpart” package in “R” is freely available
The Key Idea
Recursive Partitioning
 Take all of your data.
 Consider all possible values of all variables.
 Select the variable/value (X=t1) that produces the greatest
“separation” in the target.
 (X=t1) is called a “split”.
 If X< t1 then send the data to the “left”; otherwise, send data
point to the “right”.
 Now repeat same process on these two “nodes”
 You get a “tree”
 Note: CART only uses binary splits.
|
Splitting Rules
 Select the variable value (X=t1) that produces the
greatest “separation” in the target variable.
 “Separation” defined in many ways.
 Regression Trees (continuous target): use sum of
squared errors.
 Classification Trees (categorical target): choice of
entropy, Gini measure, “twoing” splitting rule.
RegressionTrees
 Tree-based modeling for continuous target variable
 most intuitively appropriate method for loss ratio analysis
 Find split that produces greatest separation in
∑[y – E(y)]2
 i.e.: find nodes with minimal within variance
 and therefore greatest between variance
 like credibility theory
 Every record in a node is assigned the same yhat
 model is a step function
ClassificationTrees
 Tree-based modeling for discrete target variable
 In contrast with regression trees, various measures of purity
are used
 Common measures of purity:
 Gini, entropy, “twoing”
 Intuition: an ideal retention model would produce nodes that
contain either defectors only or non-defectors only
 completely pure nodes
ClassificationTrees vs. RegressionTrees
 Splitting Criteria:
 Gini, Entropy,Twoing
 Goodness of fit measure:
 misclassification rates
 Prior probabilities and
misclassification costs
 available as model
“tuning parameters”
 Splitting Criterion:
 sum of squared errors
 Goodness of fit:
 same measure!
 sum of squared errors
 No priors or
misclassification costs…
 … just let it run
Cost-Complexity Pruning
 Definition: Cost-Complexity Criterion
Rα= MC + αL
 MC = misclassification rate
 Relative to # misclassifications in root node.
 L = # leaves (terminal nodes)
 You get a credit for lower MC.
 But you also get a penalty for more leaves.
 Let T0 be the biggest tree.
 Find sub-tree of Tα of T0 that minimizes Rα.
 Optimal trade-off of accuracy and complexity.
References
 Introduction to Machine Learning - Lecture
6,http://www.slideshare.net/aorriols/lecture6-c45
 CART fromA to B,
https://www.casact.org/education/specsem/f2005/handouts/
cart.ppt

Mais conteúdo relacionado

Mais procurados

decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is funZhen Li
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 

Mais procurados (20)

decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Random forest
Random forestRandom forest
Random forest
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 

Destaque

Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningNasha Dmasive
 
Good Enough Analytics
Good Enough AnalyticsGood Enough Analytics
Good Enough AnalyticsKai Xin Thia
 
Belajar mudah algoritma data mining c4.5
Belajar mudah algoritma data mining c4.5Belajar mudah algoritma data mining c4.5
Belajar mudah algoritma data mining c4.5ilmuBiner
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 

Destaque (8)

Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Good Enough Analytics
Good Enough AnalyticsGood Enough Analytics
Good Enough Analytics
 
Belajar mudah algoritma data mining c4.5
Belajar mudah algoritma data mining c4.5Belajar mudah algoritma data mining c4.5
Belajar mudah algoritma data mining c4.5
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 

Semelhante a Decision Tree - C4.5&CART

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees Nikolaos Vergos
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppthenonah
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 

Semelhante a Decision Tree - C4.5&CART (20)

16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
Decision tree
Decision tree Decision tree
Decision tree
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 

Decision Tree - C4.5&CART

  • 1. Decision Tree -- C4.5 and CART Xueping Peng Xueping.peng@uts.edu.au
  • 2.
  • 3. ID3 (1/2)  ID3 is a strong system that  Uses hill-climbing search based on the information gain measure to search through the space of decision trees  Outputs a single hypothesis  Never backtracks.It converges to locally optimal solutions  Uses all training examples at each step, contrary to methods that make decisions incrementally  Uses statistical properties of all examples:the search is less sensitive to errors in individual training examples
  • 4. ID3 (2/2)  However, ID3 has some drawbacks  It can only deal with nominal data (e.g. continuous data)  It may be not robust in presence of noise (e.g. overfitting)  It is not able to deal with noisy data sets (e.g. missing values)
  • 5. How C4.5 enhances C4.5 to  Be robust in the presence of noise. Avoid overfitting  Deal with continuous attributes  Deal with missing data  Convert trees to rules
  • 6. What’s Overfitting?  Overfitting = Given a hypothesis space H, a hypothesis hєH is said to overfit the training data if there exists some alternative hypothesis h’єH, such that  h has smaller error than h’ over the training examples, but  h’ has a smaller error than h over the entire distribution of instances.
  • 7. Why May my System Overfit?  In domains with noise or uncertainty  the system may try to decrease the training error by completely fitting all the training examples
  • 8. How to Avoid Overfitting?  Pre-prune: Stop growing a branch when information becomes unreliable  Post-prune:Take a fully-grown decision tree and discard unreliable parts
  • 9. Post-pruning  First, build the full tree  Then, prune it  Fully-grown tree shows all attribute interactions  Two pruning operations:  Subtree replacement  Subtree raising  Possible strategies:  error estimation  significance testing  MDL principle
  • 10. Subtree Replacement  Bottom up approach  Consider replacing a tree after considering all its subtrees  Ex: labor negotiations
  • 11. Subtree Replacement  Algorithm:  Split the data into training and validation set  Do until further pruning is harmful:  Evaluate impact on the validation set of pruning each possible node  Select the node whose removal most increases the validation set accuracy
  • 12. Deal with continuous attributes  When dealing with nominal data  We evaluated the grain for each possible value  In continuous data, we have infinite values  What should we do?  Continuous-valued attributes may take infinite values, but we have a limited number of values in our instances (at most N if we have N instances)  Therefore, simulate that you have N nominal values  Evaluate information gain for every possible split point of the attribute  Choose the best split point  The information gain of the attribute is the information gain of the best split
  • 13. Deal with continuous attributes  Example
  • 14. Deal with continuous attributes  Split on temperature attribute:  E.g.: temperature < 71.5: yes/4, no/2 temperature ≥ 71.5: yes/5, no/3  Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits  Place split points halfway between values  Can evaluate all split points in one pass!
  • 15. Deal with continuous attributes  To speed up  Entropy only needs to be evaluated between points of different classes Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal
  • 16. Deal with Missing Data  Treat missing values as a separate value  Missing value denoted “?” in C4.X  Simple idea: treat missing as a separate value  Split instances with missing values into pieces  A piece going down a branch receives a weight proportional to the popularity of the branch  weights sum to 1  Info gain works with fractional instances  Use sums of weights instead of counts  During classification, split the instance into pieces in the same way  Merge probability distribution using weights
  • 17. What is CART?  Classification And Regression Trees  Developed by Breiman, Friedman, Olshen, Stone in early 80’s.  Introduced tree-based modeling into the statistical mainstream  Rigorous approach involving cross-validation to select the optimal tree  One of many tree-based modeling techniques.  CART -- the classic  CHAID  C5.0  Software package variants (SAS, S-Plus, R…)  Note: the “rpart” package in “R” is freely available
  • 18. The Key Idea Recursive Partitioning  Take all of your data.  Consider all possible values of all variables.  Select the variable/value (X=t1) that produces the greatest “separation” in the target.  (X=t1) is called a “split”.  If X< t1 then send the data to the “left”; otherwise, send data point to the “right”.  Now repeat same process on these two “nodes”  You get a “tree”  Note: CART only uses binary splits. |
  • 19. Splitting Rules  Select the variable value (X=t1) that produces the greatest “separation” in the target variable.  “Separation” defined in many ways.  Regression Trees (continuous target): use sum of squared errors.  Classification Trees (categorical target): choice of entropy, Gini measure, “twoing” splitting rule.
  • 20. RegressionTrees  Tree-based modeling for continuous target variable  most intuitively appropriate method for loss ratio analysis  Find split that produces greatest separation in ∑[y – E(y)]2  i.e.: find nodes with minimal within variance  and therefore greatest between variance  like credibility theory  Every record in a node is assigned the same yhat  model is a step function
  • 21. ClassificationTrees  Tree-based modeling for discrete target variable  In contrast with regression trees, various measures of purity are used  Common measures of purity:  Gini, entropy, “twoing”  Intuition: an ideal retention model would produce nodes that contain either defectors only or non-defectors only  completely pure nodes
  • 22. ClassificationTrees vs. RegressionTrees  Splitting Criteria:  Gini, Entropy,Twoing  Goodness of fit measure:  misclassification rates  Prior probabilities and misclassification costs  available as model “tuning parameters”  Splitting Criterion:  sum of squared errors  Goodness of fit:  same measure!  sum of squared errors  No priors or misclassification costs…  … just let it run
  • 23. Cost-Complexity Pruning  Definition: Cost-Complexity Criterion Rα= MC + αL  MC = misclassification rate  Relative to # misclassifications in root node.  L = # leaves (terminal nodes)  You get a credit for lower MC.  But you also get a penalty for more leaves.  Let T0 be the biggest tree.  Find sub-tree of Tα of T0 that minimizes Rα.  Optimal trade-off of accuracy and complexity.
  • 24. References  Introduction to Machine Learning - Lecture 6,http://www.slideshare.net/aorriols/lecture6-c45  CART fromA to B, https://www.casact.org/education/specsem/f2005/handouts/ cart.ppt