SlideShare uma empresa Scribd logo
1 de 34
CSCI-B 659
Decision Trees
Milind Gokhale
February 4, 2015
Decision Tree Learning Agenda
• Decision Tree Representation
• ID3 Learning Algorithm
• Entropy, Information Gain
• An Illustrative Example
• Issues in Decision Tree Learning
CSCI-B 659 | Decision Trees
Decision Tree Learning
• It is a method of approximating discrete-valued functions
that is robust to noisy data and capable of learning
disjunctive expressions
• The learned function is represented by a decision tree.
• Disjunctive Expressions – (A ∧ B ∧ C) ∨ (D ∧ E ∧ F)
CSCI-B 659 | Decision Trees
Decision Tree Representation
• Each internal node tests an
attribute
• Each branch corresponds to an
attribute value
• Each leaf node assigns a
classification
PlayTennis: This decision tree classifies Saturday mornings
according to whether or not they are suitable for playing tennis
CSCI-B 659 | Decision Trees
Decision Tree Representation - Classification
• An example is classified by
sorting it through the tree from
the root to the leaf node
• Example – (Outlook = Sunny,
Humidity = High) =>
(PlayTennis = No)
PlayTennis: This decision tree classifies Saturday mornings
according to whether or not they are suitable for playing tennis
CSCI-B 659 | Decision Trees
Appropriate problems for decision tree
learning
• Instances describable by attribute-value pairs
• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy data
• Training data may contain errors
• Training data may contain missing attribute values
• Examples – Classification Problems
1. Equipment or medical diagnosis
2. Credit risk analysis
CSCI-B 659 | Decision Trees
Basic ID3 Learning Algorithm approach
• Top-down construction of the tree, beginning with the question "which attribute
should be tested at the root of the tree?'
• Each instance attribute is evaluated using a statistical test to determine how
well it alone classifies the training examples.
• The best attribute is selected and used as the test at the root node of the tree.
• A descendant of the root node is then created for each possible value of this
attribute.
• The training examples are sorted to the appropriate descendant node
• The entire process is then repeated at the descendant node using the training
examples associated with each descendant node
• GREEDY Approach
• No Backtracking - So we may get a suboptimal solution.
CSCI-B 659 | Decision Trees
ID3 Algorithm
CSCI-B 659 | Decision Trees
Top-Down induction of decision trees
1. Find A = the best decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendants of node
4. Sort the training examples to the leaf node.
5. If training examples classified perfectly, STOP else
iterate over the new leaf nodes.
CSCI-B 659 | Decision Trees
Which attribute is the best classifier?
• Information Gain – A statistical property that measures
how well a given attribute separates the training
examples according to their target classification.
• This measure is used to select among the candidate
attributes at each step while growing the tree.
CSCI-B 659 | Decision Trees
Entropy
• S is a sample of training examples
• is the proportion of positive examples in S
• is the proportion of negative examples in S
• Then the entropy measures the impurity of S:
• But If the target attribute can take c different
values:
CSCI-B 659 | Decision Trees
The entropy varies between 0 and 1.
if all the members belong to the same
class => The entropy = 0
if there are equal number of positive and
negative examples => The entropy = 1
Entropy - Example
• Entropy([29+, 35-]) = - (29/64) log2(29/64) - (35/64) log2(35/64)
= 0.994
CSCI-B 659 | Decision Trees
Information Gain
• Gain(S,A) = expected reduction in entropy due to sorting
on A
• Here sv is simply the sum of the entropies of each
subset, weighted by the fraction of examples |sv/s| that
belong to sv
CSCI-B 659 | Decision Trees
Information Gain - Example
• Gain(S,A1)
= 0.994 – (26/64)*.706 – (38/64)*.742
= 0.266
• Information gained by partitioning
along attribute A1 is 0.266
CSCI-B 659 | Decision Trees
E=0.706 E=0.742
E=0.994 E=0.994
E=0.937 E=0.619
• Gain(S,A2)
= 0.994 – (51/64)*.937 – (13/64)*.619
= 0.121
• Information gained by partitioning
along attribute A2 is 0.121
An Illustrative Example
CSCI-B 659 | Decision Trees
An Illustrative Example
CSCI-B 659 | Decision Trees
• Gain(S, Outlook) = 0.246
• Gain(S, Humidity) = 0.151
• Gain(S, Wind) = 0.048
• Gain(S, Temperature) = 0.029
• Since Outlook attribute provides the
best prediction of the target attribute,
PlayTennis, it is selected as the
decision attribute for the root node, and
branches are created with its possible
values (i.e., Sunny, Overcast, and
Rain).
An Illustrative Example
CSCI-B 659 | Decision Trees
• Ssunny = {D1,D2,D8,D9,D11}
• Gain (Ssunny , Humidity)
= .970 - (3/5) 0.0 - (2/5) 0.0
= .970
• Gain (S sunny , Temperature)
= .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0
= .570
• Gain (S sunny , Wind)
= .970 - (2/5) 1.0 - (3/5) .918
= .019
Inductive Bias in ID3
• Inductive bias is the set of assumptions that along with
the training data justify the classifications assigned by
the learner to future instances.
• Given H as the power set of instances X
• ID3 has preference of short trees with high information
gain attributes near the root.
• ID3 has preference for certain hypotheses over others,
with no hard restriction on the hypotheses space H.
CSCI-B 659 | Decision Trees
Occam’s Razor
• Prefer the simplest hypothesis that fits the data
• Argument in favor -
• A short hypothesis that fits data unlikely to be a coincidence
• A long hypothesis that first data might be a coincidence
• Argument Opposed –
• There are many ways to define small sets of hypotheses
• Two different hypotheses from the same training examples
possible when applied by two learners that perceive these
examples in terms of different internal representations
CSCI-B 659 | Decision Trees
Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values
CSCI-B 659 | Decision Trees
Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) < errortrain(h’)
AND
• errorD(h) > errorD(h’)
CSCI-B 659 | Decision Trees
Overfitting in decision tree learning
CSCI-B 659 | Decision Trees
Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data
CSCI-B 659 | Decision Trees
Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy
CSCI-B 659 | Decision Trees
Effect of Reduced-Error Pruning
CSCI-B 659 | Decision Trees
Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use
CSCI-B 659 | Decision Trees
Converting a tree to rules
CSCI-B 659 | Decision Trees
IF (Outlook = Sunny) ∧ (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) ∧ (Humidity = Normal)
THEN PlayTennis = Yes
Continuous Valued-Attributes
• Create a discrete-valued attribute to test continuous
• So if Temperature = 75
• We can infer that PlayTennis = Yes
CSCI-B 659 | Decision Trees
Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi
CSCI-B 659 | Decision Trees
Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.
CSCI-B 659 | Decision Trees
Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree
CSCI-B 659 | Decision Trees
Some of the latest Applications
CSCI-B 659 | Decision Trees
Gesture Recognition
Motion Detection
Xbox 360 Kinect
Thank You
CSCI-B 659 | Decision Trees
References
• Mitchell, Tom M. "Decision Tree Learning." In Machine
Learning. New York: McGraw-Hill, 1997.
• Flach, Peter A. "Tree Models: Decision Trees."
In Machine Learning: The Art and Science of Algorithms
That Make Sense of Data. Cambridge: Cambridge
University Press, 2012.
CSCI-B 659 | Decision Trees

Mais conteúdo relacionado

Mais procurados

Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
butest
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 

Mais procurados (20)

Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision tree
Decision treeDecision tree
Decision tree
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Decision tree
Decision treeDecision tree
Decision tree
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Decision trees
Decision treesDecision trees
Decision trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random forest
Random forestRandom forest
Random forest
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 

Destaque

Decision tree powerpoint presentation templates
Decision tree powerpoint presentation templatesDecision tree powerpoint presentation templates
Decision tree powerpoint presentation templates
SlideTeam.net
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
Laila Fatehy
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 

Destaque (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree powerpoint presentation templates
Decision tree powerpoint presentation templatesDecision tree powerpoint presentation templates
Decision tree powerpoint presentation templates
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSciDecision Tree- M.B.A -DecSci
Decision Tree- M.B.A -DecSci
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Decision Tree
Decision TreeDecision Tree
Decision Tree
 
Adding Tree and Tree
Adding Tree and TreeAdding Tree and Tree
Adding Tree and Tree
 
Risk Ana
Risk AnaRisk Ana
Risk Ana
 
Decision Tree
Decision TreeDecision Tree
Decision Tree
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Turing Test
Turing TestTuring Test
Turing Test
 

Semelhante a Decision Tree Learning

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
CHIRAGGOWDA41
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
Dongseo University
 

Semelhante a Decision Tree Learning (20)

MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Machine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptxMachine learning decision tree AIML ML Lecture 7.pptx
Machine learning decision tree AIML ML Lecture 7.pptx
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
machine_learning.pptx
machine_learning.pptxmachine_learning.pptx
machine_learning.pptx
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
Decision tree
Decision treeDecision tree
Decision tree
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
KNN
KNNKNN
KNN
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
Decision tree
Decision treeDecision tree
Decision tree
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
 
c23_ml1.ppt
c23_ml1.pptc23_ml1.ppt
c23_ml1.ppt
 

Mais de Milind Gokhale

Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and Design
Milind Gokhale
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User Stories
Milind Gokhale
 

Mais de Milind Gokhale (20)

Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Sprint Plan1
Sprint Plan1Sprint Plan1
Sprint Plan1
 
Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and Design
 
Market Survey Report
Market Survey ReportMarket Survey Report
Market Survey Report
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User Stories
 
Visualforce
VisualforceVisualforce
Visualforce
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRS
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design Document
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Android games analysis final presentation
Android games analysis final presentationAndroid games analysis final presentation
Android games analysis final presentation
 
Android gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalAndroid gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinal
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory Sorting
 
One sample runs test
One sample runs testOne sample runs test
One sample runs test
 
Building effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportBuilding effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project report
 
Building effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationBuilding effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - Presentation
 
Internet marketing report
Internet marketing reportInternet marketing report
Internet marketing report
 
Internet marketing
Internet marketingInternet marketing
Internet marketing
 
Indian it industry
Indian it industryIndian it industry
Indian it industry
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Último (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Decision Tree Learning

  • 1. CSCI-B 659 Decision Trees Milind Gokhale February 4, 2015
  • 2. Decision Tree Learning Agenda • Decision Tree Representation • ID3 Learning Algorithm • Entropy, Information Gain • An Illustrative Example • Issues in Decision Tree Learning CSCI-B 659 | Decision Trees
  • 3. Decision Tree Learning • It is a method of approximating discrete-valued functions that is robust to noisy data and capable of learning disjunctive expressions • The learned function is represented by a decision tree. • Disjunctive Expressions – (A ∧ B ∧ C) ∨ (D ∧ E ∧ F) CSCI-B 659 | Decision Trees
  • 4. Decision Tree Representation • Each internal node tests an attribute • Each branch corresponds to an attribute value • Each leaf node assigns a classification PlayTennis: This decision tree classifies Saturday mornings according to whether or not they are suitable for playing tennis CSCI-B 659 | Decision Trees
  • 5. Decision Tree Representation - Classification • An example is classified by sorting it through the tree from the root to the leaf node • Example – (Outlook = Sunny, Humidity = High) => (PlayTennis = No) PlayTennis: This decision tree classifies Saturday mornings according to whether or not they are suitable for playing tennis CSCI-B 659 | Decision Trees
  • 6. Appropriate problems for decision tree learning • Instances describable by attribute-value pairs • Target function is discrete valued • Disjunctive hypothesis may be required • Possibly noisy data • Training data may contain errors • Training data may contain missing attribute values • Examples – Classification Problems 1. Equipment or medical diagnosis 2. Credit risk analysis CSCI-B 659 | Decision Trees
  • 7. Basic ID3 Learning Algorithm approach • Top-down construction of the tree, beginning with the question "which attribute should be tested at the root of the tree?' • Each instance attribute is evaluated using a statistical test to determine how well it alone classifies the training examples. • The best attribute is selected and used as the test at the root node of the tree. • A descendant of the root node is then created for each possible value of this attribute. • The training examples are sorted to the appropriate descendant node • The entire process is then repeated at the descendant node using the training examples associated with each descendant node • GREEDY Approach • No Backtracking - So we may get a suboptimal solution. CSCI-B 659 | Decision Trees
  • 8. ID3 Algorithm CSCI-B 659 | Decision Trees
  • 9. Top-Down induction of decision trees 1. Find A = the best decision attribute for next node 2. Assign A as decision attribute for node 3. For each value of A create new descendants of node 4. Sort the training examples to the leaf node. 5. If training examples classified perfectly, STOP else iterate over the new leaf nodes. CSCI-B 659 | Decision Trees
  • 10. Which attribute is the best classifier? • Information Gain – A statistical property that measures how well a given attribute separates the training examples according to their target classification. • This measure is used to select among the candidate attributes at each step while growing the tree. CSCI-B 659 | Decision Trees
  • 11. Entropy • S is a sample of training examples • is the proportion of positive examples in S • is the proportion of negative examples in S • Then the entropy measures the impurity of S: • But If the target attribute can take c different values: CSCI-B 659 | Decision Trees The entropy varies between 0 and 1. if all the members belong to the same class => The entropy = 0 if there are equal number of positive and negative examples => The entropy = 1
  • 12. Entropy - Example • Entropy([29+, 35-]) = - (29/64) log2(29/64) - (35/64) log2(35/64) = 0.994 CSCI-B 659 | Decision Trees
  • 13. Information Gain • Gain(S,A) = expected reduction in entropy due to sorting on A • Here sv is simply the sum of the entropies of each subset, weighted by the fraction of examples |sv/s| that belong to sv CSCI-B 659 | Decision Trees
  • 14. Information Gain - Example • Gain(S,A1) = 0.994 – (26/64)*.706 – (38/64)*.742 = 0.266 • Information gained by partitioning along attribute A1 is 0.266 CSCI-B 659 | Decision Trees E=0.706 E=0.742 E=0.994 E=0.994 E=0.937 E=0.619 • Gain(S,A2) = 0.994 – (51/64)*.937 – (13/64)*.619 = 0.121 • Information gained by partitioning along attribute A2 is 0.121
  • 15. An Illustrative Example CSCI-B 659 | Decision Trees
  • 16. An Illustrative Example CSCI-B 659 | Decision Trees • Gain(S, Outlook) = 0.246 • Gain(S, Humidity) = 0.151 • Gain(S, Wind) = 0.048 • Gain(S, Temperature) = 0.029 • Since Outlook attribute provides the best prediction of the target attribute, PlayTennis, it is selected as the decision attribute for the root node, and branches are created with its possible values (i.e., Sunny, Overcast, and Rain).
  • 17. An Illustrative Example CSCI-B 659 | Decision Trees • Ssunny = {D1,D2,D8,D9,D11} • Gain (Ssunny , Humidity) = .970 - (3/5) 0.0 - (2/5) 0.0 = .970 • Gain (S sunny , Temperature) = .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0 = .570 • Gain (S sunny , Wind) = .970 - (2/5) 1.0 - (3/5) .918 = .019
  • 18. Inductive Bias in ID3 • Inductive bias is the set of assumptions that along with the training data justify the classifications assigned by the learner to future instances. • Given H as the power set of instances X • ID3 has preference of short trees with high information gain attributes near the root. • ID3 has preference for certain hypotheses over others, with no hard restriction on the hypotheses space H. CSCI-B 659 | Decision Trees
  • 19. Occam’s Razor • Prefer the simplest hypothesis that fits the data • Argument in favor - • A short hypothesis that fits data unlikely to be a coincidence • A long hypothesis that first data might be a coincidence • Argument Opposed – • There are many ways to define small sets of hypotheses • Two different hypotheses from the same training examples possible when applied by two learners that perceive these examples in terms of different internal representations CSCI-B 659 | Decision Trees
  • 20. Issues in Decision Tree Learning • Overfitting • Incorporating Continuous-valued attributes • Attributes with many values • Handling attributes with costs • Handling examples with missing attribute values CSCI-B 659 | Decision Trees
  • 21. Overfitting • Consider a hypothesis h over • Training data: errortrain(h) • Entire distribution D of data: errorD(h) • The hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that • errortrain(h) < errortrain(h’) AND • errorD(h) > errorD(h’) CSCI-B 659 | Decision Trees
  • 22. Overfitting in decision tree learning CSCI-B 659 | Decision Trees
  • 23. Avoiding Overfitting • Causes 1. This can happen when the training data contains errors or noise. 2. small numbers of examples are associated with leaf nodes • Avoiding Overfitting 1. Stop growing when data split not statistically significant 2. Grow full tree, then post-prune it. • Selecting Best Tree 1. Measure performance over training data 2. Measure performance over separate validation data CSCI-B 659 | Decision Trees
  • 24. Reduced-Error Pruning • Split data into training and validation sets • Do until further pruning is harmful 1. Evaluate impact of pruning each possible node on validation set 2. Greedily remove the one that most improves the validation set accuracy CSCI-B 659 | Decision Trees
  • 25. Effect of Reduced-Error Pruning CSCI-B 659 | Decision Trees
  • 26. Rule Post-Pruning • The major drawback of Reduced-Error Pruning is when the data is limited, validation set reduces even further the number of examples for training. Hence Rule Post-Pruning • Convert tree to equivalent set of rules • Prune each rule independently of others • Sort final rules into desired sequence for use CSCI-B 659 | Decision Trees
  • 27. Converting a tree to rules CSCI-B 659 | Decision Trees IF (Outlook = Sunny) ∧ (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) ∧ (Humidity = Normal) THEN PlayTennis = Yes
  • 28. Continuous Valued-Attributes • Create a discrete-valued attribute to test continuous • So if Temperature = 75 • We can infer that PlayTennis = Yes CSCI-B 659 | Decision Trees
  • 29. Attributes with many values • Problem: • If attribute has many values, Gain will select any value • Example – Using date attribute • One approach – Gain Ratio Where si is a subset of S which has value vi CSCI-B 659 | Decision Trees
  • 30. Attributes with costs • Problem: • Medical diagnosis, BloodTest has cost $150 • Robotics, Width_from_1ft has cost 23 sec • One Approach - replace gain • Tan and Schlimmer (1990) • Nunez (1988) • where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information gain. CSCI-B 659 | Decision Trees
  • 31. Examples with missing attribute values • What if some examples missing values of attribute A? • Use training examples anyway and sort through tree • If node n tests A, Assign it the most common value among the examples at node n • Assign a probability pi to each possible value of A – vi and assign fraction pi of example to each descendant in tree CSCI-B 659 | Decision Trees
  • 32. Some of the latest Applications CSCI-B 659 | Decision Trees Gesture Recognition Motion Detection Xbox 360 Kinect
  • 33. Thank You CSCI-B 659 | Decision Trees
  • 34. References • Mitchell, Tom M. "Decision Tree Learning." In Machine Learning. New York: McGraw-Hill, 1997. • Flach, Peter A. "Tree Models: Decision Trees." In Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge: Cambridge University Press, 2012. CSCI-B 659 | Decision Trees

Notas do Editor

  1. We will see what is decision tree learning Then decision tree representation ID3 learning algorithm Concepts like – Entropy, information gain which help to improve the inductive bias of the decision trees. Overfitting problem which can occur in the decision trees and ways to solve the problem.
  2. Decision Tree learning is a method of approximating discrete-values functions that is robust to noisy data and capable of learning disjunctive expressions. Disjunctive expressions are basically expressions which are Disjunctions of Conjunctions.
  3. Here each internal node in playTennis like Outlook humidity and wind test an attribute namely outlook humidity and wind. While the values of the attributes like outlook being sunny, overcast or Rain corresponds to an attribute value. Similarly Humidity as high, normal and Wind as Strong/Weak.
  4. An instance is classified by starting at the root node of the tree and testing the attribute specified by this node and then moving down the tree branch corresponding to the value of the attribute in the given example. And then repeat the process for the sub-tree rooted at this node. Example – (Outlook = Sunny, Humidity = High) will be sorted down the tree to the left most leaf node and hence be classified as negative instance = (PlayTennis = No)
  5. Instances are represented by attribute-value pairs The target function has discrete values Disjunctive expressions Training data may contain errors Training data may contain missing attribute values Example as given on slide. 1. Problems such as learning to classify medical patients by their diseases or symptoms 2. equipment malfunction by their cause. 3. Classification of loan applicants by their likelihood of defaulting on payments.
  6. learns decision trees by constructing them top-down, beginning with the question "which attribute should be tested at the root of the tree?' each instance attribute is evaluated using a statistical test to determine how well it alone classifies the training examples. i.e. attribute which is most useful for classifying examples. The best attribute is selected and used as the test at the root node of the tree. A descendant of the root node is then created for each possible value of this attribute, and the training examples are sorted to the appropriate descendant node The entire process is then repeated using the training examples associated with each descendant node to select the best attribute to test at that point in the tree. GREEDY Approach No Backtracking - So we may get a suboptimal solution.
  7. Find the attribute A which is the best decision attribute for next node. Assign A as the decision attribute for node. Thus making an internal node. Now each of the values of A becomes a sub-branch of the tree i.e. descendants of the internal node. Then sort the examples as the leaf nodes across these newly created branches. If the training examples are classified completely, then we may stop otherwise iterate over the leaf nodes to again find the best attribute to classify the examples further.
  8. The central choice in the ID3 algorithm is selecting which attribute to test at each node in the tree. We would like to select the attribute that is most useful for classifying examples. What is a good quantitative measure of the worth of an attribute? We will define a statistical property, called information gain, that measures how well a given attribute separates the training examples according to their target classification. ID3 uses this information gain measure to select among the candidate attributes at each step while growing the tree.
  9. In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy, that characterizes the (im)purity of an arbitrary collection of examples.
  10. information gain, is simply the expected reduction in entropy caused by partitioning the examples according to this attribute.
  11. Since Outlook attribute provides the best prediction of the target attribute, PlayTennis, over the training examples. Outlook is selected as the decision attribute for the root node, and branches are created below the root for each of its possible values (i.e., Sunny, Overcast, and Rain).
  12. Since Outlook attribute provides the best prediction of the target attribute, PlayTennis, over the training examples. Outlook is selected as the decision attribute for the root node, and branches are created below the root for each of its possible values (i.e., Sunny, Overcast, and Rain).
  13. Prefer the simplest hypothesis that fits the data Argument in favor - it is less likely that one will find a short hypothesis that coincidentally fits the training data. In contrast there are often many very complex hypotheses that fit the current training data but fail to generalize correctly to subsequent data Argument Opposed – There are many ways to define small sets of hypotheses Occam's razor will produce two different hypotheses from the same training examples when it is applied by two learners that perceive these examples in terms of different internal representations
  14. Definition: Given a hypothesis space H, a hypothesis h belongs to H is said to overfit the training data if there exists some alternative hypothesis h' belongs to H, such that h has smaller error than h' over the training examples, but h' has a smaller error than h over the entire distribution of instances.
  15. Predictably, the accuracy of the tree over the training examples increases monotonically as the tree is grown. However, the accuracy measured over the independent test examples first increases, then decreases.
  16. Causes This can happen when the training data contains errors or noise. Overfitting is possible even when the training data are noise-free, especially when small numbers of examples are associated with leaf nodes Avoiding Overfitting The approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data. - Difficult to estimate precisely when to stop growing the tree. The approaches that allow the tree to overfit the data, then post-prune the tree. What criterion is to be used to determine the final correct tree size? Use a separate set of examples, distinct from the training examples, to evaluate the utility of post-pruning nodes from the tree. training set, which is used to form the learned hypothesis, and a separate validation set, which is used to evaluate the accuracy of this hypothesis over subsequent data and, in particular, to evaluate the impact of pruning this hypothesis.
  17. Reduced Error Pruning: to consider each of the decision nodes in the tree to be candidates for pruning. Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node. Nodes are removed only if the resulting pruned tree performs no worse than the original tree over the validation set. Thus any leaf node added due to coincidental regularities in the training set is likely to be pruned because these same coincidences are unlikely to occur in the validation set. Nodes are pruned iteratively, always choosing the node whose removal most increases the decision tree accuracy over the validation set. And this continues until further pruning is harmful for the decision tree. The major drawback of this approach is when the data is limited, validation set reduces even further the number of examples for training.
  18. Infer the decision tree from the training data available and allowing to grow as far as overfitting to occur. Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root to the leaf. Prune (generalize) each rule by removing any preconditions that result in improving the its estimated accuracy. Sort the pruned rules by their accuracy and consider them in this sequence when classifying subsequent sequences.
  19. In rule postpruning, one rule is generated for each leaf node in the tree. Each attribute test along the path from the root to the leaf becomes a rule antecedent (precondition) and the classification at the leaf node becomes the rule consequent (postcondition). For example, the leftmost path of the tree in Figure 3.1 is translated into the rule IF (Outlook = Sunny) A (Humidity = High) THEN PlayTennis = No
  20. two candidate thresholds, corresponding to the values of Temperature at which the value of PlayTennis changes: (48 + 60)/2, and (80 + 90)/2. The information gain can then be computed for each of the candidate attributes, temperature>54 and temperature>85, the best can be selected (temperature>54)
  21. In some learning tasks the instance attributes may have associated costs. For example, in learning to classify medical diseases we might describe patients in terms of attributes such as Temperature, BiopsyResult, Pulse, BloodTestResults, etc. These attributes vary significantly in their costs, both in terms of monetary cost and cost to patient comfort. In such tasks, we would prefer decision trees that use low-cost attributes where possible, relying on high-cost attributes only when needed to produce reliable classifications. In such cases we can modify the ID3 algorithm to take Gain formula differently as given.
  22. the pose recognition algorithm in the Kinect motion sensing device for the Xbox game console has decision tree classifiers at its heart (in fact, an ensemble of decision trees called a random forest about which you will learn more in Chapter 11)