SlideShare uma empresa Scribd logo
1 de 22
DECISION TREE
3/31/2020 Shivani Saluja 1
INTRODUCTION
• Decision Trees are a type of Supervised Machine Learning
• Decision Tree Analysis is a general, predictive modelling tool
• Data is continuously split according to a certain parameter
• Decision trees are constructed via an algorithmic approach that identifies ways to split a data set
based on different conditions.
3/31/2020 Shivani Saluja 2
RULES
• The goal is to create a model that predicts the value of a target variable by learning simple
decision rules inferred from the data features.
• The decision rules are generally in form of if-then-else statements.
• Deeper the tree, the more complex the rules and fitter the model.
3/31/2020 Shivani Saluja 3
TERMINOLOGIES
• Root Node: It represents entire population or sample and this further gets divided into
two or more homogeneous sets.
• Splitting: It is a process of dividing a node into two or more sub-nodes.
• Decision Node: When a sub-node splits into further sub-nodes, then it is called decision
node.
• Leaf/ Terminal Node: Nodes with no children (no further split) is called Leaf or Terminal
node.
• Pruning: When we reduce the size of decision trees by removing nodes (opposite of
Splitting), the process is called pruning.
• Branch / Sub-Tree: A sub section of decision tree is called branch or sub-tree.
• Parent and Child Node: A node, which is divided into sub-nodes is called parent node of
sub-nodes where as sub-nodes are the child of parent node.
3/31/2020 Shivani Saluja 4
ENTITIES
• Decision nodes :Decision nodes are where the data is split.
• Leaves: The leaves are the decisions or the final outcomes.
3/31/2020 Shivani Saluja 5
TYPES OF DECISION TREES
Classification trees (Yes/No types)
• What we’ve seen above is an example of
classification tree, where the outcome was
variable like ‘fit’ or ‘unfit’. Here the decision
variable is Categorical.
Regression trees (Continuous data types)
• Here the decision or the outcome variable
is Continuous, e.g. a number like 123.
3/31/2020 Shivani Saluja 6
EXPRESSIVENESS OF DECISION TREES
• Decision trees can represent any boolean function of the input attributes
• Decision trees to perform the function of :AND, OR
3/31/2020 Shivani Saluja 7
DECISION TREE FOR OR
3/31/2020 Shivani Saluja 8
SELECT THE BEST ATTRIBUTE → A
• Best attribute in terms of which attribute has the most information gain
• a measure that expresses how well an attribute splits that data into groups based on
classification.
• ID3 is a greedy algorithm that grows the tree top-down, at each node selecting the
attribute that best classifies the local training examples. This process continues until
the tree perfectly classifies the training examples or until all attributes have been used.
3/31/2020 Shivani Saluja 9
ENTROPY
• Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure
of the amount of uncertainty or randomness in data.
• It tells us about the predictability of a certain event.
• lower values imply less uncertainty
• while higher values imply high uncertainty.
• If the sample is completely homogeneous the entropy is zero and if the sample is equally
divided then it has entropy of one.
•
3/31/2020 Shivani Saluja 10
INFORMATION GAIN
• Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a
set S is the effective change in entropy after deciding on a particular attribute A.
• It measures the relative change in entropy with respect to the independent variables.
3/31/2020 Shivani Saluja 11
where IG(S, A) is the information gain by applying
feature A. H(S) is the Entropy of the entire set,
while the second term calculates the Entropy after
applying the feature A, where P(x) is the
probability of event x.
DECISION TREE LEARNING
ALGORITHM (ID3)
• Builds decision trees using a top-down, greedy approach
• Select the best attribute → A
• Assign A as the decision attribute (test case) for the NODE.
• For each value of A, create a new descendant of the NODE. –
• Sort the training examples to the appropriate descendant node leaf.
• If examples are perfectly classified, then STOP else iterate over the new leaf nodes.
3/31/2020 Shivani Saluja 12
EXAMPLE
• Consider a piece of data collected over the course of 14 days where the features are
Outlook, Temperature, Humidity, Wind and the outcome variable is whether Golf was
played on the day. Now, our job is to build a predictive model which takes in above 4
parameters and predicts whether Golf will be played on the day. We’ll build a decision
tree to do that using ID3 algorithm.
3/31/2020 Shivani Saluja 13
EXAMPLE
3/31/2020 Shivani Saluja 14
3/31/2020 Shivani Saluja 15
3/31/2020 Shivani Saluja 16
3/31/2020 Shivani Saluja 17
3/31/2020 Shivani Saluja 18
3/31/2020 Shivani Saluja 19
3/31/2020 Shivani Saluja 20
3/31/2020 Shivani Saluja 21
3/31/2020 Shivani Saluja 22

Mais conteúdo relacionado

Mais procurados

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 

Mais procurados (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Semelhante a Decision tree

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
monicafrancis71118
 

Semelhante a Decision tree (20)

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
 
Data discretization
Data discretizationData discretization
Data discretization
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Decision tree
Decision tree Decision tree
Decision tree
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 

Mais de shivani saluja (6)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Regression
RegressionRegression
Regression
 
supervised and unsupervised learning
supervised and unsupervised learningsupervised and unsupervised learning
supervised and unsupervised learning
 
Bayes and naive bayes
Bayes and naive bayesBayes and naive bayes
Bayes and naive bayes
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Prolog basics
Prolog basicsProlog basics
Prolog basics
 

Último

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 

Decision tree

  • 2. INTRODUCTION • Decision Trees are a type of Supervised Machine Learning • Decision Tree Analysis is a general, predictive modelling tool • Data is continuously split according to a certain parameter • Decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. 3/31/2020 Shivani Saluja 2
  • 3. RULES • The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. • The decision rules are generally in form of if-then-else statements. • Deeper the tree, the more complex the rules and fitter the model. 3/31/2020 Shivani Saluja 3
  • 4. TERMINOLOGIES • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. • Splitting: It is a process of dividing a node into two or more sub-nodes. • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node. • Leaf/ Terminal Node: Nodes with no children (no further split) is called Leaf or Terminal node. • Pruning: When we reduce the size of decision trees by removing nodes (opposite of Splitting), the process is called pruning. • Branch / Sub-Tree: A sub section of decision tree is called branch or sub-tree. • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node. 3/31/2020 Shivani Saluja 4
  • 5. ENTITIES • Decision nodes :Decision nodes are where the data is split. • Leaves: The leaves are the decisions or the final outcomes. 3/31/2020 Shivani Saluja 5
  • 6. TYPES OF DECISION TREES Classification trees (Yes/No types) • What we’ve seen above is an example of classification tree, where the outcome was variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical. Regression trees (Continuous data types) • Here the decision or the outcome variable is Continuous, e.g. a number like 123. 3/31/2020 Shivani Saluja 6
  • 7. EXPRESSIVENESS OF DECISION TREES • Decision trees can represent any boolean function of the input attributes • Decision trees to perform the function of :AND, OR 3/31/2020 Shivani Saluja 7
  • 8. DECISION TREE FOR OR 3/31/2020 Shivani Saluja 8
  • 9. SELECT THE BEST ATTRIBUTE → A • Best attribute in terms of which attribute has the most information gain • a measure that expresses how well an attribute splits that data into groups based on classification. • ID3 is a greedy algorithm that grows the tree top-down, at each node selecting the attribute that best classifies the local training examples. This process continues until the tree perfectly classifies the training examples or until all attributes have been used. 3/31/2020 Shivani Saluja 9
  • 10. ENTROPY • Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data. • It tells us about the predictability of a certain event. • lower values imply less uncertainty • while higher values imply high uncertainty. • If the sample is completely homogeneous the entropy is zero and if the sample is equally divided then it has entropy of one. • 3/31/2020 Shivani Saluja 10
  • 11. INFORMATION GAIN • Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy after deciding on a particular attribute A. • It measures the relative change in entropy with respect to the independent variables. 3/31/2020 Shivani Saluja 11 where IG(S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire set, while the second term calculates the Entropy after applying the feature A, where P(x) is the probability of event x.
  • 12. DECISION TREE LEARNING ALGORITHM (ID3) • Builds decision trees using a top-down, greedy approach • Select the best attribute → A • Assign A as the decision attribute (test case) for the NODE. • For each value of A, create a new descendant of the NODE. – • Sort the training examples to the appropriate descendant node leaf. • If examples are perfectly classified, then STOP else iterate over the new leaf nodes. 3/31/2020 Shivani Saluja 12
  • 13. EXAMPLE • Consider a piece of data collected over the course of 14 days where the features are Outlook, Temperature, Humidity, Wind and the outcome variable is whether Golf was played on the day. Now, our job is to build a predictive model which takes in above 4 parameters and predicts whether Golf will be played on the day. We’ll build a decision tree to do that using ID3 algorithm. 3/31/2020 Shivani Saluja 13

Notas do Editor

  1. An example of a decision tree can be explained using above binary tree. Let’s say you want to predict whether a person is fit given their information like age, eating habit, and physical activity, etc. The decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this was a binary classification problem (a yes no type problem).
  2. . Example, consider a coin toss whose probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest possible, since there’s no way of determining what the outcome might be. Alternatively, consider a coin which has heads on both the sides, the entropy of such an event can be predicted perfectly since we know beforehand that it’ll always be heads. In other words, this event has no randomness hence it’s entropy is zero