2. INTRODUCTION
• Decision Trees are a type of Supervised Machine Learning
• Decision Tree Analysis is a general, predictive modelling tool
• Data is continuously split according to a certain parameter
• Decision trees are constructed via an algorithmic approach that identifies ways to split a data set
based on different conditions.
3/31/2020 Shivani Saluja 2
3. RULES
• The goal is to create a model that predicts the value of a target variable by learning simple
decision rules inferred from the data features.
• The decision rules are generally in form of if-then-else statements.
• Deeper the tree, the more complex the rules and fitter the model.
3/31/2020 Shivani Saluja 3
4. TERMINOLOGIES
• Root Node: It represents entire population or sample and this further gets divided into
two or more homogeneous sets.
• Splitting: It is a process of dividing a node into two or more sub-nodes.
• Decision Node: When a sub-node splits into further sub-nodes, then it is called decision
node.
• Leaf/ Terminal Node: Nodes with no children (no further split) is called Leaf or Terminal
node.
• Pruning: When we reduce the size of decision trees by removing nodes (opposite of
Splitting), the process is called pruning.
• Branch / Sub-Tree: A sub section of decision tree is called branch or sub-tree.
• Parent and Child Node: A node, which is divided into sub-nodes is called parent node of
sub-nodes where as sub-nodes are the child of parent node.
3/31/2020 Shivani Saluja 4
5. ENTITIES
• Decision nodes :Decision nodes are where the data is split.
• Leaves: The leaves are the decisions or the final outcomes.
3/31/2020 Shivani Saluja 5
6. TYPES OF DECISION TREES
Classification trees (Yes/No types)
• What we’ve seen above is an example of
classification tree, where the outcome was
variable like ‘fit’ or ‘unfit’. Here the decision
variable is Categorical.
Regression trees (Continuous data types)
• Here the decision or the outcome variable
is Continuous, e.g. a number like 123.
3/31/2020 Shivani Saluja 6
7. EXPRESSIVENESS OF DECISION TREES
• Decision trees can represent any boolean function of the input attributes
• Decision trees to perform the function of :AND, OR
3/31/2020 Shivani Saluja 7
9. SELECT THE BEST ATTRIBUTE → A
• Best attribute in terms of which attribute has the most information gain
• a measure that expresses how well an attribute splits that data into groups based on
classification.
• ID3 is a greedy algorithm that grows the tree top-down, at each node selecting the
attribute that best classifies the local training examples. This process continues until
the tree perfectly classifies the training examples or until all attributes have been used.
3/31/2020 Shivani Saluja 9
10. ENTROPY
• Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure
of the amount of uncertainty or randomness in data.
• It tells us about the predictability of a certain event.
• lower values imply less uncertainty
• while higher values imply high uncertainty.
• If the sample is completely homogeneous the entropy is zero and if the sample is equally
divided then it has entropy of one.
•
3/31/2020 Shivani Saluja 10
11. INFORMATION GAIN
• Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a
set S is the effective change in entropy after deciding on a particular attribute A.
• It measures the relative change in entropy with respect to the independent variables.
3/31/2020 Shivani Saluja 11
where IG(S, A) is the information gain by applying
feature A. H(S) is the Entropy of the entire set,
while the second term calculates the Entropy after
applying the feature A, where P(x) is the
probability of event x.
12. DECISION TREE LEARNING
ALGORITHM (ID3)
• Builds decision trees using a top-down, greedy approach
• Select the best attribute → A
• Assign A as the decision attribute (test case) for the NODE.
• For each value of A, create a new descendant of the NODE. –
• Sort the training examples to the appropriate descendant node leaf.
• If examples are perfectly classified, then STOP else iterate over the new leaf nodes.
3/31/2020 Shivani Saluja 12
13. EXAMPLE
• Consider a piece of data collected over the course of 14 days where the features are
Outlook, Temperature, Humidity, Wind and the outcome variable is whether Golf was
played on the day. Now, our job is to build a predictive model which takes in above 4
parameters and predicts whether Golf will be played on the day. We’ll build a decision
tree to do that using ID3 algorithm.
3/31/2020 Shivani Saluja 13
An example of a decision tree can be explained using above binary tree. Let’s say you want to predict whether a person is fit given their information like age, eating habit, and physical activity, etc. The decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this was a binary classification problem (a yes no type problem).
. Example, consider a coin toss whose probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest possible, since there’s no way of determining what the outcome might be. Alternatively, consider a coin which has heads on both the sides, the entropy of such an event can be predicted perfectly since we know beforehand that it’ll always be heads. In other words, this event has no randomness hence it’s entropy is zero