SlideShare uma empresa Scribd logo
1 de 51
ID3 Algorithm &
 ROC Analysis
     Talha KABAKUŞ
talha.kabakus@ibu.edu.tr
Agenda
●   Where are we now?
●   Decision Trees
●   What is ID3?
●   Entropy
●   Information Gain
●   Pros and Cons of ID3
●   An Example - The Simpsons
●   What is ROC Analysis?
●   ROC Space
●   ROC Space Example over predictions
Where are we now?
Decision Trees
● One of the most used classification approach because of
  its clear model and presentation
● Classification by using data attributes
● Aim is to reaching estimating destination field
  value using source fields
● Tree Induction
  ○ Create tree
  ○ Apply data into tree to classify
● Each branch node represents a choice between a
  number of alternatives
● Each leaf node represents a classification or decision
● Leaf Count = Rule Count
Decision Trees (Cont.)
● Leafs are inserted through top to bottom

                    A

           B                     C



   D            E          F           G
Sample Decision Tree
Creating Tree Model by Training Data
Decision Tree Classification Task
Apply Model to Test Data
Apply Model to Test Data (Cont.)
Apply Model to Test Data (Cont.)
Apply Model to Test Data (Cont.)
Apply Model to Test Data (Cont.)
Apply Model to Test Data (Cont.)
Decision Tree Algorithms
● Classification and Regression
  Algorithms
  ○ Twoig
  ○ Gini
● Entropy-based Algorithms
  ○ ID3
  ○ C4.5
● Memory-based (Sample-based)
  Classification Algorithms
Decision Trees by Variable Type

● Single Variable Decision Trees
  ○ Classifications are done with asking
     questions over only one variable
● Hybrid Decision Trees
  ○ Classifications are done with asking
     questions over both single and multiple
     variables
● Multiple Variables Decision Trees
  ○ Classifications are done with asking
     questions over multiple variables
ID3 Algorithm
●   Iterative Dichotomizer 3
●   Developed by J. Ross Quinlan in 1979
●   Based on Entropy
●   Only works for discrete data
●   Can not work with defective data
●   Advantage over Hunt's algorithm is choosing
    the right attribute while classification.
    (Hunt's algorithm chooses randomly)
Entropy
● A formula to calculate the homogeneity of a
  sample; gives idea about how much
  information gain provides each leaf
● A complete homogeneous sample
  entropy value is 0
● An equally divided sample entropy value is 1
● Formula:
Information Gain (IG)
● Information Gain calculates effective change
  in entropy after making a decision based on
  the value of an attribute.
● Which attribute creates the most
  homogeneous branches?
● First the entropy of the total dataset is
  calculated.
● The dataset is then split on the different
  attributes.
Information Gain (Cont.)
● The entropy for each branch is calculated.
  Then it is added proportionally, to get total
  entropy for the split.
● The resulting entropy is subtracted from the
  entropy before the split.
● The result is the Information Gain, or
  decrease in entropy.
● The attribute that yields the largest IG is
  chosen for the decision node.
Information Gain (Cont.)
● A branch set with entropy of 0 is a
  leaf node.
● Otherwise, the branch needs further
  splitting to classify its dataset.
● The ID3 algorithm is run recursively
  on the non-leaf branches, until all data
  is classified.
ID3 Algorithm Steps
function ID3 (R: a set of non-categorical attributes,
          C: the categorical attribute,
          S: a training set) returns a decision tree;
   begin
    If S is empty, return a single node with value Failure;
    If S consists of records all with the same value for
       the categorical attribute,
       return a single node with that value;
    If R is empty, then return a single node with as value
       the most frequent of the values of the categorical attribute
       that are found in records of S; [note that then there
       will be errors, that is, records that will be improperly
       classified];
    Let D be the attribute with largest Gain( D,S)
       among attributes in R;
    Let {dj| j=1,2, .., m} be the values of attribute D;
    Let {Sj| j=1,2, .., m} be the subsets of S consisting
       respectively of records with value dj for attribute D;
    Return a tree with root labeled D and arcs labeled
       d1, d2, .., dm going respectively to the trees

         ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);
   end ID3;
Pros of ID3 Algorithm
● Builds decision tree in min. steps
  ○ The most important point while tree
    induction is collecting enough reliable
    associated data over specific properties.
  ○ Asking right questions determines tree
    induction.
● Each level benefits from previous level
  choices
● Whole dataset is scanned to create tree
Cons of ID3 Algorithm
● Tree can not be updated when new
  data is classified incorrectly, instead
  a new tree must be generated.
● Only one attribute at a time is tested
  for making a decision.
● Can not work with defective data
● Can not work with numerical
  attributes
An Example - The Simpsons
  Person   Hair Length   Weight   Age   Class

  Homer        0''        250     36     M

  Marge        10''       150     34     F

  Bart         2''         90     10     M

  Lisa         6''         78      8     M

  Maggie       4''         20      1     F

  Abe          1''        170     70     F

  Selma        8''        160     41     F

  Otto         10''       180     38     M

  Krusty       6''        200     45     M
Information Gain over Hair Length



E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain



                                         Hair Length <= 5

              Yes                                                                    No




    E(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.9710   E(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5)
                                                                                               =0.8113


Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.9710 + 5/9 * 0.8113) = 0.0911
Information Gain over Weight


E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain



                                          Weight <= 160

                Yes                                                                     No




E(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219   E(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0


Gain(Weight<= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
Information Gain over Age


E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain



                                          Age <= 40

                Yes                                                                  No




E(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1   E(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3)= 0.9188



Gain(Age z= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Results
 Attribute             Information Gain (IG)
 Hair Length <= 5      0.0911
 Weight <= 160         0.5900
 Age <= 40             0.0183

● As seen in the results, weight is the best
  attribute to classify these group.
Constructed Decision Tree


                             Weight <= 160
    Yes                                             No




               Hair Length <= 5

    Yes                                      No




      Female                                 Male
Entropy over Nominal Values

● If an attribute has nominal values:
  ○ First calculate information gain for each attribute
    value
  ○ Then calculate attribute information gain
Example II




IG= -(5/15)log2(5/15)-(10/15)log2(10/15) = ~0.918
Example II (Cont.)
            Information Gain over Engine
 ● Engine: 6 small, 5 medium, 4 large
 ● 3 values for attribute engine, so we need 3 entropy
    calculations
 ● small: 5 no, 1 yes
    ○ IGsmall = -(5/6)log2(5/6)-(1/6)log2(1/6) = ~0.65
 ● medium: 3 no, 2 yes
    ○ IGmedium = -(3/5)log2(3/5)-(2/5)log2(2/5) = ~0.97
 ● large: 2 no, 2 yes
    ○ IGlarge = 1 (evenly distributed subset)
=> IGEngine = IE(S) – [(6/15)*IGsmall + (5/15)*IGmedium +
(4/15)*Ilarge]
    = IGEngine = 0.918 – 0.85 = 0.068
Example II (Cont.)
          Information Gain over SC/Turbo
● SC/Turbo: 4 yes, 11 no
● 2 values for attribute SC/Turbo, so we need 2 entropy
  calculations
● yes: 2 yes, 2 no
  ○ IGturbo = 1 (evenly distributed subset)
● no: 3 yes, 8 no
  ○ IGnoturbo = -(3/11)log2(3/11)-(8/11)log2(8/11) = ~0.84

  IGturbo = IE(S) – [(4/15)*IGturbo + (11/15)*IGnoturbo]
  IGturbo = 0.918 – 0.886 = 0.032
Example II (Cont.)
              Information Gain over Weight
● Weight: 6 Average, 4 Light, 5 Heavy
● 3 values for attribute weight, so we need 3 entropy
  calculations
● average: 3 no, 3 yes
   ○ IGaverage = 1 (evenly distributed subset)
● light: 3 no, 1 yes
   ○ IGlight = -(3/4)log2(3/4)-(1/4)log2(1/4) = ~0.81
● heavy: 4 no, 1 yes
   ○ IGheavy = -(4/5)log2(4/5)-(1/5)log2(1/5) = ~0.72

   IGWeight = IE(S) – [(6/15)*IGaverage + (4/15)*IGlight + (5/15)*IGheavy]
   IGWeight = 0.918 – 0.856 = 0.062
Example II (Cont.)
             Information Gain over Full Eco
● Fuel Economy: 2 good, 3 average, 10 bad
● 3 values for attribute Fuel Eco, so we need 3 entropy
  calculations
● good: 0 yes, 2 no
  ○ IGgood = 0 (no variability)
● average: 0 yes, 3 no
  ○ IGaverage = 0 (no variability)
● bad: 5 yes, 5 no
  ○ IGbad = 1 (evenly distributed subset)
    We can omit calculations for good and average since they always
end up not fast.
    IGFuelEco = IE(S) – [(10/15)*IGbad]
    IGFuelEco = 0.918 – 0.667 = 0.251
Example II (Cont.)
●   Results:
    IGEngine    0.068
                        ■   Root of the tree
    IGturbo     0.032

    IGWeight    0.062

    IGFuelEco   0.251
Example II (Cont.)
●   Since we selected the Fuel Eco attribute for our Root Node, it
    is removed from the table for future calculations.




      General Information Gain = 1 (Evenly distributed set)
Example II (Cont.)
              Information Gain over Engine
● Engine: 1 small, 5 medium, 4 large
● 3 values for attribute engine, so we need 3 entropy calculations
● small: 1 yes, 0 no
   ○ IGsmall = 0 (no variability)
● medium: 2 yes, 3 no
   ○ IGmedium = -(2/5)log2(2/5)-(3/5)log2(3/5) = ~0.97
● large: 2 no, 2 yes
   ○ IGlarge = 1 (evenly distributed subset)

   IGEngine = IE(SFuelEco) – (5/10)*IGmedium + (4/10)*IGlarge]
   IGEngine = 1 – 0.885 = 0.115
Example II (Cont.)
            Information Gain over SC/Turbo
● SC/Turbo: 3 yes, 7 no
● 2 values for attribute SC/Turbo, so we need 2 entropy calculations
● yes: 2 yes, 1 no
   ○ IGturbo = -(2/3)log2(2/3)-(1/3)log2(1/3) = ~0.84
● no: 3 yes, 4 no
   ○ IGnoturbo = -(3/7)log2(3/7)-(4/7)log2(4/7) = ~0.84

   IGturbo = IE(SFuelEco) – [(3/10)*IGturbo + (7/10)*IGnoturbo]
   IGturbo = 1 – 0.965 = 0.035
Example II (Cont.)
              Information Gain over Weight
● Weight: 3 average, 5 heavy, 2 light
● 3 values for attribute weight, so we need 3 entropy calculations
● average: 3 yes, 0 no
   ○ IGaverage = 0 (no variability)
● heavy: 1 yes, 4 no
   ○ IGheavy = -(1/5)log2(1/5)-(4/5)log2(4/5) = ~0.72
● light: 1 yes, 1 no
   ○ IlGight = 1 (evenly distributed subset)

   IGEngine = IE(SFuel Eco) – [(5/10)*IGheavy+(2/10)*IGlight]
   IGEngine = 1 – 0.561 = 0.439
Example II (Cont.)
● Results:
IGEngine             0.115

IGturbo              0.035

IGWeight             0.439


Weight has the highest gain, and is thus the
best choice.
Example II (Cont.)
Since there are only two items for SC/Turbo where
Weight = Light, and the result is consistent, we can
simplify the weight = Light path.
Example II (Cont.)
● Updated Table: (Weight = Heavy)




● All cars with large engines in this table are not fast.
● Due to inconsistent patterns in the data, there is no way to
  proceed since medium size engines may lead to
  either fast or not fast.
ROC Analysis
● Receiver Operating Characteristic
● The limitations of diagnostic “accuracy” as a measure
  of decision performance require introduction of the
  concepts of the “sensitivity” and “specificity” of a
  diagnostic test. These measures and the related
  indices, “true positive rate” and “false positive
  rate”, are more meaningful than “accuracy”.
● ROC curve is shown to be a complete description of
  this decision threshold effect, indicating all possible
  combinations of the relative frequencies of the various
  kinds of correct and incorrect decisions.
ROC Analysis (Cont.)
● Combinations of correct & incorrect decisions:
Actual Value   Prediction Outcome   Description
p              p                    True Positive Rate (TPR)

p              n                    False Negative Rate (FNR)

n              p                    False Positive Rate (FPR)

n              n                    True Negative Rate (TNR)



● TPR is equivalent with sensitivity.
● FPR is equivalent with 1 - specificity.
● Best possible prediction would be 100% sensitivity
  and 100% specificity (which means FPR = 0%).
ROC Space
● A ROC space is defined by FPR and TPR as x
  and y axes respectively, which depicts relative
  trade-offs between true positive (benefits) and
  false positive (costs).
● Since TPR is equivalent with sensitivity and
  FPR is equal to 1 − specificity, the ROC graph
  is sometimes called the sensitivity vs (1 −
  specificity) plot.
● Each prediction result one point in the ROC
  space.
Calculations
● Sensitivity
  ○ TPR = TP / P = TP / (TP + FN)
● Specificity
  ○ FPR = FP / N = FP / (FP + TN)
● Accuracy
  ○ ACC = (TP + TN) / (P + N)
A ROC Space Example
● Let A, B, C, D to be predictions over 100
  negative and 100 positive instance:
Prediction/   TP   FP   FN   TN   TPR    FPR    ACC
Combination

     A        63   28   37   72   0.63   0.28   0.68


     B        77   77   23   23   0.77   0.77   0.50


     C        24   88   76   12   0.24   0.88   0.18


     D        76   12   24   88   0.76   0.12   0.82
A ROC Space Example (Cont.)
References
1. Data Mining Course Lectures, Ass. Prof. Nilüfer
   Yurtay
2. Quinlan, J.R. 1986, Machine Learning, 1, 81
3. http://www.cse.unsw.edu.
   au/~billw/cs9414/notes/ml/06prop/id3/id3.html
4. J. Han, M. Kamber, J. Pie, Data Mining Concepts and
   Techniques, 3rd Edition, Elsevier, 2011.
5. http://www.cise.ufl.edu/~ddd/cap6635/Fall-
   97/Short-papers/2.htm
6. C. E. Metz, Basic Principles of ROC Analysis,
   Seminars in Nuclear Medicine, Volume 8, Issue 4, P
   283-298

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
KNN
KNN KNN
KNN
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Decision tree
Decision treeDecision tree
Decision tree
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Decision tree
Decision treeDecision tree
Decision tree
 

Destaque

OSI Veri Bağı Katmanı
OSI Veri Bağı KatmanıOSI Veri Bağı Katmanı
OSI Veri Bağı Katmanı
Talha Kabakus
 
Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12
Kenisha S Russell Jonsson
 
Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]
butest
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
butest
 

Destaque (20)

ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
General Introduction to ROC Curves
General Introduction to ROC CurvesGeneral Introduction to ROC Curves
General Introduction to ROC Curves
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Roc curve, analytics
Roc curve, analyticsRoc curve, analytics
Roc curve, analytics
 
Roc Search
Roc SearchRoc Search
Roc Search
 
OSI Veri Bağı Katmanı
OSI Veri Bağı KatmanıOSI Veri Bağı Katmanı
OSI Veri Bağı Katmanı
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
 
ppt
pptppt
ppt
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12
 
05 powerpoint-alessandra young
05 powerpoint-alessandra young05 powerpoint-alessandra young
05 powerpoint-alessandra young
 
Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]Ch 9-2.Machine Learning: Symbol-based[new]
Ch 9-2.Machine Learning: Symbol-based[new]
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Roc
RocRoc
Roc
 
Neural network
Neural networkNeural network
Neural network
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curve
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 

Semelhante a ID3 Algorithm & ROC Analysis

Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
CHIRAGGOWDA41
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
rajasamal1999
 

Semelhante a ID3 Algorithm & ROC Analysis (20)

Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdfAiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
 
unit 5 decision tree2.pptx
unit 5 decision tree2.pptxunit 5 decision tree2.pptx
unit 5 decision tree2.pptx
 
ID3 Algorithm
ID3 AlgorithmID3 Algorithm
ID3 Algorithm
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
 
ML Decision Tree_2.pptx
ML Decision Tree_2.pptxML Decision Tree_2.pptx
ML Decision Tree_2.pptx
 
Descision making descision making decision tree.pptx
Descision making descision making decision tree.pptxDescision making descision making decision tree.pptx
Descision making descision making decision tree.pptx
 
Data mining lecture about gini index and something
Data mining lecture about gini index and somethingData mining lecture about gini index and something
Data mining lecture about gini index and something
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 
Decision tree of cart
Decision tree of cartDecision tree of cart
Decision tree of cart
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
decison tree and rules in data mining techniques
decison tree and rules in data mining techniquesdecison tree and rules in data mining techniques
decison tree and rules in data mining techniques
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Id3 algorithm
Id3 algorithmId3 algorithm
Id3 algorithm
 

Mais de Talha Kabakus

Android Malware Detection Mechanisms
Android Malware Detection MechanismsAndroid Malware Detection Mechanisms
Android Malware Detection Mechanisms
Talha Kabakus
 
Görüntü i̇şlemede makine öğrenme teknikleri
Görüntü i̇şlemede makine öğrenme teknikleriGörüntü i̇şlemede makine öğrenme teknikleri
Görüntü i̇şlemede makine öğrenme teknikleri
Talha Kabakus
 
64 bit işlemcilerin modern tarihçesi
64 bit işlemcilerin modern tarihçesi64 bit işlemcilerin modern tarihçesi
64 bit işlemcilerin modern tarihçesi
Talha Kabakus
 

Mais de Talha Kabakus (11)

Abant İzzet Baysal Üniversitesi Lisansüstü Programlara Başvuru Ön Kayit Sistemi
Abant İzzet Baysal Üniversitesi Lisansüstü Programlara Başvuru Ön Kayit SistemiAbant İzzet Baysal Üniversitesi Lisansüstü Programlara Başvuru Ön Kayit Sistemi
Abant İzzet Baysal Üniversitesi Lisansüstü Programlara Başvuru Ön Kayit Sistemi
 
Web Saldırı Teknikleri & Korunma Yöntemleri
Web Saldırı Teknikleri & Korunma YöntemleriWeb Saldırı Teknikleri & Korunma Yöntemleri
Web Saldırı Teknikleri & Korunma Yöntemleri
 
Programlanabilir DDRx Denetleyicileri
Programlanabilir DDRx DenetleyicileriProgramlanabilir DDRx Denetleyicileri
Programlanabilir DDRx Denetleyicileri
 
Android Malware Detection Mechanisms
Android Malware Detection MechanismsAndroid Malware Detection Mechanisms
Android Malware Detection Mechanisms
 
Android Kötücül Yazılım (Malware) Tespit Mekanizmaları
Android Kötücül Yazılım (Malware) Tespit MekanizmalarıAndroid Kötücül Yazılım (Malware) Tespit Mekanizmaları
Android Kötücül Yazılım (Malware) Tespit Mekanizmaları
 
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi v.2
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi v.2Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi v.2
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi v.2
 
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt SistemiAbant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi
Abant İzzet Baysal Üniversitesi Enstitü Ön Kayıt Sistemi
 
Atlassian JIRA
Atlassian JIRAAtlassian JIRA
Atlassian JIRA
 
Google Arama Motorunda Matrislerin Önemi
Google Arama Motorunda Matrislerin ÖnemiGoogle Arama Motorunda Matrislerin Önemi
Google Arama Motorunda Matrislerin Önemi
 
Görüntü i̇şlemede makine öğrenme teknikleri
Görüntü i̇şlemede makine öğrenme teknikleriGörüntü i̇şlemede makine öğrenme teknikleri
Görüntü i̇şlemede makine öğrenme teknikleri
 
64 bit işlemcilerin modern tarihçesi
64 bit işlemcilerin modern tarihçesi64 bit işlemcilerin modern tarihçesi
64 bit işlemcilerin modern tarihçesi
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

ID3 Algorithm & ROC Analysis

  • 1. ID3 Algorithm & ROC Analysis Talha KABAKUŞ talha.kabakus@ibu.edu.tr
  • 2. Agenda ● Where are we now? ● Decision Trees ● What is ID3? ● Entropy ● Information Gain ● Pros and Cons of ID3 ● An Example - The Simpsons ● What is ROC Analysis? ● ROC Space ● ROC Space Example over predictions
  • 4. Decision Trees ● One of the most used classification approach because of its clear model and presentation ● Classification by using data attributes ● Aim is to reaching estimating destination field value using source fields ● Tree Induction ○ Create tree ○ Apply data into tree to classify ● Each branch node represents a choice between a number of alternatives ● Each leaf node represents a classification or decision ● Leaf Count = Rule Count
  • 5. Decision Trees (Cont.) ● Leafs are inserted through top to bottom A B C D E F G
  • 7. Creating Tree Model by Training Data
  • 9. Apply Model to Test Data
  • 10. Apply Model to Test Data (Cont.)
  • 11. Apply Model to Test Data (Cont.)
  • 12. Apply Model to Test Data (Cont.)
  • 13. Apply Model to Test Data (Cont.)
  • 14. Apply Model to Test Data (Cont.)
  • 15. Decision Tree Algorithms ● Classification and Regression Algorithms ○ Twoig ○ Gini ● Entropy-based Algorithms ○ ID3 ○ C4.5 ● Memory-based (Sample-based) Classification Algorithms
  • 16. Decision Trees by Variable Type ● Single Variable Decision Trees ○ Classifications are done with asking questions over only one variable ● Hybrid Decision Trees ○ Classifications are done with asking questions over both single and multiple variables ● Multiple Variables Decision Trees ○ Classifications are done with asking questions over multiple variables
  • 17. ID3 Algorithm ● Iterative Dichotomizer 3 ● Developed by J. Ross Quinlan in 1979 ● Based on Entropy ● Only works for discrete data ● Can not work with defective data ● Advantage over Hunt's algorithm is choosing the right attribute while classification. (Hunt's algorithm chooses randomly)
  • 18. Entropy ● A formula to calculate the homogeneity of a sample; gives idea about how much information gain provides each leaf ● A complete homogeneous sample entropy value is 0 ● An equally divided sample entropy value is 1 ● Formula:
  • 19. Information Gain (IG) ● Information Gain calculates effective change in entropy after making a decision based on the value of an attribute. ● Which attribute creates the most homogeneous branches? ● First the entropy of the total dataset is calculated. ● The dataset is then split on the different attributes.
  • 20. Information Gain (Cont.) ● The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. ● The resulting entropy is subtracted from the entropy before the split. ● The result is the Information Gain, or decrease in entropy. ● The attribute that yields the largest IG is chosen for the decision node.
  • 21. Information Gain (Cont.) ● A branch set with entropy of 0 is a leaf node. ● Otherwise, the branch needs further splitting to classify its dataset. ● The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
  • 22. ID3 Algorithm Steps function ID3 (R: a set of non-categorical attributes, C: the categorical attribute, S: a training set) returns a decision tree; begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value the most frequent of the values of the categorical attribute that are found in records of S; [note that then there will be errors, that is, records that will be improperly classified]; Let D be the attribute with largest Gain( D,S) among attributes in R; Let {dj| j=1,2, .., m} be the values of attribute D; Let {Sj| j=1,2, .., m} be the subsets of S consisting respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs labeled d1, d2, .., dm going respectively to the trees ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm); end ID3;
  • 23. Pros of ID3 Algorithm ● Builds decision tree in min. steps ○ The most important point while tree induction is collecting enough reliable associated data over specific properties. ○ Asking right questions determines tree induction. ● Each level benefits from previous level choices ● Whole dataset is scanned to create tree
  • 24. Cons of ID3 Algorithm ● Tree can not be updated when new data is classified incorrectly, instead a new tree must be generated. ● Only one attribute at a time is tested for making a decision. ● Can not work with defective data ● Can not work with numerical attributes
  • 25. An Example - The Simpsons Person Hair Length Weight Age Class Homer 0'' 250 36 M Marge 10'' 150 34 F Bart 2'' 90 10 M Lisa 6'' 78 8 M Maggie 4'' 20 1 F Abe 1'' 170 70 F Selma 8'' 160 41 F Otto 10'' 180 38 M Krusty 6'' 200 45 M
  • 26. Information Gain over Hair Length E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain Hair Length <= 5 Yes No E(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.9710 E(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) =0.8113 Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.9710 + 5/9 * 0.8113) = 0.0911
  • 27. Information Gain over Weight E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain Weight <= 160 Yes No E(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 E(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Gain(Weight<= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
  • 28. Information Gain over Age E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain Age <= 40 Yes No E(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 E(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3)= 0.9188 Gain(Age z= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
  • 29. Results Attribute Information Gain (IG) Hair Length <= 5 0.0911 Weight <= 160 0.5900 Age <= 40 0.0183 ● As seen in the results, weight is the best attribute to classify these group.
  • 30. Constructed Decision Tree Weight <= 160 Yes No Hair Length <= 5 Yes No Female Male
  • 31. Entropy over Nominal Values ● If an attribute has nominal values: ○ First calculate information gain for each attribute value ○ Then calculate attribute information gain
  • 33. Example II (Cont.) Information Gain over Engine ● Engine: 6 small, 5 medium, 4 large ● 3 values for attribute engine, so we need 3 entropy calculations ● small: 5 no, 1 yes ○ IGsmall = -(5/6)log2(5/6)-(1/6)log2(1/6) = ~0.65 ● medium: 3 no, 2 yes ○ IGmedium = -(3/5)log2(3/5)-(2/5)log2(2/5) = ~0.97 ● large: 2 no, 2 yes ○ IGlarge = 1 (evenly distributed subset) => IGEngine = IE(S) – [(6/15)*IGsmall + (5/15)*IGmedium + (4/15)*Ilarge] = IGEngine = 0.918 – 0.85 = 0.068
  • 34. Example II (Cont.) Information Gain over SC/Turbo ● SC/Turbo: 4 yes, 11 no ● 2 values for attribute SC/Turbo, so we need 2 entropy calculations ● yes: 2 yes, 2 no ○ IGturbo = 1 (evenly distributed subset) ● no: 3 yes, 8 no ○ IGnoturbo = -(3/11)log2(3/11)-(8/11)log2(8/11) = ~0.84 IGturbo = IE(S) – [(4/15)*IGturbo + (11/15)*IGnoturbo] IGturbo = 0.918 – 0.886 = 0.032
  • 35. Example II (Cont.) Information Gain over Weight ● Weight: 6 Average, 4 Light, 5 Heavy ● 3 values for attribute weight, so we need 3 entropy calculations ● average: 3 no, 3 yes ○ IGaverage = 1 (evenly distributed subset) ● light: 3 no, 1 yes ○ IGlight = -(3/4)log2(3/4)-(1/4)log2(1/4) = ~0.81 ● heavy: 4 no, 1 yes ○ IGheavy = -(4/5)log2(4/5)-(1/5)log2(1/5) = ~0.72 IGWeight = IE(S) – [(6/15)*IGaverage + (4/15)*IGlight + (5/15)*IGheavy] IGWeight = 0.918 – 0.856 = 0.062
  • 36. Example II (Cont.) Information Gain over Full Eco ● Fuel Economy: 2 good, 3 average, 10 bad ● 3 values for attribute Fuel Eco, so we need 3 entropy calculations ● good: 0 yes, 2 no ○ IGgood = 0 (no variability) ● average: 0 yes, 3 no ○ IGaverage = 0 (no variability) ● bad: 5 yes, 5 no ○ IGbad = 1 (evenly distributed subset) We can omit calculations for good and average since they always end up not fast. IGFuelEco = IE(S) – [(10/15)*IGbad] IGFuelEco = 0.918 – 0.667 = 0.251
  • 37. Example II (Cont.) ● Results: IGEngine 0.068 ■ Root of the tree IGturbo 0.032 IGWeight 0.062 IGFuelEco 0.251
  • 38. Example II (Cont.) ● Since we selected the Fuel Eco attribute for our Root Node, it is removed from the table for future calculations. General Information Gain = 1 (Evenly distributed set)
  • 39. Example II (Cont.) Information Gain over Engine ● Engine: 1 small, 5 medium, 4 large ● 3 values for attribute engine, so we need 3 entropy calculations ● small: 1 yes, 0 no ○ IGsmall = 0 (no variability) ● medium: 2 yes, 3 no ○ IGmedium = -(2/5)log2(2/5)-(3/5)log2(3/5) = ~0.97 ● large: 2 no, 2 yes ○ IGlarge = 1 (evenly distributed subset) IGEngine = IE(SFuelEco) – (5/10)*IGmedium + (4/10)*IGlarge] IGEngine = 1 – 0.885 = 0.115
  • 40. Example II (Cont.) Information Gain over SC/Turbo ● SC/Turbo: 3 yes, 7 no ● 2 values for attribute SC/Turbo, so we need 2 entropy calculations ● yes: 2 yes, 1 no ○ IGturbo = -(2/3)log2(2/3)-(1/3)log2(1/3) = ~0.84 ● no: 3 yes, 4 no ○ IGnoturbo = -(3/7)log2(3/7)-(4/7)log2(4/7) = ~0.84 IGturbo = IE(SFuelEco) – [(3/10)*IGturbo + (7/10)*IGnoturbo] IGturbo = 1 – 0.965 = 0.035
  • 41. Example II (Cont.) Information Gain over Weight ● Weight: 3 average, 5 heavy, 2 light ● 3 values for attribute weight, so we need 3 entropy calculations ● average: 3 yes, 0 no ○ IGaverage = 0 (no variability) ● heavy: 1 yes, 4 no ○ IGheavy = -(1/5)log2(1/5)-(4/5)log2(4/5) = ~0.72 ● light: 1 yes, 1 no ○ IlGight = 1 (evenly distributed subset) IGEngine = IE(SFuel Eco) – [(5/10)*IGheavy+(2/10)*IGlight] IGEngine = 1 – 0.561 = 0.439
  • 42. Example II (Cont.) ● Results: IGEngine 0.115 IGturbo 0.035 IGWeight 0.439 Weight has the highest gain, and is thus the best choice.
  • 43. Example II (Cont.) Since there are only two items for SC/Turbo where Weight = Light, and the result is consistent, we can simplify the weight = Light path.
  • 44. Example II (Cont.) ● Updated Table: (Weight = Heavy) ● All cars with large engines in this table are not fast. ● Due to inconsistent patterns in the data, there is no way to proceed since medium size engines may lead to either fast or not fast.
  • 45. ROC Analysis ● Receiver Operating Characteristic ● The limitations of diagnostic “accuracy” as a measure of decision performance require introduction of the concepts of the “sensitivity” and “specificity” of a diagnostic test. These measures and the related indices, “true positive rate” and “false positive rate”, are more meaningful than “accuracy”. ● ROC curve is shown to be a complete description of this decision threshold effect, indicating all possible combinations of the relative frequencies of the various kinds of correct and incorrect decisions.
  • 46. ROC Analysis (Cont.) ● Combinations of correct & incorrect decisions: Actual Value Prediction Outcome Description p p True Positive Rate (TPR) p n False Negative Rate (FNR) n p False Positive Rate (FPR) n n True Negative Rate (TNR) ● TPR is equivalent with sensitivity. ● FPR is equivalent with 1 - specificity. ● Best possible prediction would be 100% sensitivity and 100% specificity (which means FPR = 0%).
  • 47. ROC Space ● A ROC space is defined by FPR and TPR as x and y axes respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). ● Since TPR is equivalent with sensitivity and FPR is equal to 1 − specificity, the ROC graph is sometimes called the sensitivity vs (1 − specificity) plot. ● Each prediction result one point in the ROC space.
  • 48. Calculations ● Sensitivity ○ TPR = TP / P = TP / (TP + FN) ● Specificity ○ FPR = FP / N = FP / (FP + TN) ● Accuracy ○ ACC = (TP + TN) / (P + N)
  • 49. A ROC Space Example ● Let A, B, C, D to be predictions over 100 negative and 100 positive instance: Prediction/ TP FP FN TN TPR FPR ACC Combination A 63 28 37 72 0.63 0.28 0.68 B 77 77 23 23 0.77 0.77 0.50 C 24 88 76 12 0.24 0.88 0.18 D 76 12 24 88 0.76 0.12 0.82
  • 50. A ROC Space Example (Cont.)
  • 51. References 1. Data Mining Course Lectures, Ass. Prof. Nilüfer Yurtay 2. Quinlan, J.R. 1986, Machine Learning, 1, 81 3. http://www.cse.unsw.edu. au/~billw/cs9414/notes/ml/06prop/id3/id3.html 4. J. Han, M. Kamber, J. Pie, Data Mining Concepts and Techniques, 3rd Edition, Elsevier, 2011. 5. http://www.cise.ufl.edu/~ddd/cap6635/Fall- 97/Short-papers/2.htm 6. C. E. Metz, Basic Principles of ROC Analysis, Seminars in Nuclear Medicine, Volume 8, Issue 4, P 283-298