SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Introduction to Machine
       Learning
                  Lecture 6


               Albert Orriols i Puig
              aorriols@salle.url.edu
                  i l @ ll       ld

     Artificial Intelligence – Machine Learning
         Enginyeria i Arquitectura La Salle
             gy           q
                Universitat Ramon Llull
Recap of Lecture 4
        ID3 is a strong system that
                      gy
                Uses hill-climbing search based on the information gain
                measure to sea c through the space o dec s o trees
                 easu e o search oug         e       of decision ees
                Outputs a single hypothesis.
                Never b kt k It converges to locally optimal solutions.
                N     backtracks.         tl     ll    ti l l ti
                Uses all training examples at each step, contrary to methods
                that
                th t make decisions i
                       k d ii        incrementally.
                                              t ll
                Uses statistical properties of all examples: the search is less
                sensitive t errors i i di id l t i i examples.
                    iti to          in individual training      l
                Can handle noisy data by modifying its termination criterion to
                accept hypotheses that imperfectly fit the data.
                     th    th     th t i    f tl       th d t




                                                                              Slide 2
Artificial Intelligence                 Machine Learning
Recap of Lecture 4

        However, ID3 has some drawbacks
                It
                I can only deal with nominal d
                        l d l ih        i l data
                It is not able to deal with noisy data sets
                It may be not robust in presence of noise




                                                              Slide 3
Artificial Intelligence                 Machine Learning
Today’s Agenda


        Going from ID3 to C4.5
        How C4.5 enhances C4.5 to
                Be robust in the presence of noise. Avoid overfitting
                Deal with continuous attributes
                Deal with missing data
                Convert trees to rules




                                                                  Slide 4
Artificial Intelligence            Machine Learning
What’s Overfitting?
      Overfitting = Given a hypothesis space H, a hypothesis hєH is said to
      overfit the training data if there exists some alternative hypothesis h’єH,
      such that
                h has smaller error than h’ over the training examples, but
                                         h                    examples
          1.
          1

                h’ has a smaller error than h over the entire distribution of instances.
          2.




                                                                                           Slide 5
Artificial Intelligence                    Machine Learning
Why May my System Overfit?
           In domains with noise or uncertainty
                                              y
                   the system may try to decrease the training error by completely
                   fitting a the training e a p es
                         g all e a      g examples

       The learner overfits
       to correctly classify
       the noisy instances                                       Noisy instances




Occam’s razor: Prefer the
simplest hypothesis that fits
the data with high accuracy




                                                                                   Slide 6
   Artificial Intelligence               Machine Learning
How to Avoid Overfitting?
        Ok, my system may overfit… Can I avoid it?
          , yy          y
                 Sure! Do not include branches that fit data too specifically
         How?
         H?
                 Pre-prune: Stop growing a branch when information becomes
        1.
                 unreliable
                     li bl
                 Post-prune: Take a fully-grown decision tree and discard
        2.
                 unreliable parts
                     li bl




                                                                                Slide 7
Artificial Intelligence                 Machine Learning
Pre-pruning
        Based on statistical significance test
                               g
                Stop growing the tree when there is no statistically significant
                assoc a o between any attribute and e class at particular
                association be ee a y a bu e a d the c ass a a pa cu a
                node
                Use all available da a for training a d app y the s a s ca test
                     a a a ab e data o a          g and apply e statistical es
                to estimate whether expanding/pruning a node is to produce an
                improvement beyond the training set
        Most popular test: chi-squared test
        ID3 used chi-squared test in addition to information gain
                Only statistically significant attributes were allowed to be
                selected by information gain procedure




                                                                               Slide 8
Artificial Intelligence                 Machine Learning
Pre-pruning
 Early stopping: Pre-pruning may stop the growth process prematurely
 Classic example: XOR/Parity-problem
             No individual attribute exhibits any significant association to the class
             Structure is only visible in fully expanded tree
             Pre-pruning won t
             Pre pruning won’t expand the root node
 But: XOR-type problems rare in practice
 And: pre-pruning faster than post-pruning

                          x1   x2    Class
                1         0    0      0
                                                                01     10
                2         0    1      1
                3         1    0      1
                4         1    1      0                         00    10




                                                                                    Slide 9
Artificial Intelligence                      Machine Learning
Post-pruning
         First, build the full tree
              ,
         Then, prune it
                 Fully-grown
                 Fully grown tree shows all attribute interactions
         Problem: some subtrees might be due to chance effects
         Two pruning operations:
                 Subtree replacement
        1.

                 Subtree raising
        2.

         Possible strategies:
                 error estimation
                 significance t ti
                  i ifi       testing
                 MDL principle


                                                                     Slide 10
Artificial Intelligence                    Machine Learning
Subtree Replacement
        Bottom up approach
                p pp
        Consider replacing a tree after considering all its subtrees
        Ex: labor negotiations




                                                                       Slide 11
Artificial Intelligence          Machine Learning
Subtree Replacement
Algorithm:
1. Split the data into training and validation set
2. Do until further pruning is harmful:
   a. Evaluate impact on the validation set of pruning
      each possible node
   b. Select th
   b S l t the node whose removal most i
                    d     h            l     t increases
      the validation set accuracy




                                                                  Slide 12
Artificial Intelligence                        Machine Learning
Subtree Raising
                                                  Delete node
                                                  Redistribute instances
                                                  Slower than subtree
                                                  replacement
                                                  (Worthwhile?)




                                                        X

                                                                           Slide 13
Artificial Intelligence        Machine Learning
Estimating Error Rates
        Ok we can prune. But when?
                  p
                Prune only if it reduces the estimated error
                Error on the training data is NOT a useful estimator
                Q: Why it would result in very little pruning?
                Use hold-out set for pruning
                    hold out
                                                                                 Training
                                                                                 T ii
                          Separate a validation set                              Data set’
                                                                 Training
                                                                        g
                          Use this validation set to
                          test the improvement                   Data set
                                                                                 Validation
                C4.5 s
                C4 5’s method                                                        set
                           Derive confidence interval from training data
                           Use a heuristic limit derived from this for pruning
                                           limit,             this,
                           Standard Bernoulli-process-based method
                           Shaky statistical assumptions (based on training data)
                               y                  p      (                g     )


                                                                                     Slide 14
Artificial Intelligence                       Machine Learning
Deal with continuous attributes
        When dealing with nominal data
                   g
                We evaluated the grain for each possible value
        In
        I continuous data, we have infinite values.
             ti      dt       h    i fi it    l
        What should we do?
                Continuous-valued attributes may take infinite values, but we
                have a limited number of values in our instances (at most N if
                we have N instances)
                Therefore, simulate that you have N nominal values
                          Evaluate information gain for every possible split point of the
                          attribute
                          Choose the best split point
                          The information gain of the attribute is the information gain
                          of the best split

                                                                                    Slide 15
Artificial Intelligence                      Machine Learning
Deal with continuous attributes

        Example

                    Outlook     Temperature   Humidity           Windy   Play

                     Sunny
                         y          85           85              False   No

                     Sunny          80           90              True    No

                    Overcast        83           86              False   Yes

                      Rainy         75           80              False   Yes

                          …         …             …               …       …




                               Continuous attributes



                                                                                Slide 16
Artificial Intelligence                       Machine Learning
Deal with continuous attributes
        Split on temperature attribute:
        64       65       68   69   70    71   72       72     75      75    80   81    83   85
       Yes
       Y         N
                 No       Y
                          Yes Y
                              Yes   Yes
                                    Y     No
                                          N    No
                                               N       Yes Y
                                                       Y   Yes         Yes
                                                                       Y     No
                                                                             N    Y
                                                                                  Yes   Yes N
                                                                                        Y   No


                   E.g.: temperature < 71.5: yes/ , no/2
                     g te pe atu e        5 yes/4, o/
                         temperature ≥ 71.5: yes/5, no/3


                   Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits



        Place split points halfway between values
        Can evaluate all split points in one pass!


                                                                                                  Slide 17
Artificial Intelligence                             Machine Learning
Deal with continuous attributes
        To speed up
            p     p
                Entropy only needs to be evaluated between points of different
                c asses
                classes


value 64            65    68   69   70    71   72    72       75   75    80   81    83   85
class Yes                                                 X
                    No    Yes Yes   Yes   No   No   Yes Yes        Yes   No   Yes   Yes No




                   Potential optimal breakpoints

                   Breakpoints between values of the same class cannot
                   be optimal




                                                                                              Slide 18
Artificial Intelligence                        Machine Learning
Deal with Missing Data
        Treat missing values as a separate value
                    g               p
                Missing value denoted “?” in C4.X
                Simple idea: treat missing as a separate value
                Q: When this is not appropriate?
                A: Wh
                A When values are missing d to diff
                         l         i i due different reasons
                          Example 1: gene expression could be missing when it is very
                          high or very low
                          Example 2: field IsPregnant=missing for a male patient should be
                          treated differently (no) than for a female patient of age 25
                          (unknown)




                                                                                     Slide 19
Artificial Intelligence                       Machine Learning
Deal with Missing Data
        Split instances with missing values into pieces
                   A piece going down a branch receives a weight proportional to
                   the popularity of the branch
                   weights sum to 1
        Info gain works with fractional instances
                   Use sums of weights instead of counts
        During classification, split the instance into pieces
        in the same way
                   Merge probability distribution using weights




                                                                           Slide 20
Artificial Intelligence                  Machine Learning
From Trees to Rules
        I finally g a tree from domains with
                y got
                Noisy instances
                Missing l
                Mi i values
                Continuous attributes
        But I prefer rules…
                No context dependent
        Procedure
                Generate a rule for each tree
                Get context-independent rules




                                                           Slide 21
Artificial Intelligence                 Machine Learning
From Trees to Rules
        A procedure a little more sophisticated: C4.5Rules
          p                         p
              C4.5rules: greedily prune conditions from each rule if this
              reduces its es a ed e o
               educes s estimated error
                      Can produce duplicate rules
                      Check for this at the end
              Then
                      look at each class in turn
                      consider the rules for that class
                      find a “good” subset (guided by MDL)
                              good
              Then rank the subsets to avoid conflicts
              Finally, remove rules (greedily) if this decreases error on the
              training data


                                                                            Slide 22
Artificial Intelligence                    Machine Learning
Next Class

        Instance-based Classifiers




                                                 Slide 23
Artificial Intelligence       Machine Learning
Introduction to Machine
       Learning
                  Lecture 6

               Albert Orriols i Puig
              aorriols@salle.url.edu
                  i l @ ll       ld

     Artificial Intelligence – Machine Learning
         Enginyeria i Arquitectura La Salle
             gy           q
                Universitat Ramon Llull

Mais conteúdo relacionado

Mais procurados

CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
butest
 

Mais procurados (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
AlexNet.pptx
AlexNet.pptxAlexNet.pptx
AlexNet.pptx
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
linear classification
linear classificationlinear classification
linear classification
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
08 linear classification_2
08 linear classification_208 linear classification_2
08 linear classification_2
 
Vgg
VggVgg
Vgg
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Clustering
ClusteringClustering
Clustering
 

Semelhante a Lecture6 - C4.5

DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
Gregory Renard
 

Semelhante a Lecture6 - C4.5 (20)

Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
Lecture19
Lecture19Lecture19
Lecture19
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami Presentation
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 

Mais de Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
Albert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
Albert Orriols-Puig
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
Albert Orriols-Puig
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
Albert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
Albert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
Albert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Albert Orriols-Puig
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
Albert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
Albert Orriols-Puig
 

Mais de Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Lecture6 - C4.5

  • 1. Introduction to Machine Learning Lecture 6 Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  • 2. Recap of Lecture 4 ID3 is a strong system that gy Uses hill-climbing search based on the information gain measure to sea c through the space o dec s o trees easu e o search oug e of decision ees Outputs a single hypothesis. Never b kt k It converges to locally optimal solutions. N backtracks. tl ll ti l l ti Uses all training examples at each step, contrary to methods that th t make decisions i k d ii incrementally. t ll Uses statistical properties of all examples: the search is less sensitive t errors i i di id l t i i examples. iti to in individual training l Can handle noisy data by modifying its termination criterion to accept hypotheses that imperfectly fit the data. th th th t i f tl th d t Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 4 However, ID3 has some drawbacks It I can only deal with nominal d l d l ih i l data It is not able to deal with noisy data sets It may be not robust in presence of noise Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Going from ID3 to C4.5 How C4.5 enhances C4.5 to Be robust in the presence of noise. Avoid overfitting Deal with continuous attributes Deal with missing data Convert trees to rules Slide 4 Artificial Intelligence Machine Learning
  • 5. What’s Overfitting? Overfitting = Given a hypothesis space H, a hypothesis hєH is said to overfit the training data if there exists some alternative hypothesis h’єH, such that h has smaller error than h’ over the training examples, but h examples 1. 1 h’ has a smaller error than h over the entire distribution of instances. 2. Slide 5 Artificial Intelligence Machine Learning
  • 6. Why May my System Overfit? In domains with noise or uncertainty y the system may try to decrease the training error by completely fitting a the training e a p es g all e a g examples The learner overfits to correctly classify the noisy instances Noisy instances Occam’s razor: Prefer the simplest hypothesis that fits the data with high accuracy Slide 6 Artificial Intelligence Machine Learning
  • 7. How to Avoid Overfitting? Ok, my system may overfit… Can I avoid it? , yy y Sure! Do not include branches that fit data too specifically How? H? Pre-prune: Stop growing a branch when information becomes 1. unreliable li bl Post-prune: Take a fully-grown decision tree and discard 2. unreliable parts li bl Slide 7 Artificial Intelligence Machine Learning
  • 8. Pre-pruning Based on statistical significance test g Stop growing the tree when there is no statistically significant assoc a o between any attribute and e class at particular association be ee a y a bu e a d the c ass a a pa cu a node Use all available da a for training a d app y the s a s ca test a a a ab e data o a g and apply e statistical es to estimate whether expanding/pruning a node is to produce an improvement beyond the training set Most popular test: chi-squared test ID3 used chi-squared test in addition to information gain Only statistically significant attributes were allowed to be selected by information gain procedure Slide 8 Artificial Intelligence Machine Learning
  • 9. Pre-pruning Early stopping: Pre-pruning may stop the growth process prematurely Classic example: XOR/Parity-problem No individual attribute exhibits any significant association to the class Structure is only visible in fully expanded tree Pre-pruning won t Pre pruning won’t expand the root node But: XOR-type problems rare in practice And: pre-pruning faster than post-pruning x1 x2 Class 1 0 0 0 01 10 2 0 1 1 3 1 0 1 4 1 1 0 00 10 Slide 9 Artificial Intelligence Machine Learning
  • 10. Post-pruning First, build the full tree , Then, prune it Fully-grown Fully grown tree shows all attribute interactions Problem: some subtrees might be due to chance effects Two pruning operations: Subtree replacement 1. Subtree raising 2. Possible strategies: error estimation significance t ti i ifi testing MDL principle Slide 10 Artificial Intelligence Machine Learning
  • 11. Subtree Replacement Bottom up approach p pp Consider replacing a tree after considering all its subtrees Ex: labor negotiations Slide 11 Artificial Intelligence Machine Learning
  • 12. Subtree Replacement Algorithm: 1. Split the data into training and validation set 2. Do until further pruning is harmful: a. Evaluate impact on the validation set of pruning each possible node b. Select th b S l t the node whose removal most i d h l t increases the validation set accuracy Slide 12 Artificial Intelligence Machine Learning
  • 13. Subtree Raising Delete node Redistribute instances Slower than subtree replacement (Worthwhile?) X Slide 13 Artificial Intelligence Machine Learning
  • 14. Estimating Error Rates Ok we can prune. But when? p Prune only if it reduces the estimated error Error on the training data is NOT a useful estimator Q: Why it would result in very little pruning? Use hold-out set for pruning hold out Training T ii Separate a validation set Data set’ Training g Use this validation set to test the improvement Data set Validation C4.5 s C4 5’s method set Derive confidence interval from training data Use a heuristic limit derived from this for pruning limit, this, Standard Bernoulli-process-based method Shaky statistical assumptions (based on training data) y p ( g ) Slide 14 Artificial Intelligence Machine Learning
  • 15. Deal with continuous attributes When dealing with nominal data g We evaluated the grain for each possible value In I continuous data, we have infinite values. ti dt h i fi it l What should we do? Continuous-valued attributes may take infinite values, but we have a limited number of values in our instances (at most N if we have N instances) Therefore, simulate that you have N nominal values Evaluate information gain for every possible split point of the attribute Choose the best split point The information gain of the attribute is the information gain of the best split Slide 15 Artificial Intelligence Machine Learning
  • 16. Deal with continuous attributes Example Outlook Temperature Humidity Windy Play Sunny y 85 85 False No Sunny 80 90 True No Overcast 83 86 False Yes Rainy 75 80 False Yes … … … … … Continuous attributes Slide 16 Artificial Intelligence Machine Learning
  • 17. Deal with continuous attributes Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes Y N No Y Yes Y Yes Yes Y No N No N Yes Y Y Yes Yes Y No N Y Yes Yes N Y No E.g.: temperature < 71.5: yes/ , no/2 g te pe atu e 5 yes/4, o/ temperature ≥ 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits Place split points halfway between values Can evaluate all split points in one pass! Slide 17 Artificial Intelligence Machine Learning
  • 18. Deal with continuous attributes To speed up p p Entropy only needs to be evaluated between points of different c asses classes value 64 65 68 69 70 71 72 72 75 75 80 81 83 85 class Yes X No Yes Yes Yes No No Yes Yes Yes No Yes Yes No Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal Slide 18 Artificial Intelligence Machine Learning
  • 19. Deal with Missing Data Treat missing values as a separate value g p Missing value denoted “?” in C4.X Simple idea: treat missing as a separate value Q: When this is not appropriate? A: Wh A When values are missing d to diff l i i due different reasons Example 1: gene expression could be missing when it is very high or very low Example 2: field IsPregnant=missing for a male patient should be treated differently (no) than for a female patient of age 25 (unknown) Slide 19 Artificial Intelligence Machine Learning
  • 20. Deal with Missing Data Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1 Info gain works with fractional instances Use sums of weights instead of counts During classification, split the instance into pieces in the same way Merge probability distribution using weights Slide 20 Artificial Intelligence Machine Learning
  • 21. From Trees to Rules I finally g a tree from domains with y got Noisy instances Missing l Mi i values Continuous attributes But I prefer rules… No context dependent Procedure Generate a rule for each tree Get context-independent rules Slide 21 Artificial Intelligence Machine Learning
  • 22. From Trees to Rules A procedure a little more sophisticated: C4.5Rules p p C4.5rules: greedily prune conditions from each rule if this reduces its es a ed e o educes s estimated error Can produce duplicate rules Check for this at the end Then look at each class in turn consider the rules for that class find a “good” subset (guided by MDL) good Then rank the subsets to avoid conflicts Finally, remove rules (greedily) if this decreases error on the training data Slide 22 Artificial Intelligence Machine Learning
  • 23. Next Class Instance-based Classifiers Slide 23 Artificial Intelligence Machine Learning
  • 24. Introduction to Machine Learning Lecture 6 Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull