SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Note to other teachers and users of these slides. Andrew would
                                                         be delighted if you found this source material useful in giving
                                                         your own lectures. Feel free to use these slides verbatim, or to
                                                         modify them to fit your own needs. PowerPoint originals are
                                                         available. If you make use of a significant portion of these sli des
                                                         in your own lecture, please include this message, or the
                                                         following link to the source repository of Andrew’s tutorials:
                                                         http://www.cs.cmu.edu/~awm/tutorials . Comments and
                                                         corrections gratefully received.




                      PAC-learning
                                    Andrew W. Moore
                                   Associate Professor
                                School of Computer Science
                                Carnegie Mellon University
                                     www.cs.cmu.edu/~awm
                                        awm@cs.cmu.edu
                                          412-268-7599




         Copyright © 2001, Andrew W. Moore                                             Nov 30th, 2001




   Probably Approximately Correct
           (PAC) Learning
• Imagine we’re doing classification with categorical
  inputs.
• All inputs and outputs are binary.
• Data is noiseless.
• There’s a machine f(x,h) which has H possible
  settings (a.k.a. hypotheses), called h1, h 2 .. hH.




Copyright © 2001, Andrew W. Moore                                                             PAC-learning: Slide 2




                                                                                                                                1
Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
  complete set of hypotheses in f?



Copyright © 2001, Andrew W. Moore                                 PAC-learning: Slide 3




                   Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
  complete set of hypotheses in f? (H = 8)
              True                  X2        X3        X2 ^ X3

              X1                    X1 ^ X2   X1 ^ X3   X1 ^ X2 ^ X3
Copyright © 2001, Andrew W. Moore                                 PAC-learning: Slide 4




                                                                                          2
And-Positive-Literals Machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
  hypotheses in f?



Copyright © 2001, Andrew W. Moore            PAC-learning: Slide 5




       And-Positive-Literals Machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
  hypotheses in f? (H = 2m)



Copyright © 2001, Andrew W. Moore            PAC-learning: Slide 6




                                                                     3
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
  contain only logical ands.
• Example hypotheses:
• X1 ^ ~X3 ^ X19
• X3 ^ ~X18
• ~X7
• X1 ^ X2 ^ ~X3 ^ … ^ Xm
• Question: if there are 2
  attributes, what is the
  complete set of hypotheses
  in f?

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 7




                     And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True       True
  contain only logical ands.
• Example hypotheses:               True       X2
• X1 ^ ~X3 ^ X19                    True       ~X2
• X3 ^ ~X18                         X1         True
• ~X7                               X1     ^   X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^   ~X2
• Question: if there are 2          ~X1        True
  attributes, what is the
                                    ~X1    ^   X2
  complete set of hypotheses
                                    ~X1    ^   ~X2
  in f? (H = 9)

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 8




                                                                       4
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True        True
  contain only logical ands.
• Example hypotheses:               True        X2
• X1 ^ ~X3 ^ X19                    True        ~X2
• X3 ^ ~X18                         X1          True
• ~X7                               X1     ^    X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^    ~X2
• Question: if there are m          ~X1         True
  attributes, what is the size of
                                    ~X1    ^    X2
  the complete set of
                                    ~X1    ^    ~X2
  hypotheses in f?

Copyright © 2001, Andrew W. Moore               PAC-learning: Slide 9




                     And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True        True
  contain only logical ands.
• Example hypotheses:               True        X2
• X1 ^ ~X3 ^ X19                    True        ~X2
• X3 ^ ~X18                         X1          True
• ~X7                               X1     ^    X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^    ~X2
• Question: if there are m          ~X1         True
  attributes, what is the size of
                                    ~X1    ^    X2
  the complete set of
                                    ~X1    ^    ~X2
  hypotheses in f? (H = 3m)

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 10




                                                                        5
Lookup Table Machine
• f(x,h) consists of all truth              X1   X2   X3   X4   Y

  tables mapping combinations               0    0    0    0    0
                                            0    0    0    1    1
  of input attributes to true               0    0    1    0    1
  and false                                 0    0    1    1    0
                                            0    1    0    0    1
• Example hypothesis:                       0    1    0    1    0
                                            0    1    1    0    0
                                            0    1    1    1    1
• Question: if there are m                  1    0    0    0    0

  attributes, what is the size of           1    0    0    1    0

  the complete set of                       1    0    1    0    0
                                            1    0    1    1    1
  hypotheses in f?                          1    1    0    0    0
                                            1    1    0    1    0
                                            1    1    1    0    0
                                            1    1    1    1    0




Copyright © 2001, Andrew W. Moore                                   PAC-learning: Slide 11




                  Lookup Table Machine
• f(x,h) consists of all truth              X1   X2   X3   X4   Y

  tables mapping combinations               0    0    0    0    0
                                            0    0    0    1    1
  of input attributes to true               0    0    1    0    1
  and false                                 0    0    1    1    0
                                            0    1    0    0    1
• Example hypothesis:                       0    1    0    1    0
                                            0    1    1    0    0
                                            0    1    1    1    1
• Question: if there are m                  1    0    0    0    0

  attributes, what is the size of           1    0    0    1    0

  the complete set of                       1    0    1    0    0
                                            1    0    1    1    1
  hypotheses in f?                          1    1    0    0    0
                                            1    1    0    1    0




            H =2
                                        m   1    1    1    0    0
                                    2       1    1    1    1    0




Copyright © 2001, Andrew W. Moore                                   PAC-learning: Slide 12




                                                                                             6
A Game
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
     1.Vector of inputs x k = (x k1 ,x k2 , x km) is
       drawn from a fixed unknown distrib: D
     2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0




 Copyright © 2001, Andrew W. Moore                      PAC-learning: Slide 13




    Test Error Rate
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
        drawn from a fixed unknown distrib: D
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0
• For each hypothesis h ,
• Say h is Correctly Classified (CCd) if h has
  zero training set error
• Define TESTERR(h )
      = Fraction of test points that h will classify
        correctly
      = P(h classifies a random test point
        correctly)
• Say h is BAD if TESTERR(h) > ε




 Copyright © 2001, Andrew W. Moore                      PAC-learning: Slide 14




                                                                                 7
Test Error Rate
                                                                      P (h is CCd | h is bad ) =
                                                        P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• We specify f, the machine
                                                                            ≤ (1 − ε ) R
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
        drawn from a fixed unknown distrib: D
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0
• For each hypothesis h ,
• Say h is Correctly Classified (CCd) if h has
  zero training set error
• Define TESTERR(h )
      = Fraction of test points that i will classify
        correctly
      = P(h classifies a random test point
        correctly)
• Say h is BAD if TESTERR(h) > ε




 Copyright © 2001, Andrew W. Moore                                                          PAC-learning: Slide 15




    Test Error Rate
                                                                      P (h is CCd | h is bad ) =
                                                        P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• We specify f, the machine
                                                                            ≤ (1 − ε ) R
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
                                                        P ( we learn a bad h ) ≤
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
                                                                   the set of CCd h' s 
        drawn from a fixed unknown distrib: D                    P
                                                                   containsa bad h  = 
                                                                                       
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
                                                                   P (∃h. h is CCd ^ h is bad)=
  some hest for which the training set error is 0
• For each hypothesis h ,
                                                                     ( h1 is CCd ^ h1 is bad) ∨ 
• Say h is Correctly Classified (CCd) if h has                                                  
  zero training set error                                            ( h2 is CCd ^ h2 is bad) ∨ 
                                                                                                 ≤
                                                                   P
• Define TESTERR(h )
                                                                                  :
                                                                                                
      = Fraction of test points that i will classify
                                                                     ( h is CCd ^ h is bad) 
                                                                    H                           
        correctly                                                                      H
      = P(h classifies a random test point                                H

                                                                      ∑ P(h is          CCd ^ hi is bad)≤
        correctly)                                                                  i
                                                                          i =1
                                                             H
• Say h is BAD if TESTERR(h) > ε
                                                            ∑ P(h is             CCd | hi is bad) =
                                                                      i
                                                            i =1

                                                        H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R
 Copyright © 2001, Andrew W. Moore                                                          PAC-learning: Slide 16




                                                                                                                     8
PAC Learning
 • Chose R such that with probability less than δ we’ll
   select a bad hest (i.e. an hest which makes mistakes
   more than fraction ε of the time)
 • Probably Approximately Correct
 • As we just saw, this can be achieved by choosing
   R such that
                 δ = P( we learn a bad h ) ≤ H (1 − ε ) R

 • i.e. R such that
                                   0.69                 1
                        R≥               log 2 H + log 2 
                                    ε                   δ
 Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 17




                                     PAC in action
Machine       Example                              H                R required to PAC-
              Hypothesis                                            learn
                                                                        0.69           1
And-positive- X3 ^ X7 ^ X8                         2m                         m + log 2 
                                                                          ε            δ
literals
And-literals X3 ^ ~X7                              3m                0.69                  1
                                                                           (log2 3)m + log2 
                                                                      ε                    δ

Lookup                   X1   X2     X3   X4   Y

                                                                        0.69  m        1
                                                         m
                         0    0      0    0    0



                                                    22                        2 + log 2 
                         0    0      0    1    1


Table
                         0    0      1    0    1

                                                                         ε             δ
                         0    0      1    1    0
                         0    1      0    0    1
                         0    1      0    1    0
                         0    1      1    0    0
                         0    1      1    1    1
                         1    0      0    0    0
                         1    0      0    1    0
                         1    0      1    0    0
                         1    0      1    1    1
                         1    1      0    0    0
                         1    1      0    1    0
                         1    1      1    0    0
                         1    1      1    1    0




                        (X1 ^ X5) v
And-lits or                                                          0. 69                       1
                                                   (3m ) 2 = 3 2m           ( 2 log 2 3) m + log2 
                                                                      ε                          δ
And-lits                (X2 ^ ~X7 ^ X8)

 Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 18




                                                                                                        9
PAC for decision trees of depth k
• Assume m attributes
• Hk = Number of decision trees of depth k

• H0 =2
• Hk+1 = (#choices of root attribute) *
         (# possible left subtrees) *
          (# possible right subtrees)
                  = m * Hk * Hk

• Write Lk = log2 Hk
• L0 = 1
• Lk+1 = log2 m + 2Lk
• So Lk = (2 k -1)(1+log2 m) +1
• So to PAC-learn, need
                                              0.69  k                                1
                                         R≥         ( 2 − 1)(1 + log 2 m) + 1 + log 2 
                                               ε                                     δ
Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 19




                 What you should know
• Be able to understand every step in the math that
  gets you to
                  δ = P( we learn a bad h ) ≤ H (1 − ε ) R

• Understand that you thus need this many records
  to PAC-learn a machine with H hypotheses
                                    0.69                 1
                       R≥                 log 2 H + log 2 
                                     ε                   δ
• Understand examples of deducing H for various
  machines
Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 20




                                                                                                       10

Mais conteúdo relacionado

Mais procurados

Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers Istiak Ahmed
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 
Unit 1 quantifiers
Unit 1  quantifiersUnit 1  quantifiers
Unit 1 quantifiersraksharao
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiersblaircomp2003
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and ApplicationsSaurav Jha
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceasimnawaz54
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Probleminventionjournals
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Matthew Leingang
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacityFrank Katta
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiersallyn joy calcaben
 
Conditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicConditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicFayçal Touazi
 
Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1saya efan
 

Mais procurados (20)

Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
I2ml3e chap2
I2ml3e chap2I2ml3e chap2
I2ml3e chap2
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Unit 1 quantifiers
Unit 1  quantifiersUnit 1  quantifiers
Unit 1 quantifiers
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiers
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
Puy chosuantai2
Puy chosuantai2Puy chosuantai2
Puy chosuantai2
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inference
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Iteration Techniques
Iteration TechniquesIteration Techniques
Iteration Techniques
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacity
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiers
 
Predicate & quantifier
Predicate & quantifierPredicate & quantifier
Predicate & quantifier
 
Quantification
QuantificationQuantification
Quantification
 
Conditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicConditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logic
 
Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1
 

Semelhante a PAC Learning

Information Gain
Information GainInformation Gain
Information Gainguest32311f
 
Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coverianceudhayax793
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: EntropyDongseo University
 
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxsmile790243
 

Semelhante a PAC Learning (8)

pac.ppt
pac.pptpac.ppt
pac.ppt
 
Information Gain
Information GainInformation Gain
Information Gain
 
Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coveriance
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 
Mle
MleMle
Mle
 
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 

Mais de guestfee8698

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesguestfee8698
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
Short Overview of Bayes Nets
Short Overview of Bayes NetsShort Overview of Bayes Nets
Short Overview of Bayes Netsguestfee8698
 
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiersguestfee8698
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionguestfee8698
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
Gaussian Bayes Classifiers
Gaussian Bayes ClassifiersGaussian Bayes Classifiers
Gaussian Bayes Classifiersguestfee8698
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Minersguestfee8698
 

Mais de guestfee8698 (19)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Short Overview of Bayes Nets
Short Overview of Bayes NetsShort Overview of Bayes Nets
Short Overview of Bayes Nets
 
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Bayes Classifiers
Gaussian Bayes ClassifiersGaussian Bayes Classifiers
Gaussian Bayes Classifiers
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
Gaussians
GaussiansGaussians
Gaussians
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 

Último

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 

Último (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

PAC Learning

  • 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these sli des in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, Andrew W. Moore Nov 30th, 2001 Probably Approximately Correct (PAC) Learning • Imagine we’re doing classification with categorical inputs. • All inputs and outputs are binary. • Data is noiseless. • There’s a machine f(x,h) which has H possible settings (a.k.a. hypotheses), called h1, h 2 .. hH. Copyright © 2001, Andrew W. Moore PAC-learning: Slide 2 1
  • 2. Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 3 Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? (H = 8) True X2 X3 X2 ^ X3 X1 X1 ^ X2 X1 ^ X3 X1 ^ X2 ^ X3 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 4 2
  • 3. And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 5 And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? (H = 2m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 6 3
  • 4. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that contain only logical ands. • Example hypotheses: • X1 ^ ~X3 ^ X19 • X3 ^ ~X18 • ~X7 • X1 ^ X2 ^ ~X3 ^ … ^ Xm • Question: if there are 2 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 7 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are 2 ~X1 True attributes, what is the ~X1 ^ X2 complete set of hypotheses ~X1 ^ ~X2 in f? (H = 9) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 8 4
  • 5. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 9 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? (H = 3m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 10 5
  • 6. Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 11 Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 H =2 m 1 1 1 0 0 2 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 12 6
  • 7. A Game • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 13 Test Error Rate • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that h will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 14 7
  • 8. Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that i will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 15 Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints P ( we learn a bad h ) ≤ •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is  the set of CCd h' s  drawn from a fixed unknown distrib: D P  containsa bad h  =    2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing P (∃h. h is CCd ^ h is bad)= some hest for which the training set error is 0 • For each hypothesis h ,  ( h1 is CCd ^ h1 is bad) ∨  • Say h is Correctly Classified (CCd) if h has   zero training set error  ( h2 is CCd ^ h2 is bad) ∨  ≤ P • Define TESTERR(h ) :   = Fraction of test points that i will classify  ( h is CCd ^ h is bad)  H  correctly H = P(h classifies a random test point H ∑ P(h is CCd ^ hi is bad)≤ correctly) i i =1 H • Say h is BAD if TESTERR(h) > ε ∑ P(h is CCd | hi is bad) = i i =1 H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R Copyright © 2001, Andrew W. Moore PAC-learning: Slide 16 8
  • 9. PAC Learning • Chose R such that with probability less than δ we’ll select a bad hest (i.e. an hest which makes mistakes more than fraction ε of the time) • Probably Approximately Correct • As we just saw, this can be achieved by choosing R such that δ = P( we learn a bad h ) ≤ H (1 − ε ) R • i.e. R such that 0.69  1 R≥  log 2 H + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 17 PAC in action Machine Example H R required to PAC- Hypothesis learn 0.69  1 And-positive- X3 ^ X7 ^ X8 2m  m + log 2  ε δ literals And-literals X3 ^ ~X7 3m 0.69  1  (log2 3)m + log2  ε δ Lookup X1 X2 X3 X4 Y 0.69  m 1 m 0 0 0 0 0 22  2 + log 2  0 0 0 1 1 Table 0 0 1 0 1 ε δ 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 (X1 ^ X5) v And-lits or 0. 69  1 (3m ) 2 = 3 2m  ( 2 log 2 3) m + log2  ε δ And-lits (X2 ^ ~X7 ^ X8) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 18 9
  • 10. PAC for decision trees of depth k • Assume m attributes • Hk = Number of decision trees of depth k • H0 =2 • Hk+1 = (#choices of root attribute) * (# possible left subtrees) * (# possible right subtrees) = m * Hk * Hk • Write Lk = log2 Hk • L0 = 1 • Lk+1 = log2 m + 2Lk • So Lk = (2 k -1)(1+log2 m) +1 • So to PAC-learn, need 0.69  k 1 R≥  ( 2 − 1)(1 + log 2 m) + 1 + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 19 What you should know • Be able to understand every step in the math that gets you to δ = P( we learn a bad h ) ≤ H (1 − ε ) R • Understand that you thus need this many records to PAC-learn a machine with H hypotheses 0.69  1 R≥  log 2 H + log 2  ε δ • Understand examples of deducing H for various machines Copyright © 2001, Andrew W. Moore PAC-learning: Slide 20 10