SlideShare uma empresa Scribd logo
1 de 86
Baixar para ler offline
Lecture No. 2

       Ravi Gupta
 AU-KBC Research Centre,
MIT Campus, Anna University




                              Date: 8.3.2008
Today’s Agenda

•   Recap (FIND-S Algorithm)
•   Version Space
•   Candidate-Elimination Algorithm
•   Decision Tree
•   ID3 Algorithm
•   Entropy
Concept Learning as Search

Concept learning can be viewed as the task of searching through
a large space of hypothesis implicitly defined by the hypothesis
representation.


The goal of the concept learning search is to find the hypothesis
that best fits the training examples.
General-to-Specific Learning
                                                     Every day Tom his
                                                       enjoy i.e., Only
                                                     positive examples.


  Most General Hypothesis: h = <?, ?, ?, ?, ?, ?>




 Most Specific Hypothesis: h = < Ø, Ø, Ø, Ø, Ø, Ø>
General-to-Specific Learning




             h2 is more general than h1

     h2 imposes fewer constraints on the instance than h1
Definition

Given hypotheses hj and hk, hj is more_general_than_or_equal_to
hk if and only if any instance that satisfies hk also satisfies hj.




We can also say that hj is more_specific_than hk when hk is
more_general_than hj.
FIND-S: Finding a Maximally
    Specific Hypothesis
Step 1: FIND-S




h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
Step 2: FIND-S




                          h0 = <Ø, Ø, Ø, Ø, Ø, Ø>

                     a1      a2    a3   a4   a5     a6

              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Iteration 1

              h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Iteration 2
                        x2 = <Sunny, Warm, High, Strong, Warm, Same>




              h2 = <Sunny, Warm, ?, Strong, Warm, Same>
Iteration 3   Ignore   h3 = <Sunny, Warm, ?, Strong, Warm, Same>
h3 = < Sunny, Warm, ?, Strong, Warm, Same >


Iteration 4
                        x4 = < Sunny, Warm, High, Strong, Cool, Change >


Step 3

Output        h4 = <Sunny, Warm, ?, Strong, ?, ?>
Unanswered Questions by FIND-S

• Has the learner converged to the correct target
  concept?

• Why prefer the most specific hypothesis?

• What if the training examples consistent?
Version Space

The set of all valid hypotheses provided by an
algorithm is called version space (VS) with respect
to the hypothesis space H and the given example set
D.
Candidate-Elimination Algorithm

  The Candidate-Elimination algorithm finds all describable hypotheses
  that are consistent with the observed training examples




  Hypothesis is derived from examples regardless of whether x is
  positive or negative example
Candidate-Elimination Algorithm




    Earlier
(i.e., FIND-S)
      Def.
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space
                       Examples
Hypothesis Space


                   .
                                  Version Space
                   .
                   .
                   .
                   .
                                       VSH,D
                   .
       H



                         D
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space

 • In principle, the LIST-THEN-ELIMINATE algorithm can be
 applied whenever the hypothesis space H is finite.

 • It is guaranteed to output all hypotheses consistent with the
 training data.

 • Unfortunately, it requires exhaustively enumerating all
 hypotheses in H-an unrealistic requirement for all but the most
 trivial hypothesis spaces.
Candidate-Elimination Algorithm

  • The CANDIDATE-ELIMINATION algorithm works on the same
  principle as the above LIST-THEN-ELIMINATE algorithm.

  • It employs a much more compact representation of the version
  space.

  • In this the version space is represented by its most general and
  least general members (Specific).

  • These members form general and specific boundary sets that delimit
  the version space within the partially ordered hypothesis space.
Least General
  (Specific)




Most General
Candidate-Elimination Algorithm
Example




                 G0 ← {<?, ?, ?, ?, ?, ?>}

Initialization

                 S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}
G0 ← {<?, ?, ?, ?, ?, ?>}

                        S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}




              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Iteration 1
                        G1 ← {<?, ?, ?, ?, ?, ?>}

              S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}




                 x2 = <Sunny, Warm, High, Strong, Warm, Same>
Iteration 2
                         G2 ← {<?, ?, ?, ?, ?, ?>}

              S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}
G2 ← {<?, ?, ?, ?, ?, ?>}

              S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}



                               consistent



                x3 = <Rainy, Cold, High, Strong, Warm, Change>
Iteration 3
                S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >}


       G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}




                             G2 ← {<?, ?, ?, ?, ?, ?>}
S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >}


     G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}




                x4 = <Sunny, Warm, high, Strong, Cool, Change>
Iteration 4
                S4 ← {< Sunny, Warm, ?, Strong, ?, ? >}


                 G4 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}



         G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}
Remarks on Version Spaces and
    Candidate-Elimination

The version space learned by the CANDIDATE-ELIMINATION algorithm
will converge toward the hypothesis that correctly describes the target
concept, provided

 (1) there are no errors in the training examples, and

(2) there is some hypothesis in H that correctly
describes the target concept.
What will Happen if the Training
      Contains errors ?



                             No
G0 ← {<?, ?, ?, ?, ?, ?>}

                        S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}




              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Iteration 1
                        G1 ← {<?, ?, ?, ?, ?, ?>}

              S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}




                 x2 = <Sunny, Warm, High, Strong, Warm, Same>
Iteration 2
                         G2 ← {<?, ?, Normal, ?, ?, ?>}

              S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}
G2 ← {<?, ?, Normal, ?, ?, ?>}

              S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}



                            consistent



               x3 = <Rainy, Cold, High, Strong, Warm, Change>
Iteration 3
               S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}

                          G3 ← {<?, ?, Normal, ?, ?, ?>}
S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}

                         G3 ← {<?, ?, Normal, ?, ?, ?>}




              x4 = <Sunny, Warm, high, Strong, Cool, Change>
Iteration 4
                              S4 ← { }
                                                        Empty

                              G4 ← { }




                       G3 ← {<?, ?, Normal, ?, ?, ?>}
What will Happen if Hypothesis
       is not Present ?
Remarks on Version Spaces and
    Candidate-Elimination


The target concept is exactly learned when
the S and G boundary sets converge to a
single, identical, hypothesis.
Remarks on Version Spaces and
    Candidate-Elimination

How Can Partially Learned Concepts Be Used?
  Suppose that no additional training examples are available beyond
  the four in our example. And the learner is now required to classify
  new instances that it has not yet observed.




   The target concept is exactly learned when
   the S and G boundary sets converge to a
   single, identical, hypothesis.
Remarks on Version Spaces and
    Candidate-Elimination
Remarks on Version Spaces and
    Candidate-Elimination



  All six hypotheses satisfied




  All six hypotheses satisfied
Remarks on Version Spaces and
    Candidate-Elimination


  Three hypotheses satisfied
  Three hypotheses not satisfied




  Two hypotheses satisfied
  Four hypotheses not satisfied
Remarks on Version Spaces and
    Candidate-Elimination


                         Yes
                         No
Decision Trees
Decision Trees


• Decision tree learning is a method for approximating
discrete value target functions, in which the learned function
is represented by a decision tree.

• Decision trees can also be represented by if-then-else rule.

• Decision tree learning is one of the most widely used
approach for inductive inference .
Decision Trees




An instance is classified by starting at the root node of the tree, testing the
attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example. This process
is then repeated for the subtree rooted at the new node.
Decision Trees




<Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>




                       PlayTennis = No
Decision Trees
Edges: Attribute
value
                                    Intermediate
                                    Nodes: Attributes

                                                                             Attribute: A1
                                                                 Attribute                       Attribute
                                                                  value            Attribute
                                                                                                  value
                                                                                    value

                                                           Attribute: A2         Output           Attribute: A3
                                                                                 value
                                                                                          Attribute
                                                   Attribute         Attribute                           Attribute
                                                                                           value
                                                    value             value                               value


                                                        Output         Output          Output            Output
                                                        value          value           value             value




                   Leave node:
                   Output value
Decision Trees

                                                     conjunction
                            disjunction




Decision trees represent a disjunction of conjunctions of constraints
on the attribute values of instances.

Each path from the tree root to a leaf corresponds to a conjunction of
attribute tests, and the tree itself to a disjunction of these
conjunctions.
Decision Trees
Decision Trees (F = A ^ B')
                   F = A ^ B‘
      If (A=True and B = False) then Yes
      else
          No

                                           If then else form
                 A
       False               True


      No
                            B
                  False           True



                     Yes          No
Decision Trees (F = A V (B ^ C))

 If (A=True) then Yes
 else if (B = True and C=True) then Yes              If then else form
       else No


                           A
                  True             False

                Yes
                                        B
                           False            True

                         No                 C
                                    False          True

                                   No              Yes
Decision Trees (F = A XOR B)
            F = (A ^ B') V (A' ^ B)

  If (A=True and B = False) then Yes
                                                        If then else form
  else If (A=False and B = False) then Yes
       else No


                                      A
                        False                   True



                        B                        B
            False                     False            True
                             True


           No                             Yes          No
                              Yes
Decision Trees as If-then-else rule
                                                    conjunction
                             disjunction




   If (Outlook = Sunny AND humidity = Normal) then PlayTennis = Yes
   If (Outlook = Overcast) then PlayTennis = Yes
   If (Outlook = Rain AND Wind = Weak) then PlayTennis = Yes
Problems Suitable for Decision Trees

 • Instances are represented by attribute-value pairs
     Instances are described by a fixed set of attributes (e.g., Temperature) and
     their values (e.g., Hot). The easiest situation for decision tree learning is when
     each attribute takes on a small number of disjoint possible values (e.g., Hot,
     Mild, Cold). However, extensions to the basic algorithm allow handling real-
     valued attributes as well (e.g., representing Temperature numerically).

 • The target function has discrete output values

 • Disjunctive descriptions may be required

 • The training data may contain errors

 • The training data may contain missing attribute values
Basic Decision Tree Learning Algorithm

 • ID3 Algorithm (Quinlan 1986) and it’s
 successors C4.5 and C5.0

 • Employs a top-down
     An instance is classified by starting at the root
     node of the tree, testing the attribute specified
     by this node, then moving down the tree
     branch corresponding to the value of the
     attribute in the given example. This process is
     then repeated for the subtree rooted at the
     new node.

 • Greedy search the space of possible
                                                         http://www.rulequest.com/Personal/
 decision trees.
     The algorithm never backtracks to
     reconsider earlier choices.
ID3 Algorithm
Example
Attributes…




Attributes are Outlook, Temperature, Humidity, Wind
Building Decision Tree
Building Decision Tree

                                           Attribute: A1
                  Attribute value
                                                                      Attribute value
                                                   Attribute
                                                     value



                                           Output value
              Attribute: A2                                                  Attribute: A3


                                                           Attribute value
Attribute value          Attribute value                                            Attribute value




                                                      Output value                 Output value
  Output value                Output value
Building Decision Tree

             Outlook
           Temperature   Which attribute to
                          select ?????
             Humidity
              Wind
Root
node
Which Attribute to Select ??

  • We would like to select the attribute that is most useful for
  classifying examples.

  • What is a good quantitative measure of the worth of an
  attribute?




  ID3 uses this information gain measure to select among the
  candidate attributes at each step while growing the tree.
Information Gain

Information gain is based on information theory concept called Entropy

                                                     “Nothing in life is certain except death,
                                                            taxes and the second law of
                                                          thermodynamics. All three are
                                                           processes in which useful or
                                                       accessible forms of some quantity,
                                                          such as energy or money, are
                                                     transformed into useless, inaccessible
                                                     forms of the same quantity. That is not
                                                     to say that these three processes don’t
                                                        have fringe benefits: taxes pay for
Rudolf Julius Emanuel                                 roads and schools; the second law of
                           Claude Elwood
Clausius (January 2,                                       thermodynamics drives cars,
                           Shannon (April 30,
1822 – August 24, 1888),   1916 – February 24,       computers and metabolism; and death,
was a German physicist     2001), an American          at the very least, opens up tenured
and mathematician and      electrical engineer and               faculty positions”
is considered one of the   mathematician, has
central founders of the    been called quot;the father
                                                       Seth Lloyd, writing in Nature 430,
science of                 of information theoryquot;              971 (26 August 2004).
thermodynamics
Entropy

• In information theory, the Shannon entropy or
information entropy is a measure of the uncertainty
associated with a random variable.

• It quantifies the information contained in a
message, usually in bits or bits/symbol.

• It is the minimum message length necessary to
communicate information.
Why Shannon named his uncertainty
      function quot;entropy“ ?

                                                                                 John von
                                                                                 Neumann




 My greatest concern was what to call it. I thought of calling it 'information,' but the
 word was overly used, so I decided to call it 'uncertainty.' When I discussed it with
 John von Neumann, he had a better idea. Von Neumann told me, 'You should call
 it entropy, for two reasons. In the first place your uncertainty function has
 been used in statistical mechanics under that name, so it already has a name.
 In the second place, and more important, no one really knows what entropy
 really is, so in a debate you will always have the advantage.'
Shannon's mouse


Shannon and his famous
electromechanical mouse
Theseus, named after the Greek
mythology hero of Minotaur and
Labyrinth fame, and which he
tried to teach to come out of the
maze in one of the first
experiments in artificial
intelligence.
Entropy


The information entropy of a discrete random variable X, that can take on
possible values {x1...xn} is



where
   I(X) is the information content or self-information of X, which is itself a
   random variable; and
   p(xi) = Pr(X=xi) is the probability mass function of X.
Entropy in our Context

Given a collection S, containing positive and negative
examples of some target concept, the entropy of S relative to
this boolean classification (yes/no) is




 where       is the proportion of positive examples in S and pӨ, is the
 proportion of negative examples in S. In all calculations involving
 entropy we define 0 log 0 to be 0.
Example




There are 14 examples. 9 positive and 5 negative examples [9+, 5-].

The entropy of S relative to this boolean (yes/no) classification is
Information Gain Measure

 Information gain, is simply the expected reduction in entropy
 caused by partitioning the examples according to this attribute.

 More precisely, the information gain, Gain(S, A) of an attribute A,
 relative to a collection of examples S, is defined as




 where Values(A) is the set of all possible values for attribute A,
 and Sv, is the subset of S for which attribute A has value v, i.e.,
Information Gain Measure



                                                 Entropy of S after
                    Entropy of S
                                                     partition




Gain(S, A) is the expected reduction in entropy caused by knowing the value of
attribute A.

Gain(S, A) is the information provided about the target &action value, given the
value of some other attribute A. The value of Gain(S, A) is the number of bits
saved when encoding the target value of an arbitrary member of S, by knowing
the value of attribute A.
Example




There are 14 examples. 9 positive and 5 negative examples [9+, 5-].

The entropy of S relative to this boolean (yes/no) classification is
Gain (S, Attribute = Wind)
Gain (S,A)
Gain (SSunny,A)




Temperature           Humidity              Wind
(Hot) {0+, 2-)    (High) {0+, 3-}     (Weak) {1+, 2-}
(Mild) {1+, 1-}   (Normal) {2+, 0-}   (Strong) {1+, 1-}
(Cool) {1+, 0-}
Gain (SSunny,A)
                    Entropy(SSunny) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095


                         Entropy(Hot) = 0
Temperature
(Hot) {0+, 2-)           Entropy(Mild) = 1
(Mild) {1+, 1-}          Entropy(Cool) = 0
(Cool) {1+, 0-}          Gain(S1, Temperature) = 0.97095 – 2/5*0 – 2/5*1 – 1/5*0 = 0.57095



    Humidity            Entropy(High) = 0
(High) {0+, 3-}         Entropy(Normal) = 0
(Normal) {2+, 0-}       Gain(S1, Humidity) = 0.97095 – 3/5*0 – 2/5*0 = 0.97095


                         Entropy(Weak) = 0.9183
      Wind
(Weak) {1+, 2-}          Entropy(Normal) = 1.0
(Strong) {1+, 1-}        Gain(S1, Wind) = 0.97095 – 3/5*0.9183 – 2/5*1 = 0.01997
Modified Decision Tree
Gain (SRain,A)




Temperature
                     Humidity               Wind
(Hot) {0+, 0-)
                  (High) {1+, 1-}     (Weak) {3+, 0-}
(Mild) {2+, 1-}
                  (Normal) {2+, 1-}   (Strong) {0+, 2-}
(Cool) {1+, 1-}
Gain (SRain,A)
                    Entropy(SRain) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095


                         Entropy(Hot) = 0
Temperature
(Hot) {0+, 0-)           Entropy(Mild) = 0.1383
(Mild) {2+, 1-}          Entropy(Cool) = 1.0
(Cool) {1+, 1-}          Gain(S1, Temperature) = 0.97095 – 0 – 2/3*0.1383 - 2/5*1 = 0.4922



    Humidity            Entropy(High) = 1.0
(High) {1+, 1-}         Entropy(Normal) = 0.1383
(Normal) {2+, 1-}       Gain(S1, Humidity) = 0.97095 – 2/5*1.0 – 3/5*0.1383 = 0.4922


                         Entropy(Weak) = 0.0
      Wind
(Weak) {3+, 0-}          Entropy(Normal) = 0.0
(Strong) {0+, 2-}        Gain(S1, Humidity) = 0.97095 - 3/5*0 – 2/5*0 = 0.97095
Final Decision Tree
Home work
Home work
Home work
a1
(True) {2+, 1-}
(False) {1+, 2-}

Entropy(a1=True) = -{2/3log(2/3) + 1/3log(1/3)} = 0.9183
Entropy(a1=False) = 0.9183
Gain (S, a1) = 1 – 3/6*0.9183 – 3/6*0.9183 = 0.0817        S {3+, 3-} => Entropy(S) = 1




a2                 Entropy(a2=True) = 1.0
(True) {2+, 2-}    Entropy(a1=False) = 1.0
(False) {1+, 1-}   Gain (S, a1) = 1 – 4/6*1 -2/6*1 = 0.0
Home work

               a1


      True              False




[D1, D2, D3]
                    [D4, D5, D6]
Home work

                             a1


            True                           False



      [D1, D2, D3]
                                      [D4, D5, D6]
          a2
                                           a2
  True               False
                                  True               False

+ (Yes)
               - (No)             - (No)
                                                     + (Yes)
Home work
                            a1


           True                           False


     [D1, D2, D3]
                                    [D4, D5, D6]
          a2
                                          a2
  True              False
                                 True              False

+ (Yes)
               - (No)            - (No)
                                                   + (Yes)


           (a1^a2) V (a1' ^ a2')
Some Insights into Capabilities and
       Limitations of ID3 Algorithm
•   ID3’s algorithm searches complete hypothesis space. [Advantage]

•   ID3 maintain only a single current hypothesis as it searches through
    the space of decision trees. By determining only as single
    hypothesis, ID3 loses the capabilities that follows explicitly
    representing all consistent hypothesis. [Disadvantage]

•   ID3 in its pure form performs no backtracking in its search. Once it
    selects an attribute to test at a particular level in the tree, it never
    backtracks to reconsider this choice. Therefore, it is susceptible to
    the usual risks of hill-climbing search without backtracking:
    converging to locally optimal solutions that are not globally optimal.
    [Disadvantage]
Some Insights into Capabilities and
       Limitations of ID3 Algorithm

•   ID3 uses all training examples at each step in the search to make
    statistically based decisions regarding how to refine its current
    hypothesis. This contrasts with methods that make decisions
    incrementally, based on individual training examples (e.g., FIND-S
    or CANDIDATE-ELIMINATION). One advantage of using statistical
    properties of all the examples (e.g., information gain) is that the
    resulting search is much less sensitive to errors in individual training
    examples. [Advantage]

Mais conteúdo relacionado

Mais procurados

Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
Jithin Parakka
 

Mais procurados (20)

Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Fuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoningFuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoning
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
 
predicate logic example
predicate logic examplepredicate logic example
predicate logic example
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Eucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebulaEucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebula
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
 
Problem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.pptProblem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.ppt
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 
Principle source of optimazation
Principle source of optimazationPrinciple source of optimazation
Principle source of optimazation
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOILLearning sets of rules, Sequential Learning Algorithm,FOIL
Learning sets of rules, Sequential Learning Algorithm,FOIL
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Fuzzy Membership Function
Fuzzy Membership Function Fuzzy Membership Function
Fuzzy Membership Function
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
 

Destaque

Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
Srinivasan R
 

Destaque (7)

Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Candidate elimination example
Candidate elimination exampleCandidate elimination example
Candidate elimination example
 
Machine learning Lecture 4
Machine learning Lecture 4Machine learning Lecture 4
Machine learning Lecture 4
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 

Semelhante a Machine learning Lecture 2

Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
Yamagata Yoriyuki
 
concept-learning of artificial intelligence
concept-learning of artificial intelligenceconcept-learning of artificial intelligence
concept-learning of artificial intelligence
ssuser01fa1b
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
butest
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 

Semelhante a Machine learning Lecture 2 (20)

Candidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabCandidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML Lab
 
ML02.ppt
ML02.pptML02.ppt
ML02.ppt
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
 
Supervised_Learning.ppt
Supervised_Learning.pptSupervised_Learning.ppt
Supervised_Learning.ppt
 
concept-learning of artificial intelligence
concept-learning of artificial intelligenceconcept-learning of artificial intelligence
concept-learning of artificial intelligence
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processes
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slide
 
2-Heuristic Search.ppt
2-Heuristic Search.ppt2-Heuristic Search.ppt
2-Heuristic Search.ppt
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...
 
Hprec6 4
Hprec6 4Hprec6 4
Hprec6 4
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Machine learning Lecture 2

  • 1. Lecture No. 2 Ravi Gupta AU-KBC Research Centre, MIT Campus, Anna University Date: 8.3.2008
  • 2. Today’s Agenda • Recap (FIND-S Algorithm) • Version Space • Candidate-Elimination Algorithm • Decision Tree • ID3 Algorithm • Entropy
  • 3. Concept Learning as Search Concept learning can be viewed as the task of searching through a large space of hypothesis implicitly defined by the hypothesis representation. The goal of the concept learning search is to find the hypothesis that best fits the training examples.
  • 4. General-to-Specific Learning Every day Tom his enjoy i.e., Only positive examples. Most General Hypothesis: h = <?, ?, ?, ?, ?, ?> Most Specific Hypothesis: h = < Ø, Ø, Ø, Ø, Ø, Ø>
  • 5. General-to-Specific Learning h2 is more general than h1 h2 imposes fewer constraints on the instance than h1
  • 6. Definition Given hypotheses hj and hk, hj is more_general_than_or_equal_to hk if and only if any instance that satisfies hk also satisfies hj. We can also say that hj is more_specific_than hk when hk is more_general_than hj.
  • 7. FIND-S: Finding a Maximally Specific Hypothesis
  • 8. Step 1: FIND-S h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
  • 9. Step 2: FIND-S h0 = <Ø, Ø, Ø, Ø, Ø, Ø> a1 a2 a3 a4 a5 a6 x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
  • 10. h1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 2 x2 = <Sunny, Warm, High, Strong, Warm, Same> h2 = <Sunny, Warm, ?, Strong, Warm, Same>
  • 11. Iteration 3 Ignore h3 = <Sunny, Warm, ?, Strong, Warm, Same>
  • 12. h3 = < Sunny, Warm, ?, Strong, Warm, Same > Iteration 4 x4 = < Sunny, Warm, High, Strong, Cool, Change > Step 3 Output h4 = <Sunny, Warm, ?, Strong, ?, ?>
  • 13. Unanswered Questions by FIND-S • Has the learner converged to the correct target concept? • Why prefer the most specific hypothesis? • What if the training examples consistent?
  • 14. Version Space The set of all valid hypotheses provided by an algorithm is called version space (VS) with respect to the hypothesis space H and the given example set D.
  • 15. Candidate-Elimination Algorithm The Candidate-Elimination algorithm finds all describable hypotheses that are consistent with the observed training examples Hypothesis is derived from examples regardless of whether x is positive or negative example
  • 16. Candidate-Elimination Algorithm Earlier (i.e., FIND-S) Def.
  • 17. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space
  • 18. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space Examples Hypothesis Space . Version Space . . . . VSH,D . H D
  • 19. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space • In principle, the LIST-THEN-ELIMINATE algorithm can be applied whenever the hypothesis space H is finite. • It is guaranteed to output all hypotheses consistent with the training data. • Unfortunately, it requires exhaustively enumerating all hypotheses in H-an unrealistic requirement for all but the most trivial hypothesis spaces.
  • 20. Candidate-Elimination Algorithm • The CANDIDATE-ELIMINATION algorithm works on the same principle as the above LIST-THEN-ELIMINATE algorithm. • It employs a much more compact representation of the version space. • In this the version space is represented by its most general and least general members (Specific). • These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.
  • 21. Least General (Specific) Most General
  • 23. Example G0 ← {<?, ?, ?, ?, ?, ?>} Initialization S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}
  • 24. G0 ← {<?, ?, ?, ?, ?, ?>} S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >} x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 G1 ← {<?, ?, ?, ?, ?, ?>} S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} x2 = <Sunny, Warm, High, Strong, Warm, Same> Iteration 2 G2 ← {<?, ?, ?, ?, ?, ?>} S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}
  • 25. G2 ← {<?, ?, ?, ?, ?, ?>} S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >} consistent x3 = <Rainy, Cold, High, Strong, Warm, Change> Iteration 3 S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>} G2 ← {<?, ?, ?, ?, ?, ?>}
  • 26. S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>} x4 = <Sunny, Warm, high, Strong, Cool, Change> Iteration 4 S4 ← {< Sunny, Warm, ?, Strong, ?, ? >} G4 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}
  • 27.
  • 28.
  • 29. Remarks on Version Spaces and Candidate-Elimination The version space learned by the CANDIDATE-ELIMINATION algorithm will converge toward the hypothesis that correctly describes the target concept, provided (1) there are no errors in the training examples, and (2) there is some hypothesis in H that correctly describes the target concept.
  • 30. What will Happen if the Training Contains errors ? No
  • 31. G0 ← {<?, ?, ?, ?, ?, ?>} S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >} x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 G1 ← {<?, ?, ?, ?, ?, ?>} S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} x2 = <Sunny, Warm, High, Strong, Warm, Same> Iteration 2 G2 ← {<?, ?, Normal, ?, ?, ?>} S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}
  • 32. G2 ← {<?, ?, Normal, ?, ?, ?>} S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} consistent x3 = <Rainy, Cold, High, Strong, Warm, Change> Iteration 3 S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} G3 ← {<?, ?, Normal, ?, ?, ?>}
  • 33. S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} G3 ← {<?, ?, Normal, ?, ?, ?>} x4 = <Sunny, Warm, high, Strong, Cool, Change> Iteration 4 S4 ← { } Empty G4 ← { } G3 ← {<?, ?, Normal, ?, ?, ?>}
  • 34. What will Happen if Hypothesis is not Present ?
  • 35. Remarks on Version Spaces and Candidate-Elimination The target concept is exactly learned when the S and G boundary sets converge to a single, identical, hypothesis.
  • 36. Remarks on Version Spaces and Candidate-Elimination How Can Partially Learned Concepts Be Used? Suppose that no additional training examples are available beyond the four in our example. And the learner is now required to classify new instances that it has not yet observed. The target concept is exactly learned when the S and G boundary sets converge to a single, identical, hypothesis.
  • 37. Remarks on Version Spaces and Candidate-Elimination
  • 38. Remarks on Version Spaces and Candidate-Elimination All six hypotheses satisfied All six hypotheses satisfied
  • 39. Remarks on Version Spaces and Candidate-Elimination Three hypotheses satisfied Three hypotheses not satisfied Two hypotheses satisfied Four hypotheses not satisfied
  • 40. Remarks on Version Spaces and Candidate-Elimination Yes No
  • 42. Decision Trees • Decision tree learning is a method for approximating discrete value target functions, in which the learned function is represented by a decision tree. • Decision trees can also be represented by if-then-else rule. • Decision tree learning is one of the most widely used approach for inductive inference .
  • 43. Decision Trees An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node.
  • 44. Decision Trees <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong> PlayTennis = No
  • 45. Decision Trees Edges: Attribute value Intermediate Nodes: Attributes Attribute: A1 Attribute Attribute value Attribute value value Attribute: A2 Output Attribute: A3 value Attribute Attribute Attribute Attribute value value value value Output Output Output Output value value value value Leave node: Output value
  • 46. Decision Trees conjunction disjunction Decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself to a disjunction of these conjunctions.
  • 48. Decision Trees (F = A ^ B') F = A ^ B‘ If (A=True and B = False) then Yes else No If then else form A False True No B False True Yes No
  • 49. Decision Trees (F = A V (B ^ C)) If (A=True) then Yes else if (B = True and C=True) then Yes If then else form else No A True False Yes B False True No C False True No Yes
  • 50. Decision Trees (F = A XOR B) F = (A ^ B') V (A' ^ B) If (A=True and B = False) then Yes If then else form else If (A=False and B = False) then Yes else No A False True B B False False True True No Yes No Yes
  • 51. Decision Trees as If-then-else rule conjunction disjunction If (Outlook = Sunny AND humidity = Normal) then PlayTennis = Yes If (Outlook = Overcast) then PlayTennis = Yes If (Outlook = Rain AND Wind = Weak) then PlayTennis = Yes
  • 52. Problems Suitable for Decision Trees • Instances are represented by attribute-value pairs Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). However, extensions to the basic algorithm allow handling real- valued attributes as well (e.g., representing Temperature numerically). • The target function has discrete output values • Disjunctive descriptions may be required • The training data may contain errors • The training data may contain missing attribute values
  • 53. Basic Decision Tree Learning Algorithm • ID3 Algorithm (Quinlan 1986) and it’s successors C4.5 and C5.0 • Employs a top-down An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node. • Greedy search the space of possible http://www.rulequest.com/Personal/ decision trees. The algorithm never backtracks to reconsider earlier choices.
  • 56. Attributes… Attributes are Outlook, Temperature, Humidity, Wind
  • 58. Building Decision Tree Attribute: A1 Attribute value Attribute value Attribute value Output value Attribute: A2 Attribute: A3 Attribute value Attribute value Attribute value Attribute value Output value Output value Output value Output value
  • 59. Building Decision Tree Outlook Temperature Which attribute to select ????? Humidity Wind Root node
  • 60. Which Attribute to Select ?? • We would like to select the attribute that is most useful for classifying examples. • What is a good quantitative measure of the worth of an attribute? ID3 uses this information gain measure to select among the candidate attributes at each step while growing the tree.
  • 61. Information Gain Information gain is based on information theory concept called Entropy “Nothing in life is certain except death, taxes and the second law of thermodynamics. All three are processes in which useful or accessible forms of some quantity, such as energy or money, are transformed into useless, inaccessible forms of the same quantity. That is not to say that these three processes don’t have fringe benefits: taxes pay for Rudolf Julius Emanuel roads and schools; the second law of Claude Elwood Clausius (January 2, thermodynamics drives cars, Shannon (April 30, 1822 – August 24, 1888), 1916 – February 24, computers and metabolism; and death, was a German physicist 2001), an American at the very least, opens up tenured and mathematician and electrical engineer and faculty positions” is considered one of the mathematician, has central founders of the been called quot;the father Seth Lloyd, writing in Nature 430, science of of information theoryquot; 971 (26 August 2004). thermodynamics
  • 62. Entropy • In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. • It quantifies the information contained in a message, usually in bits or bits/symbol. • It is the minimum message length necessary to communicate information.
  • 63. Why Shannon named his uncertainty function quot;entropy“ ? John von Neumann My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'
  • 64. Shannon's mouse Shannon and his famous electromechanical mouse Theseus, named after the Greek mythology hero of Minotaur and Labyrinth fame, and which he tried to teach to come out of the maze in one of the first experiments in artificial intelligence.
  • 65. Entropy The information entropy of a discrete random variable X, that can take on possible values {x1...xn} is where I(X) is the information content or self-information of X, which is itself a random variable; and p(xi) = Pr(X=xi) is the probability mass function of X.
  • 66. Entropy in our Context Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this boolean classification (yes/no) is where is the proportion of positive examples in S and pӨ, is the proportion of negative examples in S. In all calculations involving entropy we define 0 log 0 to be 0.
  • 67. Example There are 14 examples. 9 positive and 5 negative examples [9+, 5-]. The entropy of S relative to this boolean (yes/no) classification is
  • 68. Information Gain Measure Information gain, is simply the expected reduction in entropy caused by partitioning the examples according to this attribute. More precisely, the information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as where Values(A) is the set of all possible values for attribute A, and Sv, is the subset of S for which attribute A has value v, i.e.,
  • 69. Information Gain Measure Entropy of S after Entropy of S partition Gain(S, A) is the expected reduction in entropy caused by knowing the value of attribute A. Gain(S, A) is the information provided about the target &action value, given the value of some other attribute A. The value of Gain(S, A) is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A.
  • 70. Example There are 14 examples. 9 positive and 5 negative examples [9+, 5-]. The entropy of S relative to this boolean (yes/no) classification is
  • 73. Gain (SSunny,A) Temperature Humidity Wind (Hot) {0+, 2-) (High) {0+, 3-} (Weak) {1+, 2-} (Mild) {1+, 1-} (Normal) {2+, 0-} (Strong) {1+, 1-} (Cool) {1+, 0-}
  • 74. Gain (SSunny,A) Entropy(SSunny) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095 Entropy(Hot) = 0 Temperature (Hot) {0+, 2-) Entropy(Mild) = 1 (Mild) {1+, 1-} Entropy(Cool) = 0 (Cool) {1+, 0-} Gain(S1, Temperature) = 0.97095 – 2/5*0 – 2/5*1 – 1/5*0 = 0.57095 Humidity Entropy(High) = 0 (High) {0+, 3-} Entropy(Normal) = 0 (Normal) {2+, 0-} Gain(S1, Humidity) = 0.97095 – 3/5*0 – 2/5*0 = 0.97095 Entropy(Weak) = 0.9183 Wind (Weak) {1+, 2-} Entropy(Normal) = 1.0 (Strong) {1+, 1-} Gain(S1, Wind) = 0.97095 – 3/5*0.9183 – 2/5*1 = 0.01997
  • 76. Gain (SRain,A) Temperature Humidity Wind (Hot) {0+, 0-) (High) {1+, 1-} (Weak) {3+, 0-} (Mild) {2+, 1-} (Normal) {2+, 1-} (Strong) {0+, 2-} (Cool) {1+, 1-}
  • 77. Gain (SRain,A) Entropy(SRain) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095 Entropy(Hot) = 0 Temperature (Hot) {0+, 0-) Entropy(Mild) = 0.1383 (Mild) {2+, 1-} Entropy(Cool) = 1.0 (Cool) {1+, 1-} Gain(S1, Temperature) = 0.97095 – 0 – 2/3*0.1383 - 2/5*1 = 0.4922 Humidity Entropy(High) = 1.0 (High) {1+, 1-} Entropy(Normal) = 0.1383 (Normal) {2+, 1-} Gain(S1, Humidity) = 0.97095 – 2/5*1.0 – 3/5*0.1383 = 0.4922 Entropy(Weak) = 0.0 Wind (Weak) {3+, 0-} Entropy(Normal) = 0.0 (Strong) {0+, 2-} Gain(S1, Humidity) = 0.97095 - 3/5*0 – 2/5*0 = 0.97095
  • 81. Home work a1 (True) {2+, 1-} (False) {1+, 2-} Entropy(a1=True) = -{2/3log(2/3) + 1/3log(1/3)} = 0.9183 Entropy(a1=False) = 0.9183 Gain (S, a1) = 1 – 3/6*0.9183 – 3/6*0.9183 = 0.0817 S {3+, 3-} => Entropy(S) = 1 a2 Entropy(a2=True) = 1.0 (True) {2+, 2-} Entropy(a1=False) = 1.0 (False) {1+, 1-} Gain (S, a1) = 1 – 4/6*1 -2/6*1 = 0.0
  • 82. Home work a1 True False [D1, D2, D3] [D4, D5, D6]
  • 83. Home work a1 True False [D1, D2, D3] [D4, D5, D6] a2 a2 True False True False + (Yes) - (No) - (No) + (Yes)
  • 84. Home work a1 True False [D1, D2, D3] [D4, D5, D6] a2 a2 True False True False + (Yes) - (No) - (No) + (Yes) (a1^a2) V (a1' ^ a2')
  • 85. Some Insights into Capabilities and Limitations of ID3 Algorithm • ID3’s algorithm searches complete hypothesis space. [Advantage] • ID3 maintain only a single current hypothesis as it searches through the space of decision trees. By determining only as single hypothesis, ID3 loses the capabilities that follows explicitly representing all consistent hypothesis. [Disadvantage] • ID3 in its pure form performs no backtracking in its search. Once it selects an attribute to test at a particular level in the tree, it never backtracks to reconsider this choice. Therefore, it is susceptible to the usual risks of hill-climbing search without backtracking: converging to locally optimal solutions that are not globally optimal. [Disadvantage]
  • 86. Some Insights into Capabilities and Limitations of ID3 Algorithm • ID3 uses all training examples at each step in the search to make statistically based decisions regarding how to refine its current hypothesis. This contrasts with methods that make decisions incrementally, based on individual training examples (e.g., FIND-S or CANDIDATE-ELIMINATION). One advantage of using statistical properties of all the examples (e.g., information gain) is that the resulting search is much less sensitive to errors in individual training examples. [Advantage]