SlideShare uma empresa Scribd logo
1 de 22
COMP-4640: Intelligent & Interactive Systems
                                 (Learning from Observations)


- In the area of AI (earlier) machine learning took a back seat to Expert Systems

- Expert system development usually consists of an expert and a knowledge engineer
  (KE). The KE extracts rules from the expert and uses them to create an expert
  program.

- The bottleneck of expert system development is the expert-KE interaction.

- It would be better if we could automate the knowledge engineering process or better
  yet allow systems to learn to be experts by example.

-   According to Herbert Simon, learning is, “Any change in a System that allows it to
    perform better the second time on repetition of the same task or on another task drawn
    from the same population.” [G. F. Luger and W. A. Stubblefield, Artificial
    Intelligence: Structures and Strategies for Complex Problem Solving, The
    Benjamin/Cummings Publishing Company, Inc. 1989.]




                                                                                         1
COMP-4640: Intelligent & Interactive Systems
                                  (Learning from Observations)


- Machine learning algorithms have been used in:

    1. speech recognition

    2. drive automobiles

    3. play world-class backgammon

    4. program generation

    5. routing in communication networks

    6. understanding handwritten text

    7. data mining

    8. etc.


                                                                           2
COMP-4640: Intelligent & Interactive Systems
                               (Learning from Observations)


- When solving a machine learning problem we must be sure to identify:

    1. What task is to be learned?

    2. How do we (will we) test the performance of our system?

    3. What knowledge do we want to learn?

    4. How do we represent this knowledge?

    5. What learning paradigm would be best to use?

    6. How do we construct a training experience for our learner?




                                                                         3
COMP-4640: Intelligent & Interactive Systems
                                (Learning from Observations)


Training Experience


- The way one decides to train a learner is important.

- An appropriately designed training experience will allow the learner to generalize well
  on new, previously unseen examples

- One can view training as “practice” for real-world situations




                                                                                        4
COMP-4640: Intelligent & Interactive Systems
                               (Learning from Observations)


Types of Training Experience

- Direct (Supervised)

- Indirect (Reinforcement)

- Unsupervised



Control of Training Experience

- Teacher Controlled

- Learner Controlled




                                                                        5
COMP-4640: Intelligent & Interactive Systems
                                 (Learning from Observations)


Distribution of Training Examples

- It is important that the set of training examples given to the learner represents the type
  of examples that will be given to the learner during the performance measurement.

- If the distribution of the training examples does not represent the test elements then the
  learner will generalize poorly.




Machine Learning as Search

- Machine Learning can be viewed as search through hypothesis-space.

- A hypothesis can be viewed as a mapping function from percepts to desired output.


                                                                                               6
COMP-4640: Intelligent & Interactive Systems
                               (Learning from Observations)


The Four Components of Learning Systems

1. The Performance System

  - solves the problem at hand using the learned hypothesis (learned approximation of
    the target function) constructed by the learning element

  - takes unseen percepts as inputs and provides the outputs (again based on the learned
    hypothesis)


2. The Critic

  - informs the learner of how well the agent is doing

  - provides training examples to the learning element




                                                                                       7
COMP-4640: Intelligent & Interactive Systems
                                 (Learning from Observations)



The Four Components of Learning Systems (cont.)

3. The Learning Element (Generalizer)

  - takes training examples as input

  - outputs a hypothesis


4. The Experiment Generator

  - takes as input a hypothesis

  - outputs a new problem for the performance system to explore




                                                                          8
9
COMP-4640: Intelligent & Interactive Systems
                                  (Learning from Observations)
Learning Decision Trees
- Decision tree induction is a simple but powerful learning paradigm. In this method a
  set of training examples is broken down into smaller and smaller subsets while at the
  same time an associated decision tree get incrementally developed. At the end of the
  learning process, a decision tree covering the training set is returned.

-   The decision tree can be thought of as a set sentences (in Disjunctive Normal Form)
    written propositional logic.

- Some characteristics of problems that are well suited to Decision Tree Learning are:

      a. Attribute-value paired elements

      b. Discrete target function

      c. Disjunctive descriptions (of target function)

      d. Works well with missing or erroneous training data



                                                                                         10
COMP-4640: Intelligent & Interactive Systems
                               (Learning from Observations)




(Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook = Overcast) ∨ (Outlook = Rain ∧ Wind = Weak)

             [picture from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]



                                                                                                11
COMP-4640: Intelligent & Interactive Systems
                                 (Learning from Observations)

This decision tree was created using the following training examples:

         Day    Outlook      Temperature Humidity             Wind         PlayTennis
         D1      Sunny           Hot       High               Weak             No
         D2      Sunny           Hot       High               Strong           No
         D3     Overcast         Hot       High               Weak            Yes
         D4       Rain          Mild       High               Weak            Yes
         D5       Rain          Cool      Normal              Weak            Yes
         D6       Rain          Cool      Normal              Strong           No
         D7     Overcast        Cool      Normal              Strong          Yes
         D8      Sunny          Mild       High               Weak             No
         D9      Sunny          Cool      Normal              Weak            Yes
         D10      Rain          Mild      Normal              Weak            Yes
         D11     Sunny          Mild      Normal              Strong          Yes
         D12    Overcast        Mild       High               Strong          Yes
         D13    Overcast         Hot     Normal               Weak            Yes
         D14      Rain          Mild       High               Strong           No
                [Table from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]



                                                                                        12
COMP-4640: Intelligent & Interactive Systems
                                    (Learning from Observations)

Building a Decision Tree
In building a decision tree we:

1.   first test all attributes and select the one that would function as the best root;

2. break-up the training set into subsets based on the branches of the root node;

3. test the remaining attributes to see which ones fit best underneath the branches of the
   root node;

4. continue this process for all other branches until

       a. all examples of a subset are of one type

       b. there are no examples left (return majority classification of the parent)

       c. there are no more attributes left (default value should be majority classification)



                                                                                                13
COMP-4640: Intelligent & Interactive Systems
                                     (Learning from Observations)

Determining which attribute is best (Entropy & Gain)

- Entropy (E) is the minimum number of bits needed in order to classify an arbitrary
  example as yes or no

-   E(S) = Σci=1 –pi log2 pi    Where S is a set of training examples

-   For our entropy equation 0 log2 0 = 0

- The information gain G(S,A) where A is an attribute

-   G(S,A) ≡ E(S) - Σv in Values(A) (|Sv| / |S|) * E(Sv)




                                                                                  14
COMP-6600/7600: Artificial & Computational Intelligence
                                 (Learning from Observations)



Let Try an Example!




                                                                                15
Overfitting
In decision tree learning (given a large number of training examples) it is possible to
“overfit” the training set. … Finding meaningless regularity in the data. p662 Overfitting
occurs with the target function is not at all random.

At the onset of overfitting, the accuracy of the “growing” tree continues to increase (on
the training set) but in actuality the tree (as successive trees) begin to lose their ability to
generalize beyond the training set.
Overfitting can be eliminated or reduced by using:
     - Pruning
       Preventing recursive splitting on attributes that are clearly not relevant even when
       the data at that node in the tree is not uniformly classified.
     - Cross-Validation
       Try to estimate how well the current hypothesis will predict unseen data.
       This is done by setting aside some fraction of the known data and using it to test
       the prediction performance of a hypothesis induced from the remaining data
       k-fold cross validation means run k experiments each time setting aside a different
       1/k of the data to test on, and average the results



                                                                                              16
COMP-4640: Intelligent & Interactive Systems
                                (Learning from Observations)



Decision Tree learning can be extended to deal with:

    - Missing Data
      In many domains not all attribute values will be known

    - Attributes with Differing Costs
      We could add an attribute Restaurant Name, with different value for every
      example

    - Continuous-valued Attributes
      If we are trying to predict a numerical value such as the price of a work of art,
      rather than a discrete classification, then we need a regressions tree.
      Each leaf has a subset of numerical attributes, rather than a single value.




                                                                                     17
COMP-4640: Intelligent & Interactive Systems
                                 (Learning from Observations)

Concept Learning

- In concept learning, the objective is to infer a boolean valued function that is a
  conjunction of the attribute value assigns of the training examples.

- In concept learning, n+2 values can be assigned to an attribute


          1. The n values of the attribute.

          2. The value “?” which means any value assignment is accepted for the
             corresponding attribute.

          3. The value “∅” which means no value assignment is acceptable




                                                                                  18
COMP-4640: Intelligent & Interactive Systems
                                   (Learning from Observations)

Consider the following set of training examples:

   height         attitude            KOG              scoring          defensive   Great Player
                                                       ability           ability
     tall          good               good              high              good          yes
    short           bad               good               low              poor          no
     tall          good               good               low              good          yes
     tall          good               poor              high              poor          no
    short          good               poor               low              good          yes


One hypothesis that works is:

                                      <?,good,?,?,good>

Are there any other hypothesis that will work? Are they more or less general?




                                                                                               19
COMP-4640: Intelligent & Interactive Systems
                                (Learning from Observations)

The Find-S Algorithm

Step1:   Initialize S to the most specific hypothesis

Step2:   For each positive training example x
              For each value ai assigned to an attribute
                   If ai is different than the corresponding value in x Then
                          Replace ai with next most general value




                                                                               20
COMP-4640: Intelligent & Interactive Systems
                                (Learning from Observations)

The Find-G Algorithm

Step1: Initialize G to the most general hypothesis in the hypothesis-space

Step2: If x is negative Then
           a. For each g in G that matches x, replace g with its most general
               specializations that do not match x
           b. Remove all g in G that are more specific than any other hypothesis in G
           c. Remove all g in G that fail to match a positive example in the training set
           d. Remove any hypothesis in G that matches a previously seen negative
               example
    Else
           Remove from G all hypotheses that do not match x




                                                                                       21
COMP-4640: Intelligent & Interactive Systems
                                     (Learning from Observations)

Candidate Elimination Algorithm

Step 1: Initialize G to the set of most general hypotheses
         Initialize S to the set of most specific hypotheses

Step 2: For each training example, x,
           If x is positive Then
                  - Remove from G those hypotheses that do not match x
                  - Add all generalizations so that each s in S is consistent with x, and a member of
                     G that is more general than s (the original s is removed)
                  - Only keep the most specific hypothesis in S
           Else
                  - Remove any hypotheses in S that matches x
                  - For each g in G that matches x,
                        a. Remove g
                        b. replace g with its most general specializations that do not match x and
                           some member of S that is more specific
                        c. Remove all g in G that are more specific than any other hypothesis in G




                                                                                                   22

Mais conteúdo relacionado

Semelhante a chapter18.doc.doc

Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics coursegideymichael
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.pptbutest
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxYousef Aburawi
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLBryan Bischof
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
AI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialAI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialTariq King
 
From programming to software engineering: ICSE keynote slides available
From programming to software engineering: ICSE keynote slides availableFrom programming to software engineering: ICSE keynote slides available
From programming to software engineering: ICSE keynote slides availableCelso Martins
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxRohanPathak30
 
Ron's muri presentation
Ron's muri presentationRon's muri presentation
Ron's muri presentationgowinraj
 
Face Recognition - Deep Learning
Face Recognition - Deep LearningFace Recognition - Deep Learning
Face Recognition - Deep LearningAashish Chaubey
 

Semelhante a chapter18.doc.doc (20)

Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics course
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in ML
 
Chaptr 7 (final)
Chaptr 7 (final)Chaptr 7 (final)
Chaptr 7 (final)
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
AI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World TutorialAI and ML Skills for the Testing World Tutorial
AI and ML Skills for the Testing World Tutorial
 
21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf
 
From programming to software engineering: ICSE keynote slides available
From programming to software engineering: ICSE keynote slides availableFrom programming to software engineering: ICSE keynote slides available
From programming to software engineering: ICSE keynote slides available
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptx
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
AI Lesson 33
AI Lesson 33AI Lesson 33
AI Lesson 33
 
Lesson 33
Lesson 33Lesson 33
Lesson 33
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Ron's muri presentation
Ron's muri presentationRon's muri presentation
Ron's muri presentation
 
Face Recognition - Deep Learning
Face Recognition - Deep LearningFace Recognition - Deep Learning
Face Recognition - Deep Learning
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.docbutest
 
Resume
ResumeResume
Resumebutest
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contractbutest
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Formbutest
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.docbutest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
 
Resume
ResumeResume
Resume
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contract
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Form
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.doc
 

chapter18.doc.doc

  • 1. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) - In the area of AI (earlier) machine learning took a back seat to Expert Systems - Expert system development usually consists of an expert and a knowledge engineer (KE). The KE extracts rules from the expert and uses them to create an expert program. - The bottleneck of expert system development is the expert-KE interaction. - It would be better if we could automate the knowledge engineering process or better yet allow systems to learn to be experts by example. - According to Herbert Simon, learning is, “Any change in a System that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” [G. F. Luger and W. A. Stubblefield, Artificial Intelligence: Structures and Strategies for Complex Problem Solving, The Benjamin/Cummings Publishing Company, Inc. 1989.] 1
  • 2. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) - Machine learning algorithms have been used in: 1. speech recognition 2. drive automobiles 3. play world-class backgammon 4. program generation 5. routing in communication networks 6. understanding handwritten text 7. data mining 8. etc. 2
  • 3. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) - When solving a machine learning problem we must be sure to identify: 1. What task is to be learned? 2. How do we (will we) test the performance of our system? 3. What knowledge do we want to learn? 4. How do we represent this knowledge? 5. What learning paradigm would be best to use? 6. How do we construct a training experience for our learner? 3
  • 4. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Training Experience - The way one decides to train a learner is important. - An appropriately designed training experience will allow the learner to generalize well on new, previously unseen examples - One can view training as “practice” for real-world situations 4
  • 5. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Types of Training Experience - Direct (Supervised) - Indirect (Reinforcement) - Unsupervised Control of Training Experience - Teacher Controlled - Learner Controlled 5
  • 6. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Distribution of Training Examples - It is important that the set of training examples given to the learner represents the type of examples that will be given to the learner during the performance measurement. - If the distribution of the training examples does not represent the test elements then the learner will generalize poorly. Machine Learning as Search - Machine Learning can be viewed as search through hypothesis-space. - A hypothesis can be viewed as a mapping function from percepts to desired output. 6
  • 7. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) The Four Components of Learning Systems 1. The Performance System - solves the problem at hand using the learned hypothesis (learned approximation of the target function) constructed by the learning element - takes unseen percepts as inputs and provides the outputs (again based on the learned hypothesis) 2. The Critic - informs the learner of how well the agent is doing - provides training examples to the learning element 7
  • 8. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) The Four Components of Learning Systems (cont.) 3. The Learning Element (Generalizer) - takes training examples as input - outputs a hypothesis 4. The Experiment Generator - takes as input a hypothesis - outputs a new problem for the performance system to explore 8
  • 9. 9
  • 10. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Learning Decision Trees - Decision tree induction is a simple but powerful learning paradigm. In this method a set of training examples is broken down into smaller and smaller subsets while at the same time an associated decision tree get incrementally developed. At the end of the learning process, a decision tree covering the training set is returned. - The decision tree can be thought of as a set sentences (in Disjunctive Normal Form) written propositional logic. - Some characteristics of problems that are well suited to Decision Tree Learning are: a. Attribute-value paired elements b. Discrete target function c. Disjunctive descriptions (of target function) d. Works well with missing or erroneous training data 10
  • 11. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) (Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook = Overcast) ∨ (Outlook = Rain ∧ Wind = Weak) [picture from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997] 11
  • 12. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) This decision tree was created using the following training examples: Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No [Table from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997] 12
  • 13. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Building a Decision Tree In building a decision tree we: 1. first test all attributes and select the one that would function as the best root; 2. break-up the training set into subsets based on the branches of the root node; 3. test the remaining attributes to see which ones fit best underneath the branches of the root node; 4. continue this process for all other branches until a. all examples of a subset are of one type b. there are no examples left (return majority classification of the parent) c. there are no more attributes left (default value should be majority classification) 13
  • 14. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Determining which attribute is best (Entropy & Gain) - Entropy (E) is the minimum number of bits needed in order to classify an arbitrary example as yes or no - E(S) = Σci=1 –pi log2 pi Where S is a set of training examples - For our entropy equation 0 log2 0 = 0 - The information gain G(S,A) where A is an attribute - G(S,A) ≡ E(S) - Σv in Values(A) (|Sv| / |S|) * E(Sv) 14
  • 15. COMP-6600/7600: Artificial & Computational Intelligence (Learning from Observations) Let Try an Example! 15
  • 16. Overfitting In decision tree learning (given a large number of training examples) it is possible to “overfit” the training set. … Finding meaningless regularity in the data. p662 Overfitting occurs with the target function is not at all random. At the onset of overfitting, the accuracy of the “growing” tree continues to increase (on the training set) but in actuality the tree (as successive trees) begin to lose their ability to generalize beyond the training set. Overfitting can be eliminated or reduced by using: - Pruning Preventing recursive splitting on attributes that are clearly not relevant even when the data at that node in the tree is not uniformly classified. - Cross-Validation Try to estimate how well the current hypothesis will predict unseen data. This is done by setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data k-fold cross validation means run k experiments each time setting aside a different 1/k of the data to test on, and average the results 16
  • 17. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Decision Tree learning can be extended to deal with: - Missing Data In many domains not all attribute values will be known - Attributes with Differing Costs We could add an attribute Restaurant Name, with different value for every example - Continuous-valued Attributes If we are trying to predict a numerical value such as the price of a work of art, rather than a discrete classification, then we need a regressions tree. Each leaf has a subset of numerical attributes, rather than a single value. 17
  • 18. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Concept Learning - In concept learning, the objective is to infer a boolean valued function that is a conjunction of the attribute value assigns of the training examples. - In concept learning, n+2 values can be assigned to an attribute 1. The n values of the attribute. 2. The value “?” which means any value assignment is accepted for the corresponding attribute. 3. The value “∅” which means no value assignment is acceptable 18
  • 19. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Consider the following set of training examples: height attitude KOG scoring defensive Great Player ability ability tall good good high good yes short bad good low poor no tall good good low good yes tall good poor high poor no short good poor low good yes One hypothesis that works is: <?,good,?,?,good> Are there any other hypothesis that will work? Are they more or less general? 19
  • 20. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) The Find-S Algorithm Step1: Initialize S to the most specific hypothesis Step2: For each positive training example x For each value ai assigned to an attribute If ai is different than the corresponding value in x Then Replace ai with next most general value 20
  • 21. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) The Find-G Algorithm Step1: Initialize G to the most general hypothesis in the hypothesis-space Step2: If x is negative Then a. For each g in G that matches x, replace g with its most general specializations that do not match x b. Remove all g in G that are more specific than any other hypothesis in G c. Remove all g in G that fail to match a positive example in the training set d. Remove any hypothesis in G that matches a previously seen negative example Else Remove from G all hypotheses that do not match x 21
  • 22. COMP-4640: Intelligent & Interactive Systems (Learning from Observations) Candidate Elimination Algorithm Step 1: Initialize G to the set of most general hypotheses Initialize S to the set of most specific hypotheses Step 2: For each training example, x, If x is positive Then - Remove from G those hypotheses that do not match x - Add all generalizations so that each s in S is consistent with x, and a member of G that is more general than s (the original s is removed) - Only keep the most specific hypothesis in S Else - Remove any hypotheses in S that matches x - For each g in G that matches x, a. Remove g b. replace g with its most general specializations that do not match x and some member of S that is more specific c. Remove all g in G that are more specific than any other hypothesis in G 22