SlideShare uma empresa Scribd logo
1 de 5
MACHINE LEARNING

What is Machine Learning?

  • No standard definition, however, there are parallels between animal and machine learning.
  • Very broadly, a machine learns whenever it changes its structure, program, or data (based
    on inputs or in response to external information) in such a way that its expected future
    performance improves.
       o Example: Performance of a speech-recognition machine improves after hearing
          several samples of a person’s speech.

Learning Agents
  •   Combination of
        o Performance element – collection of knowledge and procedures for selecting
          actions.
        o Critic – observes “world” and passes information to learning element.
        o Learning element – formulates new rules, as necessary, and performance element is
          modified by installing the rule.
        o Problem generator – identify areas of behavior in need of improvement and suggest
          experiments.


      Performance standard


           Critic                    sensors
                                                               E
                                                               N
      feedback                                                 V
                                                               I
                                                               R
           Learning    changes
                                   Performance                 O
           element                                             N
                       knowledge   element
                                                               M
        learning                                               E
        goals                                                  N
                                                               T

           Problem
           generator


   Agent                             Actuators


            Model of a learning agent
DECISION TREES

Motivation

   o When a businessperson needs to make a decision based on several factors, a decision tree
      can help identify which factors to consider and how each factor has historically been
      associated with different outcomes of the decision.

   o For example, in a credit risk case study, we have data for each applicant’s debt, income,
      and marital status.

   o A decision tree creates a model as either a graphical tree or a set of text rules that can
      predict (classify) each applicant as a good or bad credit risk.

A decision tree is a model that is both predictive and descriptive. It is called a decision tree
because the resulting model is presented in the form of a tree structure.

   o Visual presentation makes the decision tree model very easy to understand and assimilate.
      Decision trees are most commonly used for classification (predicting what group a case
      belongs to), but can also be used for regression (predicting a specific value).

   o Decision trees graphically display the relationships found in data. It shows the relationship
      between one dependent variable (e.g. credit risk) and several independent variables (e.g.
      income, debt, and marital status).

   o A goal in a decision tree is of the form G ⇔ P1 ∨ P2 ∨ …. ∨ Pn, where each Pi is a
      conjunction of tests from the root of the tree to a leaf with a positive outcome.

   o Most products also translate the tree-to-text rules such as If Income = High and Years on
      job > 5 Then Credit risk = Good.

   o Decision tree algorithms are very similar to rule induction algorithms which produce rule
      sets without a decision tree.

   o The training process that creates the decision tree is usually called induction.
Example : The credit risk classification problem


                                Name Debt Income Married? Risk
                                Joe  1    1      1       1
                                Sue 0     1      1       1
                                John 0    1      0       0
                                Mary 1    0      1       0
                                Fred 0    0      1       0

                 Credit risk data with column values converted to numeric values.




          Predicted

                         High                                                 Not
      Risk                      Low Debt High Income Low Income Married
                         Debt                                                 Married
      Good               1      1          2            0             2       0
      Poor               1      2          1            2             2       1

          Cross-tabulation of the independent vs. dependent columns for the root node.



The resulting tree is:


                                                   Note:
                                                    o Each box in the tree represents a node.
                                                    o The tree grows from the root node – the
                                                       data is split at each level to form new
                                                       nodes.
                                                    o The leaf nodes play a special role when
                                                       the tree is used for prediction
Note the following:
   o In the tree, each node contains information about the number of instances at that node, and
     about the distribution of dependent variable values (Credit Risk).
   o The instances at the root node are all of the instances in the training set - instances, of
     which 40 percent are Good risks and 60 percent are Poor risks.
   o Below the root node (parent) is the first split that, in this case, splits the data into two new
     nodes (children) based on whether Income is High or Low.
         o The rightmost node (Low Income) resulting from this split contains two instances,
            both of which are associated with Poor credit risk.
                 Because all instances have the same value of the dependent variable (Credit
                   Risk), this node is termed pure and will not be split further.
         o The leftmost node in the first split contains three instances, 66.7 % of which are
            Good.
         o The leftmost node is then further split based on the value of Married (Yes or No),
            resulting in two more nodes which are each also pure.

Note also

    The order of the splits, Income first and then Married, is determined by an induction
     algorithm - the method used in the above tree is to pick the split that has the largest
     number of instances on the diagonal of its cross-tabulation.
    Once grown, a tree can be used for predicting a new case by starting at the root (top) of the
     tree and following a path down the branches until a leaf node is encountered. The path is
     determined by imposing the split rules on the values of the independent variables in the
     new instance.

Example

Consider the first row in the training set for Joe. Because Joe has High income, follow the branch
to the left. Because Joe is married, follow the tree down the branch to the right. At this point we
have arrived at a leaf node, and the predicted value is the predominant value of the leaf node, or
Good in this case.

The rules for the leaf nodes, taken left to right, are as follows:
IF Income = High AND Married = No THEN Risk = Poor
IF Income = High AND Married = Yes THEN Risk = Good
IF Income = Low THEN Risk = Poor

There are often additional interesting and potentially useful observations about the data that can
be made after a tree has been induced. In the case of our sample data, the tree reveals:
   •   Debt appears to have no role in determining Risk.
   •   People with Low Income are always a Poor Risk.
•   Income is the most significant factor in determining risk.

Mais conteúdo relacionado

Semelhante a MACHINE LEARNING

Aronchpt3correlation
Aronchpt3correlationAronchpt3correlation
Aronchpt3correlationSandra Nicks
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922stone55
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Matt Hansen
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 

Semelhante a MACHINE LEARNING (6)

Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Aronchpt3correlation
Aronchpt3correlationAronchpt3correlation
Aronchpt3correlation
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
Causality in complex networks
Causality in complex networksCausality in complex networks
Causality in complex networks
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

MACHINE LEARNING

  • 1. MACHINE LEARNING What is Machine Learning? • No standard definition, however, there are parallels between animal and machine learning. • Very broadly, a machine learns whenever it changes its structure, program, or data (based on inputs or in response to external information) in such a way that its expected future performance improves. o Example: Performance of a speech-recognition machine improves after hearing several samples of a person’s speech. Learning Agents • Combination of o Performance element – collection of knowledge and procedures for selecting actions. o Critic – observes “world” and passes information to learning element. o Learning element – formulates new rules, as necessary, and performance element is modified by installing the rule. o Problem generator – identify areas of behavior in need of improvement and suggest experiments. Performance standard Critic sensors E N feedback V I R Learning changes Performance O element N knowledge element M learning E goals N T Problem generator Agent Actuators Model of a learning agent
  • 2. DECISION TREES Motivation o When a businessperson needs to make a decision based on several factors, a decision tree can help identify which factors to consider and how each factor has historically been associated with different outcomes of the decision. o For example, in a credit risk case study, we have data for each applicant’s debt, income, and marital status. o A decision tree creates a model as either a graphical tree or a set of text rules that can predict (classify) each applicant as a good or bad credit risk. A decision tree is a model that is both predictive and descriptive. It is called a decision tree because the resulting model is presented in the form of a tree structure. o Visual presentation makes the decision tree model very easy to understand and assimilate. Decision trees are most commonly used for classification (predicting what group a case belongs to), but can also be used for regression (predicting a specific value). o Decision trees graphically display the relationships found in data. It shows the relationship between one dependent variable (e.g. credit risk) and several independent variables (e.g. income, debt, and marital status). o A goal in a decision tree is of the form G ⇔ P1 ∨ P2 ∨ …. ∨ Pn, where each Pi is a conjunction of tests from the root of the tree to a leaf with a positive outcome. o Most products also translate the tree-to-text rules such as If Income = High and Years on job > 5 Then Credit risk = Good. o Decision tree algorithms are very similar to rule induction algorithms which produce rule sets without a decision tree. o The training process that creates the decision tree is usually called induction.
  • 3. Example : The credit risk classification problem Name Debt Income Married? Risk Joe 1 1 1 1 Sue 0 1 1 1 John 0 1 0 0 Mary 1 0 1 0 Fred 0 0 1 0 Credit risk data with column values converted to numeric values. Predicted High Not Risk Low Debt High Income Low Income Married Debt Married Good 1 1 2 0 2 0 Poor 1 2 1 2 2 1 Cross-tabulation of the independent vs. dependent columns for the root node. The resulting tree is: Note: o Each box in the tree represents a node. o The tree grows from the root node – the data is split at each level to form new nodes. o The leaf nodes play a special role when the tree is used for prediction
  • 4. Note the following: o In the tree, each node contains information about the number of instances at that node, and about the distribution of dependent variable values (Credit Risk). o The instances at the root node are all of the instances in the training set - instances, of which 40 percent are Good risks and 60 percent are Poor risks. o Below the root node (parent) is the first split that, in this case, splits the data into two new nodes (children) based on whether Income is High or Low. o The rightmost node (Low Income) resulting from this split contains two instances, both of which are associated with Poor credit risk.  Because all instances have the same value of the dependent variable (Credit Risk), this node is termed pure and will not be split further. o The leftmost node in the first split contains three instances, 66.7 % of which are Good. o The leftmost node is then further split based on the value of Married (Yes or No), resulting in two more nodes which are each also pure. Note also  The order of the splits, Income first and then Married, is determined by an induction algorithm - the method used in the above tree is to pick the split that has the largest number of instances on the diagonal of its cross-tabulation.  Once grown, a tree can be used for predicting a new case by starting at the root (top) of the tree and following a path down the branches until a leaf node is encountered. The path is determined by imposing the split rules on the values of the independent variables in the new instance. Example Consider the first row in the training set for Joe. Because Joe has High income, follow the branch to the left. Because Joe is married, follow the tree down the branch to the right. At this point we have arrived at a leaf node, and the predicted value is the predominant value of the leaf node, or Good in this case. The rules for the leaf nodes, taken left to right, are as follows: IF Income = High AND Married = No THEN Risk = Poor IF Income = High AND Married = Yes THEN Risk = Good IF Income = Low THEN Risk = Poor There are often additional interesting and potentially useful observations about the data that can be made after a tree has been induced. In the case of our sample data, the tree reveals: • Debt appears to have no role in determining Risk. • People with Low Income are always a Poor Risk.
  • 5. Income is the most significant factor in determining risk.