SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Note to other teachers and users of these slides.
                                                             Andrew would be delighted if you found this source
                                                             material useful in giving your own lectures. Feel free
                                                             to use these slides verbatim, or to modify them to fit
                                                             your own needs. PowerPoint originals are available. If
                                                             you make use of a significant portion of these slides in
                                                             your own lecture, please include this message, or the
                                                             following link to the source repository of Andrew’s
                                                             tutorials: http://www.cs.cmu.edu/~awm/tutorials .
                                                             Comments and corrections gratefully received.




                Information Gain
                                    Andrew W. Moore
                                         Professor
                                School of Computer Science
                                Carnegie Mellon University
                                           www.cs.cmu.edu/~awm
                                             awm@cs.cmu.edu
                                               412-268-7599


              Copyright © 2001, 2003, Andrew W. Moore




                                             Bits
 You are watching a set of independent random samples of X
 You see that X has four possible values
P(X=A) = 1/4 P(X=B) = 1/4 P(X=C) = 1/4 P(X=D) = 1/4


 So you might see: BAACBADCDADDDA…
 You transmit data over a binary serial link. You can encode
 each reading with two bits (e.g. A = 00, B = 01, C = 10, D =
 11)

 0100001001001110110011111100…
 Copyright © 2001, 2003, Andrew W. Moore                                           Information Gain: Slide 2




                                                                                                                        1
Fewer Bits
 Someone tells you that the probabilities are not equal


P(X=A) = 1/2 P(X=B) = 1/4 P(X=C) = 1/8 P(X=D) = 1/8



 It’s possible…

       …to invent a coding for your transmission that only uses
       1.75 bits on average per symbol. How?


 Copyright © 2001, 2003, Andrew W. Moore             Information Gain: Slide 3




                                       Fewer Bits
 Someone tells you that the probabilities are not equal

P(X=A) = 1/2 P(X=B) = 1/4 P(X=C) = 1/8 P(X=D) = 1/8

 It’s possible…
       …to invent a coding for your transmission that only uses
       1.75 bits on average per symbol. How?
                                           A   0
                                           B   10
                                           C   110
                                           D   111



       (This is just one of several ways)
 Copyright © 2001, 2003, Andrew W. Moore             Information Gain: Slide 4




                                                                                 2
Fewer Bits
Suppose there are three equally likely values…

              P(X=A) = 1/3 P(X=B) = 1/3 P(X=C) = 1/3

    Here’s a naïve coding, costing 2 bits per symbol

                                           A    00
                                           B    01
                                           C    10


    Can you think of a coding that would need only 1.6 bits
    per symbol on average?

    In theory, it can in fact be done with 1.58496 bits per
    symbol.
 Copyright © 2001, 2003, Andrew W. Moore                         Information Gain: Slide 5




                                  General Case
Suppose X can have one of m values… V1, V2,               … Vm

P(X=V1) = p1                     P(X=V2) = p2        ….      P(X=Vm) = pm
What’s the smallest possible number of bits, on average, per
 symbol, needed to transmit a stream of symbols drawn from
 X’s distribution? It’s
      H ( X ) = − p1 log 2 p1 − p2 log 2 p2 − K − pm log 2 pm
                          m
                  = −∑ p j log 2 p j
                          j =1


H(X) = The entropy of X
• “High Entropy” means X is from a uniform (boring) distribution
• “Low Entropy” means X is from varied (peaks and valleys) distribution
 Copyright © 2001, 2003, Andrew W. Moore                         Information Gain: Slide 6




                                                                                             3
General Case
Suppose X can have one of m values… V1, V2,                      … Vm

P(X=V1) = p1                     P(X=V2) = p2              ….         P(X=Vm) = pm
                                                          A histogram of the
What’s the smallest possible number of frequency average, per
                                       bits, on distribution of
 symbol, needed to transmit a stream values of X would have
                                        of symbols drawn from
            A histogram of the
 X’s distribution? It’s                many lows and one or
            frequency distribution of
                                                          two highs
      H ( X ) = − p1 log 2would be flat 2 p2 − K − pm log 2 pm
              values of X p − p log
                            1    2
                          m
                  = −∑ p j log 2 p j
                          j =1


H(X) = The entropy of X
• “High Entropy” means X is from a uniform (boring) distribution
• “Low Entropy” means X is from varied (peaks and valleys) distribution
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 7




                                  General Case
Suppose X can have one of m values… V1, V2,                      … Vm

P(X=V1) = p1                     P(X=V2) = p2              ….         P(X=Vm) = pm
                                                          A histogram of the
What’s the smallest possible number of frequency average, per
                                       bits, on distribution of
 symbol, needed to transmit a stream values of X would have
                                        of symbols drawn from
            A histogram of the
 X’s distribution? It’s                many lows and one or
            frequency distribution of
                                                          two highs
      H ( X ) = − p1 log 2would be flat 2 p2 − K − pm log 2 pm
              values of X p − p log
                            1    2
                          m
                  = −∑ p ..and sop j values
                         j log 2 the                          ..and so the values
                                  sampled from it would
                          j =1
                                                              sampled from it would
                                  be all over the place       be more predictable
H(X) = The entropy of X
• “High Entropy” means X is from a uniform (boring) distribution
• “Low Entropy” means X is from varied (peaks and valleys) distribution
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 8




                                                                                                    4
Entropy in a nut-shell




   Low Entropy                                 High Entropy




Copyright © 2001, 2003, Andrew W. Moore                                  Information Gain: Slide 9




                    Entropy in a nut-shell




   Low Entropy                                 High Entropy
                                                              ..the values (locations of
                          ..the values (locations             soup) unpredictable...
                          of soup) sampled                    almost uniformly sampled
                          entirely from within                throughout our dining room
                          the soup bowl
Copyright © 2001, 2003, Andrew W. Moore                                 Information Gain: Slide 10




                                                                                                     5
Specific Conditional Entropy H(Y|X=v)
Suppose I’m trying to predict output Y and I have input X
X = College Major                     Let’s assume this reflects the true
                                      probabilities
Y = Likes “Gladiator”
                                      E.G. From this data we estimate
      X         Y
                                           • P(LikeG = Yes) = 0.5
   Math            Yes
   History         No                      • P(Major = Math & LikeG = No) = 0.25
   CS              Yes                     • P(Major = Math) = 0.5
   Math            No
                                           • P(LikeG = Yes | Major = History) = 0
   Math            No
                                      Note:
   CS              Yes
                                           • H(X) = 1.5
   History         No
                                           •H(Y) = 1
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 11




     Specific Conditional Entropy H(Y|X=v)
                                      Definition of Specific Conditional
X = College Major
                                      Entropy:
Y = Likes “Gladiator”
                                      H(Y |X=v) = The entropy of Y
                                      among only those records in which
        X               Y
                                      X has value v
   Math            Yes
   History         No
   CS              Yes
   Math            No
   Math            No
   CS              Yes
   History         No
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 12




                                                                                                     6
Specific Conditional Entropy H(Y|X=v)
                                      Definition of Specific Conditional
X = College Major
                                      Entropy:
Y = Likes “Gladiator”
                                      H(Y |X=v) = The entropy of Y
                                      among only those records in which
        X               Y
                                      X has value v
   Math            Yes
                                      Example:
   History         No
   CS              Yes
                                      • H(Y|X=Math) = 1
   Math            No
                                      • H(Y|X=History) = 0
   Math            No
                                      • H(Y|X=CS) = 0
   CS              Yes
   History         No
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                           Information Gain: Slide 13




            Conditional Entropy H(Y|X)
X = College Major                     Definition of Conditional
                                      Entropy:
Y = Likes “Gladiator”

                                      H(Y |X) = The average specific
        X               Y
                                      conditional entropy of Y
   Math            Yes
                                      = if you choose a record at random what
   History         No
                                      will be the conditional entropy of Y,
   CS              Yes
                                      conditioned on that row’s value of X
   Math            No
   Math            No                 = Expected number of bits to transmit Y if
   CS              Yes                both sides will know the value of X
   History         No
                                      = Σj Prob(X=vj) H(Y | X = vj)
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                           Information Gain: Slide 14




                                                                                                7
Conditional Entropy
                                     Definition of Conditional Entropy:
X = College Major
Y = Likes “Gladiator”
                                           H(Y|X) = The average conditional
                                           entropy of Y
                                           = ΣjProb(X=vj) H(Y | X = vj)
        X               Y
                                           Example:
   Math            Yes
   History         No                         vj      Prob(X=vj)    H(Y | X = vj)
   CS              Yes
                                           Math       0.5           1
   Math            No
                                           History    0.25          0
   Math            No
                                           CS         0.25          0
   CS              Yes
   History         No
                                     H(Y|X) = 0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 15




                            Information Gain
                                     Definition of Information Gain:
X = College Major
Y = Likes “Gladiator”
                                            IG(Y|X) = I must transmit Y.
                                           How many bits on average
                                           would it save me if both ends of
                                           the line knew X?
        X               Y
                                           IG(Y|X) = H(Y) - H(Y | X)
   Math            Yes
   History         No
                                           Example:
   CS              Yes
   Math            No                        • H(Y) = 1
   Math            No
                                             • H(Y|X) = 0.5
   CS              Yes
                                             • Thus IG(Y|X) = 1 – 0.5 = 0.5
   History         No
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                                Information Gain: Slide 16




                                                                                                     8
Information Gain Example




Copyright © 2001, 2003, Andrew W. Moore      Information Gain: Slide 17




                           Another example




Copyright © 2001, 2003, Andrew W. Moore      Information Gain: Slide 18




                                                                          9
Relative Information Gain
                      Definition of Relative Information
X = College Major
Y = Likes “Gladiator” Gain:
                                     RIG(Y|X) = I must transmit Y, what
                                     fraction of the bits on average would
                                     it save me if both ends of the line
        X               Y
                                     knew X?
   Math            Yes
                                     RIG(Y|X) = H(Y) - H(Y | X) / H(Y)
   History         No
   CS              Yes
                                           Example:
   Math            No
                                               H(Y|X) = 0.5
   Math            No                      •
   CS              Yes
                                               H(Y) = 1
                                           •
   History         No
                                               Thus IG(Y|X) = (1 – 0.5)/1 = 0.5
                                           •
   Math            Yes
 Copyright © 2001, 2003, Andrew W. Moore                          Information Gain: Slide 19




    What is Information Gain used for?
 Suppose you are trying to predict whether someone
 is going live past 80 years. From historical data you
 might find…
       •IG(LongLife | HairColor) = 0.01
       •IG(LongLife | Smoker) = 0.2
       •IG(LongLife | Gender) = 0.25
       •IG(LongLife | LastDigitOfSSN) = 0.00001
 IG tells you how interesting a 2-d contingency table is
 going to be.
 Copyright © 2001, 2003, Andrew W. Moore                          Information Gain: Slide 20




                                                                                               10

Mais conteúdo relacionado

Mais procurados

Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325Yi-Hsin Liu
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesMatthew Leingang
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesMatthew Leingang
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer VisionYap Wooi Hen
 
Minimax Rates for Homology Inference
Minimax Rates for Homology InferenceMinimax Rates for Homology Inference
Minimax Rates for Homology InferenceDon Sheehy
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkHiroshi Kuwajima
 
Bayesian Subset Simulation
Bayesian Subset SimulationBayesian Subset Simulation
Bayesian Subset SimulationJulien Bect
 
Accelerometers to Augmented Reality
Accelerometers to Augmented RealityAccelerometers to Augmented Reality
Accelerometers to Augmented Realityjblocksom
 
Introducing Copula to Risk Management Presentation
Introducing Copula to Risk Management PresentationIntroducing Copula to Risk Management Presentation
Introducing Copula to Risk Management PresentationEva Li
 
Applying the derivative
Applying the derivativeApplying the derivative
Applying the derivativeInarotul Faiza
 
Lesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsLesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsMatthew Leingang
 
Digital Communication: Information Theory
Digital Communication: Information TheoryDigital Communication: Information Theory
Digital Communication: Information TheoryDr. Sanjay M. Gulhane
 
Introduction to Tensorflow
Introduction to TensorflowIntroduction to Tensorflow
Introduction to TensorflowTzar Umang
 
Lesson 22: Optimization I (Section 4 version)
Lesson 22: Optimization I (Section 4 version)Lesson 22: Optimization I (Section 4 version)
Lesson 22: Optimization I (Section 4 version)Matthew Leingang
 
New Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal MeshesNew Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal MeshesDon Sheehy
 
Nodebox for Data Visualization
Nodebox for Data VisualizationNodebox for Data Visualization
Nodebox for Data VisualizationLynn Cherny
 
Lesson 8: Derivatives of Polynomials and Exponential functions
Lesson 8: Derivatives of Polynomials and Exponential functionsLesson 8: Derivatives of Polynomials and Exponential functions
Lesson 8: Derivatives of Polynomials and Exponential functionsMatthew Leingang
 

Mais procurados (19)

Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum Vaues
 
Lesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum VauesLesson 18: Maximum and Minimum Vaues
Lesson 18: Maximum and Minimum Vaues
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer Vision
 
Minimax Rates for Homology Inference
Minimax Rates for Homology InferenceMinimax Rates for Homology Inference
Minimax Rates for Homology Inference
 
Midterm I Review
Midterm I ReviewMidterm I Review
Midterm I Review
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
Bayesian Subset Simulation
Bayesian Subset SimulationBayesian Subset Simulation
Bayesian Subset Simulation
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Accelerometers to Augmented Reality
Accelerometers to Augmented RealityAccelerometers to Augmented Reality
Accelerometers to Augmented Reality
 
Introducing Copula to Risk Management Presentation
Introducing Copula to Risk Management PresentationIntroducing Copula to Risk Management Presentation
Introducing Copula to Risk Management Presentation
 
Applying the derivative
Applying the derivativeApplying the derivative
Applying the derivative
 
Lesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsLesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic Functions
 
Digital Communication: Information Theory
Digital Communication: Information TheoryDigital Communication: Information Theory
Digital Communication: Information Theory
 
Introduction to Tensorflow
Introduction to TensorflowIntroduction to Tensorflow
Introduction to Tensorflow
 
Lesson 22: Optimization I (Section 4 version)
Lesson 22: Optimization I (Section 4 version)Lesson 22: Optimization I (Section 4 version)
Lesson 22: Optimization I (Section 4 version)
 
New Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal MeshesNew Bounds on the Size of Optimal Meshes
New Bounds on the Size of Optimal Meshes
 
Nodebox for Data Visualization
Nodebox for Data VisualizationNodebox for Data Visualization
Nodebox for Data Visualization
 
Lesson 8: Derivatives of Polynomials and Exponential functions
Lesson 8: Derivatives of Polynomials and Exponential functionsLesson 8: Derivatives of Polynomials and Exponential functions
Lesson 8: Derivatives of Polynomials and Exponential functions
 

Semelhante a Information Gain

Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coverianceudhayax793
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacityFrank Katta
 
advance coding techniques - probability
advance coding techniques -  probabilityadvance coding techniques -  probability
advance coding techniques - probabilityYaseenMo
 
Tele4653 l9
Tele4653 l9Tele4653 l9
Tele4653 l9Vin Voro
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for DummiesBalaji P
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and ApplicationsSaurav Jha
 
Moment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsMoment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsIJSRED
 
IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionguestfee8698
 

Semelhante a Information Gain (20)

Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coveriance
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacity
 
advance coding techniques - probability
advance coding techniques -  probabilityadvance coding techniques -  probability
advance coding techniques - probability
 
Losseless
LosselessLosseless
Losseless
 
Probabilistic systems exam help
Probabilistic systems exam helpProbabilistic systems exam help
Probabilistic systems exam help
 
Probabilistic systems assignment help
Probabilistic systems assignment helpProbabilistic systems assignment help
Probabilistic systems assignment help
 
Tele4653 l9
Tele4653 l9Tele4653 l9
Tele4653 l9
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
 
Morphing Image
Morphing Image Morphing Image
Morphing Image
 
Dcs unit 2
Dcs unit 2Dcs unit 2
Dcs unit 2
 
Unit 5.pdf
Unit 5.pdfUnit 5.pdf
Unit 5.pdf
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
My7class
My7classMy7class
My7class
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
Moment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsMoment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of Distributions
 
IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED-V2I5P56
IJSRED-V2I5P56
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
 
7주차
7주차7주차
7주차
 

Último

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Information Gain

  • 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Information Gain Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, 2003, Andrew W. Moore Bits You are watching a set of independent random samples of X You see that X has four possible values P(X=A) = 1/4 P(X=B) = 1/4 P(X=C) = 1/4 P(X=D) = 1/4 So you might see: BAACBADCDADDDA… You transmit data over a binary serial link. You can encode each reading with two bits (e.g. A = 00, B = 01, C = 10, D = 11) 0100001001001110110011111100… Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 2 1
  • 2. Fewer Bits Someone tells you that the probabilities are not equal P(X=A) = 1/2 P(X=B) = 1/4 P(X=C) = 1/8 P(X=D) = 1/8 It’s possible… …to invent a coding for your transmission that only uses 1.75 bits on average per symbol. How? Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 3 Fewer Bits Someone tells you that the probabilities are not equal P(X=A) = 1/2 P(X=B) = 1/4 P(X=C) = 1/8 P(X=D) = 1/8 It’s possible… …to invent a coding for your transmission that only uses 1.75 bits on average per symbol. How? A 0 B 10 C 110 D 111 (This is just one of several ways) Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 4 2
  • 3. Fewer Bits Suppose there are three equally likely values… P(X=A) = 1/3 P(X=B) = 1/3 P(X=C) = 1/3 Here’s a naïve coding, costing 2 bits per symbol A 00 B 01 C 10 Can you think of a coding that would need only 1.6 bits per symbol on average? In theory, it can in fact be done with 1.58496 bits per symbol. Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 5 General Case Suppose X can have one of m values… V1, V2, … Vm P(X=V1) = p1 P(X=V2) = p2 …. P(X=Vm) = pm What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution? It’s H ( X ) = − p1 log 2 p1 − p2 log 2 p2 − K − pm log 2 pm m = −∑ p j log 2 p j j =1 H(X) = The entropy of X • “High Entropy” means X is from a uniform (boring) distribution • “Low Entropy” means X is from varied (peaks and valleys) distribution Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 6 3
  • 4. General Case Suppose X can have one of m values… V1, V2, … Vm P(X=V1) = p1 P(X=V2) = p2 …. P(X=Vm) = pm A histogram of the What’s the smallest possible number of frequency average, per bits, on distribution of symbol, needed to transmit a stream values of X would have of symbols drawn from A histogram of the X’s distribution? It’s many lows and one or frequency distribution of two highs H ( X ) = − p1 log 2would be flat 2 p2 − K − pm log 2 pm values of X p − p log 1 2 m = −∑ p j log 2 p j j =1 H(X) = The entropy of X • “High Entropy” means X is from a uniform (boring) distribution • “Low Entropy” means X is from varied (peaks and valleys) distribution Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 7 General Case Suppose X can have one of m values… V1, V2, … Vm P(X=V1) = p1 P(X=V2) = p2 …. P(X=Vm) = pm A histogram of the What’s the smallest possible number of frequency average, per bits, on distribution of symbol, needed to transmit a stream values of X would have of symbols drawn from A histogram of the X’s distribution? It’s many lows and one or frequency distribution of two highs H ( X ) = − p1 log 2would be flat 2 p2 − K − pm log 2 pm values of X p − p log 1 2 m = −∑ p ..and sop j values j log 2 the ..and so the values sampled from it would j =1 sampled from it would be all over the place be more predictable H(X) = The entropy of X • “High Entropy” means X is from a uniform (boring) distribution • “Low Entropy” means X is from varied (peaks and valleys) distribution Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 8 4
  • 5. Entropy in a nut-shell Low Entropy High Entropy Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 9 Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of ..the values (locations soup) unpredictable... of soup) sampled almost uniformly sampled entirely from within throughout our dining room the soup bowl Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 10 5
  • 6. Specific Conditional Entropy H(Y|X=v) Suppose I’m trying to predict output Y and I have input X X = College Major Let’s assume this reflects the true probabilities Y = Likes “Gladiator” E.G. From this data we estimate X Y • P(LikeG = Yes) = 0.5 Math Yes History No • P(Major = Math & LikeG = No) = 0.25 CS Yes • P(Major = Math) = 0.5 Math No • P(LikeG = Yes | Major = History) = 0 Math No Note: CS Yes • H(X) = 1.5 History No •H(Y) = 1 Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 11 Specific Conditional Entropy H(Y|X=v) Definition of Specific Conditional X = College Major Entropy: Y = Likes “Gladiator” H(Y |X=v) = The entropy of Y among only those records in which X Y X has value v Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 12 6
  • 7. Specific Conditional Entropy H(Y|X=v) Definition of Specific Conditional X = College Major Entropy: Y = Likes “Gladiator” H(Y |X=v) = The entropy of Y among only those records in which X Y X has value v Math Yes Example: History No CS Yes • H(Y|X=Math) = 1 Math No • H(Y|X=History) = 0 Math No • H(Y|X=CS) = 0 CS Yes History No Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 13 Conditional Entropy H(Y|X) X = College Major Definition of Conditional Entropy: Y = Likes “Gladiator” H(Y |X) = The average specific X Y conditional entropy of Y Math Yes = if you choose a record at random what History No will be the conditional entropy of Y, CS Yes conditioned on that row’s value of X Math No Math No = Expected number of bits to transmit Y if CS Yes both sides will know the value of X History No = Σj Prob(X=vj) H(Y | X = vj) Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 14 7
  • 8. Conditional Entropy Definition of Conditional Entropy: X = College Major Y = Likes “Gladiator” H(Y|X) = The average conditional entropy of Y = ΣjProb(X=vj) H(Y | X = vj) X Y Example: Math Yes History No vj Prob(X=vj) H(Y | X = vj) CS Yes Math 0.5 1 Math No History 0.25 0 Math No CS 0.25 0 CS Yes History No H(Y|X) = 0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5 Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 15 Information Gain Definition of Information Gain: X = College Major Y = Likes “Gladiator” IG(Y|X) = I must transmit Y. How many bits on average would it save me if both ends of the line knew X? X Y IG(Y|X) = H(Y) - H(Y | X) Math Yes History No Example: CS Yes Math No • H(Y) = 1 Math No • H(Y|X) = 0.5 CS Yes • Thus IG(Y|X) = 1 – 0.5 = 0.5 History No Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 16 8
  • 9. Information Gain Example Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 17 Another example Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 18 9
  • 10. Relative Information Gain Definition of Relative Information X = College Major Y = Likes “Gladiator” Gain: RIG(Y|X) = I must transmit Y, what fraction of the bits on average would it save me if both ends of the line X Y knew X? Math Yes RIG(Y|X) = H(Y) - H(Y | X) / H(Y) History No CS Yes Example: Math No H(Y|X) = 0.5 Math No • CS Yes H(Y) = 1 • History No Thus IG(Y|X) = (1 – 0.5)/1 = 0.5 • Math Yes Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 19 What is Information Gain used for? Suppose you are trying to predict whether someone is going live past 80 years. From historical data you might find… •IG(LongLife | HairColor) = 0.01 •IG(LongLife | Smoker) = 0.2 •IG(LongLife | Gender) = 0.25 •IG(LongLife | LastDigitOfSSN) = 0.00001 IG tells you how interesting a 2-d contingency table is going to be. Copyright © 2001, 2003, Andrew W. Moore Information Gain: Slide 20 10