SlideShare uma empresa Scribd logo
1 de 30
Information Gain,
Decision Trees and
Boosting
10-701 ML recitation
9 Feb 2006
by Jure
Entropy and
Information Grain
Entropy & Bits
 You are watching a set of independent random
sample of X
 X has 4 possible values:
P(X=A)=1/4, P(X=B)=1/4, P(X=C)=1/4, P(X=D)=1/4
 You get a string of symbols ACBABBCDADDC…
 To transmit the data over binary link you can
encode each symbol with bits (A=00, B=01,
C=10, D=11)
 You need 2 bits per symbol
Fewer Bits – example 1
 Now someone tells you the probabilities are not
equal
P(X=A)=1/2, P(X=B)=1/4, P(X=C)=1/8, P(X=D)=1/8
 Now, it is possible to find coding that uses only
1.75 bits on the average. How?
Fewer bits – example 2
 Suppose there are three equally likely values
P(X=A)=1/3, P(X=B)=1/3, P(X=C)=1/3
 Naïve coding: A = 00, B = 01, C=10
 Uses 2 bits per symbol
 Can you find coding that uses 1.6 bits per
symbol?
 In theory it can be done with 1.58496 bits
Entropy – General Case
 Suppose X takes n values, V1, V2,… Vn, and
P(X=V1)=p1, P(X=V2)=p2, … P(X=Vn)=pn
 What is the smallest number of bits, on average, per
symbol, needed to transmit the symbols drawn from
distribution of X? It’s
H(X) = p1log2 p1 – p2 log2p2 – … pnlog2pn
 H(X) = the entropy of X
)(log
1
2 i
n
i
i pp∑=
−=
High, Low Entropy
 “High Entropy”
 X is from a uniform like distribution
 Flat histogram
 Values sampled from it are less predictable
 “Low Entropy”
 X is from a varied (peaks and valleys) distribution
 Histogram has many lows and highs
 Values sampled from it are more predictable
Specific Conditional Entropy, H(Y|X=v)
X Y
Math Yes
History No
CS Yes
Math No
Math No
CS Yes
History No
Math Yes
 I have input X and want to predict
Y
 From data we estimate
probabilities
P(LikeG = Yes) = 0.5
P(Major=Math & LikeG=No) = 0.25
P(Major=Math) = 0.5
P(Major=History & LikeG=Yes) = 0
 Note
H(X) = 1.5
H(Y) = 1
X = College Major
Y = Likes “Gladiator”
Specific Conditional Entropy, H(Y|X=v)
X Y
Math Yes
History No
CS Yes
Math No
Math No
CS Yes
History No
Math Yes
 Definition of Specific Conditional
Entropy
 H(Y|X=v) = entropy of Y among
only those records in which X
has value v
 Example:
H(Y|X=Math) = 1
H(Y|X=History) = 0
H(Y|X=CS) = 0
X = College Major
Y = Likes “Gladiator”
Conditional Entropy, H(Y|X)
X Y
Math Yes
History No
CS Yes
Math No
Math No
CS Yes
History No
Math Yes
 Definition of Conditional Entropy
H(Y|X) = the average conditional
entropy of Y
= Σi P(X=vi) H(Y|X=vi)
 Example:
H(Y|X) = 0.5*1+0.25*0+0.25*0 = 0.5
X = College Major
Y = Likes “Gladiator”
vi P(X=vi) H(Y|X=vi)
Math 0.5 1
History 0.25 0
CS 0.25 0
Information Gain
X Y
Math Yes
History No
CS Yes
Math No
Math No
CS Yes
History No
Math Yes
 Definition of Information Gain
 IG(Y|X) = I must transmit Y.
How many bits on average
would it save me if both ends of
the line knew X?
IG(Y|X) = H(Y) – H(Y|X)
 Example:
H(Y) = 1
H(Y|X) = 0.5
Thus:
IG(Y|X) = 1 – 0.5 = 0.5
X = College Major
Y = Likes “Gladiator”
Decision Trees
When do I play tennis?
Decision Tree
Is the decision tree correct?
 Let’s check whether the split on Wind attribute is
correct.
 We need to show that Wind attribute has the
highest information gain.
When do I play tennis?
Wind attribute – 5 records match
Note: calculate the entropy only on examples that got
“routed” in our branch of the tree (Outlook=Rain)
Calculation
 Let
S = {D4, D5, D6, D10, D14}
 Entropy:
H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971
 Information Gain
IG(S,Temp) = H(S) – H(S|Temp) = 0.01997
IG(S, Humidity) = H(S) – H(S|Humidity) = 0.01997
IG(S,Wind) = H(S) – H(S|Wind) = 0.971
More about Decision Trees
 How I determine classification in the leaf?
 If Outlook=Rain is a leaf, what is classification rule?
 Classify Example:
 We have N boolean attributes, all are needed for
classification:
 How many IG calculations do we need?
 Strength of Decision Trees (boolean attributes)
 All boolean functions
 Handling continuous attributes
Boosting
Booosting
 Is a way of combining weak learners (also
called base learners) into a more accurate
classifier
 Learn in iterations
 Each iteration focuses on hard to learn parts of
the attribute space, i.e. examples that were
misclassified by previous weak learners.
Note: There is nothing inherently weak about the weak
learners – we just think of them this way. In fact, any
learning algorithm can be used as a weak learner in
boosting
Boooosting, AdaBoost
miss-classifications
with respect to
weights D
Influence (importance)
of weak learner
Booooosting Decision Stumps
Boooooosting
 Weights Dt are uniform
 First weak learner is stump that splits on Outlook
(since weights are uniform)
 4 misclassifications out of 14 examples:
α1 = ½ ln((1-ε)/ε)
= ½ ln((1- 0.28)/0.28) = 0.45
 Update Dt:
Determines
miss-classifications
Booooooosting Decision Stumps
miss-classifications
by 1st
weak learner
Boooooooosting, round 1
 1st
weak learner misclassifies 4 examples (D6,
D9, D11, D14):
 Now update weights Dt :
 Weights of examples D6, D9, D11, D14 increase
 Weights of other (correctly classified) examples
decrease
 How do we calculate IGs for 2nd
round of
boosting?
Booooooooosting, round 2
 Now use Dt instead of counts (Dt is a distribution):
 So when calculating information gain we calculate the
“probability” by using weights Dt (not counts)
 e.g.
P(Temp=mild) = Dt(d4) + Dt(d8)+ Dt(d10)+
Dt(d11)+ Dt(d12)+ Dt(d14)
which is more than 6/14 (Temp=mild occurs 6 times)
 similarly:
P(Tennis=Yes|Temp=mild) = (Dt(d4) + Dt(d10)+
Dt(d11)+ Dt(d12)) / P(Temp=mild)
 and no magic for IG
Boooooooooosting, even more
 Boosting does not easily overfit
 Have to determine stopping criteria
 Not obvious, but not that important
 Boosting is greedy:
 always chooses currently best weak learner
 once it chooses weak learner and its Alpha, it
remains fixed – no changes possible in later
rounds of boosting
Acknowledgement
 Part of the slides on Information Gain borrowed
from Andrew Moore

Mais conteúdo relacionado

Mais procurados

Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to ProbabilityTodd Bill
 
27 calculation with log and exp x
27 calculation with log and exp x27 calculation with log and exp x
27 calculation with log and exp xmath260
 
28 more on log and exponential equations x
28 more on log and exponential equations x28 more on log and exponential equations x
28 more on log and exponential equations xmath260
 
3 algebraic expressions y
3 algebraic expressions y3 algebraic expressions y
3 algebraic expressions ymath266
 
47 operations of 2nd degree expressions and formulas
47 operations of 2nd degree expressions and formulas47 operations of 2nd degree expressions and formulas
47 operations of 2nd degree expressions and formulasalg1testreview
 
4 4polynomial operations
4 4polynomial operations4 4polynomial operations
4 4polynomial operationsmath123a
 
2.5 calculation with log and exp
2.5 calculation with log and exp2.5 calculation with log and exp
2.5 calculation with log and expmath123c
 
1.5 algebraic and elementary functions
1.5 algebraic and elementary functions1.5 algebraic and elementary functions
1.5 algebraic and elementary functionsmath265
 
Machine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision TreesMachine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision Treeswebisslides
 
3.2 more on log and exponential equations
3.2 more on log and exponential equations3.2 more on log and exponential equations
3.2 more on log and exponential equationsmath123c
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numericalJudianto Nugroho
 
Bilangan Bulat Matematika Kelas 7
Bilangan Bulat Matematika Kelas 7Bilangan Bulat Matematika Kelas 7
Bilangan Bulat Matematika Kelas 7miaakmt
 
1.3 solving equations
1.3 solving equations1.3 solving equations
1.3 solving equationsmath260
 
2.1 reviews of exponents and the power functions
2.1 reviews of exponents and the power functions2.1 reviews of exponents and the power functions
2.1 reviews of exponents and the power functionsmath123c
 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)GirjaPrasad
 
4 1exponents
4 1exponents4 1exponents
4 1exponentsmath123a
 
Ani agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni_Agustina
 
4 5special binomial operations
4 5special binomial operations4 5special binomial operations
4 5special binomial operationsmath123a
 

Mais procurados (20)

Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to Probability
 
27 calculation with log and exp x
27 calculation with log and exp x27 calculation with log and exp x
27 calculation with log and exp x
 
28 more on log and exponential equations x
28 more on log and exponential equations x28 more on log and exponential equations x
28 more on log and exponential equations x
 
3 algebraic expressions y
3 algebraic expressions y3 algebraic expressions y
3 algebraic expressions y
 
47 operations of 2nd degree expressions and formulas
47 operations of 2nd degree expressions and formulas47 operations of 2nd degree expressions and formulas
47 operations of 2nd degree expressions and formulas
 
4 4polynomial operations
4 4polynomial operations4 4polynomial operations
4 4polynomial operations
 
2.5 calculation with log and exp
2.5 calculation with log and exp2.5 calculation with log and exp
2.5 calculation with log and exp
 
1.5 algebraic and elementary functions
1.5 algebraic and elementary functions1.5 algebraic and elementary functions
1.5 algebraic and elementary functions
 
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...
 
Machine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision TreesMachine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision Trees
 
3.2 more on log and exponential equations
3.2 more on log and exponential equations3.2 more on log and exponential equations
3.2 more on log and exponential equations
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numerical
 
Bilangan Bulat Matematika Kelas 7
Bilangan Bulat Matematika Kelas 7Bilangan Bulat Matematika Kelas 7
Bilangan Bulat Matematika Kelas 7
 
1.3 solving equations
1.3 solving equations1.3 solving equations
1.3 solving equations
 
2.1 reviews of exponents and the power functions
2.1 reviews of exponents and the power functions2.1 reviews of exponents and the power functions
2.1 reviews of exponents and the power functions
 
Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)Mean (Girja Pd. Patel)
Mean (Girja Pd. Patel)
 
4 1exponents
4 1exponents4 1exponents
4 1exponents
 
Ani agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomial
 
Probability and Statistics - Week 2
Probability and Statistics - Week 2Probability and Statistics - Week 2
Probability and Statistics - Week 2
 
4 5special binomial operations
4 5special binomial operations4 5special binomial operations
4 5special binomial operations
 

Semelhante a Recitation decision trees-adaboost-02-09-2006-3

Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxavinashBajpayee1
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningbutest
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslokuntalpatra420
 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itkuntalpatra420
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
4_logit_printable_.pdf
4_logit_printable_.pdf4_logit_printable_.pdf
4_logit_printable_.pdfElio Laureano
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxGairuzazmiMGhani
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionJudianto Nugroho
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learningkensaleste
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Abebe Admasu
 

Semelhante a Recitation decision trees-adaboost-02-09-2006-3 (20)

Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
My7class
My7classMy7class
My7class
 
Stats chapter 8
Stats chapter 8Stats chapter 8
Stats chapter 8
 
Stats chapter 8
Stats chapter 8Stats chapter 8
Stats chapter 8
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use it
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
4_logit_printable_.pdf
4_logit_printable_.pdf4_logit_printable_.pdf
4_logit_printable_.pdf
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distribution
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learning
 
U unit7 ssb
U unit7 ssbU unit7 ssb
U unit7 ssb
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
 

Último

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 

Último (20)

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 

Recitation decision trees-adaboost-02-09-2006-3

  • 1. Information Gain, Decision Trees and Boosting 10-701 ML recitation 9 Feb 2006 by Jure
  • 3. Entropy & Bits  You are watching a set of independent random sample of X  X has 4 possible values: P(X=A)=1/4, P(X=B)=1/4, P(X=C)=1/4, P(X=D)=1/4  You get a string of symbols ACBABBCDADDC…  To transmit the data over binary link you can encode each symbol with bits (A=00, B=01, C=10, D=11)  You need 2 bits per symbol
  • 4. Fewer Bits – example 1  Now someone tells you the probabilities are not equal P(X=A)=1/2, P(X=B)=1/4, P(X=C)=1/8, P(X=D)=1/8  Now, it is possible to find coding that uses only 1.75 bits on the average. How?
  • 5. Fewer bits – example 2  Suppose there are three equally likely values P(X=A)=1/3, P(X=B)=1/3, P(X=C)=1/3  Naïve coding: A = 00, B = 01, C=10  Uses 2 bits per symbol  Can you find coding that uses 1.6 bits per symbol?  In theory it can be done with 1.58496 bits
  • 6. Entropy – General Case  Suppose X takes n values, V1, V2,… Vn, and P(X=V1)=p1, P(X=V2)=p2, … P(X=Vn)=pn  What is the smallest number of bits, on average, per symbol, needed to transmit the symbols drawn from distribution of X? It’s H(X) = p1log2 p1 – p2 log2p2 – … pnlog2pn  H(X) = the entropy of X )(log 1 2 i n i i pp∑= −=
  • 7. High, Low Entropy  “High Entropy”  X is from a uniform like distribution  Flat histogram  Values sampled from it are less predictable  “Low Entropy”  X is from a varied (peaks and valleys) distribution  Histogram has many lows and highs  Values sampled from it are more predictable
  • 8. Specific Conditional Entropy, H(Y|X=v) X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes  I have input X and want to predict Y  From data we estimate probabilities P(LikeG = Yes) = 0.5 P(Major=Math & LikeG=No) = 0.25 P(Major=Math) = 0.5 P(Major=History & LikeG=Yes) = 0  Note H(X) = 1.5 H(Y) = 1 X = College Major Y = Likes “Gladiator”
  • 9. Specific Conditional Entropy, H(Y|X=v) X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes  Definition of Specific Conditional Entropy  H(Y|X=v) = entropy of Y among only those records in which X has value v  Example: H(Y|X=Math) = 1 H(Y|X=History) = 0 H(Y|X=CS) = 0 X = College Major Y = Likes “Gladiator”
  • 10. Conditional Entropy, H(Y|X) X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes  Definition of Conditional Entropy H(Y|X) = the average conditional entropy of Y = Σi P(X=vi) H(Y|X=vi)  Example: H(Y|X) = 0.5*1+0.25*0+0.25*0 = 0.5 X = College Major Y = Likes “Gladiator” vi P(X=vi) H(Y|X=vi) Math 0.5 1 History 0.25 0 CS 0.25 0
  • 11. Information Gain X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes  Definition of Information Gain  IG(Y|X) = I must transmit Y. How many bits on average would it save me if both ends of the line knew X? IG(Y|X) = H(Y) – H(Y|X)  Example: H(Y) = 1 H(Y|X) = 0.5 Thus: IG(Y|X) = 1 – 0.5 = 0.5 X = College Major Y = Likes “Gladiator”
  • 13. When do I play tennis?
  • 15. Is the decision tree correct?  Let’s check whether the split on Wind attribute is correct.  We need to show that Wind attribute has the highest information gain.
  • 16. When do I play tennis?
  • 17. Wind attribute – 5 records match Note: calculate the entropy only on examples that got “routed” in our branch of the tree (Outlook=Rain)
  • 18. Calculation  Let S = {D4, D5, D6, D10, D14}  Entropy: H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971  Information Gain IG(S,Temp) = H(S) – H(S|Temp) = 0.01997 IG(S, Humidity) = H(S) – H(S|Humidity) = 0.01997 IG(S,Wind) = H(S) – H(S|Wind) = 0.971
  • 19. More about Decision Trees  How I determine classification in the leaf?  If Outlook=Rain is a leaf, what is classification rule?  Classify Example:  We have N boolean attributes, all are needed for classification:  How many IG calculations do we need?  Strength of Decision Trees (boolean attributes)  All boolean functions  Handling continuous attributes
  • 21. Booosting  Is a way of combining weak learners (also called base learners) into a more accurate classifier  Learn in iterations  Each iteration focuses on hard to learn parts of the attribute space, i.e. examples that were misclassified by previous weak learners. Note: There is nothing inherently weak about the weak learners – we just think of them this way. In fact, any learning algorithm can be used as a weak learner in boosting
  • 23. miss-classifications with respect to weights D Influence (importance) of weak learner
  • 25. Boooooosting  Weights Dt are uniform  First weak learner is stump that splits on Outlook (since weights are uniform)  4 misclassifications out of 14 examples: α1 = ½ ln((1-ε)/ε) = ½ ln((1- 0.28)/0.28) = 0.45  Update Dt: Determines miss-classifications
  • 27. Boooooooosting, round 1  1st weak learner misclassifies 4 examples (D6, D9, D11, D14):  Now update weights Dt :  Weights of examples D6, D9, D11, D14 increase  Weights of other (correctly classified) examples decrease  How do we calculate IGs for 2nd round of boosting?
  • 28. Booooooooosting, round 2  Now use Dt instead of counts (Dt is a distribution):  So when calculating information gain we calculate the “probability” by using weights Dt (not counts)  e.g. P(Temp=mild) = Dt(d4) + Dt(d8)+ Dt(d10)+ Dt(d11)+ Dt(d12)+ Dt(d14) which is more than 6/14 (Temp=mild occurs 6 times)  similarly: P(Tennis=Yes|Temp=mild) = (Dt(d4) + Dt(d10)+ Dt(d11)+ Dt(d12)) / P(Temp=mild)  and no magic for IG
  • 29. Boooooooooosting, even more  Boosting does not easily overfit  Have to determine stopping criteria  Not obvious, but not that important  Boosting is greedy:  always chooses currently best weak learner  once it chooses weak learner and its Alpha, it remains fixed – no changes possible in later rounds of boosting
  • 30. Acknowledgement  Part of the slides on Information Gain borrowed from Andrew Moore