SlideShare uma empresa Scribd logo
1 de 16
Classification
APAM E4990
Computational Social Science
Jake Hofman
Columbia University
April 26, 2013
Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
Prediction a la Bayes1
• You’re testing for a rare condition:
• 1% of the student population is in this class
• You have a highly sensitive and specific test:
• 99% of students in the class visit compsocialscience.org
• 99% of students who aren’t in the class don’t visit this site
• Given that a student visits the course site, what is probability
the student is in our class?
1
Follows Wiggins, SciAm 2006
Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
So given that a student visits the site (198 ppl), there is a 50%
chance the student is in our class (99 ppl)!
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
The small error rate on the large population outside of our class
produces many false positives.
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Inverting conditional probabilities
Bayes’ Theorem
Equate the far right- and left-hand sides of product rule
p (y|x) p (x) = p (x, y) = p (x|y) p (y)
and divide to get the probability of y given x from the probability
of x given y:
p (y|x) =
p (x|y) p (y)
p (x)
where p (x) = y∈ΩY
p (x|y) p (y) is the normalization constant.
Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
Predictions a la Bayes
Given that a patient tests positive, what is probability the patient
is sick?
p (class|visit) =
99/100
p (visit|class)
1/100
p (class)
p (visit)
99/1002+99/1002=198/1002
=
99
198
=
1
2
where p (visit) = p (visit|class) p (class) + p visit|class p class .
Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
(Super) Naive Bayes
We can use Bayes’ rule to build a one-site student classifier:
p (class|site) =
p (site|class) p (class)
p (site)
where we estimate these probabilities with ratios of counts:
ˆp(site|class) =
# students in class who visit site
# students in class
ˆp(site|class) =
# students not in class who visit site
# students not in class
ˆp(class) =
# students in class
# students
ˆp(class) =
# students not in class
# students
Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
Naive Bayes
Represent each student by a binary vector x where xj = 1 if the
student has visited the j-th site (xj = 0 otherwise).
Modeling each site as an independent Bernoulli random variable,
the probability of visiting a set of sites x given class membership
c = 0, 1:
p (x|c) =
j
θ
xj
jc (1 − θjc)1−xj
where θjc denotes the probability that the j-th site is visited by a
student with class membership c.
Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
Naive Bayes
Using this likelihood in Bayes’ rule and taking a logarithm, we have:
log p (c|x) = log
p (x|c) p (c)
p (x)
=
j
xj log
θjc
1 − θjc
+
j
log(1 − θjc) + log
θc
p (x)
Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
Naive Bayes
We can eliminate p (x) by calculating the log-odds:
log
p (1|x)
p (0|x)
=
j
xj log
θj1(1 − θj0)
θj0(1 − θj1)
wj
+
j
log
1 − θj1
1 − θj0
+ log
θ1
θ0
w0
which gives a linear classifier of the form w · x + w0
Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
Naive Bayes
We train by counting students and sites to estimate θjc and θc:
ˆθjc =
njc
nc
ˆθc =
nc
n
and use these to calculate the weights ˆwj and bias ˆw0:
ˆwj = log
ˆθj1(1 − ˆθj0)
ˆθj0(1 − ˆθj1)
ˆw0 =
j
log
1 − ˆθj1
1 − ˆθj0
+ log
ˆθ1
ˆθ0
.
We we predict by simply adding the weights of the sites that a
student has visited to the bias term.
Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
Naive Bayes
In practice, this works better than one might expect given its
simplicity2
2
http://www.jstor.org/pss/1403452
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Training is computationally cheap and scalable, and the model is
easy to update given new observations2
2
http://www.springerlink.com/content/wu3g458834583125/
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Performance varies with document representations and
corresponding likelihood models2
2
http://ceas.cc/2006/15.pdf
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
It’s often important to smooth parameter estimates (e.g., by
adding pseudocounts) to avoid overfitting
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11

Mais conteúdo relacionado

Destaque

Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part Ijakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
LAS PLANTAS
LAS PLANTASLAS PLANTAS
LAS PLANTASrosayago
 
Estancias en Guadalajara
Estancias en GuadalajaraEstancias en Guadalajara
Estancias en GuadalajaraAlice Listing
 
Conferencia educación católica versión final - abril 24, 2009..[1]
Conferencia educación católica   versión final - abril 24, 2009..[1]Conferencia educación católica   versión final - abril 24, 2009..[1]
Conferencia educación católica versión final - abril 24, 2009..[1]julian
 
Cerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSCerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSjorchuk
 
4 de cada barroco
4 de cada barroco4 de cada barroco
4 de cada barrocoanamaria35
 
Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Mario Padilla
 

Destaque (18)

Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part I
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
практ3
практ3практ3
практ3
 
LAS PLANTAS
LAS PLANTASLAS PLANTAS
LAS PLANTAS
 
практ7
практ7практ7
практ7
 
No Esperes
No EsperesNo Esperes
No Esperes
 
Estancias en Guadalajara
Estancias en GuadalajaraEstancias en Guadalajara
Estancias en Guadalajara
 
Conferencia educación católica versión final - abril 24, 2009..[1]
Conferencia educación católica   versión final - abril 24, 2009..[1]Conferencia educación católica   versión final - abril 24, 2009..[1]
Conferencia educación católica versión final - abril 24, 2009..[1]
 
Cerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSCerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOS
 
4 de cada barroco
4 de cada barroco4 de cada barroco
4 de cada barroco
 
Starbucks
StarbucksStarbucks
Starbucks
 
лаб3
лаб3лаб3
лаб3
 
Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015
 
Question Four
Question FourQuestion Four
Question Four
 

Mais de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

Mais de jakehofman (16)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Último

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 

Último (20)

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 

Computational Social Science, Lecture 13: Classification

  • 1. Classification APAM E4990 Computational Social Science Jake Hofman Columbia University April 26, 2013 Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
  • 2. Prediction a la Bayes1 • You’re testing for a rare condition: • 1% of the student population is in this class • You have a highly sensitive and specific test: • 99% of students in the class visit compsocialscience.org • 99% of students who aren’t in the class don’t visit this site • Given that a student visits the course site, what is probability the student is in our class? 1 Follows Wiggins, SciAm 2006 Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
  • 3. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 4. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl So given that a student visits the site (198 ppl), there is a 50% chance the student is in our class (99 ppl)! Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 5. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl The small error rate on the large population outside of our class produces many false positives. Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 6. Inverting conditional probabilities Bayes’ Theorem Equate the far right- and left-hand sides of product rule p (y|x) p (x) = p (x, y) = p (x|y) p (y) and divide to get the probability of y given x from the probability of x given y: p (y|x) = p (x|y) p (y) p (x) where p (x) = y∈ΩY p (x|y) p (y) is the normalization constant. Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
  • 7. Predictions a la Bayes Given that a patient tests positive, what is probability the patient is sick? p (class|visit) = 99/100 p (visit|class) 1/100 p (class) p (visit) 99/1002+99/1002=198/1002 = 99 198 = 1 2 where p (visit) = p (visit|class) p (class) + p visit|class p class . Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
  • 8. (Super) Naive Bayes We can use Bayes’ rule to build a one-site student classifier: p (class|site) = p (site|class) p (class) p (site) where we estimate these probabilities with ratios of counts: ˆp(site|class) = # students in class who visit site # students in class ˆp(site|class) = # students not in class who visit site # students not in class ˆp(class) = # students in class # students ˆp(class) = # students not in class # students Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
  • 9. Naive Bayes Represent each student by a binary vector x where xj = 1 if the student has visited the j-th site (xj = 0 otherwise). Modeling each site as an independent Bernoulli random variable, the probability of visiting a set of sites x given class membership c = 0, 1: p (x|c) = j θ xj jc (1 − θjc)1−xj where θjc denotes the probability that the j-th site is visited by a student with class membership c. Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
  • 10. Naive Bayes Using this likelihood in Bayes’ rule and taking a logarithm, we have: log p (c|x) = log p (x|c) p (c) p (x) = j xj log θjc 1 − θjc + j log(1 − θjc) + log θc p (x) Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
  • 11. Naive Bayes We can eliminate p (x) by calculating the log-odds: log p (1|x) p (0|x) = j xj log θj1(1 − θj0) θj0(1 − θj1) wj + j log 1 − θj1 1 − θj0 + log θ1 θ0 w0 which gives a linear classifier of the form w · x + w0 Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
  • 12. Naive Bayes We train by counting students and sites to estimate θjc and θc: ˆθjc = njc nc ˆθc = nc n and use these to calculate the weights ˆwj and bias ˆw0: ˆwj = log ˆθj1(1 − ˆθj0) ˆθj0(1 − ˆθj1) ˆw0 = j log 1 − ˆθj1 1 − ˆθj0 + log ˆθ1 ˆθ0 . We we predict by simply adding the weights of the sites that a student has visited to the bias term. Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
  • 13. Naive Bayes In practice, this works better than one might expect given its simplicity2 2 http://www.jstor.org/pss/1403452 Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 14. Naive Bayes Training is computationally cheap and scalable, and the model is easy to update given new observations2 2 http://www.springerlink.com/content/wu3g458834583125/ Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 15. Naive Bayes Performance varies with document representations and corresponding likelihood models2 2 http://ceas.cc/2006/15.pdf Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 16. Naive Bayes It’s often important to smooth parameter estimates (e.g., by adding pseudocounts) to avoid overfitting Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11