SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Naïve Bayes for the
Superbowl
John Liu
Nashville Machine Learning Meetup
January 27th, 2015
NashML Goals
Create a hub for like-minded people to come together,
share knowledge and collaborate on interesting domains.
Experienced
MLers
ML
Hackers
Topic
Presentations
Collaborative
Practice
Common Platform
Platform
! IPython Notebook (Project Jupyter)
!   Java, Scala, Python (others?)
! Scikit-learn
!   PyLearn2/Theano
! iTorch
!   AWS/Mahout/Spark/Mllib?
Rev. Thomas Bayes
“An Essay towards solving a Problem in the Doctrine of
Chances” published posthumously 1763
I now send you an essay which I have
found among the papers of our
deceased friend Mr. Bayes, and which,
in my opinion, has great merit, and
well deserves to be preserved.
Experimental philosophy, you will
find, is nearly interested in the subject
of it; and on this account there seems to
be particular reason for thinking that
a communication of it to the Royal
Society cannot be improper…
Bayes’ Theorem
P(A | B)P(B) = P(A∩B) = P(B | A)P(A)
P(A | B) =
P(A∩B)
P(B)
P(A | B) =
P(B | A)P(A)
P(B)
Statistical Inference
P(A | B) =
P(B | A)P(A)
P(B)
Likelihood Prior
EvidencePosterior
(What we want to find)
Intuition Behind Bayes
A Priori Initial Belief (model)
Evidence See the Data
Likelihood How likely to see Data given Belief
A Posteriori Updated Belief after seeing Data
posterior =
prior•likelihood
evidence
Example
Blackjack Insurance Bet:
What is the probability that a dealer
with an Ace showing has Blackjack?
Example
Prior P(Dealer has Blackjack) 32/663
Evidence P(Ace showing) 1/13
Likelihood P(Ace showing|has Blackjack) 1/2
Posterior P(Blackjack|Ace showing) 16/51
16/51 = 31% or less than 1/3 of the time.
Are you a Bayesian?
You read Burton Malkiel’s book “A Random Walk down
Wall Street” and believe in the Efficient Market
Hypothesis. Your broker gives you a tip to buy Tesla. You
ignore the broker and Tesla rises 100 days in a row.
As a Bayesian, do you believe:
A)  The stock is long due for a correction
B)  It is possible for Tesla to rise another 100 days in a row
C)  You were fooled by randomness
Naïve Bayes
You observe outcome Y with some n features X1, X2, ..Xn. The
joint density can be expressed using the chain rule:
P(Y, X1, X2, ..Xn) = P(X1, X2, ..Xn|Y) P(Y)
= P(Y) P(X1,Y) P(X2|Y,X1) P(X3|Y,X1,X2)…
This is messy, but simplifies if we naively assume independence,
P(X2|Y,X1) = P(X2|Y)
P(X3|Y,X2,X1) = P(X3|Y)
P(Xn|Y,Xn…X2,X1) = P(Xn|Y)
Naïve Bayes
Assumption
Naïve Bayes Classifier
Let K classes be denoted ck. The (conditional) probability of
class ck given that we observed features x1, x2,..xn is:
A Naïve Bayes classifier simply chooses the class with
highest probability (maximum a posteriori):
P(ck | x1, x2, ..xn ) = P(ck ) P(xi | ck )
i=1
n
∏
cNB = argmax
k∈K
P(ck ) P(xi | ck )
i=1
n
∏
Gaussian Naïve Bayes
When features xi are continuous valued, typically make the
assumption they are normally distributed:
Variance σki can be independent of xi and/or ck.
P(xi | ck ) =
1
2πσki
2
e
−
1
2
xi−µki
σki
"
#
$
%
&
'
2
cNB = argmax
k∈K
P(ck ) P(xi | ck )
i=1
n
∏
Multinomial Naïve Bayes
When features xi are the number of occurrences of n
possible events (words, votes, etc…)
pki = probability of i-th event occuring in class k
xi = frequency of i-th event
The multinomial Naïve Bayes classifier becomes:
cNB = argmax
k∈K
logP(ck )+ xi •log pki
i=1
n
∑
#
$
%
&
'
(
Naïve Bayes Intuition
!   Assumes all features are independent with each other
!   Independence assumption decouples the individual
distributions for each feature
!   Decoupling can overcome curse of dimensionality
!   Performance robust to irrelevant features
!   Very fast, low storage footprint
!   Good performance with multiple equally important
features
Naïve Bayes Applications
!   Document Categorization
!   NLP
!   Email Sorting
!   Collaborative Filtering
!   Sports Prediction
!   Sentiment Analysis
Example: Doc Classification
Want to classify documents into k topics. Document d
consisting of words wi is assigned to the topic cNB:
P(ck) = topic frequency =
P(wi|ck) = word wi frequency in all topic ck docs
cNB = argmax
k∈K
P(ck ) P(wi | ck )
i
∏
Ndocs(topic=ck )
Ndocs
=
Nword=wi (topic=ck )
Nword=wi (topic=ck )
i
∑
Laplace (add-1) Smoothing
What happens with P(wi|ck) = 0 for a particular i, k?
Solution is to add 1 to numerator & denominator:
P(ck | x1, x2, ..xn ) = P(ck ) P(xi | ck )
i=1
n
∏ = 0!
P(wi | ck ) =
Nword=wi (topic=ck ) +1
Nword=wi (topic=ck ) +1( )
i
∑
NBC Application Roadmap
!   Read Dataset
!   Transform Dataset
!   Create Classifier
!   Train Classifier
!   Make Prediction
Sports Prediction
Who is favored to win the Superbowl?
Given the 2014 season game statistics for two teams, how can we
make a prediction on the outcome of the next game using a Naïve
Bayes classifier?
Sentiment Analysis
Which team is most favored in each state?
What if we analyzed tweets by sentiment and location using a
Naïve Bayes Classifier?
Starter Code
Repo with Starter Code at:
https://github.com/guard0g/NaiveBayesForSuperbowl
IPython notebook: NB4Superbowl.ipynb
Datasets: SeattleStats.csv
NewEnglandStats.csv

Mais conteúdo relacionado

Semelhante a Naive Bayes for the Superbowl

Resources
ResourcesResources
Resources
butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
kkkc
 

Semelhante a Naive Bayes for the Superbowl (20)

Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Presentation X-SHS - 27 oct 2015 - Topologie et perception
Presentation X-SHS - 27 oct 2015 - Topologie et perceptionPresentation X-SHS - 27 oct 2015 - Topologie et perception
Presentation X-SHS - 27 oct 2015 - Topologie et perception
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Inex07
Inex07Inex07
Inex07
 
SESSION-11 PPT.pptx
SESSION-11 PPT.pptxSESSION-11 PPT.pptx
SESSION-11 PPT.pptx
 
aritficial intellegence
aritficial intellegencearitficial intellegence
aritficial intellegence
 
lect14-semantics.ppt
lect14-semantics.pptlect14-semantics.ppt
lect14-semantics.ppt
 
Resources
ResourcesResources
Resources
 
Logics of Context and Modal Type Theories
Logics of Context and Modal Type TheoriesLogics of Context and Modal Type Theories
Logics of Context and Modal Type Theories
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
lecture15-supervised.ppt
lecture15-supervised.pptlecture15-supervised.ppt
lecture15-supervised.ppt
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Predicates and quantifiers
Predicates and quantifiersPredicates and quantifiers
Predicates and quantifiers
 
rediscover mathematics from 0 and 1
rediscover mathematics from 0 and 1rediscover mathematics from 0 and 1
rediscover mathematics from 0 and 1
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
 
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Mais de John Liu

Mais de John Liu (13)

Kubeflow and Data Science in Kubernetes
Kubeflow and Data Science in KubernetesKubeflow and Data Science in Kubernetes
Kubeflow and Data Science in Kubernetes
 
Artificial Intelligence As a Service
Artificial Intelligence As a ServiceArtificial Intelligence As a Service
Artificial Intelligence As a Service
 
Machine Learning with Small Data
Machine Learning with Small DataMachine Learning with Small Data
Machine Learning with Small Data
 
Data Analytics in Computational Law
Data Analytics in Computational LawData Analytics in Computational Law
Data Analytics in Computational Law
 
AI & Machine Learning: Business Transformation
AI & Machine Learning: Business TransformationAI & Machine Learning: Business Transformation
AI & Machine Learning: Business Transformation
 
DeepREM
DeepREMDeepREM
DeepREM
 
Social Network Analysis for Healthcare
Social Network Analysis for HealthcareSocial Network Analysis for Healthcare
Social Network Analysis for Healthcare
 
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
Healthy Competition: How Adversarial Reasoning is Leading the Next Wave of In...
 
I2P and the Dark Web
I2P and the Dark WebI2P and the Dark Web
I2P and the Dark Web
 
Beyond Machine Learning: The New Generation of Learning Algorithms Coming to ...
Beyond Machine Learning: The New Generation of Learning Algorithms Coming to ...Beyond Machine Learning: The New Generation of Learning Algorithms Coming to ...
Beyond Machine Learning: The New Generation of Learning Algorithms Coming to ...
 
Behavioral Analytics for Financial Intelligence
Behavioral Analytics for Financial IntelligenceBehavioral Analytics for Financial Intelligence
Behavioral Analytics for Financial Intelligence
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 
Role of Data Science in ERM @ Nashville Analytics Summit Sep 2014
Role of Data Science in ERM @ Nashville Analytics Summit Sep 2014Role of Data Science in ERM @ Nashville Analytics Summit Sep 2014
Role of Data Science in ERM @ Nashville Analytics Summit Sep 2014
 

Último

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Naive Bayes for the Superbowl

  • 1. Naïve Bayes for the Superbowl John Liu Nashville Machine Learning Meetup January 27th, 2015
  • 2. NashML Goals Create a hub for like-minded people to come together, share knowledge and collaborate on interesting domains. Experienced MLers ML Hackers Topic Presentations Collaborative Practice Common Platform
  • 3. Platform ! IPython Notebook (Project Jupyter) !   Java, Scala, Python (others?) ! Scikit-learn !   PyLearn2/Theano ! iTorch !   AWS/Mahout/Spark/Mllib?
  • 4. Rev. Thomas Bayes “An Essay towards solving a Problem in the Doctrine of Chances” published posthumously 1763 I now send you an essay which I have found among the papers of our deceased friend Mr. Bayes, and which, in my opinion, has great merit, and well deserves to be preserved. Experimental philosophy, you will find, is nearly interested in the subject of it; and on this account there seems to be particular reason for thinking that a communication of it to the Royal Society cannot be improper…
  • 5. Bayes’ Theorem P(A | B)P(B) = P(A∩B) = P(B | A)P(A) P(A | B) = P(A∩B) P(B) P(A | B) = P(B | A)P(A) P(B)
  • 6. Statistical Inference P(A | B) = P(B | A)P(A) P(B) Likelihood Prior EvidencePosterior (What we want to find)
  • 7. Intuition Behind Bayes A Priori Initial Belief (model) Evidence See the Data Likelihood How likely to see Data given Belief A Posteriori Updated Belief after seeing Data posterior = prior•likelihood evidence
  • 8. Example Blackjack Insurance Bet: What is the probability that a dealer with an Ace showing has Blackjack?
  • 9. Example Prior P(Dealer has Blackjack) 32/663 Evidence P(Ace showing) 1/13 Likelihood P(Ace showing|has Blackjack) 1/2 Posterior P(Blackjack|Ace showing) 16/51 16/51 = 31% or less than 1/3 of the time.
  • 10. Are you a Bayesian? You read Burton Malkiel’s book “A Random Walk down Wall Street” and believe in the Efficient Market Hypothesis. Your broker gives you a tip to buy Tesla. You ignore the broker and Tesla rises 100 days in a row. As a Bayesian, do you believe: A)  The stock is long due for a correction B)  It is possible for Tesla to rise another 100 days in a row C)  You were fooled by randomness
  • 11. Naïve Bayes You observe outcome Y with some n features X1, X2, ..Xn. The joint density can be expressed using the chain rule: P(Y, X1, X2, ..Xn) = P(X1, X2, ..Xn|Y) P(Y) = P(Y) P(X1,Y) P(X2|Y,X1) P(X3|Y,X1,X2)… This is messy, but simplifies if we naively assume independence, P(X2|Y,X1) = P(X2|Y) P(X3|Y,X2,X1) = P(X3|Y) P(Xn|Y,Xn…X2,X1) = P(Xn|Y) Naïve Bayes Assumption
  • 12. Naïve Bayes Classifier Let K classes be denoted ck. The (conditional) probability of class ck given that we observed features x1, x2,..xn is: A Naïve Bayes classifier simply chooses the class with highest probability (maximum a posteriori): P(ck | x1, x2, ..xn ) = P(ck ) P(xi | ck ) i=1 n ∏ cNB = argmax k∈K P(ck ) P(xi | ck ) i=1 n ∏
  • 13. Gaussian Naïve Bayes When features xi are continuous valued, typically make the assumption they are normally distributed: Variance σki can be independent of xi and/or ck. P(xi | ck ) = 1 2πσki 2 e − 1 2 xi−µki σki " # $ % & ' 2 cNB = argmax k∈K P(ck ) P(xi | ck ) i=1 n ∏
  • 14. Multinomial Naïve Bayes When features xi are the number of occurrences of n possible events (words, votes, etc…) pki = probability of i-th event occuring in class k xi = frequency of i-th event The multinomial Naïve Bayes classifier becomes: cNB = argmax k∈K logP(ck )+ xi •log pki i=1 n ∑ # $ % & ' (
  • 15. Naïve Bayes Intuition !   Assumes all features are independent with each other !   Independence assumption decouples the individual distributions for each feature !   Decoupling can overcome curse of dimensionality !   Performance robust to irrelevant features !   Very fast, low storage footprint !   Good performance with multiple equally important features
  • 16. Naïve Bayes Applications !   Document Categorization !   NLP !   Email Sorting !   Collaborative Filtering !   Sports Prediction !   Sentiment Analysis
  • 17. Example: Doc Classification Want to classify documents into k topics. Document d consisting of words wi is assigned to the topic cNB: P(ck) = topic frequency = P(wi|ck) = word wi frequency in all topic ck docs cNB = argmax k∈K P(ck ) P(wi | ck ) i ∏ Ndocs(topic=ck ) Ndocs = Nword=wi (topic=ck ) Nword=wi (topic=ck ) i ∑
  • 18. Laplace (add-1) Smoothing What happens with P(wi|ck) = 0 for a particular i, k? Solution is to add 1 to numerator & denominator: P(ck | x1, x2, ..xn ) = P(ck ) P(xi | ck ) i=1 n ∏ = 0! P(wi | ck ) = Nword=wi (topic=ck ) +1 Nword=wi (topic=ck ) +1( ) i ∑
  • 19. NBC Application Roadmap !   Read Dataset !   Transform Dataset !   Create Classifier !   Train Classifier !   Make Prediction
  • 20. Sports Prediction Who is favored to win the Superbowl? Given the 2014 season game statistics for two teams, how can we make a prediction on the outcome of the next game using a Naïve Bayes classifier?
  • 21. Sentiment Analysis Which team is most favored in each state? What if we analyzed tweets by sentiment and location using a Naïve Bayes Classifier?
  • 22. Starter Code Repo with Starter Code at: https://github.com/guard0g/NaiveBayesForSuperbowl IPython notebook: NB4Superbowl.ipynb Datasets: SeattleStats.csv NewEnglandStats.csv