SlideShare uma empresa Scribd logo
1 de 22
Text Classification and Naïve Bayes
An example of text classification
Definition of a machine learning problem
A refresher on probability
The Naive Bayes classifier
1
Google News
2
Different ways for classification
Human labor (people assign categories to every incoming
article)
Hand-crafted rules for automatic classification
 If article contains: stock, Dow, share, Nasdaq, etc.  Business
 If article contains: set, breakpoint, player, Federer, etc.  Tennis
Machine learning algorithms
3
What is Machine Learning?
4
Definition: A computer program is said to learn from
experience E when its performance P at a task T
improves with experience E.
Tom Mitchell, Machine Learning, 1997
Examples:
- Learning to recognize spoken words
- Learning to drive a vehicle
- Learning to play backgammon
Components of a ML System (1)
Experience (a set of examples that combines together
input and output for a task)
 Text categorization: document + category
 Speech recognition: spoken text + written text
Experience is referred to as Training Data. When training
data is available, we talk of Supervised Learning.
Performance metrics
 Error or accuracy in the Test Data
 Test Data are not present in the Training Data
 When there are few training data, methods like ‘leave-one-out’ or
‘ten-fold cross validation’ are used to measure error.
5
Components of a ML System (2)
Type of knowledge to be learned (known as the target
function, that will map between input and output)
Representation of the target function
 Decision trees
 Neural networks
 Linear functions
The learning algorithm
 C4.5 (learns decision trees)
 Gradient descent (learns a neural network)
 Linear programming (learns linear functions)
6
Task
Defining Text Classification
7
XdX∈d
},,,{ 21 Jccc =C
D cd,
C×∈Xcd,
C→X:γ
γ=Γ D)(
the document in the multi-dimensional space
a set of classes (categories, or labels)
the training set of labeled documents
Target function:
Learning algorithm:
=cd, “Beijing joins the World Trade Organization”, China
cd =)(γ =)(dγ China
Naïve Bayes Learning
8
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
cd =)(γ
Learning Algorithm: Naïve Bayes
Target Function:
)|()(maxarg)|(maxarg cdPcPdcPc
CcCc
MAP
∈∈
==
)(cP
)|( cdP
The generative process:
)|( dcP
a priori probability, of choosing a category
the cond. prob. of generating d, given the fixed c
a posteriori probability that c generated d
A Refresher on Probability
9
Visualizing probability
A is a random variable that denotes an uncertain event
 Example: A = “I’ll get an A+ in the final exam”
P(A) is “the fraction of possible worlds where A is true”
10
Worlds in
which A
is true
Slide: Andrew W. Moore
Worlds in which A is false
Event space of all possible
worlds. Its area is 1.
P(A) = Area of the blue
circle.
Axioms and Theorems of Probability
Axioms:
 0 <= P(A) <= 1
 P(True) = 1
 P(False) = 0
 P(A or B) = P(A) + P(B) – P(A and B)
Theorems:
 P(not A) = P(~A) = 1 – P(A)
 P(A) = P(A ^ B) + P(A ^ ~B)
11
Conditional Probability
P(A|B) = the probability of A being true, given that we
know that B is true
12
F
H
H = “I have a headache”
F = “Coming down with flu”
P(H) = 1/10
P(F) = 1/40
P(H/F) = 1/2
Slide: Andrew W. Moore
Headaches are rare and flu
even rarer, but if you got that flu,
there is a 50-50 chance you’ll
have a headache.
Deriving the Bayes Rule
13
)(
)(
)|(
BP
BAP
BAP
∧
=Conditional Probability:
)()|()( BPBAPBAP =∧Chain rule:
)()|()()( APABPABPBAP =∧=∧
Bayes Rule:
)(
)()|(
)|(
AP
BPBAP
ABP =
Back to the Naïve Bayes Classifier
14
Deriving the Naïve Bayes
15
)(
)()|(
)|(
AP
BPBAP
ABP = (Bayes Rule)
21,cc 'dGiven two classes and the document
)'(
)|'()(
)'|( 11
1
dP
cdPcP
dcP =
)'(
)|'()(
)'|( 22
2
dP
cdPcP
dcP =
We are looking for a that maximizes the a-posterioriic )'|( dcP i
)'(dP (the denominator) is the same in both cases
)|()(maxarg cdPcPc
Cc
MAP
∈
=Thus:
Estimating parameters for the
target function
We are looking for the estimates and
16
)(ˆ cP )|(ˆ cdP
P(c) is the fraction of possible worlds where c is true.
N
N
cP c
=)(ˆ N – number of all documents
Nc – number of documents in class c
d is a vector in the space X
)|,,,()|( 2 ctttPcdP dni =
where each dimension is a term:
)()|()( BPBAPBAP =∧By using the chain rule: we have:
(P
),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni =
...=
Naïve assumptions of independence
1. All attribute values are independent of each other given
the class. (conditional independence assumption)
2. The conditional probabilities for a term are the same
independent of position in the document.
We assume the document is a “bag-of-words”.
17
∏≤≤
==
d
d
nk
kni ctPctttPcdP
1
2 )|()|,,,()|( 
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
Finally, we get the target function of Slide 8:
Again about estimation
18
For each term, t, we need to estimate P(t|c)
∑ ∈
=
Vt ct
ct
T
T
ctP
' '
)|(ˆ
Because an estimate will be 0 if a term does not appear with a class
in the training data, we need smoothing:
||)(
1
)1(
1
)|(ˆ
' '' ' VT
T
T
T
ctP
Vt ct
ct
Vt ct
ct
∑∑ ∈∈
+
+
=
+
+
=Laplace
Smoothing
|V| is the number of terms in the vocabulary
Tct is the count of term t in all documents of class c
An Example of classification with
Naïve Bayes
19
Example 13.1 (Part 1)
20
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
Two classes: “China”, “not China”
N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP
V = {Beijing, Chinese, Japan, Macao, Tokyo}
Example 13.1 (Part 1)
21
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
7/3)68/()15()|Chinese(ˆ =++=cP
14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
9/2)63/()11()|Chinese(ˆ =++=cP
9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
Estimation Classification
∏≤≤
∝
dnk
k ctPcPdcP
1
)|()()|(
0001.09/29/2)9/2(4/1)|(
0003.014/114/1)7/3(4/3)|(
3
5
3
5
≈⋅⋅⋅∝
≈⋅⋅⋅∝
dcP
dcP
Summary: Miscellanious
Naïve Bayes is linear in the time is takes to scan the data
When we have many terms, the product of probabilities
with cause a floating point underflow, therefore:
For a large training set, the vocabulary is large. It is better
to select only a subset of terms. For that is used “feature
selection” (Section 13.5).
22
∑≤≤∈
+=
dnk
k
Cc
MAP ctPcPc
1
)|(log)(ˆ[logmaxarg

Mais conteúdo relacionado

Mais procurados

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classificationManu Chandel
 

Mais procurados (20)

NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Text categorization
Text categorizationText categorization
Text categorization
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Word embedding
Word embedding Word embedding
Word embedding
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classification
 

Destaque

Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027ankitgadgil
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsDKALab
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in REdureka!
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSkillspeed
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learningbutest
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive BayesJosh Patterson
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes ClassifiersDongseo University
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 

Destaque (20)

Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Semelhante a Text classification

Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsCheng-You Lu
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentNazeer Wahab
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.pptImXaib
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptmoiza354
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxavinashBajpayee1
 

Semelhante a Text classification (20)

Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labels
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Introduction
IntroductionIntroduction
Introduction
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
 
Midterm
MidtermMidterm
Midterm
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 

Mais de Harry Potter

How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptxHarry Potter
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningHarry Potter
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningHarry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHarry Potter
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceHarry Potter
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksHarry Potter
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsHarry Potter
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheHarry Potter
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsHarry Potter
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesHarry Potter
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaHarry Potter
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous classHarry Potter
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisHarry Potter
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your siteHarry Potter
 

Mais de Harry Potter (20)

How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptx
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Smm & caching
Smm & cachingSmm & caching
Smm & caching
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Object model
Object modelObject model
Object model
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous class
 
Abstract class
Abstract classAbstract class
Abstract class
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Api crash
Api crashApi crash
Api crash
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Text classification

  • 1. Text Classification and Naïve Bayes An example of text classification Definition of a machine learning problem A refresher on probability The Naive Bayes classifier 1
  • 3. Different ways for classification Human labor (people assign categories to every incoming article) Hand-crafted rules for automatic classification  If article contains: stock, Dow, share, Nasdaq, etc.  Business  If article contains: set, breakpoint, player, Federer, etc.  Tennis Machine learning algorithms 3
  • 4. What is Machine Learning? 4 Definition: A computer program is said to learn from experience E when its performance P at a task T improves with experience E. Tom Mitchell, Machine Learning, 1997 Examples: - Learning to recognize spoken words - Learning to drive a vehicle - Learning to play backgammon
  • 5. Components of a ML System (1) Experience (a set of examples that combines together input and output for a task)  Text categorization: document + category  Speech recognition: spoken text + written text Experience is referred to as Training Data. When training data is available, we talk of Supervised Learning. Performance metrics  Error or accuracy in the Test Data  Test Data are not present in the Training Data  When there are few training data, methods like ‘leave-one-out’ or ‘ten-fold cross validation’ are used to measure error. 5
  • 6. Components of a ML System (2) Type of knowledge to be learned (known as the target function, that will map between input and output) Representation of the target function  Decision trees  Neural networks  Linear functions The learning algorithm  C4.5 (learns decision trees)  Gradient descent (learns a neural network)  Linear programming (learns linear functions) 6 Task
  • 7. Defining Text Classification 7 XdX∈d },,,{ 21 Jccc =C D cd, C×∈Xcd, C→X:γ γ=Γ D)( the document in the multi-dimensional space a set of classes (categories, or labels) the training set of labeled documents Target function: Learning algorithm: =cd, “Beijing joins the World Trade Organization”, China cd =)(γ =)(dγ China
  • 8. Naïve Bayes Learning 8 ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg cd =)(γ Learning Algorithm: Naïve Bayes Target Function: )|()(maxarg)|(maxarg cdPcPdcPc CcCc MAP ∈∈ == )(cP )|( cdP The generative process: )|( dcP a priori probability, of choosing a category the cond. prob. of generating d, given the fixed c a posteriori probability that c generated d
  • 9. A Refresher on Probability 9
  • 10. Visualizing probability A is a random variable that denotes an uncertain event  Example: A = “I’ll get an A+ in the final exam” P(A) is “the fraction of possible worlds where A is true” 10 Worlds in which A is true Slide: Andrew W. Moore Worlds in which A is false Event space of all possible worlds. Its area is 1. P(A) = Area of the blue circle.
  • 11. Axioms and Theorems of Probability Axioms:  0 <= P(A) <= 1  P(True) = 1  P(False) = 0  P(A or B) = P(A) + P(B) – P(A and B) Theorems:  P(not A) = P(~A) = 1 – P(A)  P(A) = P(A ^ B) + P(A ^ ~B) 11
  • 12. Conditional Probability P(A|B) = the probability of A being true, given that we know that B is true 12 F H H = “I have a headache” F = “Coming down with flu” P(H) = 1/10 P(F) = 1/40 P(H/F) = 1/2 Slide: Andrew W. Moore Headaches are rare and flu even rarer, but if you got that flu, there is a 50-50 chance you’ll have a headache.
  • 13. Deriving the Bayes Rule 13 )( )( )|( BP BAP BAP ∧ =Conditional Probability: )()|()( BPBAPBAP =∧Chain rule: )()|()()( APABPABPBAP =∧=∧ Bayes Rule: )( )()|( )|( AP BPBAP ABP =
  • 14. Back to the Naïve Bayes Classifier 14
  • 15. Deriving the Naïve Bayes 15 )( )()|( )|( AP BPBAP ABP = (Bayes Rule) 21,cc 'dGiven two classes and the document )'( )|'()( )'|( 11 1 dP cdPcP dcP = )'( )|'()( )'|( 22 2 dP cdPcP dcP = We are looking for a that maximizes the a-posterioriic )'|( dcP i )'(dP (the denominator) is the same in both cases )|()(maxarg cdPcPc Cc MAP ∈ =Thus:
  • 16. Estimating parameters for the target function We are looking for the estimates and 16 )(ˆ cP )|(ˆ cdP P(c) is the fraction of possible worlds where c is true. N N cP c =)(ˆ N – number of all documents Nc – number of documents in class c d is a vector in the space X )|,,,()|( 2 ctttPcdP dni = where each dimension is a term: )()|()( BPBAPBAP =∧By using the chain rule: we have: (P ),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni = ...=
  • 17. Naïve assumptions of independence 1. All attribute values are independent of each other given the class. (conditional independence assumption) 2. The conditional probabilities for a term are the same independent of position in the document. We assume the document is a “bag-of-words”. 17 ∏≤≤ == d d nk kni ctPctttPcdP 1 2 )|()|,,,()|(  ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg Finally, we get the target function of Slide 8:
  • 18. Again about estimation 18 For each term, t, we need to estimate P(t|c) ∑ ∈ = Vt ct ct T T ctP ' ' )|(ˆ Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: ||)( 1 )1( 1 )|(ˆ ' '' ' VT T T T ctP Vt ct ct Vt ct ct ∑∑ ∈∈ + + = + + =Laplace Smoothing |V| is the number of terms in the vocabulary Tct is the count of term t in all documents of class c
  • 19. An Example of classification with Naïve Bayes 19
  • 20. Example 13.1 (Part 1) 20 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Two classes: “China”, “not China” N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP V = {Beijing, Chinese, Japan, Macao, Tokyo}
  • 21. Example 13.1 (Part 1) 21 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? 7/3)68/()15()|Chinese(ˆ =++=cP 14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP 9/2)63/()11()|Chinese(ˆ =++=cP 9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP Estimation Classification ∏≤≤ ∝ dnk k ctPcPdcP 1 )|()()|( 0001.09/29/2)9/2(4/1)|( 0003.014/114/1)7/3(4/3)|( 3 5 3 5 ≈⋅⋅⋅∝ ≈⋅⋅⋅∝ dcP dcP
  • 22. Summary: Miscellanious Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection” (Section 13.5). 22 ∑≤≤∈ += dnk k Cc MAP ctPcPc 1 )|(log)(ˆ[logmaxarg

Notas do Editor

  1. Q: What is different in this definition from other types of computer programs? A: We do not speak about experience in other occasions, just about the task and performance criteria. Q: If the task T is speech recognition, could you imagine what would be E and P? A: E would be examples of spoken text, i.e., the computer has the written text and while someone speaks the computer matches the written words to the spoken words. P (performance) will be the number of words that the computer recognizes correctly.
  2. We give the target function at the beginning, but we say that we are going to explain later on how this formula is derived (after the refresher in probability). Give the example of selecting topics for the class project, that means, selecting c. Then, given c, the choice of d, is conditional, P(d|c).
  3. It is clear that calculating all the parameters that derive from the application of the chain rule is infeasible. Therefore, we need the naïve assumptions of independence in next page.