SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
“Naïve Bayes Classifier”
Papers We Love Bucharest
Stefan Alexandru Adam
28st of March 2016
TechHub
“Library Problem”
The technique of searching was the following:
• All documents were read by a person called “Indexer”
and then labeled with one or more keywords
• All keywords with their related addresses in the library
were kept together forming the so called “Library
vocabulary”. The keywords were stored in digital format
• Based on a query, an automatic device will search
through keywords and returns the documents locations
• Documents returned were equally ranked, no distinction
between them
• There was no middle ground between tags
• In 1960 it was observed that documents growing rate was
exponential
“Library Problem” – Probabilistic indexing
• Maron and Kuhns proposed a probabilistic
approach:
• Each tag should have weights associated with
the corresponding documents
• Based on Bayes Rules a “relevance number”
was determined
It was the first ranking system and also put
the basis of Naïve Bayes Classifier
𝑃 𝐷𝑖 𝐼𝑗 − 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝐷𝑖 𝑔𝑖𝑣𝑒𝑛 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝐼𝑗
What is Naïve Bayes Classifier
Machine Learning
Supervised
learning
Unsupervised
leaning
ClassifierClassifierRegression
ClassifierClassifierClustering
ClassifierClassifier
Association
Rule
Discovery
Statistical classification
Probabilistic
classification
ClassifierClassifierNaïve Bayes
Classifier
ClassifierClassifierClassifier
Statistical classification
• The problem of identifying to which set of categories (sub-populations)
a new observation belongs, on the basis of a training set of data
containing observations (or instances) whose category membership is
known
• An algorithm which implements a classification is called classifier
Probabilistic basis
Notations :
• Prior probability : P(A) – probability of A
• Conditional probability: P(A|B) - probability of A given B
• Joint probability: P(A,B) – probability of A and B
• 𝑃 𝐴, 𝐵 = 𝑃(𝐴) ∙ 𝑃(𝐵|𝐴)
Bayes Rule :
Thomas Bayes – English statistician
𝑃 𝐴|𝐵 =
)𝑃 𝐵 𝐴 ∙ 𝑃(𝐴
)𝑃(𝐵
Probabilistic Classification
• Given input 𝑋 = [𝑥1, 𝑥2, … , 𝑥 𝑛] and classes 𝐶1, 𝐶2, … , 𝐶 𝑚
1. Discriminative classifier
2. Generative classifier
Ex: Naïve Bayes
x1
x2
x3
.
.
xn
Discriminative
Classifier
P(C1)
P(C2)
P(C3)
.
.
P(Cm)
x1
x2
x3
.
.
xn
1
2
m
…
P(C1)
P(C2)
P(Cm)
Bayes classifier
Given:
𝑋 = [𝑥1, 𝑥2, … , 𝑥 𝑛], Classes 𝐶1, 𝐶2, … , 𝐶 𝑚 and a training set T
Define an algorithm which computes:
𝑃 𝐶𝑖|𝑋 using Bayes Rule:
Return 𝐶 𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐶 𝑖
𝑃(𝑋|𝐶𝑖)
= 𝑃 𝒙 𝟏 𝐶𝑖 𝑃 𝒙 𝟐 𝐶𝑖, 𝑥1 𝑃 𝒙 𝟑 𝐶𝑖, 𝑥1, 𝑥2 … 𝑃 𝒙 𝒏 𝐶𝑖, 𝑥1, 𝑥2, … , 𝑥 𝑛−1
𝑷 𝑿|𝑪𝒊 = 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 is hard to compute in general
𝑃 𝐶𝑖|𝑋 =
𝑷 𝑿 𝑪𝒊 ∙ 𝑃(𝐶𝑖)
𝑃(𝑋)
Naïve Bayes Classifier
Compute:
𝑃 𝑋|𝐶𝑖 = 𝑃 𝒙 𝟏 𝐶𝑖 𝑃 𝒙 𝟐 𝐶𝑖, 𝑥1 … 𝑃 𝒙 𝒏 𝐶𝑖, 𝑥1, 𝑥2, … , 𝑥 𝑛−1 which is
known as a difficult problem
Solution:
Naïve (Idiot) assumption – consider that 𝑥1, 𝑥2, … , 𝑥 𝑛 attributes are
independent so that
𝑃 𝑋|𝐶𝑖 = 𝑃 𝑥1 𝐶𝑖 𝑃 𝑥2 𝐶𝑖 … 𝑃 𝑥 𝑛 𝐶𝑖
Naïve Bayes – Discrete value features
• Categorical distribution CD=(𝑤1, 𝑤2, . . , 𝑤 𝑛) where n is the number of
categories and 𝑤𝑖 = 1
• Problem – Find {𝑤𝑗} e. g. compute 𝑃 𝑥𝑗 𝐶𝑖 when 𝑥𝑗 is a discrete value
attribute 𝑥𝑗 ∈ 𝑣1, 𝑣2, 𝑣3, …
𝑃 𝑥𝑗 𝐶𝑖 =
𝐶𝑜𝑢𝑛𝑡(𝑇,𝐶 𝑖,𝑥 𝑗)
𝐶𝑜𝑢𝑛𝑡(𝑇,𝐶 𝑖)
Example:
𝑃(𝑊𝑖𝑛𝑑 = 𝑤𝑒𝑎𝑘│𝑁𝑜) =
2
5
𝑃(𝑊𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔│𝑁𝑜) =
3
5
CD =(
2
5
,
3
5
)
Naïve Bayes – continuous values features
• Usually use Normal (Gaussian) distribution N (µ,σ)
• 𝑃 𝑥 =
1
2𝜋𝜎
𝑒
(−
𝑥−μ
2𝜎2)
where µ is the mean andσis the standard deviation
μ =
1
𝑁
𝑥𝑖, 𝜎 =
1
𝑁
(𝑥𝑖 − 𝜇)2
• Suppose the temperature is continuous
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
Compute µ andσfor each class
𝜇 𝑌𝑒𝑠 = 21.64, 𝜎 𝑌𝑒𝑠 = 2.39
𝜇 𝑁𝑜 = 23.88, 𝜎 𝑁𝑜 = 7.09
Naïve Bayes Algorithm
• Learning phase
• Testing Phase – Given unknown 𝑋′
= 𝑥1
′
, 𝑥2
′
, … , 𝑥 𝑛
′
estimate class = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐶 𝑖
𝑃(𝐶𝑖) ∙ 𝑗 𝐷𝑖𝑗
For each class value 𝐶𝑖
Compute 𝑃(𝐶𝑖)
For each attribute 𝑋𝑗
//Compute Distribution 𝐷𝑖𝑗
If attribute 𝑋𝑗 is discrete
𝐷𝑖𝑗 = Categorical distribution for 𝐶𝑖 and 𝑋𝑗
Else
𝐷𝑖𝑗 = Normal distribution for 𝐶𝑖 and 𝑋𝑗
Example on “Play Tennis” Data
What if X’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Day Outlook Temperature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
P(Yes)=9/14
P(No)=5/14
Example on “Play Tennis” Data
What if X’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Outlook Play =Yes Play = no
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play =Yes Play = no
Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5
Humidity Play =Yes Play = no
High 3/9 4/5
Normal 6/9 1/5
Wind Play =Yes Play = no
Strong 3/9 3/5
Weak 6/9 2/5
P(Yes)=9/14
P(No)=5/14
P(Yes|X’) = 2/9*3/9*3/9*3/9*9/14=486/91854 = 0.0052
P(No|X’) = 3/5*1/5*4/5*3/5*5/14=180/8750 = 0.02
Because P(No|X’)> P(Yes|X’) the answer is No
Important Issues
• Violation of Independence Assumption
• Even if in many real cases 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 ≠ 𝑃 𝑥1 𝐶𝑖 𝑃 𝑥2 𝐶𝑖 … 𝑃 𝑥 𝑛 𝐶𝑖
Naïve Bayes works very well
• Zero conditional probability handling
• In case 𝑃 𝑥𝑗 𝐶𝑖 = 0 then 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 = 0
• To fix this issue consider applying Laplace Smoothing
𝑃 𝑥𝑗 𝐶𝑖 =
𝐶𝑜𝑢𝑛𝑡 𝑇,𝐶 𝑖,𝑥 𝑗 +1
𝐶𝑜𝑢𝑛𝑡 𝑇,𝐶 𝑖 +𝑘
, 𝑤ℎ𝑒𝑟𝑒 𝑥𝑗 = {𝑣1, 𝑣2, … , 𝑣 𝑘}
Important notes
• Very competitive (proved success in spam filtering)
• Fast and easy to implement
• A good candidate of a base learner in ensemble learning
• Very popular
Naïve Bayes Applications
• Text classifications. Widely used in:
• Spam filtering
• Classify documents based on topics (technology, politics, science, etc.)
• Sentiment analysis
• Information retrieval
• Image classifications (e.g. Face detection)
• Medical field
• Disease detection (Alzheimer's based on genome wide data)
• Treatment detection
Example On Sentiment Analysis
• Tag a text as being positive or negative
• Usual algorithms on Sentiment Analysis
• Naïve Bayes > 70-90 percent efficient
• Entropy Maximization
• Support Vector Machines (one of the most efficient)
Sentiment Analysis Engine
Training Set
Features Extraction
Split to words (tokenize)
Stemming (Lemma)
Negation handling
Correction, N-Grams
Match in Vocabulary
Machine
Learning
Algorithm
NB Classifier
Model
text class
text class
text class
w3 w1 … w class
Features
w4 w2 … w class
w6 w8 … w class
text w3 w1 … w class
Demo

Mais conteúdo relacionado

Mais procurados

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
Text classification
Text classificationText classification
Text classificationHarry Potter
 
Introductory maths analysis chapter 12 official
Introductory maths analysis   chapter 12 officialIntroductory maths analysis   chapter 12 official
Introductory maths analysis chapter 12 officialEvert Sandye Taasiringan
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2butest
 
Bayesian statistics intro using r
Bayesian statistics intro using rBayesian statistics intro using r
Bayesian statistics intro using rJosue Guzman
 
Designing Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsDesigning Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsTetsuya Sakai
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation MaximizationAndres Mendez-Vazquez
 

Mais procurados (14)

Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
Text classification
Text classificationText classification
Text classification
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Sudoku
SudokuSudoku
Sudoku
 
Introductory maths analysis chapter 12 official
Introductory maths analysis   chapter 12 officialIntroductory maths analysis   chapter 12 official
Introductory maths analysis chapter 12 official
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Bayesian statistics intro using r
Bayesian statistics intro using rBayesian statistics intro using r
Bayesian statistics intro using r
 
Designing Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsDesigning Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many Systems
 
adaboost
adaboostadaboost
adaboost
 
Machine learning
Machine learningMachine learning
Machine learning
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization
 

Destaque

02. naive bayes classifier revision
02. naive bayes classifier   revision02. naive bayes classifier   revision
02. naive bayes classifier revisionJeonghun Yoon
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierYiqun Hu
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes ClassifiersDongseo University
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesNTNU
 
Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python Abhaya Agarwal
 
Modified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classificationModified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classificationHammad Haleem
 
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)PyData
 
10 roses
10 roses10 roses
10 roseshoabido
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaionRavindra Chaudhary
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment AnalyzerAnkit Raj
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysisAnkit Khera
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 

Destaque (20)

02. naive bayes classifier revision
02. naive bayes classifier   revision02. naive bayes classifier   revision
02. naive bayes classifier revision
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python Wikipedia, Dead Authors, Naive Bayes and Python
Wikipedia, Dead Authors, Naive Bayes and Python
 
Modified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classificationModified naive bayes model for improved web page classification
Modified naive bayes model for improved web page classification
 
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
Bayesian Machine Learning & Python – Naïve Bayes (PyData SV 2013)
 
10 roses
10 roses10 roses
10 roses
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment Analyzer
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysis
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 

Semelhante a "Naive Bayes Classifier" @ Papers We Love Bucharest

baysian in machine learning in Supervised Learning .pptx
baysian in machine learning in Supervised Learning .pptxbaysian in machine learning in Supervised Learning .pptx
baysian in machine learning in Supervised Learning .pptxObsiElias
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsSantiagoGarridoBulln
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataYueshen Xu
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
Intro to statistical signal processing
Intro to statistical signal processingIntro to statistical signal processing
Intro to statistical signal processingNadav Carmel
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayesankit_ppt
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logicYamagata Yoriyuki
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptxKarasuLee
 
Lecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxLecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxSunny432360
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Intro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfIntro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfJifarRaya
 
Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdfJifarRaya
 

Semelhante a "Naive Bayes Classifier" @ Papers We Love Bucharest (20)

baysian in machine learning in Supervised Learning .pptx
baysian in machine learning in Supervised Learning .pptxbaysian in machine learning in Supervised Learning .pptx
baysian in machine learning in Supervised Learning .pptx
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Intro to statistical signal processing
Intro to statistical signal processingIntro to statistical signal processing
Intro to statistical signal processing
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
ML.pptx
ML.pptxML.pptx
ML.pptx
 
Lecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxLecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Stat 3203 -pps sampling
Stat 3203 -pps samplingStat 3203 -pps sampling
Stat 3203 -pps sampling
 
Intro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfIntro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdf
 
Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdf
 

Último

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native BuildpacksVish Abrams
 

Último (20)

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
 

"Naive Bayes Classifier" @ Papers We Love Bucharest

  • 1. “Naïve Bayes Classifier” Papers We Love Bucharest Stefan Alexandru Adam 28st of March 2016 TechHub
  • 2. “Library Problem” The technique of searching was the following: • All documents were read by a person called “Indexer” and then labeled with one or more keywords • All keywords with their related addresses in the library were kept together forming the so called “Library vocabulary”. The keywords were stored in digital format • Based on a query, an automatic device will search through keywords and returns the documents locations • Documents returned were equally ranked, no distinction between them • There was no middle ground between tags • In 1960 it was observed that documents growing rate was exponential
  • 3. “Library Problem” – Probabilistic indexing • Maron and Kuhns proposed a probabilistic approach: • Each tag should have weights associated with the corresponding documents • Based on Bayes Rules a “relevance number” was determined It was the first ranking system and also put the basis of Naïve Bayes Classifier 𝑃 𝐷𝑖 𝐼𝑗 − 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝐷𝑖 𝑔𝑖𝑣𝑒𝑛 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝐼𝑗
  • 4. What is Naïve Bayes Classifier Machine Learning Supervised learning Unsupervised leaning ClassifierClassifierRegression ClassifierClassifierClustering ClassifierClassifier Association Rule Discovery Statistical classification Probabilistic classification ClassifierClassifierNaïve Bayes Classifier ClassifierClassifierClassifier
  • 5. Statistical classification • The problem of identifying to which set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known • An algorithm which implements a classification is called classifier
  • 6. Probabilistic basis Notations : • Prior probability : P(A) – probability of A • Conditional probability: P(A|B) - probability of A given B • Joint probability: P(A,B) – probability of A and B • 𝑃 𝐴, 𝐵 = 𝑃(𝐴) ∙ 𝑃(𝐵|𝐴) Bayes Rule : Thomas Bayes – English statistician 𝑃 𝐴|𝐵 = )𝑃 𝐵 𝐴 ∙ 𝑃(𝐴 )𝑃(𝐵
  • 7. Probabilistic Classification • Given input 𝑋 = [𝑥1, 𝑥2, … , 𝑥 𝑛] and classes 𝐶1, 𝐶2, … , 𝐶 𝑚 1. Discriminative classifier 2. Generative classifier Ex: Naïve Bayes x1 x2 x3 . . xn Discriminative Classifier P(C1) P(C2) P(C3) . . P(Cm) x1 x2 x3 . . xn 1 2 m … P(C1) P(C2) P(Cm)
  • 8. Bayes classifier Given: 𝑋 = [𝑥1, 𝑥2, … , 𝑥 𝑛], Classes 𝐶1, 𝐶2, … , 𝐶 𝑚 and a training set T Define an algorithm which computes: 𝑃 𝐶𝑖|𝑋 using Bayes Rule: Return 𝐶 𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐶 𝑖 𝑃(𝑋|𝐶𝑖) = 𝑃 𝒙 𝟏 𝐶𝑖 𝑃 𝒙 𝟐 𝐶𝑖, 𝑥1 𝑃 𝒙 𝟑 𝐶𝑖, 𝑥1, 𝑥2 … 𝑃 𝒙 𝒏 𝐶𝑖, 𝑥1, 𝑥2, … , 𝑥 𝑛−1 𝑷 𝑿|𝑪𝒊 = 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 is hard to compute in general 𝑃 𝐶𝑖|𝑋 = 𝑷 𝑿 𝑪𝒊 ∙ 𝑃(𝐶𝑖) 𝑃(𝑋)
  • 9. Naïve Bayes Classifier Compute: 𝑃 𝑋|𝐶𝑖 = 𝑃 𝒙 𝟏 𝐶𝑖 𝑃 𝒙 𝟐 𝐶𝑖, 𝑥1 … 𝑃 𝒙 𝒏 𝐶𝑖, 𝑥1, 𝑥2, … , 𝑥 𝑛−1 which is known as a difficult problem Solution: Naïve (Idiot) assumption – consider that 𝑥1, 𝑥2, … , 𝑥 𝑛 attributes are independent so that 𝑃 𝑋|𝐶𝑖 = 𝑃 𝑥1 𝐶𝑖 𝑃 𝑥2 𝐶𝑖 … 𝑃 𝑥 𝑛 𝐶𝑖
  • 10. Naïve Bayes – Discrete value features • Categorical distribution CD=(𝑤1, 𝑤2, . . , 𝑤 𝑛) where n is the number of categories and 𝑤𝑖 = 1 • Problem – Find {𝑤𝑗} e. g. compute 𝑃 𝑥𝑗 𝐶𝑖 when 𝑥𝑗 is a discrete value attribute 𝑥𝑗 ∈ 𝑣1, 𝑣2, 𝑣3, … 𝑃 𝑥𝑗 𝐶𝑖 = 𝐶𝑜𝑢𝑛𝑡(𝑇,𝐶 𝑖,𝑥 𝑗) 𝐶𝑜𝑢𝑛𝑡(𝑇,𝐶 𝑖) Example: 𝑃(𝑊𝑖𝑛𝑑 = 𝑤𝑒𝑎𝑘│𝑁𝑜) = 2 5 𝑃(𝑊𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔│𝑁𝑜) = 3 5 CD =( 2 5 , 3 5 )
  • 11. Naïve Bayes – continuous values features • Usually use Normal (Gaussian) distribution N (µ,σ) • 𝑃 𝑥 = 1 2𝜋𝜎 𝑒 (− 𝑥−μ 2𝜎2) where µ is the mean andσis the standard deviation μ = 1 𝑁 𝑥𝑖, 𝜎 = 1 𝑁 (𝑥𝑖 − 𝜇)2 • Suppose the temperature is continuous Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8 No: 27.3, 30.1, 17.4, 29.5, 15.1 Compute µ andσfor each class 𝜇 𝑌𝑒𝑠 = 21.64, 𝜎 𝑌𝑒𝑠 = 2.39 𝜇 𝑁𝑜 = 23.88, 𝜎 𝑁𝑜 = 7.09
  • 12. Naïve Bayes Algorithm • Learning phase • Testing Phase – Given unknown 𝑋′ = 𝑥1 ′ , 𝑥2 ′ , … , 𝑥 𝑛 ′ estimate class = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐶 𝑖 𝑃(𝐶𝑖) ∙ 𝑗 𝐷𝑖𝑗 For each class value 𝐶𝑖 Compute 𝑃(𝐶𝑖) For each attribute 𝑋𝑗 //Compute Distribution 𝐷𝑖𝑗 If attribute 𝑋𝑗 is discrete 𝐷𝑖𝑗 = Categorical distribution for 𝐶𝑖 and 𝑋𝑗 Else 𝐷𝑖𝑗 = Normal distribution for 𝐶𝑖 and 𝑋𝑗
  • 13. Example on “Play Tennis” Data What if X’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) Day Outlook Temperature Humidity Wind Play Tennis Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No Day3 Overcast Hot High Weak Yes Day4 Rain Mild High Weak Yes Day5 Rain Cool Normal Weak Yes Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes Day8 Sunny Mild High Weak No Day9 Sunny Cool Normal Weak Yes Day10 Rain Mild Normal Weak Yes Day11 Sunny Mild Normal Strong Yes Day12 Overcast Mild High Strong Yes Day13 Overcast Hot Normal Weak Yes Day14 Rain Mild High Strong No P(Yes)=9/14 P(No)=5/14
  • 14. Example on “Play Tennis” Data What if X’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) Outlook Play =Yes Play = no Sunny 2/9 3/5 Overcast 4/9 0/5 Rain 3/9 2/5 Temperature Play =Yes Play = no Hot 2/9 2/5 Mild 4/9 2/5 Cool 3/9 1/5 Humidity Play =Yes Play = no High 3/9 4/5 Normal 6/9 1/5 Wind Play =Yes Play = no Strong 3/9 3/5 Weak 6/9 2/5 P(Yes)=9/14 P(No)=5/14 P(Yes|X’) = 2/9*3/9*3/9*3/9*9/14=486/91854 = 0.0052 P(No|X’) = 3/5*1/5*4/5*3/5*5/14=180/8750 = 0.02 Because P(No|X’)> P(Yes|X’) the answer is No
  • 15. Important Issues • Violation of Independence Assumption • Even if in many real cases 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 ≠ 𝑃 𝑥1 𝐶𝑖 𝑃 𝑥2 𝐶𝑖 … 𝑃 𝑥 𝑛 𝐶𝑖 Naïve Bayes works very well • Zero conditional probability handling • In case 𝑃 𝑥𝑗 𝐶𝑖 = 0 then 𝑃 𝑥1, 𝑥2, … , 𝑥 𝑛 𝐶𝑖 = 0 • To fix this issue consider applying Laplace Smoothing 𝑃 𝑥𝑗 𝐶𝑖 = 𝐶𝑜𝑢𝑛𝑡 𝑇,𝐶 𝑖,𝑥 𝑗 +1 𝐶𝑜𝑢𝑛𝑡 𝑇,𝐶 𝑖 +𝑘 , 𝑤ℎ𝑒𝑟𝑒 𝑥𝑗 = {𝑣1, 𝑣2, … , 𝑣 𝑘}
  • 16. Important notes • Very competitive (proved success in spam filtering) • Fast and easy to implement • A good candidate of a base learner in ensemble learning • Very popular
  • 17. Naïve Bayes Applications • Text classifications. Widely used in: • Spam filtering • Classify documents based on topics (technology, politics, science, etc.) • Sentiment analysis • Information retrieval • Image classifications (e.g. Face detection) • Medical field • Disease detection (Alzheimer's based on genome wide data) • Treatment detection
  • 18. Example On Sentiment Analysis • Tag a text as being positive or negative • Usual algorithms on Sentiment Analysis • Naïve Bayes > 70-90 percent efficient • Entropy Maximization • Support Vector Machines (one of the most efficient)
  • 19. Sentiment Analysis Engine Training Set Features Extraction Split to words (tokenize) Stemming (Lemma) Negation handling Correction, N-Grams Match in Vocabulary Machine Learning Algorithm NB Classifier Model text class text class text class w3 w1 … w class Features w4 w2 … w class w6 w8 … w class text w3 w1 … w class
  • 20. Demo

Notas do Editor

  1. Supervised learning is the machine learning task of inferring a function from labeled training data
  2. Quinlan 1986 journal Machine Learning Induction Of Decision Trees