O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
문서 필터링
집단지성 프로그래밍 Ch.6
허윤
Document Filtering
 Filtering == Classification Problem
Data Mining Problem
EstimationClassification
Predication
Clusteri...
Spam Filtering
 Binary Classification Problem
‘Spam’ or ‘Ham’
 Techniques
Naïve Bayesian Classifier
Support Vector Machi...
Spam Filtering in Practice
Referred at: Sahil Puri1 et al, “COMPARISON AND ANALYSIS OF SPAM DETECTION ALGORITHMS”, 2013, I...
Referred at: Rene, “New insights into Gmail’s spam filtering”, 2012, emailmarketingtipps.de
Naïve Bayesian Classifier
 Bayes Theorem
 Naïve?
Bayesian Theorem with string independence assumption
 Classifier ignor...
 Example
1. 상자 A가 선택될 확률 P( A ) = 7 / 10
2. 상자 A에서 흰공 뽑힐 확률 P( 흰공 | A )= 2 / 10
3. 주머니에서는 A, 상자 A에서 흰공 뽑힐 확률
4. 흰공의 확률
❶ ❷
 Example ❶ ❷
어디선가 흰공이 나왔는데… P( A | 흰공 )A에서 나왔을 확률?
B에서 나왔을 확률? P( B | 흰공 )
P( A | 흰공 ) = ?
 Bayes Rule
❶ Conditional Prob. A given B ❷ Conditional Prob. B given A
❸ Bayes Rule
 Document Representation Extracting words from document
Implementation: Preparation
Implementation: Preparation
 Representation of Classifier
{'python': {'bad': 0, 'good': 6}, 'the': {'bad': 3, 'good': 3}}...
 How to access dict
Implementation: Preparation
 Training
Implementation: Preparation
 Result
Implementation: Preparation
Recall
 Bayesian Theorem
p( category | doc ) =
p( doc )
p( doc | category ) * p( category)
Implementation : Classifier
 P( feature | category ) as prior
 Assumed Probability to resolve data sparseness
Implementation : Classifier
 Results
Implementation : Classifier
 P( document | category ) as likelihood
Implementation : Classifier
 P( document | category ) * p( category )
Implementation : Classifier
 Classifying
Implementation : Classifier
 Result
Implementation : Classifier
 Recall: Naïve Bayesian Classifier
Fisher’s Method
 Fisher’s Method
First, p( document| category ) =
p( feature_1| categ...
 Q&A
Thank You
Próximos SlideShares
Carregando em…5
×

집단지성프로그래밍 - 6장 문서 필터링

608 visualizações

Publicada em

세미나 자료

Publicada em: Tecnologia
  • Entre para ver os comentários

집단지성프로그래밍 - 6장 문서 필터링

  1. 1. 문서 필터링 집단지성 프로그래밍 Ch.6 허윤
  2. 2. Document Filtering  Filtering == Classification Problem Data Mining Problem EstimationClassification Predication Clustering Description Affinity Grouping  Document? A set of feature -> text document, image, etc. p( document ) = ?
  3. 3. Spam Filtering  Binary Classification Problem ‘Spam’ or ‘Ham’  Techniques Naïve Bayesian Classifier Support Vector Machine Decision Tree  Rule vs. Model pros and cons
  4. 4. Spam Filtering in Practice Referred at: Sahil Puri1 et al, “COMPARISON AND ANALYSIS OF SPAM DETECTION ALGORITHMS”, 2013, IJAIEM
  5. 5. Referred at: Rene, “New insights into Gmail’s spam filtering”, 2012, emailmarketingtipps.de
  6. 6. Naïve Bayesian Classifier  Bayes Theorem  Naïve? Bayesian Theorem with string independence assumption  Classifier ignore evidence term Posterior1 > posterio2 Posterior1 < posterio2
  7. 7.  Example 1. 상자 A가 선택될 확률 P( A ) = 7 / 10 2. 상자 A에서 흰공 뽑힐 확률 P( 흰공 | A )= 2 / 10 3. 주머니에서는 A, 상자 A에서 흰공 뽑힐 확률 4. 흰공의 확률 ❶ ❷
  8. 8.  Example ❶ ❷ 어디선가 흰공이 나왔는데… P( A | 흰공 )A에서 나왔을 확률? B에서 나왔을 확률? P( B | 흰공 ) P( A | 흰공 ) = ?
  9. 9.  Bayes Rule ❶ Conditional Prob. A given B ❷ Conditional Prob. B given A ❸ Bayes Rule
  10. 10.  Document Representation Extracting words from document Implementation: Preparation
  11. 11. Implementation: Preparation  Representation of Classifier {'python': {'bad': 0, 'good': 6}, 'the': {'bad': 3, 'good': 3}} # getwords
  12. 12.  How to access dict Implementation: Preparation
  13. 13.  Training Implementation: Preparation
  14. 14.  Result Implementation: Preparation
  15. 15. Recall  Bayesian Theorem p( category | doc ) = p( doc ) p( doc | category ) * p( category)
  16. 16. Implementation : Classifier  P( feature | category ) as prior
  17. 17.  Assumed Probability to resolve data sparseness Implementation : Classifier
  18. 18.  Results Implementation : Classifier
  19. 19.  P( document | category ) as likelihood Implementation : Classifier
  20. 20.  P( document | category ) * p( category ) Implementation : Classifier
  21. 21.  Classifying Implementation : Classifier
  22. 22.  Result Implementation : Classifier
  23. 23.  Recall: Naïve Bayesian Classifier Fisher’s Method  Fisher’s Method First, p( document| category ) = p( feature_1| category ) * p( feature_2| category ) … * p( feature_N| category ) p( category | document ) ?? p( category | feature ) = # of documents having feature in category # of documents having feature
  24. 24.  Q&A Thank You

×