SlideShare uma empresa Scribd logo
1 de 37
Text Categorization Chapter 16 Foundations of Statistical Natural Language Processing
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Part I ,[object Object]
Classification ,[object Object],[object Object],Parse trees Sentence PP attachment The word’s seneses Context of a word Disambiguation topics Document Text categorization Languages Document Language identification Document authors Document Author identification The word’s (POS) tags Context of a word Tagging Categories Object Problem
Task Description ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Task Formulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A data representation model g(x) = 0 x1 x2 w w = (1,1) b = -1 w x2 + b < 0 w x1 + b > 0 (0,1) (1,0)
Evaluation(1) ,[object Object],[object Object],[object Object],Contingency table d c No was assigned b a Yes was assigned No is correct Yes is correct
Evaluation(2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part II ,[object Object]
E.g.  A trained decision tree for category “earnings” Doc = {cts=1, net =3} Node1  7681 articles P(c|n1) = 0.3000 split: cts  value: 2 Node2  5977 articles P(c|n2) = 0.116 split: net  value: 1 Node5  1704 articles P(c|n5) = 0.943 split: vs  value: 2 Node3 5436 articles P(c|n3) = 0.050 Node4 541 articles P(c|n4) = 0.649 Node6 301 articles P(c|n6) = 0.694 Node7 1403  articles P(c|n7) = 0.996 cts < 2 cts >= 2 net<1 Net>= 1 vs <2 vs >= 2
A Closer Look on the E.g. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Presentation Model (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],Ref to: Chap 5
Data Presentation Model (2) ,[object Object],[object Object],[object Object],[object Object]
Training Procedure: Growing (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Entropy of parent Node Proportion of elements that passed on to the left nodes Ref. Machine Learning
Training Procedure: Growing (2) ,[object Object],[object Object],[object Object],[object Object],cts < 2 cts >= 2 Node1  7681 articles P(c|n1) = 0.3000 split: cts  value: 2 Node2  5977 articles2 p(c|n) = 0.116 Node5  1704 articles P(c|n5) = 0.943
Training Procedure: pruning (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Ref to :chap3 (3.7.1) machine learning
Training Procedure: pruning (2) ,[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part III ,[object Object],[object Object],[object Object],[object Object]
Basic Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Presentation Model ,[object Object],[object Object],[object Object]
Model Class ,[object Object],[object Object],[object Object],[object Object]
Training Process: Generalized Iterative Scaling ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Principle of Maximum Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object]
Application to Text Categorization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part VI ,[object Object]
Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Perceptron learning Procedure: gradient descent ,[object Object],[object Object],[object Object]
Perceptron learning Procedure: Basic Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],j-th item in the weight vector   j-th item in the input vector   expected output & output
Why ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E.g. w x x w+x s’ s Yes No
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part V ,[object Object]
Nearest Neighbor ,[object Object]
K Nearest Neighbor ,[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks!

Mais conteúdo relacionado

Mais procurados

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 

Mais procurados (20)

Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Ir models
Ir modelsIr models
Ir models
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwise
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 

Destaque

Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
Tommaso Teofili
 
Text classification
Text classificationText classification
Text classification
James Wong
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 

Destaque (20)

Text Categorization
Text CategorizationText Categorization
Text Categorization
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Tutorial on Text Categorization, EACL, 2003
Tutorial on Text Categorization, EACL, 2003Tutorial on Text Categorization, EACL, 2003
Tutorial on Text Categorization, EACL, 2003
 
NLP for Robotics
NLP for RoboticsNLP for Robotics
NLP for Robotics
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
[ppt]
[ppt][ppt]
[ppt]
 
It
ItIt
It
 
Text categorization
Text categorizationText categorization
Text categorization
 
Text categorization using Rough Set
Text categorization using Rough SetText categorization using Rough Set
Text categorization using Rough Set
 
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text classification
Text classificationText classification
Text classification
 
Chainerを使ったらカノジョができたお話
Chainerを使ったらカノジョができたお話Chainerを使ったらカノジョができたお話
Chainerを使ったらカノジョができたお話
 
NLP
NLPNLP
NLP
 
Text clustering
Text clusteringText clustering
Text clustering
 
Text categorization
Text categorizationText categorization
Text categorization
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
Icml読み会 deep speech2
Icml読み会 deep speech2Icml読み会 deep speech2
Icml読み会 deep speech2
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural Network
 
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
 

Semelhante a 20070702 Text Categorization

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
butest
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
butest
 

Semelhante a 20070702 Text Categorization (20)

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
ppt
pptppt
ppt
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
Lect4
Lect4Lect4
Lect4
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Último (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

20070702 Text Categorization

  • 1. Text Categorization Chapter 16 Foundations of Statistical Natural Language Processing
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. E.g. A trained decision tree for category “earnings” Doc = {cts=1, net =3} Node1 7681 articles P(c|n1) = 0.3000 split: cts value: 2 Node2 5977 articles P(c|n2) = 0.116 split: net value: 1 Node5 1704 articles P(c|n5) = 0.943 split: vs value: 2 Node3 5436 articles P(c|n3) = 0.050 Node4 541 articles P(c|n4) = 0.649 Node6 301 articles P(c|n6) = 0.694 Node7 1403 articles P(c|n7) = 0.996 cts < 2 cts >= 2 net<1 Net>= 1 vs <2 vs >= 2
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. E.g. w x x w+x s’ s Yes No
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.