SlideShare uma empresa Scribd logo
1 de 100
Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth  [email_address] http:// www.d.umn.edu/~tpederse/SCTutorial.html
Language Independent Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering Similar Contexts ,[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tutorial Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SenseClusters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many thanks… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background and Motivations
Headed and Headless Contexts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search as Application ,[object Object],[object Object],[object Object],[object Object]
Email Foldering as Application ,[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering News as Application ,[object Object],[object Object],[object Object],[object Object]
What is it to be “similar”? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying Lexical Features Measures of Association and  Tests of Significance
What are features? ,[object Object],[object Object],[object Object]
Where do features come from?  ,[object Object],[object Object],[object Object],[object Object]
Feature Selection ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lexical Features ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams and Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
“ occur together more often than expected by chance…” ,[object Object],[object Object],[object Object],[object Object],[object Object]
2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
Measures of Association
Interpreting the Scores… ,[object Object],[object Object]
Interpreting the Scores… ,[object Object],[object Object],[object Object]
Measures of Association ,[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Context Representations First and Second Order Methods
Once features selected… ,[object Object],[object Object],[object Object],[object Object]
First Order Representation ,[object Object],[object Object],[object Object]
Contexts ,[object Object],[object Object],[object Object],[object Object]
Unigram Feature Set  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Unigrams 1 0 1 0 1 Cxt4 0 0 0 0 0 Cxt3 1 1 0 1 0 Cxt2 1 1 1 1 1 Cxt1 child magic curse black island
Bigram Feature Set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Bigrams 1 0 1 1 0 Cxt4 0 1 1 0 0 Cxt3 1 0 0 0 1 Cxt2 1 0 0 1 1 Cxt1 voodoo child serious error military might  island curse  black magic
First Order Vectors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Second Order Features ,[object Object],[object Object],[object Object],[object Object],[object Object]
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Word by Word Matrix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Second Order Co-Occurrences ,[object Object],[object Object]
Second Order Representation ,[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  78.8 0 24.4 63.1 41.2 Cxt1 child error might curse magic
Second Order Representation ,[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensionality Reduction Singular Value Decomposition
Effect of SVD ,[object Object],[object Object]
Effect of SVD ,[object Object],[object Object],[object Object]
How can SVD be used? ,[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
Singular Value Decomposition A=UDV’
Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],1.0 .72 memory .00 .00 organ .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .00 .91 .00 2.1 1.3 .01 .00 .76 disk plasma graphics tissue data ibm cells blood apple
Relationship to LSA ,[object Object],[object Object],[object Object],[object Object],[object Object]
Feature by Context Representation 0 1 0 0 serious error 1 0 1 1 voodoo child 0 1 0 0 military might 1 0 0 1 island curse 1 0 1 1 black magic Cxt4 Cxt3 Cxt2 Cxt1
References ,[object Object],[object Object],[object Object],[object Object]
Clustering Partitional Methods Cluster Stopping Cluster Labeling
Many many methods… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object]
Partitional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Partitional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object]
Partitional Criterion Functions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intra Cluster Similarity ,[object Object],[object Object],[object Object],[object Object]
Contexts to be Clustered
Ball of String  (I1 Internal Criterion Function)
Flower (I2 Internal Criterion Function)
Inter Cluster Similarity ,[object Object],[object Object],[object Object]
The Fan (E1 External Criterion Function)
Hybrid Criterion Functions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cluster Stopping
Cluster Stopping ,[object Object],[object Object]
Criterion Functions Can Help ,[object Object],[object Object],[object Object],[object Object]
H2 versus k T. Blair – V. Putin – S. Hussein
PK2 ,[object Object],[object Object],[object Object]
PK2 predicts 3 senses T. Blair – V. Putin – S. Hussein
PK3 ,[object Object],[object Object],[object Object],[object Object]
PK3 predicts 3 senses T. Blair – V. Putin – S. Hussein
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cluster Labeling
Cluster Labeling ,[object Object],[object Object]
Results of Clustering ,[object Object],[object Object],[object Object],[object Object]
Label Types ,[object Object],[object Object]
Evaluation Techniques Comparison to gold standard data
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object]
Baseline Algorithm ,[object Object],[object Object]
Baseline Performance ,[object Object],170 55 35 80 Totals 170 55 35 80 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S3 S2 S1 170 80 35 55 Totals 170 80 35 55 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S1 S2 S3
Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],170 55 35 80 Totals 65 10 5 50 C3 60 40 0 20 C2 45 5 30 10 C1 Totals S3 S2 S1
Evaluation ,[object Object],[object Object],[object Object],170 80 55 35 Totals 65 50 10 5 C3 60 20 40 0 C2 45 10 5 30 C1 Totals S1 S3 S2
Alternatives? ,[object Object],[object Object],[object Object],[object Object]
Thank you! ,[object Object],[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Chunyang Chen
 

Mais procurados (20)

The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
Using lexical chains for text summarization
Using lexical chains for text summarizationUsing lexical chains for text summarization
Using lexical chains for text summarization
 
Text summarization
Text summarizationText summarization
Text summarization
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Tries
TriesTries
Tries
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Text summarization
Text summarizationText summarization
Text summarization
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Text summarization
Text summarization Text summarization
Text summarization
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
The Duet model
The Duet modelThe Duet model
The Duet model
 

Destaque

Destaque (15)

Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Presentation.Cit.2011 02 04.Lat.Dianak
Presentation.Cit.2011 02 04.Lat.DianakPresentation.Cit.2011 02 04.Lat.Dianak
Presentation.Cit.2011 02 04.Lat.Dianak
 
Presentation.Amendments To Tax Laws 2011 02 04.Lat.Alisas
Presentation.Amendments To Tax Laws 2011 02 04.Lat.AlisasPresentation.Amendments To Tax Laws 2011 02 04.Lat.Alisas
Presentation.Amendments To Tax Laws 2011 02 04.Lat.Alisas
 
Presentation.Pit.2011 02 04.Lat.Dianak
Presentation.Pit.2011 02 04.Lat.DianakPresentation.Pit.2011 02 04.Lat.Dianak
Presentation.Pit.2011 02 04.Lat.Dianak
 
Presentation.News Tax Exec Sum.2011 02 28.Eng.Janist
Presentation.News Tax Exec Sum.2011 02 28.Eng.JanistPresentation.News Tax Exec Sum.2011 02 28.Eng.Janist
Presentation.News Tax Exec Sum.2011 02 28.Eng.Janist
 
Presentation.Vat 2011 News.2011 02 04.Final.Lat.Janist
Presentation.Vat 2011 News.2011 02 04.Final.Lat.JanistPresentation.Vat 2011 News.2011 02 04.Final.Lat.Janist
Presentation.Vat 2011 News.2011 02 04.Final.Lat.Janist
 
I2 B2 2006 Pedersen
I2 B2 2006 PedersenI2 B2 2006 Pedersen
I2 B2 2006 Pedersen
 
The road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way streetThe road from good software engineering to good science...is a two way street
The road from good software engineering to good science...is a two way street
 
Amia06
Amia06Amia06
Amia06
 
Conll
ConllConll
Conll
 
Catalog Price 2009 Eur
Catalog Price 2009 EurCatalog Price 2009 Eur
Catalog Price 2009 Eur
 
Catalog Price 2009 Usd
Catalog Price 2009 UsdCatalog Price 2009 Usd
Catalog Price 2009 Usd
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 

Semelhante a Aaai 2006 Pedersen

14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
butest
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
Andi Wu
 
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
beshahashenafe20
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
maxfalc
 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
Sagar Dabhi
 

Semelhante a Aaai 2006 Pedersen (20)

Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
 
The Semantic Quilt
The Semantic QuiltThe Semantic Quilt
The Semantic Quilt
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Textmining
TextminingTextmining
Textmining
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
NLP todo
NLP todoNLP todo
NLP todo
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt
 

Mais de University of Minnesota, Duluth

Mais de University of Minnesota, Duluth (20)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Aaai 2006 Pedersen

  • 1. Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth [email_address] http:// www.d.umn.edu/~tpederse/SCTutorial.html
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Identifying Lexical Features Measures of Association and Tests of Significance
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. 2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Context Representations First and Second Order Methods
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. First Order Vectors of Unigrams 1 0 1 0 1 Cxt4 0 0 0 0 0 Cxt3 1 1 0 1 0 Cxt2 1 1 1 1 1 Cxt1 child magic curse black island
  • 40.
  • 41. First Order Vectors of Bigrams 1 0 1 1 0 Cxt4 0 1 1 0 0 Cxt3 1 0 0 0 1 Cxt2 1 0 0 1 1 Cxt1 voodoo child serious error military might island curse black magic
  • 42.
  • 43.
  • 44.
  • 45. Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 46.
  • 47. There was an island curse of black magic cast by that voodoo child. 120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 48.
  • 49.
  • 50. There was an island curse of black magic cast by that voodoo child. 78.8 0 24.4 63.1 41.2 Cxt1 child error might curse magic
  • 51.
  • 52.
  • 53.
  • 54. Dimensionality Reduction Singular Value Decomposition
  • 55.
  • 56.
  • 57.
  • 58. Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
  • 60. Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
  • 61.
  • 62.
  • 63. Feature by Context Representation 0 1 0 0 serious error 1 0 1 1 voodoo child 0 1 0 0 military might 1 0 0 1 island curse 1 0 1 1 black magic Cxt4 Cxt3 Cxt2 Cxt1
  • 64.
  • 65. Clustering Partitional Methods Cluster Stopping Cluster Labeling
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72. Contexts to be Clustered
  • 73. Ball of String (I1 Internal Criterion Function)
  • 74. Flower (I2 Internal Criterion Function)
  • 75.
  • 76. The Fan (E1 External Criterion Function)
  • 77.
  • 79.
  • 80.
  • 81. H2 versus k T. Blair – V. Putin – S. Hussein
  • 82.
  • 83. PK2 predicts 3 senses T. Blair – V. Putin – S. Hussein
  • 84.
  • 85. PK3 predicts 3 senses T. Blair – V. Putin – S. Hussein
  • 86.
  • 88.
  • 89.
  • 90.
  • 91. Evaluation Techniques Comparison to gold standard data
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.