SlideShare uma empresa Scribd logo
1 de 24
Automatic
Term Ambiguity Detection
Tyler Baldwin, Yunyao Li,
Bogdan Alexe, Ioana R. Stanoi
IBM Research - Almaden
What is the buzz about
Brave on Twitter?
Find tweets about the movie Brave:
Movie night watching brave with Cammie n Isla n
loads munchies
This brave girl deserves endless retweets!
Watching brave with the kiddos!
watching Bregor playing Civ 5: Brave New World and
thinking of getting it
Skyfall 007 in class with @MariaWiheelste
So I was dead set on seeing skyfall 007 for like a year
NowWatching #skyFall 007!
What movie amazed u — skyfall 007
Existing Disambiguation Methods

Word Sense Disambiguation (WSD)

Which word sense does this instance refer to?

Named Entity Disambiguation (NED)

Which entity type is this instance associated with?
Existing Disambiguation Methods

Word Sense Disambiguation (WSD)

Which word sense does this specific instance refers to?

Named Entity Disambiguation (NED)

Which entity type is this individual instance associated with?

Limitations:

Assume the number of senses/entities is known
− Often not the case

Inefficient on very large data sets
− Attempt to disambiguate each instance
Term Ambiguity Detection (TAD)

Perform term disambiguation at the term, not
instance level

Given a term T and its category C, do all the
mentions of the term reference a member of that
category?
Term Ambiguity Detection (TAD)

Perform term disambiguation at the term, not instance
level

Given a term T and its category C, do all the
mentions of the term reference a member of that
category?

Level of ambiguity of the term

Hybrid information extraction (IE) systems
− Simpler model if the term unambiguous
− More complex model otherwise

Potentially useful for other NLP tasks
Term Ambiguity Detection (TAD)
CameraEOS 5D
Video Game
A New Beginning
MovieSkyfall 007
MovieBrave
CategoryTerm
Video Game
A New Beginning
MovieBrave
CategoryTerm
Ambiguous
CameraEOS 5D
MovieSkyfall 007
CategoryTerm
Unambiguous
TAD
TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: Clustering
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gram
Does the term share a name
with a common word/phrase?
1. Normalize input term t
(stopword removel + lowercase)
2. Calculate unigram probability
3. Ambiguous if the probability is
above the empirically
determined threshold
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gram
Step 2: Ontology
• Wiktionary:
Ambiguous if term has several
senses in Wiktionary
• Wikipedia:
Ambiguous if term has a
Wikipedia disambiguation page
Ambiguous
Unambiguous
TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: Clustering
Cluster the contexts in which
the term appear
Ambiguous
Unambiguous
1. Remove stopwords and infrequent
words from all documens
containing the term
2. Cluster the document using Latent
Dirichlet Allocation (LDA)
3. Ambiguous if category term or
WordNet synonym does not appear
in the most heavily weighted terms
of any cluster
Evaluation

Dataset: terms from 4 product domains:

Movies, Video Games, Cameras, Books
− 100 terms per domain
− Extracted randomly from dbpedia and Flickr

Gold standard: ambiguity determined by
examining usage in TREC Tweets2011 corpus

10 tweets labeled per term
− Unambiguous only if all tweets reference category
Questions to Answer

How effective is TAD?

How useful is TAD?
Results - Effectiveness

Each module produced above baseline performance
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Effectiveness

Ontology method is of limited usage, as most of the
terms cannot be found in the ontology.
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Effectiveness

Each module produced above baseline performance

Combined framework produced high F-measure of 0.96
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960
Results - Usefulness

Integrated TAD pipeline into commercially
available IE system

Extracted mentions of terms from Camera and
Video game domains on Twitter data

Manually judged relevance of extracted Tweets
Results - Usefulness

Using ambiguity detection hurt recall

Only 57% of the relevant documents returned
with TAD

Ambiguity detection necessary for high
precision

w/ ambiguity detection:
− Precision: 0.96

w/o ambiguity detection
− Precision: 0.16
Conclusion

Term ambiguity detection is helpful for large-
scale information extraction

Able to detect ambiguity when number of senses is
unknown

Able to be applied to large datasets where instance-
level interpretation is impractical

3-Module TAD approach results is high
performance

Detects ambiguity with F-measure of 0.96

Allows IE system to produce high precision
BACKUP
TAD Framework
N-gram
suggests non-
referential
instances
Ontology
suggests
across
domain
instances
Clustering
suggests
either case
Ambiguous
Terms
Unambiguous
Terms
Yes
Yes
Yes
No
No
No
N-gram
Ontology
Clustering

Mais conteúdo relacionado

Destaque

QER : query entity recognition
QER : query entity recognitionQER : query entity recognition
QER : query entity recognitionDhwaj Raj
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2Arabic_NLP_ImamU2013
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationRichard Littauer
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataDave Lewis
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsJulien PLU
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Exploiting Linked Open Data and Natural Language Processing for Classificati...
Exploiting Linked Open Data  and Natural Language Processing for Classificati...Exploiting Linked Open Data  and Natural Language Processing for Classificati...
Exploiting Linked Open Data and Natural Language Processing for Classificati...giuseppe_futia
 
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesPanos Alexopoulos
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionseXascale Infolab
 
Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 

Destaque (20)

Multlingual Linked Data Patterns
Multlingual Linked Data PatternsMultlingual Linked Data Patterns
Multlingual Linked Data Patterns
 
QER : query entity recognition
QER : query entity recognitionQER : query entity recognition
QER : query entity recognition
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Text mining
Text miningText mining
Text mining
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Discoverers of Surface Analysis
Discoverers of Surface AnalysisDiscoverers of Surface Analysis
Discoverers of Surface Analysis
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Exploiting Linked Open Data and Natural Language Processing for Classificati...
Exploiting Linked Open Data  and Natural Language Processing for Classificati...Exploiting Linked Open Data  and Natural Language Processing for Classificati...
Exploiting Linked Open Data and Natural Language Processing for Classificati...
 
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web Collections
 
Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity Retrieval
 
Surface Analysis Techniques Feb & April 2013
Surface Analysis Techniques Feb & April 2013Surface Analysis Techniques Feb & April 2013
Surface Analysis Techniques Feb & April 2013
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 

Semelhante a Automatic Term Ambiguity Detection for Information Extraction

Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Hyunwoo Kim
 
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaErlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaHakka Labs
 
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...Dataconomy Media
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
CMU Trecvid sed11
CMU Trecvid sed11CMU Trecvid sed11
CMU Trecvid sed11Lu Jiang
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your VacationTJ Stalcup
 
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015Tommaso Soru
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsAlan Said
 
Building private-clouds-qconsf
Building private-clouds-qconsfBuilding private-clouds-qconsf
Building private-clouds-qconsfAndrew Shafer
 
[PH-Neutral 0x7db] Exploit Next Generation®
[PH-Neutral 0x7db] Exploit Next Generation®[PH-Neutral 0x7db] Exploit Next Generation®
[PH-Neutral 0x7db] Exploit Next Generation®Nelson Brito
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial PerturbationHyunwoo Kim
 
Image similarity with deep learning
Image similarity with deep learningImage similarity with deep learning
Image similarity with deep learningRomain Futrzynski
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataAnalyticsWeek
 
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red TeamWhat is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red TeamMITRE ATT&CK
 

Semelhante a Automatic Term Ambiguity Detection for Information Extraction (20)

Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
When Tdd Goes Awry
When Tdd Goes AwryWhen Tdd Goes Awry
When Tdd Goes Awry
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
 
What lies beneath
What lies beneathWhat lies beneath
What lies beneath
 
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-SubramanyaErlang - Because s**t Happens by Mahesh Paolini-Subramanya
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
 
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
CMU Trecvid sed11
CMU Trecvid sed11CMU Trecvid sed11
CMU Trecvid sed11
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
 
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015
Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
 
Building private-clouds-qconsf
Building private-clouds-qconsfBuilding private-clouds-qconsf
Building private-clouds-qconsf
 
Testing smells
Testing smellsTesting smells
Testing smells
 
[PH-Neutral 0x7db] Exploit Next Generation®
[PH-Neutral 0x7db] Exploit Next Generation®[PH-Neutral 0x7db] Exploit Next Generation®
[PH-Neutral 0x7db] Exploit Next Generation®
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial Perturbation
 
Image similarity with deep learning
Image similarity with deep learningImage similarity with deep learning
Image similarity with deep learning
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
 
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red TeamWhat is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
 

Mais de Yunyao Li

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopYunyao Li
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and ApplicationsYunyao Li
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLPYunyao Li
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table UnderstandingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Yunyao Li
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language UnderstandingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Yunyao Li
 
Towards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesTowards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesYunyao Li
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaYunyao Li
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningYunyao Li
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingYunyao Li
 
Coling poster
Coling posterColing poster
Coling posterYunyao Li
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Yunyao Li
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsYunyao Li
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Yunyao Li
 

Mais de Yunyao Li (20)

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and Applications
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table Understanding
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language Understanding
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)
 
Towards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesTowards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural Languages
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social Media
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
 
Coling poster
Coling posterColing poster
Coling poster
 
Coling demo
Coling demoColing demo
Coling demo
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Automatic Term Ambiguity Detection for Information Extraction

  • 1. Automatic Term Ambiguity Detection Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana R. Stanoi IBM Research - Almaden
  • 2.
  • 3. What is the buzz about Brave on Twitter?
  • 4. Find tweets about the movie Brave: Movie night watching brave with Cammie n Isla n loads munchies This brave girl deserves endless retweets! Watching brave with the kiddos! watching Bregor playing Civ 5: Brave New World and thinking of getting it
  • 5. Skyfall 007 in class with @MariaWiheelste So I was dead set on seeing skyfall 007 for like a year NowWatching #skyFall 007! What movie amazed u — skyfall 007
  • 6. Existing Disambiguation Methods  Word Sense Disambiguation (WSD)  Which word sense does this instance refer to?  Named Entity Disambiguation (NED)  Which entity type is this instance associated with?
  • 7. Existing Disambiguation Methods  Word Sense Disambiguation (WSD)  Which word sense does this specific instance refers to?  Named Entity Disambiguation (NED)  Which entity type is this individual instance associated with?  Limitations:  Assume the number of senses/entities is known − Often not the case  Inefficient on very large data sets − Attempt to disambiguate each instance
  • 8. Term Ambiguity Detection (TAD)  Perform term disambiguation at the term, not instance level  Given a term T and its category C, do all the mentions of the term reference a member of that category?
  • 9. Term Ambiguity Detection (TAD)  Perform term disambiguation at the term, not instance level  Given a term T and its category C, do all the mentions of the term reference a member of that category?  Level of ambiguity of the term  Hybrid information extraction (IE) systems − Simpler model if the term unambiguous − More complex model otherwise  Potentially useful for other NLP tasks
  • 10. Term Ambiguity Detection (TAD) CameraEOS 5D Video Game A New Beginning MovieSkyfall 007 MovieBrave CategoryTerm Video Game A New Beginning MovieBrave CategoryTerm Ambiguous CameraEOS 5D MovieSkyfall 007 CategoryTerm Unambiguous TAD
  • 11. TAD Framework Step 1: N-gram Step 2: Ontology Step 3: Clustering Ambiguous Unambiguous
  • 12. TAD Framework Step 1: N-gram Does the term share a name with a common word/phrase? 1. Normalize input term t (stopword removel + lowercase) 2. Calculate unigram probability 3. Ambiguous if the probability is above the empirically determined threshold Ambiguous Unambiguous
  • 13. TAD Framework Step 1: N-gram Step 2: Ontology • Wiktionary: Ambiguous if term has several senses in Wiktionary • Wikipedia: Ambiguous if term has a Wikipedia disambiguation page Ambiguous Unambiguous
  • 14. TAD Framework Step 1: N-gram Step 2: Ontology Step 3: Clustering Cluster the contexts in which the term appear Ambiguous Unambiguous 1. Remove stopwords and infrequent words from all documens containing the term 2. Cluster the document using Latent Dirichlet Allocation (LDA) 3. Ambiguous if category term or WordNet synonym does not appear in the most heavily weighted terms of any cluster
  • 15. Evaluation  Dataset: terms from 4 product domains:  Movies, Video Games, Cameras, Books − 100 terms per domain − Extracted randomly from dbpedia and Flickr  Gold standard: ambiguity determined by examining usage in TREC Tweets2011 corpus  10 tweets labeled per term − Unambiguous only if all tweets reference category
  • 16. Questions to Answer  How effective is TAD?  How useful is TAD?
  • 17. Results - Effectiveness  Each module produced above baseline performance Configuration Precision Recall F-measure Majority Class 0.675 1.0 0.806 N-gram (NG) 0.979 0.848 0.909 Ontology (ON) 0.979 0.704 0.819 Clustering (CL) 0.946 0.848 0.895 NG + ON 0.980 0.919 0.948 NG + CL 0.942 0.963 0.952 ON + CL 0.945 0.956 0.950 All 0.943 0.978 0.960
  • 18. Results - Effectiveness  Ontology method is of limited usage, as most of the terms cannot be found in the ontology. Configuration Precision Recall F-measure Majority Class 0.675 1.0 0.806 N-gram (NG) 0.979 0.848 0.909 Ontology (ON) 0.979 0.704 0.819 Clustering (CL) 0.946 0.848 0.895 NG + ON 0.980 0.919 0.948 NG + CL 0.942 0.963 0.952 ON + CL 0.945 0.956 0.950 All 0.943 0.978 0.960
  • 19. Results - Effectiveness  Each module produced above baseline performance  Combined framework produced high F-measure of 0.96 Configuration Precision Recall F-measure Majority Class 0.675 1.0 0.806 N-gram (NG) 0.979 0.848 0.909 Ontology (ON) 0.979 0.704 0.819 Clustering (CL) 0.946 0.848 0.895 NG + ON 0.980 0.919 0.948 NG + CL 0.942 0.963 0.952 ON + CL 0.945 0.956 0.950 All 0.943 0.978 0.960
  • 20. Results - Usefulness  Integrated TAD pipeline into commercially available IE system  Extracted mentions of terms from Camera and Video game domains on Twitter data  Manually judged relevance of extracted Tweets
  • 21. Results - Usefulness  Using ambiguity detection hurt recall  Only 57% of the relevant documents returned with TAD  Ambiguity detection necessary for high precision  w/ ambiguity detection: − Precision: 0.96  w/o ambiguity detection − Precision: 0.16
  • 22. Conclusion  Term ambiguity detection is helpful for large- scale information extraction  Able to detect ambiguity when number of senses is unknown  Able to be applied to large datasets where instance- level interpretation is impractical  3-Module TAD approach results is high performance  Detects ambiguity with F-measure of 0.96  Allows IE system to produce high precision
  • 24. TAD Framework N-gram suggests non- referential instances Ontology suggests across domain instances Clustering suggests either case Ambiguous Terms Unambiguous Terms Yes Yes Yes No No No N-gram Ontology Clustering

Notas do Editor

  1. Given a term and corresponding category of interest, discover whether all mentions of the term reference a member of that category
  2. Given a term and corresponding category of interest, discover whether all mentions of the term reference a member of that category