SlideShare uma empresa Scribd logo
1 de 46
Leveraging Wikipedia-based
Features for Entity Relatedness and
Recommendations
Nitish Aggarwal
Supervised by Dr. Paul Buitelaar
PhD Viva
Brad Pitt
Motivation
2
Motivation
Semantic Web
Technologies:
1. RDF
2. SPARQL
3. Ontology
4. Linked data
5. Turtle (syntax)
Entity Recommendation
Companies:
1. Metaweb
2. Ontoprise GmbH
3. OpenLink Software
4. Ontotext
5. Powerset (company)
Myosin
Proteins and cells:
1. Actin
2. Muscle contraction
3. Sarcomere
4. Myofibril
5. Cytoskeleton
Biologists:
1. Hugh Huxley
2. James Spudich
3. Ronald Vale
4. Manuel Morales
5. Brunó Ferenc Straub
3
Determine the degree of relatedness between two entities
Brad Pitt Tom Cruise
?
Entity Relatedness
4
Person, location,
organization
Time, date, money,
percent
Event, movie, disease,
symptom, side effect,
law, license and more
Background
Entity
• Many such types are covered in
Wikipedia
• More than 2K classes in DBpedia
• More than 350k classes in Yago
• Every Wikipedia article is
considered about an entity
5
Motor vehicle
Car
Motorcycle
Automobile
Auto
Car seat
Car window
s
s
h h
m m
Background
Relatedness
Synonym
s
Similar
Related
Substitutability
6
Outline
• Motivation
• Entity Relatedness
• Distributional Semantics for Entity Relatedness (DiSER)
• Evaluation
• Entity Recommendation
• Wikipedia-based Features for Entity Recommendation (WiFER)
• Evaluation
• Text Relatedness
• Non-Orthogonal Explicit Semantic Analysis (NESA)
• Evaluation
• Application and Industry Use Cases
• Conclusion
7
Wikipedia Features for
Entity Recommendation
(WiFER)
Feature
Extraction
Thesis Overview
Distributional Semantic for
Entity Relatedness (DiSER)
Distributional
Representation
Non-Orthogonal Explicit
Semantic Analysis (NESA)
Chapter V
Chapter IV
Chapter VI
8
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
Thesis Overview
Wikipedia Features for
Entity Recommendation
(WiFER)
Feature
Extraction
Distributional Semantic for
Entity Relatedness (DiSER)
Distributional
Representation
Non-Orthogonal Explicit
Semantic Analysis (NESA)
Chapter IV
9
Entity Relatedness
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
10
Entity Relatedness: State of the Art
• Graph-based methods
• Path distance in Wikipedia graph (Strube and Ponzetto, 2006)
• Normalized Google Distance on Wikipedia graph (Witten and Milne, 2008)
• Personalized pagerank on Wikipedia graph (Agirre et. al, 2015)
• Path-based measures on DBpedia graph (Hulpus et. al, 2015)
• Corpus-based methods
• Key-phrase Overlap for Related Entities (KORE): partial overlaps between key-
phrases in corresponding Wikipedia articles (Hoffart et. al, 2012)
• Text relatedness measures: use colocation information in text
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
11
Explicit Semantic Analysis (ESA)
Uses explicit (manually defined) concepts like Wikipedia articles where every article
is considered describing a single concept (Gabrilovich and Markovitch, 2007)
Entity Relatedness: State of the Art
Distributional Semantics
word1 W11 W12 W13 W14 …....... W1n
word2 W21 W22 W23 W24 …....... W2n
word3 W31 W32 W33 W34 …........ W3n
wordm Wm1 Wm2 Wm3 Wm4 …... Wmn
...
doc1 doc2 doc3 doc4 ….... docn
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
12
word1 W11 W12 W13 W14 …....... W1n
word2 W21 W22 W23 W24 …....... W2n
word3 W31 W32 W33 W34 …........ W3n
wordm Wm1 Wm2 Wm3 Wm4 …... Wmn
...
Entity Relatedness: State of the Art
Distributional Semantics
doc1 doc2 doc3 doc4 ….... docn
Implicit/Latent Semantic Analysis (LSA)
Transforms sparse document space into a dense latent topic space
Latent Dirichlet
Allocation (LDA)
(Blei et al., 2003)
Latent Semantic
Analysis (LSA)
(Deerwester et al.,
1990)
Neural Embeddings
(Word2Vec)
(Mikolov et al., 2013)
n ~ 1M
word1 W11 W12 ……..... W1k
word2 W21 W22 ……..... W2k
wordm Wm1 Wm2 ……..... Wmk
...
topic1 topic2 … topick
Dimensionality
Reduction
k < 1000
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
13
Limitation of Text Relatedness Measures
• Compositionality
• Most of the entities are multiword expressions
• Vector(Brad Pitt) = Vector(Brad) + Vector(Pitt) ?
• Ambiguity
• Vector of an entity with ambiguous name like “Nice” (French city)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
14
Chapter IV
Distributional Semantics for Entity
Relatedness (DiSER)
entity1 W11 W12 W13 W14 …....... W1n
entity2 W21 W22 W23 W24 …....... W2n
entity3 W31 W32 W33 W34 …........ W3n
entityn Wn1 Wn2 Wn3 Wn4 …... Wnn
...
doc1 doc2 doc3 doc4 ….... docn
Wikipedia-based Distributional Semantics for Entity Relatedness
In: AAAI-FSS-2014
[Steve Jobs] co-founded Apple in 1976 to sell
Wozniak’s [Apple I] [Personal Computer]. [Steve
Jobs | Jobs] was CEO of [Apple Inc. | Apple] and
largest shareholder of [Pixar]. Jobs is widely
recognized as a pioneer of the [Microcomputer
Revolution], along [Steve Wozniak | Wozniak].
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
Annotated
Wikipedia with
entities
One sense per document
Wikipedia entities
[Steve Jobs] [Apple Inc.| Apple] [Steve Wozniak |
Wozniak]’ [Apple I] [Personal Computer]. [Steve
Jobs | Jobs] was CEO of [Apple Inc. | Apple] and
largest shareholdef [Pixar]. [Steve Jobs | Jobs] is
widely recognizpioneer of the [Microcomputer
Revolution], along [Steve Wozniak | Wozniak].
15
The Tree of Life (film)
Falmouth, Cornwall
World War Z (film)
What Just Happened
A Mighty Heart (film)
Plan B Entertainment
Jamaican Patois
Richard: A Novel
Sobriquet
I Want a Famous Face
Brad Pitt (DiSER)
Damiani (jewelry company)
University of Pittsburgh Band
Brad Pitt
Make It Right Foundation
Pittsburgh men’s basketball
Brangelina
Pittsburgh Panthers baseball
Pitt (Comics)
Pitt River
Brad Pitt filmography
Brad Pitt (ESA)
Wikipedia-based Distributional Semantics for Entity Relatedness
In: AAAI-FSS-2014
ESA vs DiSER Vector
Chapter IV
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
16
Entity Relatedness: Evaluation
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
17
• Absolute relatedness score
• Relatedness between “Apple Inc.” and “Steve Jobs”
• Very low inter-annotator agreement
• Relative relatedness score
• Is “Steve Jobs” more related with “Apple Inc.” than “Bill Gates”
• High inter-annotator agreement
• KORE (Hoffart et al., 2012)
• 21 seed entities
• Every entity has list of 20 entities with their relatedness score
• 420 entity pairs in total
Entity Relatedness: Dataset
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
18
Approaches
Spearman Rank
Correlation
Graph-based
measures
Path-DBpedia (Hulpus et al., 2015) 0.610
WLM (Witten and Milne, 2008) 0.659
PPR (Agirre et al., 2015) 0.662
Corpus-based
measures
Word2Vec (Mikolov et al., 2013) 0.181
GloVe (Pennington et al., 2014) 0.194
LSA (Landauer et al., 1998) 0.375
KORE (Hoffart et al., 2012) 0.679
ESA (Gabrilovich and Markovitch, 2007) 0.691
DiSER 0.781
Wikipedia-based Distributional Semantics for Entity Relatedness
In: AAAI-FSS-2014
Results: KORE Dataset
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
19
DiSER Vector for non-Wikipedia Entities
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
20
BBC: http://www.bbc.com/news/world-europe-22204377
Article about Savita
Context-DiSER
Noun phrase extraction:
StanfordNLP
Entity linking:
Prior probability
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
21
Abortion
Abortion-rights
movement
The Irish Times
United States pro-
life movement
Vincent
Browne
Michael D.
Higgins
Context-DiSER
Irish abortion law
Death of Savita
Galway University
Hospital
Miscarriage
Catholic Country
…….
Savita
Halappanavar
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
22
Approaches
Spearman Rank
Correlation
KORE (state of the art) 0.679
Context-ESA 0.684
Context-DiSER (Manual linking) 0.769
Context-DiSER (Automatic linking) 0.719
Wikipedia-based Distributional Semantics for Entity Relatedness
In: AAAI-FSS-2014
Context-DiSER: Results on KORE Dataset
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
23
Thesis Overview
Wikipedia Features for
Entity Recommendation
(WiFER)
Feature
Extraction
Distributional Semantic for
Entity Relatedness (DiSER)
Distributional
Representation
Non-Orthogonal Explicit
Semantic Analysis (NESA)
Chapter V
Chapter IV
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
24
Entity Recommendation
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
25
• Classical Recommendation Systems
• Focus on personalized recommendation
• Require user-item preferences
• Entity Recommendation in Web Search (Blanco et al.,
2013)
• Co-occurrence features: query logs, query session, Flickr tags, tweets
• Graph-based features: shared connections in Yahoo knowledge graph and
others domain specific knowledge bases
• Entity and Relation type in Knowledge graph
• More than 100 features
• Combines features using learning to rank
Entity Recommendation: State of the Art
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
26
Features:
Prior Probability of Entity1
Prior Probability of Entity2
Joint Probability
Conditional Probability
Reverse Conditional Probability
Cosine Similarity
Pointwise Mutual Information
Distributional Semantic Model
Learning to
Rank
Leveraging Wikipedia Knowledge for Entity Recommendations
In: ISWC 2015
Wikipedia-based Features for Entity
Recommendation (WiFER)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
27
Prior Probability of Entity1
Prior Probability of Entity2
Joint Probability
Conditional Probability
Cosine Similarity
Pointwise Mutual Information
Reverse Conditional Probability
Distributional Semantic Model (ESA)
Wikipedia Text Wikipedia Entities
Prior Probability of Entity1
Prior Probability of Entity2
Joint Probability
Conditional Probability
Cosine Similarity
Pointwise Mutual Information
Reverse Conditional Probability
Distributional Semantic Model (DiSER)
Wikipedia-based Features for Entity
Recommendation (WiFER)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
28
• Learning to Rank
• Gradient Boosted Decision Trees (GBDT) (Li Hang, 2011)
• It builds the model in a stage-wise fashion
• Dataset: Entity recommendation in web search
• 4,797 web search queries (entities)
• Every entity query has a list of entity candidates (47,623 entity-pairs)
• All candidates are tagged on 5 label scales: Excellent, Prefer, Good, Fair,
and Bad
Combining Features
Type Total instances Percentage
Location 22,062 46.32
People 21,626 45.41
Movies 3,031 6.36
TV Shows 280 0.58
Album 563 1.18
Total 47,623 100
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
29
• Evaluation
• Normalized discounted cumulative gain (NDCG@10)
• 10 fold cross validation
Features All Person Location
Spark (Blanco
et al., 2013)
0.9276 0.9479 0.8882
WiFER 0.9173 0.9431 0.8795
Spark+WiFER 0.9325 0.9505 0.8987
Insights into Entity Recommendation in Web Search
In: IESD at ISWC, 2015
Entity Recommendation: Results
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
30
Insights into Entity Recommendation in Web Search
In: IESD at ISWC, 2015
Entity Recommendation: Feature Analysis in
Spark+WiFER
Relation type
Cosine similarity over Flickr tags
Probability of target entity over Wikipedia
text corpus
CF7 over Flickr tags
DSM over Wikipedia entities corpus
(DiSER)
Conditional user probability over query terms
DSM over Wikipedia text corpus (ESA)
Probability of source entity over
Wikipedia entities corpus
Probability of target entity over Flickr tags
Probability of target entity over Wikipedia
entities corpus
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
31
Thesis Overview
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
Wikipedia Features for
Entity Recommendation
(WiFER)
Feature
Extraction
Distributional Semantic for
Entity Relatedness (DiSER)
Distributional
Representation
Non-Orthogonal Explicit
Semantic Analysis (NESA)
Chapter V
Chapter IV
Chapter VI
32
Text Relatedness:
Non-Orthogonal Explicit Semantic Analysis (NESA)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
33
ESA assumes that related words share highly
weighted concepts in their distributional vector
Chapter VI
Improving ESA with Document Similarity
In: ECIR-2013
“soccer”
History of Soccer in the United States
Soccer in the United States
United States Soccer Federation
North American Soccer League
United Soccer Leagues
“football”
FIFA
Football
History of association football
Football in England
Association football
ESA(football, soccer) = 0.0
Orthogonality in ESA
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
34
Chapter VI
Improving ESA with Document Similarity
In: ECIR-2013
“soccer”
History of Soccer in the United States
Soccer in the United States
United States Soccer Federation
North American Soccer League
United Soccer Leagues
“football”
FIFA
Football
History of association football
Football in England
Association football
NESA(football, soccer) = (FIFA x Soccer in the United States +
FIFA x United Soccer Leagues ….) = 0.38
Non-Orthogonal Explicit Semantic Analysis
(NESA)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
35
• ESA: v1 and v2 are the n-dimensional vectors for words w1 and w2
• relESA (w1, w2) = v1
T . v2
• NESA: Correlation between vector dimensions
• relNESA (w1,w2) = v1
T . C . v2
• C(n,n) = ET . E
• Dimension correlation methods
• DiSER scores between corresponding Wikipedia article
Non-Orthogonal Explicit Semantic Analysis
(NESA)
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
36
• WN353
• 353 word pairs annotated by 13-15 experts on a scale of 1-10.
• RG65
• 65 word pairs annotated by 51 experts on scale of 0-4
• MC30
• 30 word pairs annotated by 38 experts on scale of 0-1
• MT287
• 287 word pairs annotated by 10-12 experts on scale of 0-1
Word Relatedness Datasets
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
37
Non-Orthogonal Explicit Semantic Analysis
In: *SEM-2015
Chapter VI
WN353 MC30 RG65 MT287
LSA 0.579 0.667 0.616 0.555
LSA (Wiki) 0.538 0.744 0.697 0.353
Word2Vec 0.663 0.824 0.751 0.560
ESA 0.66 0.765 0.826 0.507
NESA 0.696 0.784 0.839 0.572
Spearman rank correlation with word similarity gold standard datasets
NESA: Results
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
38
Non-Orthogonal Explicit Semantic Analysis
In: *SEM-2015
Chapter VI
NESA: Results
• Word similarity vs relatedness (Agirre et al., 2009)
• WN353Rel: 202 word pairs from WN353
• WN353Sim: 252 word pairs from WN353
Spearman rank correlation with word similarity vs relatedness datasets
WN353Rel WN353Sim
LSA 0.521 0.662
LSA (Wiki) 0.506 0.559
Word2Vec 0.601 0.741
ESA 0.643 0.663
NESA 0.663 0.719
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
39
Outline
• Motivation
• Entity Relatedness
• Distributional Semantics for Entity Relatedness (DiSER)
• Evaluation
• Entity Recommendation
• Wikipedia-based Features for Entity Recommendation (WiFER)
• Evaluation
• Text Relatedness
• Non-Orthogonal Explicit Semantic Analysis (NESA)
• Evaluation
• Application and Industry Use Cases
• Conclusion
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
40
Chapter VIIhttp://enrg.insight-centre.org/
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
41
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
42
EnRG SPARQL Endpoint
National University of Ireland, Galway
Industrial Use Cases
Medical entity linking for question-answering
and relationship explanation in Knowledge
Graph
Entity Recommendation in Web Search
Company name disambiguation for social
profiling
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
43
• Entity Relatedness
• Distributional Semantics for Entity Relatedness (DiSER)
• Outperformed state of the art entity relatedness measures
• Entity Recommendation
• Wikipedia-based Features for Entity Recommendation (WiFER)
• Effective features for entity recommendation in web search
• Text Relatedness
• Non-Orthogonal Explicit Semantic Analysis (NESA)
• Outperformed other existing word relatedness measures
• Entity Relatedness Graph (EnRG)
• Contains all Wikipedia entities and their pre-computed relatedness scores
• Contains distributional vectors for all Wikipedia entities
Conclusion
Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion
45
• Relationship explanation for recommended entities
• Best path in knowledge graph
• Best natural language description
• Knowledge discovery
• Analogy querying over knowledge graph
e.g. Google to Motorola => Microsoft to ?
• Example based querying
e.g. Google to Motorola => ? to ?
Future Research Directions
46
Related Queries?

Mais conteúdo relacionado

Destaque

Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networks
Lars Juhl Jensen
 

Destaque (19)

Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalism
 
Linked data in the digital humanities skills workshop for realising the oppo...
Linked data in the digital humanities  skills workshop for realising the oppo...Linked data in the digital humanities  skills workshop for realising the oppo...
Linked data in the digital humanities skills workshop for realising the oppo...
 
Harrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social mediaHarrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social media
 
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013
 
Data Journalism - Start working with Data
Data Journalism  - Start working with DataData Journalism  - Start working with Data
Data Journalism - Start working with Data
 
Systematic discovery of phosphorylation networks - Combining linear motifs an...
Systematic discovery of phosphorylation networks - Combining linear motifs an...Systematic discovery of phosphorylation networks - Combining linear motifs an...
Systematic discovery of phosphorylation networks - Combining linear motifs an...
 
Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
Protein interaction networks from yeast to human
Protein interaction networks from yeast to humanProtein interaction networks from yeast to human
Protein interaction networks from yeast to human
 
Protein interaction networks
Protein interaction networksProtein interaction networks
Protein interaction networks
 
Introduction to Network Mapping
Introduction to Network MappingIntroduction to Network Mapping
Introduction to Network Mapping
 
Data Journalism - Finding Data
Data Journalism - Finding DataData Journalism - Finding Data
Data Journalism - Finding Data
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - Introduction
 

Semelhante a Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations

Project Credit: Melissa Haendel - On the Nature of Credit
Project Credit: Melissa Haendel - On the Nature of CreditProject Credit: Melissa Haendel - On the Nature of Credit
Project Credit: Melissa Haendel - On the Nature of Credit
CASRAI
 
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docxThe Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
jmindy
 
5-pln-1520-Conlon
5-pln-1520-Conlon5-pln-1520-Conlon
5-pln-1520-Conlon
med20su
 

Semelhante a Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations (20)

Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Project Credit: Melissa Haendel - On the Nature of Credit
Project Credit: Melissa Haendel - On the Nature of CreditProject Credit: Melissa Haendel - On the Nature of Credit
Project Credit: Melissa Haendel - On the Nature of Credit
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Credit
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Social media and Pharmacovigilance
Social media and Pharmacovigilance Social media and Pharmacovigilance
Social media and Pharmacovigilance
 
Transhumanism & Education - Kevin Jain - H+ Summit @ Harvard
Transhumanism & Education - Kevin Jain - H+ Summit @ HarvardTranshumanism & Education - Kevin Jain - H+ Summit @ Harvard
Transhumanism & Education - Kevin Jain - H+ Summit @ Harvard
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Reading questions mackie, evil and omnipotence” 1. what t
Reading questions mackie, evil and omnipotence” 1. what tReading questions mackie, evil and omnipotence” 1. what t
Reading questions mackie, evil and omnipotence” 1. what t
 
Reading questions mackie, evil and omnipotence” 1. what t
Reading questions mackie, evil and omnipotence” 1. what tReading questions mackie, evil and omnipotence” 1. what t
Reading questions mackie, evil and omnipotence” 1. what t
 
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
 
Argumentative Essay Writing Graphic Organizers (6Th-12
Argumentative Essay Writing Graphic Organizers (6Th-12Argumentative Essay Writing Graphic Organizers (6Th-12
Argumentative Essay Writing Graphic Organizers (6Th-12
 
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docxThe Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
The Intelligence Gathering Debate[WLOs 1, 3] [CLOs 1, 3, 4].docx
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Abortion Persuasive Essays. Argumentative essays for abortion - writefiction5...
Abortion Persuasive Essays. Argumentative essays for abortion - writefiction5...Abortion Persuasive Essays. Argumentative essays for abortion - writefiction5...
Abortion Persuasive Essays. Argumentative essays for abortion - writefiction5...
 
5-pln-1520-Conlon
5-pln-1520-Conlon5-pln-1520-Conlon
5-pln-1520-Conlon
 
Justice For All Act Of 2004
Justice For All Act Of 2004Justice For All Act Of 2004
Justice For All Act Of 2004
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 

Último

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations

  • 1. Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations Nitish Aggarwal Supervised by Dr. Paul Buitelaar PhD Viva
  • 3. Motivation Semantic Web Technologies: 1. RDF 2. SPARQL 3. Ontology 4. Linked data 5. Turtle (syntax) Entity Recommendation Companies: 1. Metaweb 2. Ontoprise GmbH 3. OpenLink Software 4. Ontotext 5. Powerset (company) Myosin Proteins and cells: 1. Actin 2. Muscle contraction 3. Sarcomere 4. Myofibril 5. Cytoskeleton Biologists: 1. Hugh Huxley 2. James Spudich 3. Ronald Vale 4. Manuel Morales 5. Brunó Ferenc Straub 3
  • 4. Determine the degree of relatedness between two entities Brad Pitt Tom Cruise ? Entity Relatedness 4
  • 5. Person, location, organization Time, date, money, percent Event, movie, disease, symptom, side effect, law, license and more Background Entity • Many such types are covered in Wikipedia • More than 2K classes in DBpedia • More than 350k classes in Yago • Every Wikipedia article is considered about an entity 5
  • 6. Motor vehicle Car Motorcycle Automobile Auto Car seat Car window s s h h m m Background Relatedness Synonym s Similar Related Substitutability 6
  • 7. Outline • Motivation • Entity Relatedness • Distributional Semantics for Entity Relatedness (DiSER) • Evaluation • Entity Recommendation • Wikipedia-based Features for Entity Recommendation (WiFER) • Evaluation • Text Relatedness • Non-Orthogonal Explicit Semantic Analysis (NESA) • Evaluation • Application and Industry Use Cases • Conclusion 7
  • 8. Wikipedia Features for Entity Recommendation (WiFER) Feature Extraction Thesis Overview Distributional Semantic for Entity Relatedness (DiSER) Distributional Representation Non-Orthogonal Explicit Semantic Analysis (NESA) Chapter V Chapter IV Chapter VI 8
  • 9. Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion Thesis Overview Wikipedia Features for Entity Recommendation (WiFER) Feature Extraction Distributional Semantic for Entity Relatedness (DiSER) Distributional Representation Non-Orthogonal Explicit Semantic Analysis (NESA) Chapter IV 9
  • 10. Entity Relatedness Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 10
  • 11. Entity Relatedness: State of the Art • Graph-based methods • Path distance in Wikipedia graph (Strube and Ponzetto, 2006) • Normalized Google Distance on Wikipedia graph (Witten and Milne, 2008) • Personalized pagerank on Wikipedia graph (Agirre et. al, 2015) • Path-based measures on DBpedia graph (Hulpus et. al, 2015) • Corpus-based methods • Key-phrase Overlap for Related Entities (KORE): partial overlaps between key- phrases in corresponding Wikipedia articles (Hoffart et. al, 2012) • Text relatedness measures: use colocation information in text Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 11
  • 12. Explicit Semantic Analysis (ESA) Uses explicit (manually defined) concepts like Wikipedia articles where every article is considered describing a single concept (Gabrilovich and Markovitch, 2007) Entity Relatedness: State of the Art Distributional Semantics word1 W11 W12 W13 W14 …....... W1n word2 W21 W22 W23 W24 …....... W2n word3 W31 W32 W33 W34 …........ W3n wordm Wm1 Wm2 Wm3 Wm4 …... Wmn ... doc1 doc2 doc3 doc4 ….... docn Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 12
  • 13. word1 W11 W12 W13 W14 …....... W1n word2 W21 W22 W23 W24 …....... W2n word3 W31 W32 W33 W34 …........ W3n wordm Wm1 Wm2 Wm3 Wm4 …... Wmn ... Entity Relatedness: State of the Art Distributional Semantics doc1 doc2 doc3 doc4 ….... docn Implicit/Latent Semantic Analysis (LSA) Transforms sparse document space into a dense latent topic space Latent Dirichlet Allocation (LDA) (Blei et al., 2003) Latent Semantic Analysis (LSA) (Deerwester et al., 1990) Neural Embeddings (Word2Vec) (Mikolov et al., 2013) n ~ 1M word1 W11 W12 ……..... W1k word2 W21 W22 ……..... W2k wordm Wm1 Wm2 ……..... Wmk ... topic1 topic2 … topick Dimensionality Reduction k < 1000 Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 13
  • 14. Limitation of Text Relatedness Measures • Compositionality • Most of the entities are multiword expressions • Vector(Brad Pitt) = Vector(Brad) + Vector(Pitt) ? • Ambiguity • Vector of an entity with ambiguous name like “Nice” (French city) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 14
  • 15. Chapter IV Distributional Semantics for Entity Relatedness (DiSER) entity1 W11 W12 W13 W14 …....... W1n entity2 W21 W22 W23 W24 …....... W2n entity3 W31 W32 W33 W34 …........ W3n entityn Wn1 Wn2 Wn3 Wn4 …... Wnn ... doc1 doc2 doc3 doc4 ….... docn Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014 [Steve Jobs] co-founded Apple in 1976 to sell Wozniak’s [Apple I] [Personal Computer]. [Steve Jobs | Jobs] was CEO of [Apple Inc. | Apple] and largest shareholder of [Pixar]. Jobs is widely recognized as a pioneer of the [Microcomputer Revolution], along [Steve Wozniak | Wozniak]. Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion Annotated Wikipedia with entities One sense per document Wikipedia entities [Steve Jobs] [Apple Inc.| Apple] [Steve Wozniak | Wozniak]’ [Apple I] [Personal Computer]. [Steve Jobs | Jobs] was CEO of [Apple Inc. | Apple] and largest shareholdef [Pixar]. [Steve Jobs | Jobs] is widely recognizpioneer of the [Microcomputer Revolution], along [Steve Wozniak | Wozniak]. 15
  • 16. The Tree of Life (film) Falmouth, Cornwall World War Z (film) What Just Happened A Mighty Heart (film) Plan B Entertainment Jamaican Patois Richard: A Novel Sobriquet I Want a Famous Face Brad Pitt (DiSER) Damiani (jewelry company) University of Pittsburgh Band Brad Pitt Make It Right Foundation Pittsburgh men’s basketball Brangelina Pittsburgh Panthers baseball Pitt (Comics) Pitt River Brad Pitt filmography Brad Pitt (ESA) Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014 ESA vs DiSER Vector Chapter IV Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 16
  • 17. Entity Relatedness: Evaluation Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 17
  • 18. • Absolute relatedness score • Relatedness between “Apple Inc.” and “Steve Jobs” • Very low inter-annotator agreement • Relative relatedness score • Is “Steve Jobs” more related with “Apple Inc.” than “Bill Gates” • High inter-annotator agreement • KORE (Hoffart et al., 2012) • 21 seed entities • Every entity has list of 20 entities with their relatedness score • 420 entity pairs in total Entity Relatedness: Dataset Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 18
  • 19. Approaches Spearman Rank Correlation Graph-based measures Path-DBpedia (Hulpus et al., 2015) 0.610 WLM (Witten and Milne, 2008) 0.659 PPR (Agirre et al., 2015) 0.662 Corpus-based measures Word2Vec (Mikolov et al., 2013) 0.181 GloVe (Pennington et al., 2014) 0.194 LSA (Landauer et al., 1998) 0.375 KORE (Hoffart et al., 2012) 0.679 ESA (Gabrilovich and Markovitch, 2007) 0.691 DiSER 0.781 Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014 Results: KORE Dataset Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 19
  • 20. DiSER Vector for non-Wikipedia Entities Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 20
  • 21. BBC: http://www.bbc.com/news/world-europe-22204377 Article about Savita Context-DiSER Noun phrase extraction: StanfordNLP Entity linking: Prior probability Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 21
  • 22. Abortion Abortion-rights movement The Irish Times United States pro- life movement Vincent Browne Michael D. Higgins Context-DiSER Irish abortion law Death of Savita Galway University Hospital Miscarriage Catholic Country ……. Savita Halappanavar Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 22
  • 23. Approaches Spearman Rank Correlation KORE (state of the art) 0.679 Context-ESA 0.684 Context-DiSER (Manual linking) 0.769 Context-DiSER (Automatic linking) 0.719 Wikipedia-based Distributional Semantics for Entity Relatedness In: AAAI-FSS-2014 Context-DiSER: Results on KORE Dataset Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 23
  • 24. Thesis Overview Wikipedia Features for Entity Recommendation (WiFER) Feature Extraction Distributional Semantic for Entity Relatedness (DiSER) Distributional Representation Non-Orthogonal Explicit Semantic Analysis (NESA) Chapter V Chapter IV Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 24
  • 25. Entity Recommendation Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 25
  • 26. • Classical Recommendation Systems • Focus on personalized recommendation • Require user-item preferences • Entity Recommendation in Web Search (Blanco et al., 2013) • Co-occurrence features: query logs, query session, Flickr tags, tweets • Graph-based features: shared connections in Yahoo knowledge graph and others domain specific knowledge bases • Entity and Relation type in Knowledge graph • More than 100 features • Combines features using learning to rank Entity Recommendation: State of the Art Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 26
  • 27. Features: Prior Probability of Entity1 Prior Probability of Entity2 Joint Probability Conditional Probability Reverse Conditional Probability Cosine Similarity Pointwise Mutual Information Distributional Semantic Model Learning to Rank Leveraging Wikipedia Knowledge for Entity Recommendations In: ISWC 2015 Wikipedia-based Features for Entity Recommendation (WiFER) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 27
  • 28. Prior Probability of Entity1 Prior Probability of Entity2 Joint Probability Conditional Probability Cosine Similarity Pointwise Mutual Information Reverse Conditional Probability Distributional Semantic Model (ESA) Wikipedia Text Wikipedia Entities Prior Probability of Entity1 Prior Probability of Entity2 Joint Probability Conditional Probability Cosine Similarity Pointwise Mutual Information Reverse Conditional Probability Distributional Semantic Model (DiSER) Wikipedia-based Features for Entity Recommendation (WiFER) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 28
  • 29. • Learning to Rank • Gradient Boosted Decision Trees (GBDT) (Li Hang, 2011) • It builds the model in a stage-wise fashion • Dataset: Entity recommendation in web search • 4,797 web search queries (entities) • Every entity query has a list of entity candidates (47,623 entity-pairs) • All candidates are tagged on 5 label scales: Excellent, Prefer, Good, Fair, and Bad Combining Features Type Total instances Percentage Location 22,062 46.32 People 21,626 45.41 Movies 3,031 6.36 TV Shows 280 0.58 Album 563 1.18 Total 47,623 100 Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 29
  • 30. • Evaluation • Normalized discounted cumulative gain (NDCG@10) • 10 fold cross validation Features All Person Location Spark (Blanco et al., 2013) 0.9276 0.9479 0.8882 WiFER 0.9173 0.9431 0.8795 Spark+WiFER 0.9325 0.9505 0.8987 Insights into Entity Recommendation in Web Search In: IESD at ISWC, 2015 Entity Recommendation: Results Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 30
  • 31. Insights into Entity Recommendation in Web Search In: IESD at ISWC, 2015 Entity Recommendation: Feature Analysis in Spark+WiFER Relation type Cosine similarity over Flickr tags Probability of target entity over Wikipedia text corpus CF7 over Flickr tags DSM over Wikipedia entities corpus (DiSER) Conditional user probability over query terms DSM over Wikipedia text corpus (ESA) Probability of source entity over Wikipedia entities corpus Probability of target entity over Flickr tags Probability of target entity over Wikipedia entities corpus Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 31
  • 32. Thesis Overview Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion Wikipedia Features for Entity Recommendation (WiFER) Feature Extraction Distributional Semantic for Entity Relatedness (DiSER) Distributional Representation Non-Orthogonal Explicit Semantic Analysis (NESA) Chapter V Chapter IV Chapter VI 32
  • 33. Text Relatedness: Non-Orthogonal Explicit Semantic Analysis (NESA) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 33
  • 34. ESA assumes that related words share highly weighted concepts in their distributional vector Chapter VI Improving ESA with Document Similarity In: ECIR-2013 “soccer” History of Soccer in the United States Soccer in the United States United States Soccer Federation North American Soccer League United Soccer Leagues “football” FIFA Football History of association football Football in England Association football ESA(football, soccer) = 0.0 Orthogonality in ESA Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 34
  • 35. Chapter VI Improving ESA with Document Similarity In: ECIR-2013 “soccer” History of Soccer in the United States Soccer in the United States United States Soccer Federation North American Soccer League United Soccer Leagues “football” FIFA Football History of association football Football in England Association football NESA(football, soccer) = (FIFA x Soccer in the United States + FIFA x United Soccer Leagues ….) = 0.38 Non-Orthogonal Explicit Semantic Analysis (NESA) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 35
  • 36. • ESA: v1 and v2 are the n-dimensional vectors for words w1 and w2 • relESA (w1, w2) = v1 T . v2 • NESA: Correlation between vector dimensions • relNESA (w1,w2) = v1 T . C . v2 • C(n,n) = ET . E • Dimension correlation methods • DiSER scores between corresponding Wikipedia article Non-Orthogonal Explicit Semantic Analysis (NESA) Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 36
  • 37. • WN353 • 353 word pairs annotated by 13-15 experts on a scale of 1-10. • RG65 • 65 word pairs annotated by 51 experts on scale of 0-4 • MC30 • 30 word pairs annotated by 38 experts on scale of 0-1 • MT287 • 287 word pairs annotated by 10-12 experts on scale of 0-1 Word Relatedness Datasets Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 37
  • 38. Non-Orthogonal Explicit Semantic Analysis In: *SEM-2015 Chapter VI WN353 MC30 RG65 MT287 LSA 0.579 0.667 0.616 0.555 LSA (Wiki) 0.538 0.744 0.697 0.353 Word2Vec 0.663 0.824 0.751 0.560 ESA 0.66 0.765 0.826 0.507 NESA 0.696 0.784 0.839 0.572 Spearman rank correlation with word similarity gold standard datasets NESA: Results Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 38
  • 39. Non-Orthogonal Explicit Semantic Analysis In: *SEM-2015 Chapter VI NESA: Results • Word similarity vs relatedness (Agirre et al., 2009) • WN353Rel: 202 word pairs from WN353 • WN353Sim: 252 word pairs from WN353 Spearman rank correlation with word similarity vs relatedness datasets WN353Rel WN353Sim LSA 0.521 0.662 LSA (Wiki) 0.506 0.559 Word2Vec 0.601 0.741 ESA 0.643 0.663 NESA 0.663 0.719 Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 39
  • 40. Outline • Motivation • Entity Relatedness • Distributional Semantics for Entity Relatedness (DiSER) • Evaluation • Entity Recommendation • Wikipedia-based Features for Entity Recommendation (WiFER) • Evaluation • Text Relatedness • Non-Orthogonal Explicit Semantic Analysis (NESA) • Evaluation • Application and Industry Use Cases • Conclusion Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 40
  • 41. Chapter VIIhttp://enrg.insight-centre.org/ Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 41
  • 42. Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 42 EnRG SPARQL Endpoint National University of Ireland, Galway
  • 43. Industrial Use Cases Medical entity linking for question-answering and relationship explanation in Knowledge Graph Entity Recommendation in Web Search Company name disambiguation for social profiling Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 43
  • 44. • Entity Relatedness • Distributional Semantics for Entity Relatedness (DiSER) • Outperformed state of the art entity relatedness measures • Entity Recommendation • Wikipedia-based Features for Entity Recommendation (WiFER) • Effective features for entity recommendation in web search • Text Relatedness • Non-Orthogonal Explicit Semantic Analysis (NESA) • Outperformed other existing word relatedness measures • Entity Relatedness Graph (EnRG) • Contains all Wikipedia entities and their pre-computed relatedness scores • Contains distributional vectors for all Wikipedia entities Conclusion Motivation Entity Relatedness Entity Recommendation Text Relatedness Application Conclusion 45
  • 45. • Relationship explanation for recommended entities • Best path in knowledge graph • Best natural language description • Knowledge discovery • Analogy querying over knowledge graph e.g. Google to Motorola => Microsoft to ? • Example based querying e.g. Google to Motorola => ? to ? Future Research Directions 46

Notas do Editor

  1. Text on side does not look good
  2. Don’t call them concept try to be specific like technologies for SemWeb Change box in horizontal boxes not vertical onces
  3. Entity describe definitions in different communities and the definition we will carry out in our presentation Relatedness describe it with the notion of similarity vs relatedness by illustrating wordnet relations (taxonomic vs others), further describe a simple van diagram We do not distinguish between entity and concept like football player
  4. Entity describe definitions in different communities and the definition we will carry out in our presentation Relatedness describe it with the notion of similarity vs relatedness by illustrating wordnet relations (taxonomic vs others), further describe a simple van diagram Relatedness reflects the degree of associativity, connectivity Relatedness score: University => Student, building Similarity score => Student, building Relatedness score: University => Student, bio lab Similarity score => Student, bio lab
  5. Change box to chapter names Entity Relatedness => DiSER
  6. Context-VSM and Context-ESA - Vector similarity between corresponding Wikipedia articles - ESA score between corresponding Wikipedia article
  7. Change to diser explanation
  8. Change to diser explanation
  9. Backup slide on Vector composition Lucene based “Brad Pitt”
  10. Add wiki markups to show one sense per document
  11. Highlight the relevant articles in both vectors
  12. Merge next slide with one
  13. Explain entity disambiguation for context-diser
  14. Wikipedia text and entity tagged One thing to notice: We only get the articles that contain the given entity as wikipedia links not only world So, It performs better than text DSM
  15. Describe GBDT
  16. Change table
  17. Change table
  18. Change x to pairwise sim symbol
  19. Change to equation Consistency in subscript and superscript
  20. \textbf{MC30} It contains 30 pairs of noun and their relatedness score are on the scale of 0-4. This dataset was prepared by Miller and Charles(1991). The score was provided by 38 human experts.\\\\ WN353 It contains 353 pairs of word annotated by 13-15 human experts on a scale of 0-10. 10 stands for highly related and 0 stands for unrelated. It containes has generic words as well as named entities.\\\\ \textbf{WN353Sim and WN353Rel} The WN353 dataset was refined by Agirre at el. (2009). It contained similar and related pair of words. Two words are similar if they are connected through the taxonomic relation like synonym or hyponym. Two words are related if they are connected through relations like meronym or holonym. WN353Rel and WN353Sim contain 252 and 203 pair of words respectively.\\\\ \textbf{RG65} It contains 65 pair of non-technical word pair. It was annotated by 51 human experts.\\\\ \textbf{MT771} It has 771 pairs of words and their relatedness score. The words are very generic and varying from all kinds of domains.\\\\ \textbf{MT287} It has 287 pairs of words and their relatedness score, prepared by using Amazon Mechanical Turck (MT).
  21. Backup slide on similarity vs relatedness
  22. Context-VSM and Context-ESA - Vector similarity between corresponding Wikipedia articles - ESA score between corresponding Wikipedia article
  23. Change screenshot with better quality
  24. Change to screen shot from a Sparql editor with color encoding
  25. Remove Similarity and relatedness thing and explain more on relationship explanation