SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Distributional Semantics
Natural Language Processing
Emory University
Jinho D. Choi
Distributional Semantics
2
Quantify and categorize semantic similarities between items
using distributional properties in Big Data.
Never-Ending Language Learning
http://rtw.ml.cmu.edu/rtw
Entity extraction
http://conceptnet5.media.mit.edu
ConceptNet
Relation extraction
http://word2vec.googlecode.com
Word2Vec
Word embedding
Language Model
3
Chain rule
Markov assumption
wn
1 = w1, . . . , wn
P(wn
1 ) = P(w1) · P(w2|w1) · P(w3|w2) · · · P(wn|wn 1)
P(wn
1 ) =
nY
i=1
P(wi|wi 1)
P(w1|w0) artificial word inserted before all sentences
P(wn
1 ) = P(w1) · P(w2|w1) · P(w3|w2
1) · · · P(wn|wn 1
1 )
Language Class Model
4
Assume that each word w ∈ V belongs to a class c ∈ C.
Language Model
Language Class Model
the class that

contains wi
V = {w1, . . . , wn} C = {c1, . . . , ck}
P(wn
1 ) =
nY
i=1
P(wi|wi 1)
P(wn
1 ) =
nY
i=1
P(wi|ci) · P(ci|ci 1)
log P(wn
1 ) =
nX
i=1
log P(wi|ci) · P(ci|ci 1)
Cluster Quality
5
Quality
log P(wn
1 ) =
nX
i=1
log P(wi|ci) · P(ci|ci 1)
Bigrams: (w, w’), where w’ is a word prior to w.
Q(C) =
1
n
nX
i=1
log P(wi|ci) · P(ci|ci 1)
Q(C) =
X
w,w0
n(w, w0
)
n
log P(w0
|c0
) · P(c0
|c)
P(w, w0
)
Cluster Quality
6
Q(C) =
X
w,w0
n(w, w0
)
n
log P(w0
|c0
) · P(c0
|c)
=
X
w,w0
n(w, w0
)
n
log
n(w0
)
n(c0)
·
n(c, c0
)
n(c)
=
X
w,w0
n(w, w0
)
n
log
n(w0
)
n
·
n · n(c, c0
)
n(c) · n(c0)
=
X
w,w0
n(w, w0
)
n
log
n(w0
)
n
+
X
w,w0
n(w, w0
)
n
log
n · n(c, c0
)
n(c) · n(c0)
=
X
w0
n(w0
)
n
log
n(w0
)
n
+
X
c,c0
n(c, c0
)
n
log
n · n(c, c0
)
n(c) · n(c0)
Cluster Quality
7
=
X
w0
n(w0
)
n
log
n(w0
)
n
+
X
c,c0
n(c, c0
)
n
log
n
n · n(c,c0
)
n
n(c)
n · n(c0)
n
=
X
w0
P(w0
) log P(w0
) +
X
c,c0
P(c, c0
) log
P(c, c0
)
P(c) · P(c0)
Entropy Mutual Information
=
X
w0
n(w0
)
n
log
n(w0
)
n
+
X
c,c0
n(c, c0
)
n
log
n · n(c, c0
)
n(c) · n(c0)
Brown Clustering
8
V = {w1, . . . , wn} C = {c1, . . . , ck}
Each word is assigned to a unique cluster → n clusters.
Initial State
Each word is assigned to one of clusters ∈ C → k clusters.
Terminal State
Complexity?Run n - k merge steps:
Pick two clusters ci and cj that maximizes the quality.
arg max
i,j
Q(ci [ cj)
Brown Clustering
9
Assign the top k most frequent words to unique clusters.
Run n - k merge steps: k clusters
k+1 clusters
k clusters
Run k merge steps to be completely hierarchical.
Assign the next most frequent word to a unique cluster.
Pick two clusters ci and cj that maximizes the quality.
arg max
i,j
Q(ci [ cj)
Brown Clustering
10
1
11 10
101 100
0
111 110 01 00
apple pear boy girl said reported
1 0
1
1 1
1
Term Document Matrix
11
x1,1 x1,n
xm,1 xm,n
…
…
…
…
…
tT
i
dj
Term frequency of ti
given a document dj
TF-IDF
xi,1 xi,n…tT
i Term similarity
x1,j xm,j…dj Document similarity
Latent Semantic Analysis
12
Low-rank approximation
Remove irrelevant terms or documents from the matrix
x1,1 x1,n
xm,1 xm,n
…
…
…
…
…
X =
Singular value decomposition
u1 um…
σ1 0
0 σn
…
…
…
…
… v1
vn
…
Orthogonal Matrix Diagonal Matrix Orthogonal Matrix
X = U ⋅ Σ ⋅VT
Latent Semantic Analysis
13
u1 um…
σ1 0
0 σn
…
…
…
…
…
v1
vn
…
Choose top-k singular values
U → M ✕ M Σ → M ✕ N VT → N ✕ N
U’ → M ✕ K Σ’ → K ✕ K V’T → K ✕ N
X’ = U’ ⋅ Σ’ ⋅V’T ← LSA matrix
Word Embeddings
14
Word vectors generated by neural networks.
0 1 1 1 0 1 1 0 0 0 0 0
“king” 0 1 1 1 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0“man”
0 0 0 0 0 1 1 0 0 0 0 0“woman”
“queen”?
Each dimension in word vectors captures distributional semantics.
https://code.google.com/p/word2vec/
royal male
female
Generative vs. Discriminative
15
Predict wi given {wi-2, wi-1, wi+1, wi+2}
Generative model
Discriminative model
x = bow(wi 2, wi 1, wi+1, wi+2) 2 R1⇥n
w 2 Rn⇥d
v 2 Rd⇥n
wi = arg max
⇤
P(w⇤|wi 2, wi 1, wi+1, wi+2)
wi = arg max
i
(x · w · v)
vocabulary
size
embedding size
Word2Vec
16
Input
…
x 2 Rv
Hidden
…
h 2 Rd
Output
…
ˆy 2 Rv
w 2 Rv⇥d
v 2 Rd⇥v
embedding sizevocabulary size vocabulary size
Feed-Forward Neutral Network
wi 2

0.5
v
,
0.5
v
vi 0
Bag-of-Words
17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Predict wi given {wi-2, wi-1, wi+1, wi+2}
1 1 1 1
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
xi+2xi+1xixi 1xi 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01
sigmoid
1
|B|
X
j2B
wj · xj
vi · h? ?
0 0 0 0 0
Negative Sampling
Negative Sampling
18
w0 w0 w0 w0 … w1 w1 w2 … w2 w2 … wn
vocabulary size * embedding size * 10
Words are proportionally distributed by their counts.
dist(wi) =
|wi|
3
4
P
8j |wj|
3
4
distribution ratio
Randomly select negative words from the distribution.
Sub-Sampling
19
Randomly discard high frequent words from BOW.
d =
r
|w|
s
+ 1
!
·
✓
s
|w|
◆
⇡
r
s
|w|
d
0
2
4
6
8
10
12
14
word count / sample size
0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9
Choose a random number r
and skip the word s.t. d < r
subsample threshold * total word count
word count
Skip-Grams
20
xi-1 xi+1xi
x* xi x*
0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v =
Predict {wi-2, wi-1, wi+1, wi+2} given wi

Mais conteúdo relacionado

Mais procurados

1 7 More Dist & Combining Like Terms
1 7 More Dist & Combining Like Terms1 7 More Dist & Combining Like Terms
1 7 More Dist & Combining Like TermsKathy Favazza
 
Newton's forward & backward interpolation
Newton's forward & backward interpolationNewton's forward & backward interpolation
Newton's forward & backward interpolationHarshad Koshti
 
Functions of several variables
Functions of several variablesFunctions of several variables
Functions of several variableslord
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Matthew Leingang
 
Additional notes EC220
Additional notes EC220Additional notes EC220
Additional notes EC220Guo Xu
 
BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8Richard Ng
 
INTEGRATION BY PARTS PPT
INTEGRATION BY PARTS PPT INTEGRATION BY PARTS PPT
INTEGRATION BY PARTS PPT 03062679929
 
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Mauricio Vargas 帕夏
 
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESAsymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESNITESH POONIA
 
CP 2011 Poster
CP 2011 PosterCP 2011 Poster
CP 2011 PosterSAAM007
 

Mais procurados (20)

Alg1 lesson 9-5
Alg1 lesson 9-5Alg1 lesson 9-5
Alg1 lesson 9-5
 
Functions limits and continuity
Functions limits and continuityFunctions limits and continuity
Functions limits and continuity
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
1 7 More Dist & Combining Like Terms
1 7 More Dist & Combining Like Terms1 7 More Dist & Combining Like Terms
1 7 More Dist & Combining Like Terms
 
Newton's forward & backward interpolation
Newton's forward & backward interpolationNewton's forward & backward interpolation
Newton's forward & backward interpolation
 
Functions of several variables
Functions of several variablesFunctions of several variables
Functions of several variables
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)
 
Additional notes EC220
Additional notes EC220Additional notes EC220
Additional notes EC220
 
125 5.2
125 5.2125 5.2
125 5.2
 
Algebra [project]
Algebra [project]Algebra [project]
Algebra [project]
 
Lagrange
LagrangeLagrange
Lagrange
 
AML
AMLAML
AML
 
BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8
 
INTEGRATION BY PARTS PPT
INTEGRATION BY PARTS PPT INTEGRATION BY PARTS PPT
INTEGRATION BY PARTS PPT
 
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
 
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESAsymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
 
Integration by parts
Integration by partsIntegration by parts
Integration by parts
 
Integral calculus
Integral calculusIntegral calculus
Integral calculus
 
CP 2011 Poster
CP 2011 PosterCP 2011 Poster
CP 2011 Poster
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 

Semelhante a CS571: Distributional semantics

Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelSsu-Rui Lee
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxjacksnathalie
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes👋 Christopher Moody
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)Eelco Visser
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
20101017 program analysis_for_security_livshits_lecture02_compilers
20101017 program analysis_for_security_livshits_lecture02_compilers20101017 program analysis_for_security_livshits_lecture02_compilers
20101017 program analysis_for_security_livshits_lecture02_compilersComputer Science Club
 
"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del LinguaggioAlumni Mathematica
 
Declare Your Language: Name Resolution
Declare Your Language: Name ResolutionDeclare Your Language: Name Resolution
Declare Your Language: Name ResolutionEelco Visser
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
Talk icml
Talk icmlTalk icml
Talk icmlBo Li
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Abebe Admasu
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Olga Zinkevych
 
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Association for Computational Linguistics
 

Semelhante a CS571: Distributional semantics (20)

Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Declare Your Language (at DLS)
Declare Your Language (at DLS)Declare Your Language (at DLS)
Declare Your Language (at DLS)
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
20101017 program analysis_for_security_livshits_lecture02_compilers
20101017 program analysis_for_security_livshits_lecture02_compilers20101017 program analysis_for_security_livshits_lecture02_compilers
20101017 program analysis_for_security_livshits_lecture02_compilers
 
"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio
 
Declare Your Language: Name Resolution
Declare Your Language: Name ResolutionDeclare Your Language: Name Resolution
Declare Your Language: Name Resolution
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
Talk icml
Talk icmlTalk icml
Talk icml
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
 

Mais de Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

Mais de Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

CS571: Distributional semantics

  • 1. Distributional Semantics Natural Language Processing Emory University Jinho D. Choi
  • 2. Distributional Semantics 2 Quantify and categorize semantic similarities between items using distributional properties in Big Data. Never-Ending Language Learning http://rtw.ml.cmu.edu/rtw Entity extraction http://conceptnet5.media.mit.edu ConceptNet Relation extraction http://word2vec.googlecode.com Word2Vec Word embedding
  • 3. Language Model 3 Chain rule Markov assumption wn 1 = w1, . . . , wn P(wn 1 ) = P(w1) · P(w2|w1) · P(w3|w2) · · · P(wn|wn 1) P(wn 1 ) = nY i=1 P(wi|wi 1) P(w1|w0) artificial word inserted before all sentences P(wn 1 ) = P(w1) · P(w2|w1) · P(w3|w2 1) · · · P(wn|wn 1 1 )
  • 4. Language Class Model 4 Assume that each word w ∈ V belongs to a class c ∈ C. Language Model Language Class Model the class that
 contains wi V = {w1, . . . , wn} C = {c1, . . . , ck} P(wn 1 ) = nY i=1 P(wi|wi 1) P(wn 1 ) = nY i=1 P(wi|ci) · P(ci|ci 1) log P(wn 1 ) = nX i=1 log P(wi|ci) · P(ci|ci 1)
  • 5. Cluster Quality 5 Quality log P(wn 1 ) = nX i=1 log P(wi|ci) · P(ci|ci 1) Bigrams: (w, w’), where w’ is a word prior to w. Q(C) = 1 n nX i=1 log P(wi|ci) · P(ci|ci 1) Q(C) = X w,w0 n(w, w0 ) n log P(w0 |c0 ) · P(c0 |c) P(w, w0 )
  • 6. Cluster Quality 6 Q(C) = X w,w0 n(w, w0 ) n log P(w0 |c0 ) · P(c0 |c) = X w,w0 n(w, w0 ) n log n(w0 ) n(c0) · n(c, c0 ) n(c) = X w,w0 n(w, w0 ) n log n(w0 ) n · n · n(c, c0 ) n(c) · n(c0) = X w,w0 n(w, w0 ) n log n(w0 ) n + X w,w0 n(w, w0 ) n log n · n(c, c0 ) n(c) · n(c0) = X w0 n(w0 ) n log n(w0 ) n + X c,c0 n(c, c0 ) n log n · n(c, c0 ) n(c) · n(c0)
  • 7. Cluster Quality 7 = X w0 n(w0 ) n log n(w0 ) n + X c,c0 n(c, c0 ) n log n n · n(c,c0 ) n n(c) n · n(c0) n = X w0 P(w0 ) log P(w0 ) + X c,c0 P(c, c0 ) log P(c, c0 ) P(c) · P(c0) Entropy Mutual Information = X w0 n(w0 ) n log n(w0 ) n + X c,c0 n(c, c0 ) n log n · n(c, c0 ) n(c) · n(c0)
  • 8. Brown Clustering 8 V = {w1, . . . , wn} C = {c1, . . . , ck} Each word is assigned to a unique cluster → n clusters. Initial State Each word is assigned to one of clusters ∈ C → k clusters. Terminal State Complexity?Run n - k merge steps: Pick two clusters ci and cj that maximizes the quality. arg max i,j Q(ci [ cj)
  • 9. Brown Clustering 9 Assign the top k most frequent words to unique clusters. Run n - k merge steps: k clusters k+1 clusters k clusters Run k merge steps to be completely hierarchical. Assign the next most frequent word to a unique cluster. Pick two clusters ci and cj that maximizes the quality. arg max i,j Q(ci [ cj)
  • 10. Brown Clustering 10 1 11 10 101 100 0 111 110 01 00 apple pear boy girl said reported 1 0 1 1 1 1
  • 11. Term Document Matrix 11 x1,1 x1,n xm,1 xm,n … … … … … tT i dj Term frequency of ti given a document dj TF-IDF xi,1 xi,n…tT i Term similarity x1,j xm,j…dj Document similarity
  • 12. Latent Semantic Analysis 12 Low-rank approximation Remove irrelevant terms or documents from the matrix x1,1 x1,n xm,1 xm,n … … … … … X = Singular value decomposition u1 um… σ1 0 0 σn … … … … … v1 vn … Orthogonal Matrix Diagonal Matrix Orthogonal Matrix X = U ⋅ Σ ⋅VT
  • 13. Latent Semantic Analysis 13 u1 um… σ1 0 0 σn … … … … … v1 vn … Choose top-k singular values U → M ✕ M Σ → M ✕ N VT → N ✕ N U’ → M ✕ K Σ’ → K ✕ K V’T → K ✕ N X’ = U’ ⋅ Σ’ ⋅V’T ← LSA matrix
  • 14. Word Embeddings 14 Word vectors generated by neural networks. 0 1 1 1 0 1 1 0 0 0 0 0 “king” 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0“man” 0 0 0 0 0 1 1 0 0 0 0 0“woman” “queen”? Each dimension in word vectors captures distributional semantics. https://code.google.com/p/word2vec/ royal male female
  • 15. Generative vs. Discriminative 15 Predict wi given {wi-2, wi-1, wi+1, wi+2} Generative model Discriminative model x = bow(wi 2, wi 1, wi+1, wi+2) 2 R1⇥n w 2 Rn⇥d v 2 Rd⇥n wi = arg max ⇤ P(w⇤|wi 2, wi 1, wi+1, wi+2) wi = arg max i (x · w · v) vocabulary size embedding size
  • 16. Word2Vec 16 Input … x 2 Rv Hidden … h 2 Rd Output … ˆy 2 Rv w 2 Rv⇥d v 2 Rd⇥v embedding sizevocabulary size vocabulary size Feed-Forward Neutral Network wi 2  0.5 v , 0.5 v vi 0
  • 17. Bag-of-Words 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Predict wi given {wi-2, wi-1, wi+1, wi+2} 1 1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ xi+2xi+1xixi 1xi 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 sigmoid 1 |B| X j2B wj · xj vi · h? ? 0 0 0 0 0 Negative Sampling
  • 18. Negative Sampling 18 w0 w0 w0 w0 … w1 w1 w2 … w2 w2 … wn vocabulary size * embedding size * 10 Words are proportionally distributed by their counts. dist(wi) = |wi| 3 4 P 8j |wj| 3 4 distribution ratio Randomly select negative words from the distribution.
  • 19. Sub-Sampling 19 Randomly discard high frequent words from BOW. d = r |w| s + 1 ! · ✓ s |w| ◆ ⇡ r s |w| d 0 2 4 6 8 10 12 14 word count / sample size 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9 Choose a random number r and skip the word s.t. d < r subsample threshold * total word count word count
  • 20. Skip-Grams 20 xi-1 xi+1xi x* xi x* 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 v = Predict {wi-2, wi-1, wi+1, wi+2} given wi