SlideShare uma empresa Scribd logo
1 de 146
Baixar para ler offline
lda2vec
(word2vec, and lda)
Christopher Moody
@ Stitch Fix
Welcome, 

thanks for coming, having me, organizer

NLP can be a messy affair because you have to teach a computer about the irregularities and ambiguities of the English
language 

and have to teach it this sort of hierarchical & sparse nature of english grammar & vocab

3rd trimester, pregnant

“wears scrubs” — medicine

taking a trip — a fix for vacation clothing

power and promise of word vectors is to sweep away a lot of issues
About
@chrisemoody
Caltech Physics
PhD. in astrostats supercomputing
sklearn t-SNE contributor
Data Labs at Stitch Fix
github.com/cemoody
Gaussian Processes t-SNE
chainer
deep learning
Tensor Decomposition
word2vec
lda
1
2
3lda2vec
1. king - man + woman = queen
2. Huge splash in NLP world
3. Learns from raw text
4. Pretty simple algorithm
5. Comes pretrained
word2vec
1. Learns what words mean — can solve analogies cleanly.

1. Not treating words as blocks, but instead modeling relationships 

2. Distributed representations form the basis of more complicated deep learning systems

3.
1. Set up an objective function
2. Randomly initialize vectors
3. Do gradient descent
word2vec
w
ord2vec
word2vec: learn word vector w
from it’s surrounding context
w
1. not mention neural networks

2. Let’s talk about training first

3. n-grams transition probability vs tf-idf / LSI co-occurence matrices

4. Here we will learn the embedded representation directly, with no intermediates, update it w/ every example
w
ord2vec
“The fox jumped over the lazy dog”
Maximize the likelihood of seeing the words given the word over.
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
…instead of maximizing the likelihood of co-occurrence counts.
1. Context — the words surrounding the training word

2. Naively assume, BoW, not recurrent, no state

3. Still a pretty simple assumption!

Conditioning on just *over* no other secret parameters or anything
w
ord2vec
P(fox|over)
What should this be?
w
ord2vec
P(vfox|vover)
Should depend on the word vectors.
P(fox|over)
Trying to learn the word vectors, so let’s start with those

(we’ll randomly initialize them to begin with)
w
ord2vec
“The fox jumped over the lazy dog”
P(w|c)
Extract pairs from context window around every input word.
w
ord2vec
“The fox jumped over the lazy dog”
c
P(w|c)
Extract pairs from context window around every input word.
IN = training word = context
w
ord2vec
“The fox jumped over the lazy dog”
w
P(w|c)
c
Extract pairs from context window around every input word.
w
ord2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
w
ord2vec
“The fox jumped over the lazy dog”
P(w|c)
w c
Extract pairs from context window around every input word.
w
ord2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
w
ord2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
w
ord2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
inner most for-loop

v_in was fix over the for loop

increment v_in to point to the next word
w
ord2vec
P(w|c)
w c
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
w
ord2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
w
ord2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
w
ord2vec
P(w|c)
cw
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
w
ord2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
…So that at a high level is what we want word2vec to do.
w
ord2vec
P(w|c)
c w
“The fox jumped over the lazy dog”
Extract pairs from context window around every input word.
called skip grams

…So that at a high level is what we want word2vec to do.

two for loops
objective
Measure loss between
w and c?
How should we define P(w|c)?
Now we’ve defined the high-level update path for the algorithm.

Need to define this prob exactly in order to define our updates.

Boils down to diff between word & context — want to make as similar as possible, and then the probability will go up.
objective
w . c
How should we define P(w|c)?
Measure loss between
w and c?
Use cosine sim.

could imagine euclidean dist, mahalonobis dist
w
ord2vec
w . c ~ 1
objective
w
c
vcanada . vsnow ~ 1
Dot product has these properties:

Similar vectors have similarly near 1 (if normed)
w
ord2vec
w . c ~ 0
objective
w
c
vcanada . vdesert ~0
Orthogonal vectors have similarity near 0
w
ord2vec
w . c ~ -1
objective
w
c
Orthogonal vectors have similarity near -1
w
ord2vec
w . c ∈ [-1,1]
objective
But the inner product ranges from -1 to 1 (when normalized)
w
ord2vec
But we’d like to measure a probability.
w . c ∈ [-1,1]
objective
…and we’d like a probability
w
ord2vec
But we’d like to measure a probability.
objective
∈ [0,1]σ(c·w)
Transform again using sigmoid
w
ord2vec
But we’d like to measure a probability.
objective
∈ [0,1]σ(c·w)
w
c
w
c
SimilarDissimilar
Transform again using sigmoid
w
ord2vec
Loss function:
objective
L=σ(c·w)
Logistic (binary) choice.
Is the (context, word) combination from our dataset?
Are these 2 similar? 

This is optimized with identical vectors… everything looks exactly the same
w
ord2vec
The skip-gram negative-sampling model
objective
Trivial solution is that context = word for all vectors
L=σ(c·w)
w
c
no contrast! 

add negative samples
w
ord2vec
The skip-gram negative-sampling model
L = σ(c·w) + σ(-c·wneg)
objective
Draw random words in vocabulary.
no contrast! 

add negative samples

discrimanate that this (w, c) is from observed vocabulary, this is a randomly drawn word

example: (fox, jumped) but (-fox, career)

no popularity guessing as in softmax
w
ord2vec
The skip-gram negative-sampling model
objective
Discriminate positive from negative samples
Multiple Negative
L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg)
mult samples

(fox, jumped)

not:

(fox, career)

(fox, android)

(fox, sunlight)

remind: this L is being computed, and we’ll nudge all of the values of c & w via GD to optimize L

so that’s SGNS / w2v

That’s it! A bit disservice to make this a giant network
w
ord2vec
The SGNS Model
PM
I
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
Levy & Goldberg 2014
L = σ(c·w) + σ(-c·wneg)
explain what matrix is, row, col, entries

e.g. can solve word2vec via SVD if we want deterministically

One of the most cited NLP papers of 2014
w
ord2vec
The SGNS Model
PM
I
Levy & Goldberg 2014
‘traditional’ NLP
L = σ(c·w) + σ(-c·wneg)
ci·wj = PMI(Mij) - log k
…is extremely similar to matrix factorization!
Dropped indices for readability

most cited because connection to PMI

Info-theoretic measure association of (w and c)
w
ord2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PM
I ci·wj = log
Levy & Goldberg 2014
#(ci,wj)/n
k #(wj)/n #(ci)/n
‘traditional’ NLP
instead of looping over all words, you can count and roll them up the way

props are just counts divided by number of obs

word2vec adds this bias term

cool that all of for looping did before can be reduced to this form
w
ord2vec
The SGNS Model
L = σ(c·w) + Σσ(-c·w)
PM
I ci·wj = log
Levy & Goldberg 2014
popularity of c,w
k (popularity of c) (popularity of w)
‘traditional’ NLP
props are just counts divided by number of obs

word2vec adds this bias term

down weights rare terms

More frequent words are weighted more so than infrequent words. Rare words are downweighted.

Make sense given that words distribution follows Zipf law.

This weighting is what makes SGNS so powerful.

theoretical is nice, but practical is even better
w
ord2vec
PM
I
99% of word2vec
is counting.
And you can count
words in SQL
w
ord2vec
PM
I
Count how many times
you saw c·w
Count how many times
you saw c
Count how many times
you saw w
w
ord2vec
PM
I …and this takes ~5 minutes to compute on a single core.
Computing SVD is a completely standard math library.
word2vec
explain table
intuition about word vectors
Showing just 2 of the ~500 dimensions. Effectively we’ve PCA’d it

only 4 of 100k words
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
If we only had locality and not regularity, this wouldn’t necessarily be true
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
So we live in a vector space where operations like addition and subtraction are semantically meaningful. 

So here’s a few examples of this working.

Really get the idea of these vectors as being ‘mixes’ of other ideas & vectors
ITEM_3469 + ‘Pregnant’
SF is a person service

Box
+ ‘Pregnant’
I love the stripes and the cut around my neckline was amazing

someone else might write ‘grey and black’

subtlety and nuance in that language

For some items, we have several times the collected works of shakespeare
= ITEM_701333
= ITEM_901004
= ITEM_800456
Stripes and are safe for maternity

And also similar tones and flowy — still great for expecting mothers
what about?LDA?
LDA
on Client Item
Descriptions
This shows the incredible amount of structure
LDA
on Item
Descriptions
(with Jay)
clunky jewelry

dangling delicate jewelry elsewhere
LDA
on Item
Descriptions
(with Jay)
topics on patterns, styles — this cluster is similarly described as high contrast tops with popping colors
LDA
on Item
Descriptions
(with Jay)
bright dresses for a warm summer

LDA helps us model topics over documents in an interpretable way
lda vs word2vec
Bayesian Graphical ModelML Neural Model
word2vec is local:
one word predicts a nearby word
“I love finding new designer brands for jeans”
as if the world where one very long text string. no end of documents, no end of sentence, etc. 

and a window across words
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
“I love finding new designer brands for jeans”
In LDA, documents globally predict words.
doc 7681
these are client comment which are short, only predict dozens of words

but could be legal documents, or medical documents, 10k words
typical word2vec vector
[ 0%, 9%, 78%, 11%]
typical LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
All sum to 100%All real values
5D word2vec vector
[ 0%, 9%, 78%, 11%]
5D LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
Sparse
All sum to 100%
Dimensions are absolute
Dense
All real values
Dimensions relative
much easier to say to another human 78% than it is +2.2 of something and -1.25 of something else

w2v an address — 200 main st. — figure out from neighbors

LDA is a *mixture* model

78% of some ingredient

but 

w2v isn’t -1.25 of some ingredient

ingredient = topics
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Sparse
All sum to 100%
Dimensions are absolute
Dense
All real values
Dimensions relative
dense sparse
lda is sparse, has 95 dims close to zero
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Similar in fewer ways
(more interpretable)
Similar in 100D ways
(very flexible)
+mixture
+sparse
can we do both? lda2vec
series of exp

grain of salt
-1.9 0.85 -0.6 -0.3 -0.5
Lufthansa is a German airline and when
fox
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Lufthansa is a German airline and when
German
word2vec predicts locally:
one word predicts a nearby word
read example

We extract pairs of pivot and target words that occur in a moving window that scans across the corpus. For every pair, the
pivot word is used to predict the nearby target word.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
Document vector
predicts a word from
a global context
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German: French or Spanish

+airlines a document vector similar to the word vector for airline. 

we know we can add vectors

German + airline: like Lufthansa, Condor Flugdienst, and Aero Lloyd. 

A latent vector is randomly initialized for every document in the corpus. Very similar to doc2vec and paragraph vectors.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing
mixtures & sparsity!
German
Good for training sentiment models, great scores.

interpretability
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
We’re missing
mixtures & sparsity!
German
Too many documents.

about as interpretable a hash
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Now it’s a mixture.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
document X is +0.34 in topic 0, -0.1 in topic 2 and 0.17 in topic 3

model with 3 topics.

before had 500 DoF; now has just a few. 

better choose really good topics, because I only have a few available to summarize the entire document
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Trinitarian
baptismal
Pentecostals
Bede
schismatics
excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topics
Document weight
Each topic has a distributed representation that lives in the same space as the word vectors. While each topic is not literally
a token present in the corpus, it is similar to other tokens.
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 1 = “religion”
Trinitarian
baptismal
Pentecostals
Bede
schismatics
excommunication
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topics
Document weight
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Milosevic
absentee
Indonesia
Lebanese
Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topics
Document weight
notice one column over in the topic matrix
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
topic 2 = “politics”
Milosevic
absentee
Indonesia
Lebanese
Isrealis
Karadzic
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
#topics
Document weight
topic vectors, document vectors, and word vectors all live in the same space
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
The document weights are softmax transformed weights to yield the document proportions.

similar to logistic, but instead of 0-1 now a vectors of %

100% and indicates the topic proportions of a single document

one document might be 41% in topic 0, 26% in topic 1, and 34% in topic 3

very close to LDA-like representations
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
1st time i did this, still very dense in percentages. so if 100 topics, had 1% each

might be, but still dense!

a zillion categories, 1% of this, 1% of this, 1% of this…

mathematically works, addition of lots of little bits (distributed)
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
Sparsity!
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
34% 32% 34%
t=0
41% 26% 34%
t=10
99% 1% 0%
t=∞
tim
e
init balanced, but dense

Dirichlet likelihood loss encourages proportions vectors to become sparser over time.

relatively simple 

start pink, go to white and sparse
0.34 -0.1 0.17
41% 26% 34%
-1.4 -0.5 -1.4
-1.9-1.7 0.75
0.96-0.7 -1.9
-0.2-1.1 0.6
-0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5
-2.6 0.45 -1.3 -0.6 -0.8
Lufthansa is a German airline and when
#topics
#topics
fox
#hiddenunits
#topics
#hidden units#hidden units
#hidden units
Skip grams from
sentences
Word vector
Negative sampling loss
Topic matrix
Document proportion
Document weight
Document vector
Context vector
x
+
Lufthansa is a German airline and when
German
end up with something that’s quite a bit more complicated

but it achieves our goals: mixes word vectors w/ sparse interpretable document representations
@chrisemoody
Example Hacker News comments
Word vectors:
https://github.com/cemoody/
lda2vec/blob/master/examples/
hacker_news/lda2vec/
word_vectors.ipynb
read examples

play around at home — on
@chrisemoody
Example Hacker News comments
Topics:
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/
hacker_news/lda2vec/lda2vec.ipynb
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/lda2vec.ipynb

topic 16 — sci, phys

topic 1 — housing

topic 8 — finance, bitcoin

topic 23 — programming languages

topic 6 — transportation

topic 3 — education
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
+ API docs
+ Examples
+ GPU
+ Tests
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
@chrisemoody
lda2vec.com
human-interpretable doc topics, use LDA.
machine-useable word-level features, use word2vec.
if you like to experiment a lot, and have
topics over user / doc / region / etc. features, use lda2vec.
(and you have a GPU)
If you want…
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
?@chrisemoody
Multithreaded
Stitch Fix
@chrisemoody
lda2vec.com
http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
Credit
Large swathes of this talk are from
previous presentations by:
• Tomas Mikolov
• David Blei
• Christopher Olah
• Radim Rehurek
• Omer Levy & Yoav Goldberg
• Richard Socher
• Xin Rong
• Tim Hopper
Richar & Xin Rong both ludic explanations of the word2vec gradient
“PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we model topics to sentences?
lda2lstm
Data Labs @ SF is all about mixing cutting edge algorithms but we absolutely need interpretability. 

initial vector is a dirichlet mixture — moves us from bag of words to sentence-level LDA

give a sent that’s 80% religion, 10% politics

word2vec on word level, LSTM on the sentence level, LDA on document level

Dirichlet-squeeze internal states and manipulations, that’ll help us understand the science of LSTM dynamics
Can we model topics to sentences?
lda2lstm
“PS! Thank you for such an awesome idea”doc_id=1846
@chrisemoody
Can we model topics to images?
lda2ae
TJ Torres
and now for something completely crazy
4
Fun
Stuff
translation
(using just a rotation
matrix)
M
ikolov
2013
English
Spanish
Matrix
Rotation
Blow mind

Explain plot

Not a complicated NN here

Still have to learn the rotation matrix — but it generalizes very nicely.

Have analogies for every linalg op as a linguistic operator

Robust framework and tools to do science on words
deepwalk
Perozzi
etal2014
learn word vectors from
sentences
“The fox jumped over the lazy dog”
vOUT vOUT vOUT vOUT vOUTvOUT
‘words’ are graph vertices
‘sentences’ are random walks on the
graph
word2vec
Playlists at
Spotify
context
sequence
learning
‘words’ are song indices
‘sentences’ are playlists
Playlists at
Spotify
context
Erik
Bernhardsson
Great performance on ‘related artists’
Fixes at
Stitch Fix
sequence
learning
Let’s try:
‘words’ are items
‘sentences’ are fixes
Fixes at
Stitch Fix
context
Learn similarity between styles
because they co-occur
Learn ‘coherent’ styles
sequence
learning
Fixes at
Stitch Fix?
context
sequence
learning
Got lots of structure!
Fixes at
Stitch Fix?
context
sequence
learning
Fixes at
Stitch Fix?
context
sequence
learning
Nearby regions are
consistent ‘closets’
What sorts of sequences do you have at Quora? What kinds of things can you learn from context?
?@chrisemoody
Multithreaded
Stitch Fix
context
dependent
Levy
&
G
oldberg
2014
Australian scientist discovers star with telescope
context +/- 2 words
context
dependent
context
Australian scientist discovers star with telescope
Levy
&
G
oldberg
2014
What if we
context
dependent
context
Australian scientist discovers star with telescope
context
Levy
&
G
oldberg
2014
context
dependent
context
BoW DEPS
topically-similar vs ‘functionally’ similar
Levy
&
G
oldberg
2014
?@chrisemoody
Multithreaded
Stitch Fix
Lda2vec text by the bay 2016 with notes
Crazy
Approaches
Paragraph Vectors
(Just extend the context window)
Content dependency
(Change the window grammatically)
Social word2vec (deepwalk)
(Sentence is a walk on the graph)
Spotify
(Sentence is a playlist of song_ids)
Stitch Fix
(Sentence is a shipment of five items)
See previous
CBOW
“The fox jumped over the lazy dog”
Guess the word
given the context
~20x faster.
(this is the alternative.)
vOUT
vIN vINvIN vIN
vIN vIN
SkipGram
“The fox jumped over the lazy dog”
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context
given the word
Better at syntax.
(this is the one we went over)
CBOW sums words vectors, loses the order in the sentence

Both are good at semantic relationships 

	 Child and kid are nearby

	 Or gender in man, woman

	 If you blur words over the scale of context — 5ish words, you lose a lot grammatical nuance

But skipgram preserves order 

	 Preserves the relationship in pluralizing, for example
lda2vec
vDOC = a vtopic1 + b vtopic2 +…
Let’s make vDOC sparse
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
lda2vec
This works! 😀 But vDOC isn’t as
interpretable as the topic vectors. 😔
vDOC = topic0 + topic1
Let’s say that vDOC ads
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
lda2vec
softmax(vOUT * (vIN+ vDOC))
we want k *sparse* topics
Shows that are many words similar to vacation actually come in lots of flavors

— wedding words (bachelorette, rehearsals)

— holiday/event words (birthdays, brunch, christmas, thanksgiving)

— seasonal words (spring, summer,)

— trip words (getaway)

— destinations
theory of lda2vec
lda2vec
pyLDAvis of lda2vec
lda2vec
LDA
Results
context
H
istory
I loved every choice in this fix!! Great job!
Great Stylist Perfect
There are k tags 

Issues	 

"Cancel Disappointed"	

Delivery	 

"Profile, Pinterest"	

"Weather Vacation"	 

“Corrections for Next"	 

"Wardrobe Mix"	 

"Requesting Specific"	 

"Requesting Department"	 

"Requesting Style"	 

"Style, Positive"	 

"Style, Neutral"
LDA
Results
context
H
istory
Body Fit
My measurements are 36-28-32. If that helps.
I like wearing some clothing that is fitted.
Very hard for me to find pants that fit right.
There are k tags 

Issues	 

"Cancel Disappointed"	

Delivery	 

"Profile, Pinterest"	

"Weather Vacation"	 

“Corrections for Next"	 

"Wardrobe Mix"	 

"Requesting Specific"	 

"Requesting Department"	 

"Requesting Style"	 

"Style, Positive"	 

"Style, Neutral"
LDA
Results
context
H
istory
Sizing
Really enjoyed the experience and the
pieces, sizing for tops was too big.
Looking forward to my next box!
Excited for next
There are k tags 

Issues	 

"Cancel Disappointed"	

Delivery	 

"Profile, Pinterest"	

"Weather Vacation"	 

“Corrections for Next"	 

"Wardrobe Mix"	 

"Requesting Specific"	 

"Requesting Department"	 

"Requesting Style"	 

"Style, Positive"	 

"Style, Neutral"
LDA
Results
context
H
istory
Almost Bought
It was a great fix. Loved the two items I
kept and the three I sent back were close!
Perfect
There are k tags 

Issues	 

"Cancel Disappointed"	

Delivery	 

"Profile, Pinterest"	

"Weather Vacation"	 

“Corrections for Next"	 

"Wardrobe Mix"	 

"Requesting Specific"	 

"Requesting Department"	 

"Requesting Style"	 

"Style, Positive"	 

"Style, Neutral"
All of the following ideas will change what
‘words’ and ‘context’ represent.
But we’ll still use the same w2v algo
paragraph
vector
What about summarizing documents?
On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector Normal skipgram extends C words before, and C words after.
IN
OUT OUT
Except we stay inside a sentence
On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
from	gensim.models	import	Doc2Vec		
fn	=	“item_document_vectors”		
model	=	Doc2Vec.load(fn)		
model.most_similar('pregnant')		
matches	=	list(filter(lambda	x:	'SENT_'	in	x[0],	matches))			
#	['...I	am	currently	23	weeks	pregnant...',		
#		'...I'm	now	10	weeks	pregnant...',		
#		'...not	showing	too	much	yet...',		
#		'...15	weeks	now.	Baby	bump...',		
#		'...6	weeks	post	partum!...',		
#		'...12	weeks	postpartum	and	am	nursing...',		
#		'...I	have	my	baby	shower	that...',		
#		'...am	still	breastfeeding...',		
#		'...I	would	love	an	outfit	for	a	baby	shower...']
sentence
search
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes

Mais conteúdo relacionado

Destaque

Discussion on the Distributed Search Engine
Discussion on the Distributed Search EngineDiscussion on the Distributed Search Engine
Discussion on the Distributed Search EngineYusuke Fujisaka
 
Fabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationFabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationTim Carnus
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Conor Duke
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vecKai Sasaki
 
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...Yuya Unno
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
Running Word2Vec with Chinese Wikipedia dump
Running Word2Vec with Chinese Wikipedia dumpRunning Word2Vec with Chinese Wikipedia dump
Running Word2Vec with Chinese Wikipedia dumpBilly Yang
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 

Destaque (10)

Discussion on the Distributed Search Engine
Discussion on the Distributed Search EngineDiscussion on the Distributed Search Engine
Discussion on the Distributed Search Engine
 
P2p search engine
P2p search engineP2p search engine
P2p search engine
 
Fabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationFabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical application
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
 
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...
 
Emnlp読み会資料
Emnlp読み会資料Emnlp読み会資料
Emnlp読み会資料
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
Running Word2Vec with Chinese Wikipedia dump
Running Word2Vec with Chinese Wikipedia dumpRunning Word2Vec with Chinese Wikipedia dump
Running Word2Vec with Chinese Wikipedia dump
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 

Semelhante a Lda2vec text by the bay 2016 with notes

Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherMLReview
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken DownChin Huan Tan
 
Recipe2Vec: Or how does my robot know what’s tasty
Recipe2Vec: Or how does my robot know what’s tastyRecipe2Vec: Or how does my robot know what’s tasty
Recipe2Vec: Or how does my robot know what’s tastyPyData
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptxGowrySailaja
 
Interview presentation
Interview presentationInterview presentation
Interview presentationJoseph Gubbins
 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semanticsJinho Choi
 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonMoscowDataFest
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
Paper dissected glove_ global vectors for word representation_ explained _ ...
Paper dissected   glove_ global vectors for word representation_ explained _ ...Paper dissected   glove_ global vectors for word representation_ explained _ ...
Paper dissected glove_ global vectors for word representation_ explained _ ...Nikhil Jaiswal
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Brian Ho
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 

Semelhante a Lda2vec text by the bay 2016 with notes (20)

Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
 
Recipe2Vec: Or how does my robot know what’s tasty
Recipe2Vec: Or how does my robot know what’s tastyRecipe2Vec: Or how does my robot know what’s tasty
Recipe2Vec: Or how does my robot know what’s tasty
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
Interview presentation
Interview presentationInterview presentation
Interview presentation
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
DISMATH_Part1
DISMATH_Part1DISMATH_Part1
DISMATH_Part1
 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Majorfinal
MajorfinalMajorfinal
Majorfinal
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
Paper dissected glove_ global vectors for word representation_ explained _ ...
Paper dissected   glove_ global vectors for word representation_ explained _ ...Paper dissected   glove_ global vectors for word representation_ explained _ ...
Paper dissected glove_ global vectors for word representation_ explained _ ...
 
Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017Deep learning Malaysia presentation 12/4/2017
Deep learning Malaysia presentation 12/4/2017
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 

Último

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 

Último (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 

Lda2vec text by the bay 2016 with notes

  • 1. lda2vec (word2vec, and lda) Christopher Moody @ Stitch Fix Welcome, thanks for coming, having me, organizer NLP can be a messy affair because you have to teach a computer about the irregularities and ambiguities of the English language and have to teach it this sort of hierarchical & sparse nature of english grammar & vocab 3rd trimester, pregnant “wears scrubs” — medicine taking a trip — a fix for vacation clothing power and promise of word vectors is to sweep away a lot of issues
  • 2. About @chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody Gaussian Processes t-SNE chainer deep learning Tensor Decomposition
  • 4. 1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained word2vec 1. Learns what words mean — can solve analogies cleanly. 1. Not treating words as blocks, but instead modeling relationships 2. Distributed representations form the basis of more complicated deep learning systems 3.
  • 5. 1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent word2vec
  • 6. w ord2vec word2vec: learn word vector w from it’s surrounding context w 1. not mention neural networks 2. Let’s talk about training first 3. n-grams transition probability vs tf-idf / LSI co-occurence matrices 4. Here we will learn the embedded representation directly, with no intermediates, update it w/ every example
  • 7. w ord2vec “The fox jumped over the lazy dog” Maximize the likelihood of seeing the words given the word over. P(the|over) P(fox|over) P(jumped|over) P(the|over) P(lazy|over) P(dog|over) …instead of maximizing the likelihood of co-occurrence counts. 1. Context — the words surrounding the training word 2. Naively assume, BoW, not recurrent, no state 3. Still a pretty simple assumption! Conditioning on just *over* no other secret parameters or anything
  • 9. w ord2vec P(vfox|vover) Should depend on the word vectors. P(fox|over) Trying to learn the word vectors, so let’s start with those (we’ll randomly initialize them to begin with)
  • 10. w ord2vec “The fox jumped over the lazy dog” P(w|c) Extract pairs from context window around every input word.
  • 11. w ord2vec “The fox jumped over the lazy dog” c P(w|c) Extract pairs from context window around every input word. IN = training word = context
  • 12. w ord2vec “The fox jumped over the lazy dog” w P(w|c) c Extract pairs from context window around every input word.
  • 13. w ord2vec P(w|c) w c “The fox jumped over the lazy dog” Extract pairs from context window around every input word.
  • 14. w ord2vec “The fox jumped over the lazy dog” P(w|c) w c Extract pairs from context window around every input word.
  • 15. w ord2vec P(w|c) c w “The fox jumped over the lazy dog” Extract pairs from context window around every input word.
  • 16. w ord2vec P(w|c) c w “The fox jumped over the lazy dog” Extract pairs from context window around every input word.
  • 17. w ord2vec P(w|c) c w “The fox jumped over the lazy dog” Extract pairs from context window around every input word. inner most for-loop v_in was fix over the for loop increment v_in to point to the next word
  • 18. w ord2vec P(w|c) w c “The fox jumped over the lazy dog” Extract pairs from context window around every input word.
  • 19. w ord2vec P(w|c) cw “The fox jumped over the lazy dog” Extract pairs from context window around every input word. …So that at a high level is what we want word2vec to do.
  • 20. w ord2vec P(w|c) cw “The fox jumped over the lazy dog” Extract pairs from context window around every input word. …So that at a high level is what we want word2vec to do.
  • 21. w ord2vec P(w|c) cw “The fox jumped over the lazy dog” Extract pairs from context window around every input word. …So that at a high level is what we want word2vec to do.
  • 22. w ord2vec P(w|c) c w “The fox jumped over the lazy dog” Extract pairs from context window around every input word. …So that at a high level is what we want word2vec to do.
  • 23. w ord2vec P(w|c) c w “The fox jumped over the lazy dog” Extract pairs from context window around every input word. called skip grams …So that at a high level is what we want word2vec to do. two for loops
  • 24. objective Measure loss between w and c? How should we define P(w|c)? Now we’ve defined the high-level update path for the algorithm. Need to define this prob exactly in order to define our updates. Boils down to diff between word & context — want to make as similar as possible, and then the probability will go up.
  • 25. objective w . c How should we define P(w|c)? Measure loss between w and c? Use cosine sim. could imagine euclidean dist, mahalonobis dist
  • 26. w ord2vec w . c ~ 1 objective w c vcanada . vsnow ~ 1 Dot product has these properties: Similar vectors have similarly near 1 (if normed)
  • 27. w ord2vec w . c ~ 0 objective w c vcanada . vdesert ~0 Orthogonal vectors have similarity near 0
  • 28. w ord2vec w . c ~ -1 objective w c Orthogonal vectors have similarity near -1
  • 29. w ord2vec w . c ∈ [-1,1] objective But the inner product ranges from -1 to 1 (when normalized)
  • 30. w ord2vec But we’d like to measure a probability. w . c ∈ [-1,1] objective …and we’d like a probability
  • 31. w ord2vec But we’d like to measure a probability. objective ∈ [0,1]σ(c·w) Transform again using sigmoid
  • 32. w ord2vec But we’d like to measure a probability. objective ∈ [0,1]σ(c·w) w c w c SimilarDissimilar Transform again using sigmoid
  • 33. w ord2vec Loss function: objective L=σ(c·w) Logistic (binary) choice. Is the (context, word) combination from our dataset? Are these 2 similar? This is optimized with identical vectors… everything looks exactly the same
  • 34. w ord2vec The skip-gram negative-sampling model objective Trivial solution is that context = word for all vectors L=σ(c·w) w c no contrast! add negative samples
  • 35. w ord2vec The skip-gram negative-sampling model L = σ(c·w) + σ(-c·wneg) objective Draw random words in vocabulary. no contrast! add negative samples discrimanate that this (w, c) is from observed vocabulary, this is a randomly drawn word example: (fox, jumped) but (-fox, career) no popularity guessing as in softmax
  • 36. w ord2vec The skip-gram negative-sampling model objective Discriminate positive from negative samples Multiple Negative L = σ(c·w) + σ(-c·wneg) +…+ σ(-c·wneg) mult samples (fox, jumped) not: (fox, career) (fox, android) (fox, sunlight) remind: this L is being computed, and we’ll nudge all of the values of c & w via GD to optimize L so that’s SGNS / w2v That’s it! A bit disservice to make this a giant network
  • 37. w ord2vec The SGNS Model PM I ci·wj = PMI(Mij) - log k …is extremely similar to matrix factorization! Levy & Goldberg 2014 L = σ(c·w) + σ(-c·wneg) explain what matrix is, row, col, entries e.g. can solve word2vec via SVD if we want deterministically One of the most cited NLP papers of 2014
  • 38. w ord2vec The SGNS Model PM I Levy & Goldberg 2014 ‘traditional’ NLP L = σ(c·w) + σ(-c·wneg) ci·wj = PMI(Mij) - log k …is extremely similar to matrix factorization! Dropped indices for readability most cited because connection to PMI Info-theoretic measure association of (w and c)
  • 39. w ord2vec The SGNS Model L = σ(c·w) + Σσ(-c·w) PM I ci·wj = log Levy & Goldberg 2014 #(ci,wj)/n k #(wj)/n #(ci)/n ‘traditional’ NLP instead of looping over all words, you can count and roll them up the way props are just counts divided by number of obs word2vec adds this bias term cool that all of for looping did before can be reduced to this form
  • 40. w ord2vec The SGNS Model L = σ(c·w) + Σσ(-c·w) PM I ci·wj = log Levy & Goldberg 2014 popularity of c,w k (popularity of c) (popularity of w) ‘traditional’ NLP props are just counts divided by number of obs word2vec adds this bias term down weights rare terms More frequent words are weighted more so than infrequent words. Rare words are downweighted. Make sense given that words distribution follows Zipf law. This weighting is what makes SGNS so powerful. theoretical is nice, but practical is even better
  • 41. w ord2vec PM I 99% of word2vec is counting. And you can count words in SQL
  • 42. w ord2vec PM I Count how many times you saw c·w Count how many times you saw c Count how many times you saw w
  • 43. w ord2vec PM I …and this takes ~5 minutes to compute on a single core. Computing SVD is a completely standard math library.
  • 46. Showing just 2 of the ~500 dimensions. Effectively we’ve PCA’d it only 4 of 100k words
  • 57. If we only had locality and not regularity, this wouldn’t necessarily be true
  • 60. So we live in a vector space where operations like addition and subtraction are semantically meaningful. So here’s a few examples of this working. Really get the idea of these vectors as being ‘mixes’ of other ideas & vectors
  • 61. ITEM_3469 + ‘Pregnant’ SF is a person service Box
  • 62. + ‘Pregnant’ I love the stripes and the cut around my neckline was amazing someone else might write ‘grey and black’ subtlety and nuance in that language For some items, we have several times the collected works of shakespeare
  • 64. Stripes and are safe for maternity And also similar tones and flowy — still great for expecting mothers
  • 66. LDA on Client Item Descriptions This shows the incredible amount of structure
  • 67. LDA on Item Descriptions (with Jay) clunky jewelry dangling delicate jewelry elsewhere
  • 68. LDA on Item Descriptions (with Jay) topics on patterns, styles — this cluster is similarly described as high contrast tops with popping colors
  • 69. LDA on Item Descriptions (with Jay) bright dresses for a warm summer LDA helps us model topics over documents in an interpretable way
  • 72. word2vec is local: one word predicts a nearby word “I love finding new designer brands for jeans” as if the world where one very long text string. no end of documents, no end of sentence, etc. and a window across words
  • 73. “I love finding new designer brands for jeans” But text is usually organized. as if the world where one very long text string. no end of documents, no end of sentence, etc.
  • 74. “I love finding new designer brands for jeans” But text is usually organized. as if the world where one very long text string. no end of documents, no end of sentence, etc.
  • 75. “I love finding new designer brands for jeans” In LDA, documents globally predict words. doc 7681 these are client comment which are short, only predict dozens of words but could be legal documents, or medical documents, 10k words
  • 76. typical word2vec vector [ 0%, 9%, 78%, 11%] typical LDA document vector [ -0.75, -1.25, -0.55, -0.12, +2.2] All sum to 100%All real values
  • 77. 5D word2vec vector [ 0%, 9%, 78%, 11%] 5D LDA document vector [ -0.75, -1.25, -0.55, -0.12, +2.2] Sparse All sum to 100% Dimensions are absolute Dense All real values Dimensions relative much easier to say to another human 78% than it is +2.2 of something and -1.25 of something else w2v an address — 200 main st. — figure out from neighbors LDA is a *mixture* model 78% of some ingredient but w2v isn’t -1.25 of some ingredient ingredient = topics
  • 78. 100D word2vec vector [ 0%0%0%0%0% … 0%, 9%, 78%, 11%] 100D LDA document vector [ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2] Sparse All sum to 100% Dimensions are absolute Dense All real values Dimensions relative dense sparse lda is sparse, has 95 dims close to zero
  • 79. 100D word2vec vector [ 0%0%0%0%0% … 0%, 9%, 78%, 11%] 100D LDA document vector [ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2] Similar in fewer ways (more interpretable) Similar in 100D ways (very flexible) +mixture +sparse
  • 80. can we do both? lda2vec series of exp grain of salt
  • 81. -1.9 0.85 -0.6 -0.3 -0.5 Lufthansa is a German airline and when fox #hidden units Skip grams from sentences Word vector Negative sampling loss Lufthansa is a German airline and when German word2vec predicts locally: one word predicts a nearby word read example We extract pairs of pivot and target words that occur in a moving window that scans across the corpus. For every pair, the pivot word is used to predict the nearby target word.
  • 82. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when German Document vector predicts a word from a global context 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when German: French or Spanish +airlines a document vector similar to the word vector for airline. we know we can add vectors German + airline: like Lufthansa, Condor Flugdienst, and Aero Lloyd. A latent vector is randomly initialized for every document in the corpus. Very similar to doc2vec and paragraph vectors.
  • 83. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when We’re missing mixtures & sparsity! German Good for training sentiment models, great scores. interpretability
  • 84. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when We’re missing mixtures & sparsity! German Too many documents. about as interpretable a hash
  • 85. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when Now it’s a mixture. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when German document X is +0.34 in topic 0, -0.1 in topic 2 and 0.17 in topic 3 model with 3 topics. before had 500 DoF; now has just a few. better choose really good topics, because I only have a few available to summarize the entire document
  • 86. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when Trinitarian baptismal Pentecostals Bede schismatics excommunication 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 #topics Document weight Each topic has a distributed representation that lives in the same space as the word vectors. While each topic is not literally a token present in the corpus, it is similar to other tokens.
  • 87. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when topic 1 = “religion” Trinitarian baptismal Pentecostals Bede schismatics excommunication 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 #topics Document weight
  • 88. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when Milosevic absentee Indonesia Lebanese Isrealis Karadzic 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 #topics Document weight notice one column over in the topic matrix
  • 89. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when topic 2 = “politics” Milosevic absentee Indonesia Lebanese Isrealis Karadzic 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 #topics Document weight topic vectors, document vectors, and word vectors all live in the same space
  • 90. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when
  • 91. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when The document weights are softmax transformed weights to yield the document proportions. similar to logistic, but instead of 0-1 now a vectors of % 100% and indicates the topic proportions of a single document one document might be 41% in topic 0, 26% in topic 1, and 34% in topic 3 very close to LDA-like representations
  • 92. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 1st time i did this, still very dense in percentages. so if 100 topics, had 1% each might be, but still dense! a zillion categories, 1% of this, 1% of this, 1% of this… mathematically works, addition of lots of little bits (distributed)
  • 93. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when Sparsity! 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when 34% 32% 34% t=0 41% 26% 34% t=10 99% 1% 0% t=∞ tim e init balanced, but dense Dirichlet likelihood loss encourages proportions vectors to become sparser over time. relatively simple start pink, go to white and sparse
  • 94. 0.34 -0.1 0.17 41% 26% 34% -1.4 -0.5 -1.4 -1.9-1.7 0.75 0.96-0.7 -1.9 -0.2-1.1 0.6 -0.7 -0.4 -0.7 -0.3 -0.3-1.9 0.85 -0.6 -0.3 -0.5 -2.6 0.45 -1.3 -0.6 -0.8 Lufthansa is a German airline and when #topics #topics fox #hiddenunits #topics #hidden units#hidden units #hidden units Skip grams from sentences Word vector Negative sampling loss Topic matrix Document proportion Document weight Document vector Context vector x + Lufthansa is a German airline and when German end up with something that’s quite a bit more complicated but it achieves our goals: mixes word vectors w/ sparse interpretable document representations
  • 95. @chrisemoody Example Hacker News comments Word vectors: https://github.com/cemoody/ lda2vec/blob/master/examples/ hacker_news/lda2vec/ word_vectors.ipynb read examples play around at home — on
  • 96. @chrisemoody Example Hacker News comments Topics: http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/ hacker_news/lda2vec/lda2vec.ipynb http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/hacker_news/lda2vec/lda2vec.ipynb topic 16 — sci, phys topic 1 — housing topic 8 — finance, bitcoin topic 23 — programming languages topic 6 — transportation topic 3 — education
  • 98. + API docs + Examples + GPU + Tests @chrisemoody lda2vec.com http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
  • 99. @chrisemoody lda2vec.com human-interpretable doc topics, use LDA. machine-useable word-level features, use word2vec. if you like to experiment a lot, and have topics over user / doc / region / etc. features, use lda2vec. (and you have a GPU) If you want… http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
  • 102. Credit Large swathes of this talk are from previous presentations by: • Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper Richar & Xin Rong both ludic explanations of the word2vec gradient
  • 103. “PS! Thank you for such an awesome idea” @chrisemoody doc_id=1846 Can we model topics to sentences? lda2lstm Data Labs @ SF is all about mixing cutting edge algorithms but we absolutely need interpretability. initial vector is a dirichlet mixture — moves us from bag of words to sentence-level LDA give a sent that’s 80% religion, 10% politics word2vec on word level, LSTM on the sentence level, LDA on document level Dirichlet-squeeze internal states and manipulations, that’ll help us understand the science of LSTM dynamics
  • 104. Can we model topics to sentences? lda2lstm “PS! Thank you for such an awesome idea”doc_id=1846 @chrisemoody Can we model topics to images? lda2ae TJ Torres
  • 105. and now for something completely crazy 4 Fun Stuff
  • 106. translation (using just a rotation matrix) M ikolov 2013 English Spanish Matrix Rotation Blow mind Explain plot Not a complicated NN here Still have to learn the rotation matrix — but it generalizes very nicely. Have analogies for every linalg op as a linguistic operator Robust framework and tools to do science on words
  • 107. deepwalk Perozzi etal2014 learn word vectors from sentences “The fox jumped over the lazy dog” vOUT vOUT vOUT vOUT vOUTvOUT ‘words’ are graph vertices ‘sentences’ are random walks on the graph word2vec
  • 108. Playlists at Spotify context sequence learning ‘words’ are song indices ‘sentences’ are playlists
  • 110. Fixes at Stitch Fix sequence learning Let’s try: ‘words’ are items ‘sentences’ are fixes
  • 111. Fixes at Stitch Fix context Learn similarity between styles because they co-occur Learn ‘coherent’ styles sequence learning
  • 114. Fixes at Stitch Fix? context sequence learning Nearby regions are consistent ‘closets’ What sorts of sequences do you have at Quora? What kinds of things can you learn from context?
  • 117. context dependent context Australian scientist discovers star with telescope Levy & G oldberg 2014 What if we
  • 118. context dependent context Australian scientist discovers star with telescope context Levy & G oldberg 2014
  • 119. context dependent context BoW DEPS topically-similar vs ‘functionally’ similar Levy & G oldberg 2014
  • 122. Crazy Approaches Paragraph Vectors (Just extend the context window) Content dependency (Change the window grammatically) Social word2vec (deepwalk) (Sentence is a walk on the graph) Spotify (Sentence is a playlist of song_ids) Stitch Fix (Sentence is a shipment of five items)
  • 124. CBOW “The fox jumped over the lazy dog” Guess the word given the context ~20x faster. (this is the alternative.) vOUT vIN vINvIN vIN vIN vIN SkipGram “The fox jumped over the lazy dog” vOUT vOUT vIN vOUT vOUT vOUTvOUT Guess the context given the word Better at syntax. (this is the one we went over) CBOW sums words vectors, loses the order in the sentence Both are good at semantic relationships Child and kid are nearby Or gender in man, woman If you blur words over the scale of context — 5ish words, you lose a lot grammatical nuance But skipgram preserves order Preserves the relationship in pluralizing, for example
  • 125. lda2vec vDOC = a vtopic1 + b vtopic2 +… Let’s make vDOC sparse Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
  • 126. lda2vec This works! 😀 But vDOC isn’t as interpretable as the topic vectors. 😔 vDOC = topic0 + topic1 Let’s say that vDOC ads Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
  • 127. lda2vec softmax(vOUT * (vIN+ vDOC)) we want k *sparse* topics
  • 128. Shows that are many words similar to vacation actually come in lots of flavors — wedding words (bachelorette, rehearsals) — holiday/event words (birthdays, brunch, christmas, thanksgiving) — seasonal words (spring, summer,) — trip words (getaway) — destinations
  • 131. LDA Results context H istory I loved every choice in this fix!! Great job! Great Stylist Perfect There are k tags Issues "Cancel Disappointed" Delivery "Profile, Pinterest" "Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
  • 132. LDA Results context H istory Body Fit My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted. Very hard for me to find pants that fit right. There are k tags Issues "Cancel Disappointed" Delivery "Profile, Pinterest" "Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
  • 133. LDA Results context H istory Sizing Really enjoyed the experience and the pieces, sizing for tops was too big. Looking forward to my next box! Excited for next There are k tags Issues "Cancel Disappointed" Delivery "Profile, Pinterest" "Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
  • 134. LDA Results context H istory Almost Bought It was a great fix. Loved the two items I kept and the three I sent back were close! Perfect There are k tags Issues "Cancel Disappointed" Delivery "Profile, Pinterest" "Weather Vacation" “Corrections for Next" "Wardrobe Mix" "Requesting Specific" "Requesting Department" "Requesting Style" "Style, Positive" "Style, Neutral"
  • 135. All of the following ideas will change what ‘words’ and ‘context’ represent. But we’ll still use the same w2v algo
  • 136. paragraph vector What about summarizing documents? On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
  • 137. On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed. paragraph vector Normal skipgram extends C words before, and C words after. IN OUT OUT Except we stay inside a sentence
  • 138. On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed. paragraph vector A document vector simply extends the context to the whole document. IN OUT OUT OUT OUTdoc_1347