Distributional approaches are based on a simple hypothesis: the meaning of a word can be inferred from its usage. The application of that idea to the vector space model makes possible the construction of a WordSpace in which words are represented by mathematical points in a geometric space. Similar words are represented close in this space and the definition of ``word usage'' depends on the definition of the context used to build the space, which can be the whole document, the sentence in which the word occurs, a fixed window of words, or a specific syntactic context. However, in its original formulation WordSpace can take into account only one definition of context at a time. We propose an approach based on vector permutation and Random Indexing to encode several syntactic contexts in a single WordSpace. Moreover, we propose some operations in this space and report the results of an evaluation performed using the GEMS 2011 Shared Evaluation data.
Encoding syntactic dependencies by vector permutation
1. Encoding
syntac-c
dependencies
by
vector
permuta-on
Pierpaolo
Basile,
Annalina
Caputo
and
Giovanni
Semeraro
Department
of
Computer
Science
University
of
Bari
“Aldo
Moro”
(Italy)
GEMS
2011:
GEometrical
Models
of
Natural
Language
Seman-cs
Edinburgh,
Scotland
-‐
July
31st,
2011
2. Mo-va-on
• meaning
is
its
use
• the
meaning
of
a
word
is
determined
by
the
set
of
textual
contexts
in
which
it
appears
• one
defini-on
of
context
at
a
-me
2
3. Building
Blocks
• Random
Indexing
• Dependency
Parser
• Vector
Permuta-on
3
4. Random
Indexing
• assign
a
context
vector
to
each
context
element
(e.g.
document,
passage,
term,
…)
• term
vector
is
the
sum
of
the
context
vectors
in
which
the
term
occurs
– some-mes
the
context
vector
could
be
boosted
by
a
score
(e.g.
term
frequency,
PMI,
…)
4
5. Context
Vector
0
0
0
0
0
0
0
-‐1
0
0
0
0
1
0
0
-‐1
0
1
0
0
0
0
1
0
0
0
0
-‐1
• sparse
• high
dimensional
• ternary
{-‐1,
0,
+1}
• small
number
of
randomly
distributed
non-‐
zero
elements
5
6. Random
Indexing
(formal)
n,k n,m m,k
B =A R k << m
B
preserves
the
distance
between
points
(Johnson-‐Lindenstrauss
lemma)
dr = c ! d
6
7. Dependency
parser
John
eats
a
red
apple.
subject
object
John
eats
apple
modifier
red
7
8. Vector
permuta-on
• using
permuta-on
of
elements
in
random
vector
to
encode
several
contexts
– right
shib
of
n
elements
to
encode
dependents
(permuta-on)
– leb
shib
of
n
elements
to
encode
heads
(inverse
permuta-on)
• choose
a
different
n
for
each
kind
of
dependency
8
9. Method
• assign
a
context
vector
to
each
term
• assign
a
shib
func-on
(Πn)
to
each
kind
of
dependency
• each
term
is
represented
by
a
vector
which
is
– the
sum
of
the
permuted
vectors
of
all
the
dependent
terms
– the
sum
of
the
inverse
permuted
vectors
of
all
the
head
terms
9
10. Example
John
-‐>
(0,
0,
0,
0,
0,
0,
1,
0,
-‐1,
0)
eat
-‐>
(1,
0,
0,
0,
-‐1,
0,
0
,0
,0
,0)
John
eats
a
red
apple
red-‐>
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)
apple
-‐>
(1,
0,
0,
0,
0,
0,
0,
-‐1,
0,
0)
mod-‐>Π3;
obj-‐>Π7
(apple)=Π3(red)+Π-‐7(eat)=…
10
12. Output
R
B
Vector
space
of
random
Vector
space
of
terms
context
vectors
12
13. Query
1/4
• similarity
between
terms
– cosine
similarity
between
terms
vectors
in
B
– terms
are
similar
if
they
occur
in
similar
syntac-c
contexts
13
14. Query
2/4
Words
similar
to
“provide”
offer
0.855
supply
0.819
deliver
0.801
give
0.787
contain
0.784
require
0.782
present
0.778
14
15. Query
3/4
• similarity
between
terms
exploi-ng
dependencies
what
are
the
objects
of
the
word
“provide”?
1. get
the
term
vector
for
“provide”
in
B
2. compute
the
similarity
with
all
permutated
vectors
in
R
using
the
permuta-on
assigned
to
“obj”
rela-on
15
16. Query
4/4
What
are
the
objects
of
the
word
“provide”?
informa-on
0.344
food
0.208
support
0.143
energy
0.143
job
0.142
16
17. Composi-onal
seman-cs
1/2
• words
are
represented
in
isola-on
• represent
complex
structure
(phrase
or
sentence)
is
a
challenge
task
– IR,
QA,
IE,
Text
Entailment,
…
• how
to
combine
words
– tensor
product
of
words
– Clark
and
Pulman
suggest
to
take
into
account
symbolic
features
(syntac-c
dependencies)
17
18. Composi-onal
seman-cs
2/2
man
reads
magazine
(Clark
and
Pulman)
man ! subj ! read ! obj ! magazine
18
19. Similarity
between
structures
man
reads
magazine
woman
browses
newspaper
man ! subj ! read ! obj ! magazine
woman ! subj ! browse ! obj ! newspaper
19
20. …a
bit
of
math
(w1 ! w2 )" (w3 ! w4 ) = (w1 " w3 ) # (w2 " w4 )
man ! woman " read ! browse " magazine ! newspaper
20
21. System
setup
• Implemented
in
JAVA
• Two
corpora
– TASA:
800K
sentences
and
9M
dependencies
– a
por-on
of
ukWaC:
7M
sentences
and
127M
dependencies
– 40,000
most
frequent
words
• Dependency
parser
– MINIPAR
21
22. Evalua-on
• GEMS
2011
Shared
Task
for
composi-onal
seman-cs
– list
of
two
pairs
of
words
combina-on
• rated
by
humans
• 5,833
rates
• encoded
dependencies:
subj,
obj,
mod,
nn
– GOAL:
compare
the
system
performance
against
humans
scores
• Spearman
correla-on
22
25. Conclusion
and
Future
Work
• Conclusion
– encode
syntac-c
dependencies
using
vector
permuta-ons
and
Random
Indexing
– early
arempt
in
seman-c
composi-on
• Future
Work
– deeper
evalua-on
(in
vivo)
– more
formal
study
about
seman-c
composi-on
– tackle
scalability
problem
– try
to
encode
other
kinds
of
context
25