Distributional approaches are based on a simple hypothesis: the meaning of a word can be inferred from its usage. The application of that idea to the vector space model makes possible the construction of a WordSpace in which words are represented by mathematical points in a geometric space. Similar words are represented close in this space and the definition of ``word usage'' depends on the definition of the context used to build the space, which can be the whole document, the sentence in which the word occurs, a fixed window of words, or a specific syntactic context. However, in its original formulation WordSpace can take into account only one definition of context at a time. We propose an approach based on vector permutation and Random Indexing to encode several syntactic contexts in a single WordSpace. Moreover, we propose some operations in this space and report the results of an evaluation performed using the GEMS 2011 Shared Evaluation data.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Encoding syntactic dependencies by vector permutation
1. Encoding
syntac-c
dependencies
by
vector
permuta-on
Pierpaolo
Basile,
Annalina
Caputo
and
Giovanni
Semeraro
Department
of
Computer
Science
University
of
Bari
“Aldo
Moro”
(Italy)
GEMS
2011:
GEometrical
Models
of
Natural
Language
Seman-cs
Edinburgh,
Scotland
-‐
July
31st,
2011
2. Mo-va-on
• meaning
is
its
use
• the
meaning
of
a
word
is
determined
by
the
set
of
textual
contexts
in
which
it
appears
• one
defini-on
of
context
at
a
-me
2
3. Building
Blocks
• Random
Indexing
• Dependency
Parser
• Vector
Permuta-on
3
4. Random
Indexing
• assign
a
context
vector
to
each
context
element
(e.g.
document,
passage,
term,
…)
• term
vector
is
the
sum
of
the
context
vectors
in
which
the
term
occurs
– some-mes
the
context
vector
could
be
boosted
by
a
score
(e.g.
term
frequency,
PMI,
…)
4
5. Context
Vector
0
0
0
0
0
0
0
-‐1
0
0
0
0
1
0
0
-‐1
0
1
0
0
0
0
1
0
0
0
0
-‐1
• sparse
• high
dimensional
• ternary
{-‐1,
0,
+1}
• small
number
of
randomly
distributed
non-‐
zero
elements
5
6. Random
Indexing
(formal)
n,k n,m m,k
B =A R k << m
B
preserves
the
distance
between
points
(Johnson-‐Lindenstrauss
lemma)
dr = c ! d
6
7. Dependency
parser
John
eats
a
red
apple.
subject
object
John
eats
apple
modifier
red
7
8. Vector
permuta-on
• using
permuta-on
of
elements
in
random
vector
to
encode
several
contexts
– right
shib
of
n
elements
to
encode
dependents
(permuta-on)
– leb
shib
of
n
elements
to
encode
heads
(inverse
permuta-on)
• choose
a
different
n
for
each
kind
of
dependency
8
9. Method
• assign
a
context
vector
to
each
term
• assign
a
shib
func-on
(Πn)
to
each
kind
of
dependency
• each
term
is
represented
by
a
vector
which
is
– the
sum
of
the
permuted
vectors
of
all
the
dependent
terms
– the
sum
of
the
inverse
permuted
vectors
of
all
the
head
terms
9
10. Example
John
-‐>
(0,
0,
0,
0,
0,
0,
1,
0,
-‐1,
0)
eat
-‐>
(1,
0,
0,
0,
-‐1,
0,
0
,0
,0
,0)
John
eats
a
red
apple
red-‐>
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)
apple
-‐>
(1,
0,
0,
0,
0,
0,
0,
-‐1,
0,
0)
mod-‐>Π3;
obj-‐>Π7
(apple)=Π3(red)+Π-‐7(eat)=…
10
12. Output
R
B
Vector
space
of
random
Vector
space
of
terms
context
vectors
12
13. Query
1/4
• similarity
between
terms
– cosine
similarity
between
terms
vectors
in
B
– terms
are
similar
if
they
occur
in
similar
syntac-c
contexts
13
14. Query
2/4
Words
similar
to
“provide”
offer
0.855
supply
0.819
deliver
0.801
give
0.787
contain
0.784
require
0.782
present
0.778
14
15. Query
3/4
• similarity
between
terms
exploi-ng
dependencies
what
are
the
objects
of
the
word
“provide”?
1. get
the
term
vector
for
“provide”
in
B
2. compute
the
similarity
with
all
permutated
vectors
in
R
using
the
permuta-on
assigned
to
“obj”
rela-on
15
16. Query
4/4
What
are
the
objects
of
the
word
“provide”?
informa-on
0.344
food
0.208
support
0.143
energy
0.143
job
0.142
16
17. Composi-onal
seman-cs
1/2
• words
are
represented
in
isola-on
• represent
complex
structure
(phrase
or
sentence)
is
a
challenge
task
– IR,
QA,
IE,
Text
Entailment,
…
• how
to
combine
words
– tensor
product
of
words
– Clark
and
Pulman
suggest
to
take
into
account
symbolic
features
(syntac-c
dependencies)
17
18. Composi-onal
seman-cs
2/2
man
reads
magazine
(Clark
and
Pulman)
man ! subj ! read ! obj ! magazine
18
19. Similarity
between
structures
man
reads
magazine
woman
browses
newspaper
man ! subj ! read ! obj ! magazine
woman ! subj ! browse ! obj ! newspaper
19
20. …a
bit
of
math
(w1 ! w2 )" (w3 ! w4 ) = (w1 " w3 ) # (w2 " w4 )
man ! woman " read ! browse " magazine ! newspaper
20
21. System
setup
• Implemented
in
JAVA
• Two
corpora
– TASA:
800K
sentences
and
9M
dependencies
– a
por-on
of
ukWaC:
7M
sentences
and
127M
dependencies
– 40,000
most
frequent
words
• Dependency
parser
– MINIPAR
21
22. Evalua-on
• GEMS
2011
Shared
Task
for
composi-onal
seman-cs
– list
of
two
pairs
of
words
combina-on
• rated
by
humans
• 5,833
rates
• encoded
dependencies:
subj,
obj,
mod,
nn
– GOAL:
compare
the
system
performance
against
humans
scores
• Spearman
correla-on
22
25. Conclusion
and
Future
Work
• Conclusion
– encode
syntac-c
dependencies
using
vector
permuta-ons
and
Random
Indexing
– early
arempt
in
seman-c
composi-on
• Future
Work
– deeper
evalua-on
(in
vivo)
– more
formal
study
about
seman-c
composi-on
– tackle
scalability
problem
– try
to
encode
other
kinds
of
context
25