1. OpenLogos
Seman-co-‐Syntac-c
Knowledge-‐Rich
Bilingual
Dic-onaries
Anabela
Barreiro1,
Fernando
Ba0sta1,2,
Ricardo
Ribeiro1,2,
Helena
Moniz1,3,
Isabel
Trancoso1,4
1INESC-‐ID,
2ISCTE-‐IUL,
3FLUL/CLUL,
4IST
{abarreiro;fmmb;rdmr;helenam;imt}@l2f.inesc-id.pt!
http://www.l2f.inesc-id.pt/!
Characteris0cs
– Representa0on
schema
with
eclec0c
categories
– Designed
to
work
in
concert
with
the
lexical
resources
and
linguis0c
rules
(transfer
(TRAN)
and
seman0co-‐syntac0c
(SEMTAB)
rules)
– Easy
mapping
from
natural
to
symbolic
language,
represen0ng
both
meaning
and
structure
in
a
con0nuum,
undissociated,
represented
in
the
same
layer,
based
on
the
belief
that
seman0cs
of
a
word
oRen
affects
the
surrounding
syntax
– Extensible
system,
designed
so
that
developers
would
expand
and
add
to
its
capabili0es
– Ini0ally
developed
for
English,
but
many
of
its
elements
are
universal
(mostly
nouns,
adjec0ves,
and
adverbs)
and
applicable
to
other
languages
Representa0on
– SAL
knowledge
is
embedded
in
the
dic0onary
in
the
form
of
numeric
codes
(SAL
mnemonics
are
used
for
easier
understanding)
• E.g.
the
noun
(N)
table
has
two
SAL
representa0ons:
– COsurf
–
concrete,
surface
– INdata
–
informa0on,
recorded
data
– Nouns
have
12
supersets.
Superset
measure
(ME)
has
3
sets
and
11
subsets:
• SAL
codes
for
nouns
represent
seman0c
groupings,
and
are
language
independent,
as
concepts
are
transverse
across
languages
– Verbs
are
subdivided
in
3
types:
intransi0ve,
weak
transi0ve
and
strong
transi0ve.
Intransi0ve
verbs
have
3
supersets:
mo0onal
(INMO),
opera0onal
(INOP),
and
existen0al
(INEX)
• Existen0al
intransi0ve
verbs
include
be
and
be-‐
subs0tutes
that
take
predicate
nouns
and
adjec0ves
– Adjec-ves
are
classified
in
2
types:
descrip0ve
and
par0cipial,
sub-‐classified
according
to
syntac0c
rela0onships
with
other
words
• syntac0c
pa]erns
for
the
descrip0ve
pre-‐clausal
good-‐type
adjec0ves
– OpenLogos
(OL)
is
the
open
source
deriva0ve
of
the
Logos
machine
transla0on
(MT)
system
– OL
strength
resides
in
its
lexical
resources,
the
knowledge-‐rich
bilingual
dic-onaries
• contain
seman0co-‐syntac0c
knowledge
and
ontological
rela0ons
for
all
lexical
entries
represented
at
an
abstract/higher
level
by
the
Seman0co-‐
Syntac0c
Abstrac0on
Language
–
SAL
• present
other
idiosyncrasies
that
dis0nguish
them
from
other
publicly
available
dic0onaries
Mo0va0on
– OL
resources
were
used
successfully
in
the
Logos
commercial
MT
product
during
2-‐3
decades
• validated
by
the
Logos
development
team
and
clients
– Possible
applica0ons
• basis
for
new
linguis0c
and
NLP
tools,
especially
for
poor-‐resourced
languages
• enhancement
of
other
MT
systems
Bilingual
Dic0onaries:
EN
>
GE/FR/IT
– Verbs,
nouns
and
adjec0ves
are
clearly
the
most
represented
classes,
as
they
reach
more
than
80,000
entries
for
each
target
language.
– Dic0onaries
stored
in
self-‐contained
XML
files
• easily
addressed
by
small
programs
• supported
by
exis0ng
efficient
XML
APIs
– Example
for
the
verb
entry
depart,
extracted
from
the
English-‐French
dic0onary
Introduc0on
Seman0co-‐Syntac0c
Knowledge
– Part-‐of-‐speech
(POS)
– Gender
(GEN)
– Number
(NUM)
– Morphological
paradigms
(PAT)
for
source
and
target
words
• make
it
possible
to
map
inflected
forms
across
languages
and
improve
agreement
in
SMT
– Head
word
(HEAD)
in
mul0word
• useful
to
correct
MT
problems
related
to
agreement
within
mul0words
or
within
larger
units
(e.g.
between
nominal
mul0words
and
verb
or
agreement
within
verbal
mul0words)
– Homographs
(HOMO)
• homographs
are
a
major
source
of
transla0on
errors
and
their
iden0fica0on
is
crucial
– Auxiliary
(AUX)
• helps
improve
precision
in
the
transla0on
when
auxiliary
choice
is
subtle
– Alternate
word
(ALT)
• nominaliza0on
(process
noun),
predicate
adjec0ve,
etc.
-‐
useful
for
paraphrasing
purposes
– Causa0ve
verb
(CAUS)
– Reflexive
verb
(REFL)
– Aspectual
verb
(ASP)
– Seman0co-‐Syntac0c
Knowledge
(SAL)
• interlingua-‐style
hierarchical
taxonomy
with
over
1,000
elements,
embracing
all
POS
• 3
levels
of
representa0on:
superset
(SUPER),
set
(SET),
and
subset
(SUB)
-‐
embedded
in
the
dic0onary
entries
and
in
the
transla0on
system’s
rules
(help
with
disambigua0on).
E.g.
pipe,
hose:
OpenLogos
Data
3
2
1
– Three
bilingual
dic0onaries
were
created
• English-‐French;
English-‐German;
English-‐Italian
• online
and
free
for
research
purposes
– h]p://metanet4u.l2f.inesc-‐id.pt/
– The
resources
contain
seman0co-‐syntac0c
knowledge
concerning
the
conceptual
formaliza0on
of
things,
ideas,
rela0onships,
disposi0ons,
condi0ons,
processes,
etc.
• valuable
for
MT
and
other
NLP
applica0ons
• stored
in
XML
format
for
easy
processing
– In
the
future,
we
will
make
available
three
complementary
bilingual
dic0onaries
• English-‐Portuguese;
English-‐Spanish;
German-‐
English
Acknowledgments
– This
work
was
supported
by
na0onal
funds
through
Fundação
para
a
Ciência
e
a
Tecnologia,
under
grants
SFRH/BPD/91446/2012
and
SFRH/BPD/95849/2013
and
project
PEst-‐OE/EEI/LA0021/2013
Conclusions
and
Future
Work
5
Resul0ng
Resources
4
Instituto de Engenharia de Sistemas e Computadores
Investigação e Desenvolvimento em Lisboa
Laboratório de Sistemas de Língua Falada
id
EN-‐GE
EN-‐FR
EN-‐IT
Noun
1
28266
25910
23505
Verb
2
33855
33354
33021
Adverb
(loca0ve)
3
465
442
450
Adjec0ve
4
21219
20749
20518
Pronoun
5
121
121
121
Adverb
(manner,
agency,
degree)
6
2207
2167
2173
Preposi0on
(non-‐loca0ve)
11
140
140
139
Auxiliary
and
Modal
12
34
34
34
Preposi0on
(loca0ve)
13
148
148
148
Definite
Ar0cle
14
194
194
189
Indefinite
Ar0cle
15
66
66
65
Arithmate
in
Apposi0on
16
208
208
203
Nega0ve
17
2
2
2
Rela0ve
and
Interroga0ve
Pronoun
18
23
23
20
Conjunc0on
19
160
160
160
Punctua0on
20
30
30
30
Total
87138
83748
80778
nouns%
concrete%
func+onals%
conduits%
word%class%
superset%
set%
subset%barriers% containers%
…%…%
…% …%
…%…%
<Entry
source="depart"
target="qui]er">
<source
head_word="1"
homograph="no"
word_type="01">
<pos
descrip0on="Verb"
wclass="02"/>
<morphology>
<inflec0on
descrip0on="like
walk,
walked,
walking"
example="walk"
id="1"/>
</morphology>
<sal
code="13,98,596"
descrip0on="create,
etc."
mnemonic="generictransi0ve4"
set="other98"/>
</source>
<target
aux="1"
head_word="1"
word_type="01">
<pos
descrip0on="Verb"
wclass="02"/>
<morphology>
<inflec0on
descrip0on="regular
ending
in
-‐er:
parler"
example="parler"
id="3"/>
</morphology>
</target>
</Entry>
<Entry
source="depart"
target="par0r">
<source
head_word="1"
homograph="no"
word_type="01">
<pos
descrip0on="Verb"
wclass="02"/>
<morphology>
<inflec0on
descrip0on="like
walk,
walked,
walking"
example="walk"
id="1"/>
</morphology>
<sal
code="10,24,596"
descrip0on="from
=
away
from,
off
of,
out
of"
set="governsawayfrom"/>
</source>
<target
aux="2"
head_word="1"
word_type="01">
<pos
descrip0on="Verb"
wclass="02"/>
<morphology>
<inflec0on
descrip0on="Irreg.
in
-‐ir
with
shortened
stem
..."
example="par0r"
id="12"/>
</morphology>
</target>
</Entry>
Mnemonic
Example
Verb
Example
Sentence
INEXbe-‐type
be
She
was
at
the
seashore
all
summer.
INEXbecome-‐type
become,
remain
He
became
a
doctor
at
a
very
young
age.
INEXgrow-‐type
sound,
look
Their
voices
sounded
cheerful.
INEXseem-‐type
seem,
appear
He
seemed
happy
with
the
results.
Mnemonics
Descrip-on
Examples
MEabs
abstract
measurable
concepts
humidity,
length
MEdis
discrete
measurable
concepts
sum,
increment
MEunit
units
of
measure
See
subsets
MEunitwt
units
of
weight
ounce,
pound
MEunitvel
units
of
velocity
mph,
megahertz
MEunitvol
unites
of
volume
measure
gallon,
liter
MEuni]emp
units
of
temperature
degrees
celsius
MEunitener
units
of
energy/force
wa],
horsepower
MEunitsys
measurement
systems
fahrenheit,
kelvin
MEunitdur
units
of
dura0on
hour,
year
MEunitspec
specialized
units
of
measure
oersted,
ohm
MEunitvalue
units
of
money/value
dollar,
euro
MEunitlin
units
of
linear/area
measure
inch,
mille
MEundif
undifferen0ated
measure
degree,
share
PaQern
Example
Sentence
It
is
ADJ
that
It
is
silly
that...
It
is
ADJ
for
NP
that
It
is
good
for
the
employees
that...
It
is
ADJ
to
VP
It
is
smart
to
exercise.
It
is
ADJ
for
NP
to
VP
It
was
silly
for
them
to
expect...
It
is
ADJ
V'ing
It
is
smart
doing
the
right
thing.
NP
is
ADJ
to
VP
John
is
smart
to
exercise.