Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
On Distance between Deep Syntax and Semantic Representation
1. Motivation
Issues of transformation
Conclusions
On Distance between Deep Syntax and Semantic
Representation
V´clav Nov´k
a a
Institute of Formal and Applied Linguistics
Charles University
Prague, Czech Republic
Frontiers in Linguistically Annotated Corpora
July 22, 2006, 16:00 – 16:30
Sydney, Australia
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 1/ 20
7. Motivation
Issues of transformation
Conclusions
Presentation Outline
1 Motivation
MultiNet – Knowledge Representation
Prague Dependency Treebank
Missing pieces
2 Issues of transformation
Mapping
Topic-Focus Articulation
Additional Requirements
3 Conclusions
Conclusions
Related Work
Future Work
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 3/ 20
8. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
MultiNet
What is MultiNet
Multilayered Semantic Network
University in Hagen, Germany
Hermann Helbig, Sven Hartrumpf
Parser: WOCADI for German
(relies heavily on HaGenLex lexicon)
MWR interface (Workbench of Knowledge Engineer)
Designed w.r.t. question answering and cognitive modeling
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 4/ 20
9. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Semantic Network
Properties of Semantic Networks
Everything represented as graph nodes
The utterances gradually build the graph
Inference rules can further connect the nodes
(or add new ones)
⇒ Representation of knowledge, usable for inferencing and QA
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 5/ 20
11. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
MultiNet Example: “The car was damaged because of the impact.”
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 7/ 20
12. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
MultiNet – technical info
Properties of MultiNet
93 relations + 18 functions
7 layers of attributes
hierarchy of 46 sorts
1 edge-end attribute distinguishing immanent (prototypical /
categorical) vs. situational knowledge
encapsulation of concepts
default vs. categorical inference rules
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 8/ 20
13. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Prague Dependency Treebank
Developed at the Institute of Formal
and Applied Linguistics, Charles
University, Prague
Three layers of annotation
3,168 documents ≈ 49,442 sentences
≈ 833,357 tokens annotated on all
three layers.
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 9/ 20
15. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Prague Dependency Treebank
Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud
e y a e r´ e e y e er´ c
neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si
y ar´ e s a e e ı u
na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku.
s r´ r´ u a c ı, a
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
16. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Prague Dependency Treebank
Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud
e y a e r´ e e y e er´ c
neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si
y ar´ e s a e e ı u
na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku.
s r´ r´ u a c ı, a
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
17. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Prague Dependency Treebank
Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud
e y a e r´ e e y e er´ c
neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si
y ar´ e s a e e ı u
na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku.
s r´ r´ u a c ı, a
t-lnd94103-085-p1s21B t-ln94200-173-p2s6 t-ln94211-120-p5s4
root root root
nabádat.enunc kritizovat.enunc vztáhnout.enunc
PRED PRED PRED
v v v
přítelkyně #PersPron #Comma.enunc #PersPron #PersPron systém věřit pacient vzpomenout_si doktor ruku
ACT ADDR CONJ ACT PAT COMPL ACT COMPL PAT DPHR
n.denot n.pron.def.pers coap n.pron.def.pers n.denot v n.denot v n.denot dphr
#PersPron sedět hýbat_se hvězdný #Cor autentičnost #Cor příkoří
ACT PAT PAT RSTR ACT PAT ACT PAT
n.pron.def.pers v v adj.denot qcomplex n.denot.neg qcomplex n.denot
klidný #Neg tvář způsobený který
MANN RHEM APP RSTR RSTR
adj.denot atom n.denot adj.denot adj.pron.indef
okoukaný stát_se #PersPron společnost
RSTR RSTR PAT ACT
adj.denot v n.pron.def.pers n.denot
dosud však který záhy také hvězda být.enunc
TTILL PREC ACT TWHEN.basic RHEM PAT PAR
adv.denot.ngrad.nneg atom n.pron.indef adv.denot.ngrad.nneg atom n.denot v
a ten #Neg osud
PREC ACT RHEM PAT
atom n.pron.def.demon atom n.denot
jen Belmondo
RHEM APP
atom n.denot
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
18. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Tectogrammatical Representation
Properties of Tectogrammatical Layer
One sentence ≈ one tree
Auxiliaries and function words removed
Missing obligatory valents inserted
Attributes of nodes
Functor
Semantic part of speech
15 grammatemes (negation, tense, politeness, . . . )
Topic-Focus distinction
Sentential modality
+ technical attributes (coordinations, parentheses, IDs)
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 11/ 20
19. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Tectogrammatical Representation
t-lnd94103-085-p1s21B
root
nabádat.enunc
PRED
v
přítelkyně #PersPron #Comma.enunc
ACT ADDR CONJ
n.denot n.pron.def.pers coap
#PersPron sedět hýbat_se
ACT PAT PAT
n.pron.def.pers v v
klidný #Neg
MANN RHEM
adj.denot atom
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 12/ 20
20. Motivation MultiNet – Knowledge Representation
Issues of transformation Prague Dependency Treebank
Conclusions Missing pieces
Additional Required Information
Missing Pieces
1 Named entities recognition
Numbers
Places
People
...
2 Metadata
Author
Date
Place
Document type
Intended recipient of the text
Bibliographical and other references
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 13/ 20
21. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Presentation Outline Again
1 Motivation
MultiNet – Knowledge Representation
Prague Dependency Treebank
Missing pieces
2 Issues of transformation
Mapping
Topic-Focus Articulation
Additional Requirements
3 Conclusions
Conclusions
Related Work
Future Work
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 14/ 20
22. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Mapping of Representational Means
Main Issues of Transformation
1 Mapping of edges and corresponding functors in TR to
MultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs to
attribute-value assignments
4 Mapping of verbal tenses to temporal axis
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
23. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 1
1 Mapping of edges and corresponding functors in TR to
MultiNet cognitive roles
Actor and Patient highly ambiguous
Location functors are used also where no location is involved
(ELMT, CTXT, SITU)
However, other functors correspond quite straightforwardly to
MultiNet roles (a table is presented in the paper)
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs to
attribute-value assignments
4 Mapping of verbal tenses to temporal axis
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
24. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 2
1 Mapping of edges and corresponding functors in TR to
MultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
Typically, a TR node corresponds to a MultiNet concept (i.e.,
also a node)
Quite often, a TR node corresponds to a subnetwork in
MultiNet
Sometimes, the TR node corresponds to an edge in MultiNet
(e.g., CORR, CTXT)
3 Mapping of various natural language constructs to
attribute-value assignments
4 Mapping of verbal tenses to temporal axis
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
25. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 3
1 Mapping of edges and corresponding functors in TR to
MultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs to
attribute-value assignments color
The color of x is y .
SUB y
x has y color.
TR VAL
x is y . AT
y is the color of x. x
4 Mapping of verbal tenses to temporal axis
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
26. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 4
1 Mapping of edges and corresponding functors in TR to
MultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs to
attribute-value assignments
4 Mapping of verbal tenses to temporal axis
Verbal tenses encoded in grammatemes
In MultiNet, TEMP, ANTE, DUR, STRT, and FIN relations can be
used.
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
27. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Topic-Focus Articulation
TFA in PDT
TFA is annotated on the Tectogrammatical layer
Every word has an attribute: c, t, or f
The nodes are ordered with respect to “communicative
dynamism”
⇓
TFA in MultiNet
Content expressed by TFA is further analyzed into:
1 Encapsulation of concepts
2 Scope of quantifiers
3 Layer attributes (GENER, REFER, VARIA, . . . )
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 16/ 20
28. Motivation Mapping
Issues of transformation Topic-Focus Articulation
Conclusions Additional Requirements
Additional Requirements
Additional Requirements
1 Spatio-Temporal Representation
For simple inferences about space and time
2 Calendar
For computations with dates
3 Ontology
For all kinds of inferences
Ontology is an inherent part of MultiNet semantic network
design
Upper conceptual ontology represented by sorts
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 17/ 20
29. Motivation Conclusions
Issues of transformation Related Work
Conclusions Future Work
Conclusions
Conclusions
MultiNet is a suitable formalism for inferences and QA
It’s difficult to transform texts into MultiNet
Tectogrammatical representation is not designed for
inferencing and QA
There are tools for text-to-TR conversion
TR is a good starting point for conversion to MultiNet
(structural similarity, disambiguation in TR)
We have presented issues arising in such a process
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 18/ 20
30. Motivation Conclusions
Issues of transformation Related Work
Conclusions Future Work
Related Work
Related Work
Helbig (1986): Automatical transformation to MultiNet
Hor´k (2001): Automatical transformation to Transparent
a
Intensional Logic
Callmeier et al. (2004): DeepThought project – automatical
transformation to Robust Minimal Recursion Semantics
Bos (2005): Automatical transformation to Discourse
Representation Theory
Bolshakov and Gelbukh (2000): Automatical transformation
in Meaning–Text Theory framework
Kruijff-Korbayov´ (1998): TR to DRT automatical
a
transformation
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 19/ 20
31. Motivation Conclusions
Issues of transformation Related Work
Conclusions Future Work
Future Work
Future Work
1 Stage I – Preparation
Annotation tools
Annotation guidelines
2 Stage II – Annotation
Pilot study
Automated preprocessing
Evaluation of annotators
3 Stage III – Application
Supervised “parsing”
Assessment of TR necessity
novak@ufal.mff.cuni.cz Syntax – Semantic Distance 20/ 20