Cross-lingual ontology lexicalisation, translation and information extraction...
Issues W2011 Final
1. An Interpretation-Driven Model of Syntax Richard Caneba canebr@rpi.edu RPI Cognitive Science Department Human-Level Intelligence Laboratory 5/2/2011
2. Introduction We start with a goal: Develop a system that can understand natural language. 5/2/2011
3. Introduction We start with a goal: Develop a system that can understand natural language. (Roughly) three sub-goals: Syntactic Parsing Semantic Representation Pragmatics/Discourse 5/2/2011
4. Introduction We start with a goal: Develop a system that can understand natural language. (Roughly) three sub-goals: Syntactic Parsing Semantic Representation Pragmatics/Discourse Stage 1: Syntax Why is syntax important for natural language understanding? 5/2/2011
14. Introduction: Syntax Syntactic interpretation yields very distinct semantic interpretations. Helps identify the role words play in an utterance. 5/2/2011
17. Syntax Current grammar formalisms have number of shortcomings From perspective of trying to develop a system to understand natural language 5/2/2011
18. Syntax Current grammar formalisms have number of shortcomings From perspective of trying to develop a system to understand natural language Focus on generative grammar (i.e. Chomskyan, HPSG [11], [13]) 5/2/2011
19. Syntax: Theory & Implementation Two classes of shortcomings: Theoretical: shortcoming in the way a theory of language represents the users mental knowledge of that language. Implementation: shortcoming in the way a theory implies or represents language processing in terms of computability and/or cognitive realism. 5/2/2011
20. Syntax: Theory & Implementation We will show that these two classes are so closely related, does not make sense to make a strong distinction 5/2/2011
29. 5/2/2011 [S] [NP] [VP] [NP] is [PP] [PP] here man [NP] [DP] in the [PP] boston with [NP] hat [DP] “the man in boston with the hat is here.” the
30. 5/2/2011 [S] [NP] [VP] [NP] is [PP] [PP] here man [NP] [DP] in the [PP] boston with [NP] hat [DP] “the man in boston with the hat is here.” the
31. 5/2/2011 [S] [VP] [NP] is [PP] man [DP] [PP] [PP] here with [NP] [NP] in the hat [DP] boston the “the man in boston with the hat is here.”
32. 5/2/2011 [S] [VP] [NP] is [PP] man [DP] [PP] [PP] here with [NP] [NP] in the hat [DP] boston the “the man in boston with the hat is here.”
33. Implementation Shortcomings Shortcoming in the way a theory implies or represents language processing in terms of computability and/or cognitive realism. 5/2/2011
53. Syntax: Statistical Approach Most currently successful parsing algorithms rely heavily on statistics. However, inferences that require notion of semantics difficult 5/2/2011
61. Other Notable Work Other Cognitive Architecture based Parsers [8] R. Lewis et.al. developed a parser based on "immediate reasoning" in Soar [9] R. Lewis et.al. developed an activation based parser model in ACT-R [1][2][3] J.T. Ball et.al. developed a parser based on Double R Grammar Model, for "synthetic teammate" development in ACT-R 5/2/2011
62. Other Notable Work However, each of these theories suffers from the shortcomings we’ve already seen. 5/2/2011
63. Other Notable Work However, each of these theories suffers from the shortcomings we’ve already seen. Both parsers designed by Lewis rely on a CFG formalism Bell’s ACT-R parser does not deeply integrated with reasoning 5/2/2011
64. Other Notable Work However, each of these theories suffers from the shortcomings we’ve already seen. Both parsers designed by Lewis rely on a CFG formalism Bell’s ACT-R parser does not deeply integrated with reasoning These approaches are not well integrated with reasoning overall 5/2/2011
65. Motivating Principles To address these shortcomings from an interpretative perspective, four principles are motivated: 5/2/2011
66. Motivating Principles To address these shortcomings from an interpretative perspective, four principles are motivated:a) The existence of satellite structuresb) Feature structure unificationc) Feature structure aggregationd) Incrementality 5/2/2011
67. Satellite Structures When we hear words, infer existence of other structures related to that word 5/2/2011
68. Satellite Structures When we hear words, infer existence of other structures related to that word i.e. “bit” 5/2/2011 bit bit
69. Satellite Structures When we hear words, infer existence of other structures related to that word i.e. “bit” 5/2/2011 [NP] [NP] Obj Subj bit bit
70. Feature Structure Unification With the existence of satellite structures, we can unify observed structures together 5/2/2011
71. Feature Structure Unification With the existence of satellite structures, we can unify observed structures together 5/2/2011 [NP] [NP] Obj Subj john bit bit fido
72. Feature Structure Unification With the existence of satellite structures, we can unify observed structures together 5/2/2011 [NP] [NP] Obj Subj john bit bit fido
83. Architectural Implementation Parser prototype implemented in the Polyscheme Cognitive Architecture. Use the model to process and interpret natural language input. 5/2/2011
99. Comparative Results Compare to previous incarnation of syntactic parser in Polyscheme: Loyal to CFG formalism Relies very heavily on search 5/2/2011
100. Comparative Results New model has significant benefits: Orders of magnitude faster (tens of minutes vs. seconds) Wider coverage of sentences 5/2/2011
101. Conclusions Introduced a syntactic theory based on the principles of Satellite Structure Positing Feature Structure Unification Feature Structure Aggregation Incrementality 5/2/2011
102. Conclusions Working implementation in Cognitive Architecture that is: Structurally Efficient Computationally fast Cognitively plausible 5/2/2011
103. Contribution We have presented a new grammatical formalism Implemented in a cognitive architecture Integrated with reasoning capabilities Computationally efficient, cognitively plausible Will lead towards a system that can understand natural language. 5/2/2011
104. Future Directions Many linguistic phenomena not mentioned here (not necessarily syntactic) Developing a new lexical representation theory to support interpretive grammar Integration with notions of pragmatics/discourse Integrate theory into working applications 5/2/2011
106. Special thanks to (in no particular order): Dr. Nick Cassimatis Perrin Bignoli JR Scally Soledad Vedovato John Borland Hiroyuki Uchida 5/2/2011
107. References 5/2/2011 Ball, J. T. (2004). A Cognitively Plausible Model of Language Comprehension. Proceedings of the 13th Conference on Behavior Representation in Modeling and Simulation. Ball, J., Rodgers, S., & Gluck, K. (2001). Integrating ACT-R and Cyc in a large-scale model of language comprehension for use in intelligent agents. Artificial Intelligence. Ball, J., Heiberg, A., & Silber, R. (2007). Toward a large-scale model of language comprehension in ACT-R 6. In R. L. Lewis, T. A. Polk, & J. E. Laird (Eds.), Proceedings of the 8th International Conference on Cognitive Modeling (pp. 163-168). Ball, Jerry T; Heiberg, Andrea; Silber, R. (2005). Toward a Large-Scale Model of Language Comprehension in ACT-R 6 Construction-Driven Language Processing. Language, 1. Ball, J. T. (2004). A Cognitively Plausible Model of Language Comprehension. Proceedings of the 13th Conference on Behavior Representation in Modeling and Simulation. Culicover, P. W., & Jackendoff, R. (2006). The simpler syntax hypothesis. Trends in cognitive sciences, 10(9), 413-8. doi: 10.1016/j.tics.2006.07.007. Lewis, R. L. (1993). An architecturally-based theory of human sentence comprehension. Proceedings of the fifteenth annual conference of the Cognitive Science Society June 18 to 21 1993 Institute of Cognitive Science University of Colorado Boulder (p. 108). Lawrence Erlbaum. Lewis, R. L., Newell, A., & Polk, T. A. (1989). Toward a Soar theory of taking instructions for immediate reasoning. Proceedings of the Eleventh Annual Conference of the Cognitive Science Society (pp. 514-521). Erlbaum. Lewis, R. L., & Vasishth, S. (2005). An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval. Cognitive Science, 29(3), 375-419. Psychology Press. doi: 10.1207/s15516709cog0000_25. Nivre, J. (2005). Dependency grammar and dependency parsing. MSI report, 5133(1959), 1-32. Citeseer. Pollard, C., & Sag, I. (1994). Head Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. University of Chicago Press. Pullman, Stepen G. (1991). Basic Parsing Techniques: an introductory survey. Sag, I. A., Wasow, T., & Bender, E. (2003). Syntactic Theory: A Formal Introduction (second edition). (I. A. Sag, Thomas Wasow, & Emily Bender, Eds.). CSLI Publications.
Notas do Editor
Consider this sentence “the dog bit the man”
Our interpretation of the described event is like this.
Now consider a rearrangement of the words: “The man bit the dog.”
We get a complete different interpretation.
Now it may seem as though word order is all that syntax is. However, consider this example: “I hit the man with my car.”In particular, we can focus on the prepositional phrase at the end, “with my car.”
Now it may seem as though word order is all that syntax is. However, consider this example: “I hit the man with my car.”In particular, we can focus on the prepositional phrase at the end, “with my car.”
We have the first reading here, where the preposition modifies the VP “hit”.In this instance, we get the interpretation that “I hit the man WHILE DRIVING my car.”However, a second reading does exist…
If “with my car” modifies the NP “the man,” we get an interpretation that is “I hit the man WHO HAD my car.”Thus you can see that, with these two reasons, syntactic interpretation is a lot more than simply word order. On top of that, it is important to us as human beings, as the syntactic interpretation of “I hit the man with my car” has very important legal ramifications.
Moving towards a theory of grammar that is useful for interpreting natural language, we realized that current grammar formalisms have a number of shortcomings.
->We will show that, in fact, theoretical and implementation considerations are so closely related, that it doesn’t make sense to make a strong distinction between them.->That is to say, we can deal with many important shortcomings both theoretically and implementation-wise in a fairly elegant manner, using a unified framework for addressing both.
Generative grammar not concerned with interpretability.Concerned with over-generationInterpretability extends beyond simply grammatical generation.
Generative grammar formalisms do not give us an account of how we can interpret sentences. Generative grammar, in this case, only goes so far as saying that this sentence is ungrammatical, but has no concern for the fact that it is still interpretable.
Generative grammar formalisms do not give us an account of how we can interpret sentences. Generative grammar, in this case, only goes so far as saying that this sentence is ungrammatical, but has no concern for the fact that it is still interpretable.
We use our knowledge of grammaticality to interpret an utterance, even if it is ungrammatical. So, there is a similarity between “Fido bit dog” and “Fido bit the dog.” Our knowledge of grammar allows us to get this meaning.
We use our knowledge of grammaticality to interpret an utterance, even if it is ungrammatical. Our knowledge of grammar allows us to get this meaning.Thus, traditional grammar formalism is not concerned with how grammatical knowledge is actually used, but rather only the nature of grammatical knowledge itself.
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”In particular, let’s focus on the prepositional phrasal nesting that occurs here:
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”In particular, let’s focus on the prepositional phrasal nesting that occurs here:This seems to imply that that “with the hat” is modifying the NP headed by “boston,” as if the city of Boston has a hat. In fact, we are suggesting that it is “the man” that is the one with the hat. A more correct parse would be:
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”In particular, let’s focus on the prepositional phrasal nesting that occurs here:This seems to imply that that “with the hat” is modifying the NP headed by “boston,” as if the city of Boston has a hat. In fact, we are suggesting that it is “the man” that is the one with the hat. A more correct parse would be:
Many grammar formalisms require a binary branching in order to preserve the recursive aspect of the rule applications. However, this leads to spurious parse trees that are a result of the phrasal nesting. In many cases, it’s more accurate to deliver a flat parse tree.i.e. Consider the sentence “the man in boston with the hat is here.”In particular, let’s focus on the prepositional phrasal nesting that occurs here:This seems to imply that that “with the hat” is modifying the NP headed by “boston,” as if the city of Boston has a hat. In fact, we are suggesting that it is “the man” that is the one with the hat. A more correct parse would be:Notice the 4-ary branching that occurs at this NP. This is more realistic and closer to what is actually intended by the utterance.Hence, you’ll notice that syntactic interpretation is more than just a question of word order. It is in fact highly integrated with a lot of reasoning that would yield this interpretation, as opposed to the previous one.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.
Many theories of grammar are not concerned with the dynamic nature of syntactic parsing, and how syntax is generated on-line and incrementally in a left-right fashion.i.e. Complement selection before specifier selection.You can see in this case that, before we could identify the subject of the sentence (“john”), we had to identify the object (“mary”). However, this misses the crucial intuition that, when we hear “john saw…” we’ve already identified the fact that “john” is the subject of the sentence.
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”
Many theories of grammar posit an unnecessary amount of structure in order to preserve the recursiveness of their rule applications. Consider this case:“the tall strong angry man.”As the parse develops, we can see the needless addition of NP constituency structures. It would be ideal if we could obviate this need for unnecessary structure, as it is computationally inefficient, and increases the load on working memory.
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”
Another problem with this is that, potentially, there are a number of nodes here that are open for modificational attachment. In particular, consider the possibility of a prepositional phrase following, like “in boston.”So almost immediately, there is a high degree of attachment ambiguity, for what should be an unproblematic parse.
Thus, if you start with the goal of natural language “understanding,” in any deep sense of understanding, statistics falters where representation beyond simply the words are required. So in this case, there has to be a notion of what it we mean by “the couple” in order to draw the link between the couple and the two pronouns in the second sentence.
“joe has put those raw potatoes in the pot.”
“joe has put those raw potatoes in the pot.”
“joe has put those raw potatoes in the pot.”
“joe has put those raw potatoes in the pot.”
The way a theory is represented has very real concerns in terms of computability. Simply, the generation of a parse for what could potentially be a simple procedure (that is to some extent empirically supported by the fact that we don’t struggle heavily with interpreting these sentences) is a very expensive procedure, perhaps prohibitively so.
-However, each of these theories suffers from the shortcomings we’ve already seen.
-However, each of these theories suffers from the shortcomings we’ve already seen.-Both of Lewis' parsers rely on a context-free grammar formalism, and so are subject to those limitations.-Ball's parser: non-monotonic, no backtracking. We are monotonic, and use backtracking, which corresponds more strongly to human sentence comprehension. we disagree on this particular theoretical point: we believe that human cognition utilizes monotonicity, and backtracking, while they make the implicit assumption that this is not the case. -These approaches are not well integrated with reasoning.
-However, each of these theories suffers from the shortcomings we’ve already seen.-Both of Lewis' parsers rely on a context-free grammar formalism, and so are subject to those limitations.-Ball's parser: non-monotonic, no backtracking. We are monotonic, and use backtracking, which corresponds more strongly to human sentence comprehension. we disagree on this particular theoretical point: we believe that human cognition utilizes monotonicity, and backtracking, while they make the implicit assumption that this is not the case. -These approaches are not well integrated with reasoning. They suffer from the shortcomings that we’ve already identified in CFG This poor-integration with reasoning can be seen as very related to the shortcomings we’ve identified.
I know that there are satellite structures to the bit verb, that it takes an NP Subject, and an NP Object.
Suppose we observe the existence of two NP’s, one that precedes the verb, and one that follows the verb.
Thus, syntactic interpretation can be thought as the UNIFICATION of feature structures that already exist.This kind of unification is driven by two ideas:That words have satellite structures (in this case, “bit” has a subject NP and an object NP)Many words and phrases are also parts of other “satellite” centers (in this case, “fido” and “john” are taken by “bit” and treat it as a satellite center) With these ideas, we can unify these structures.So far then, we’ve observed that, we do not NEED to posit additional structures during a parse, so long as we know that they exist in some form. With this knowledge, unification of feature structures allows us to avoid the problem of excessive structure positing that occurs in many CFG formalisms.
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Sequential unification of multiple feature structures yields an "aggregating" feature structure-Builds up the internal representation of a feature structure through aggregation across multiply posited structures
-Computationally, once again, we are avoiding the positing of excessive structure through the sequential application of unification of multiple feature structures. -Thus, reasoned unification applied in sequence generates these “aggregate” structures that are composed of the internal features of a number of different structures.
This is fairly clear, that humans process sentences in an incremental manner, and do so (mostly) one word at a time, and as a sequential process.
Using those points as motivations, we’ve developed a parser prototype that has been implemented in the Polyscheme cognitive architecture.
The intention of the model is to process and interpret natural language input.
Pair-wise pattern matching:Have a native set of pair-wise matching constraints that generate a parse from right to left. For instance, if you have a Determiner before a Noun, you can apply a rule that unifies structures and aggregates the NP.Pair-wise pattern matching is very fast, as it limits the relevant details to a decision point to a very small number of identifiable features of the current state of the parse.
Pair-wise pattern matching:Have a native set of pair-wise matching constraints that generate a parse from right to left. For instance, if you have a Determiner before a Noun, you can apply a rule that unifies structures and aggregates the NP.Pair-wise pattern matching is very fast, as it limits the relevant details to a decision point to a very small number of identifiable features of the current state of the parse.
Pair-wise pattern matching:Have a native set of pair-wise matching constraints that generate a parse from right to left. For instance, if you have a Determiner before a Noun, you can apply a rule that unifies structures and aggregates the NP.Pair-wise pattern matching is very fast, as it limits the relevant details to a decision point to a very small number of identifiable features of the current state of the parse.
As you can see, the unification mechanism is actually very simple from a high level view like this. The backbone of feature structure unification is the Same() atom, which simply expresses the fact that two object’s are the same, and thus inherit each other’s attributes, similar to Leibniz’s law.
As you can see, the unification mechanism is actually very simple from a high level view like this. The backbone of feature structure unification is the Same() atom, which simply expresses the fact that two object’s are the same, and thus inherit each other’s attributes.
Here are just a small subset of the number of pair-wise pattern matching rules we can use.
Here are just a small subset of the number of pair-wise pattern matching rules we can use.Focus on the two ways of attaching the complements. I’m still on the fence about which one I prefer, but I’m leaning towards the first way.
At time step 1, we hear “the”.
At this point, we’ve heard two words, “the” and “tall”.So we’ve incrementally added the next word in the utterance.We have an underspecified node at the top of phrasal chain of “tall.” What we can do is unify these structures with a rule.
At this point, we’ve heard two words, “the” and “tall”.We have an underspecified node at the top of phrasal chain of “tall.” What we can do is unify these structures with a rule.
At this point, we’ve heard two words, “the” and “tall”.We have an underspecified node at the top of phrasal chain of “tall.” What we can do is unify these structures with a rule.What has happened here is that the NP has been aggregated to the right, by positing that the NP whose existence was inferred at time step 1 is the SAME as the underspecified node inferred to exist at time step 2. So what this amounts to, in this case is that we have generated an aggregate grouping that, at this point, contains the first NP and the next XP.
Now add the next word: “man”Once again, we can apply a rule to incrementally grow our parse, to incorporate the new word “man” into the growing syntactic parse tree.
Now add the next word: “man”Once again, we can apply a rule to incrementally grow our parse, to incorporate the new word “man” into the growing syntactic parse tree.
Now add the next word: “man”Once again, we can apply a rule to incrementally grow our parse, to incorporate the new word “man” into the growing syntactic parse tree.Thus, you can see that we’ve once again aggregated the NP structure, and inherited more information that helps to define in greater detail the internal structure of the aggregating NP.Thus, we’ve shown a method parsing that is both incremental left to right to grow a parse, that doesn’t posit excessive structure through the notion of unification of unification and aggregation.We’ve also identified the satellite structure that satisfies the obligatory specifier argument in the internal structure of “man”.
So in a sense, it’s also more cognitively plausible, in that it reflects two key facts about human performance:Human sentence comprehension occurs in a finite time. We cannot take tens of minutes to converge upon an interpretation. So quicker sentence processing is closer to human level intelligence.Wider coverage of sentence phenomena is also moving closer to human-level intelligence, as the more sentences we can cover, the more human performance we’ve captured. So, high performance is, to a large degree, a move towards cognitive plausibility.