1. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Towards Building a Text Commitment
System
Gaurav Arora1
200801229
Supervisor
Prof. Prasenjit Majumder1
1 Dhirubhai Ambani Institute of Information and Communication Technology
2. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Outline
1 Introduction
Problem Defination
Natural Langauge understanding
Literature Survey and Usage
Approach Overview
2 Structural Features
3 Our Approach
Generating Model for simple sentences
Extracting Similar POS Patterns and sentence
genreration
4 Results and Analysis
3. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Problem Definition
Textual Commitment
Publicly Held Beliefs
Textual Commitment system simplifies complex sentence in a
set of simple sentences which are public beliefs conveyed by
the complex sentence.
Textual Commitment Origin
Textual Entailment was proposed by LCC(Language Computer
Corporation) to be used as core component module for Natural
Language Understanding.
4. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Problem Definition
Textual Commitment Example
Example Complex Sentence
Text: "The Extra Girl" (1923) is a story of a small?town girl,Sue
Graham (played by Mabel Normand) who comes to Hollywood
to be in the pictures. This Mabel Normand vehicle, produced by
Mack Sennett, followed earlier films about the film industry and
also paved the way for later films about Hollywood, such as
King Vidor?s "Show People" (1928).
Simplified Sentences
T1. "The Extra Girl" is a story of a small?town girl.
T2. "The Extra Girl" is a story of Sue Graham.
T3. Sue Graham is a small?town girl.
T4. Sue Graham [was] played by Mabel Normand.
T5. Sue Graham comes to Hollywood to be in the pictures.
T6. A Mabel Normand vehicle was produced by Mack Sennett.
5. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Natural Language understanding
Language Understanding Components
By machine reading or understanding text mean the
formation of a coherent set of beliefs based on a textual
corpus and a background theory.
Textual Entailment systems determine whether one
sentence is entailed by another.
Language understanding Features
Noisy
Limited scope
Corpus-wide statistics
Minimal reasoning
Bottom up
General
Very Fast!
6. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Question Answering to QA4MRE
Question Answering(QA) System have a upper bound of
60% of accuracy in systems performance.
Current QA system have less emphasis on Understanding
and analyzing text.
To tackle 60% upper bound QA4MRE focuses on
understanding single document and emphasis is on
component like Textual Commitment,Textual Entailment.
7. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Entailment
Pascal Recognising Textual Entailment(RTE) Challenge is
reputed evaluation campaign for research in Textual
Entailment from past 7 years.
Researcher use logic prover to detect entailment to
overcome need of background knowledge with an
performance upper bound as 71%.
LCC Proposed Textual commitment obtained 9%
improvement over upper bound.
8. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Entailment classes
Figure: Textual Entailment Classes
9. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Commitment Approach
LCC Heuristic Approach
LCC’s TC system uses a series of ex-traction heuristics in order
to enumerate a subset of the discourse commitments that are
inferable from either the text or hypothesis
Statistical approach for Textual Commitment
Due to unavailability of Heuristics, we decided to build a Textual
Commitment system using statistical features of Language.
10. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Approach Overview
Statistical approach for Textual Commitment
Learning Grammatical Structural rules of Simple
Sentences(POS Tags).
Converting Complex Sentences into Structural Elements.
Finding Similar Rules for Generating Simple sentences.
Generating simple sentences in natural language based on
Rules.
Example Part of Speech Tagging
They-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD
Feature
Key feature for statistical language , Textual Commitment
generation is Part of Speech tagging.
11. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Simple Sentence Distribution
Figure: A Distribution of sentence in english
12. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Comparison of POS Tags
Figure: A Distribution of POS Tags in simple sentences
13. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Module 1 Block diagram
14. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Basic Components
Tri-gram Language Model Generation on POS Tags.
Artificial Generation of POS Patterns.
Ranking of Artificially generated sentences based on
created Language Model.
Example
Ranked POSTAG Patterns
-53.7293 DT NN VBD VBN
-54.0778 PRP VBP RB VBN
-54.2327 NNP NN NNP NNP
-54.7982 PRP VBP RB JJ
-55.3234 NNP NNP NN NNP
Total Generated Rules: 9606406
15. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Distribution of POSTAG in Simple Sentence Tokens
Figure: Textual Entailment Classes
16. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Distribution of Simple Sentence based on LM score
Total Rules: 9606406
Number of rules categorized by scores
Rules > -100 ( 679545 )
Rules > -90 ( 170662 )
Rules > -80 ( 27280 )
Rules > -70 ( 2328 )
Rules > -65 ( 474 )
Rules > -60 ( 76 )
Rules > -70,Words length - 5 and 6
1594
Rules > -83,Words length - 7 and 8
3110
Total Rules considered for matching: 4704
17. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Module 2 and 3 Block Diagram
18. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Extracting Similar POS Patterns - Basic Components
Extraction of POS tags and Chunks from Complex
sentences.
Chunks are Noun Phrase,Together occurring words which
must also occur together in simple sentences.
Considering POS Rules from Module 1 as Virtual
documents.
Searching for Rules/Documents Similar to Chunks and
POSTAG in complex sentences.
Xapian is used for search,Phrase search to ensure
occurrence of chunks tags together in Similar rules.
19. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Extracting Similar POS Patterns - Module I/O
Example
Sentence:
A Revenue Cutter, the ship was named for Harriet Lane, niece
of President James Buchanan,who served as Buchanan?s
White House hostess.
Example
Frequency of POS tags,chunks in Complex Sentences:
POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,..
Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,..
Example
Extracted Patterns from Xapian:
91% NNP NNP VBD VBN IN DT NN RB
86% NNP VBD RB VBN IN DT NN RB
20. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Simple Sentence Generation - Basic Components
Replacement of all chunks in Similar POS Tag Rules with
chunk value.
Additional rules with different chunk values are added if
chunk maps to more than one value.
After replacement of chunk, Left POS Tags are filled with
values.
Module Generate a lot of noisy sentences from this
module.
21. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Simple Sentence Generation - Module I/O
chunk value mapping
NNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBD
VBN-0=was named, DT NN-0=the ship, RB IN-0=niece of
Example
A Revenue Cutter, the ship was named for Harriet Lane, niece
of President James Buchanan,who served as Buchanan?s
White House hostess.
Simple Sentences:
Harriet Lane President James Buchanan niece
Harriet Lane served for hostess
Harriet Lane was for the ship
Buchanan ? White House the ship hostess
Harriet Lane was the ship
22. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Recall System
Recall of System is important,System is input to textual
entailment and other Natural Language Understanding
Module.
Recall of our statistical Textual Commitment system is
0.23.
System Recall calculated on 5 Complex Sentences.
Recall value shows positive signs for Sophisticated
Statistical Textual Commitment system.
23. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Analysis
Require additional module to rank good sentences and
remove noisy sentences.
Sophisticated Natural Language Generation Module.
Generating Simple sentence pattern from Complex
sentences rather than Artificially generating Rules.
Finding a more suitable Model- Combination of bi-gram
and tri-gram , bigram model .
24. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Acknowledgement
I would like to express my sincere thanks to Prof. Prasenjit
Majumder for providing opportunity to work under his esteem
guidance and helping throughout the project and providing his
valuable critical suggestion on my work.Additionally i would like
to thanks SRILM and Xapian team for helping me work with
their open source software.
25. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
References
Hickl: A discourse commitment-based framework for
recognizing textual entailment
Anselmo Peñas et. al. Overview of QA4MRE at CLEF
2011: Question Answering for Machine Reading
Evaluation, Working Notes of CLEF (2011)
L. Bentivogli (FBK-irst) et. al. The Sixth PASCAL
Recognizing Textual Entailment Challenge
( 2010) Olly Betts,Xapian,version 1.2.9
Asher Stern and Ido Dagan: A Confidence Model for
Syntactically-Motivated Entailment Proofs. In Proceedings
of RANLP 2011
Katrin Kirchhoff et. al. Factored Language Models Tutorial
(2008)
(2002) The IEEE website. [Online]. Available:
http://www.ieee.org/
(2010) SRILM-Language Modelling Toolkit.