SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Introduction     Structural Features    Our Approach   Results and Analysis   Acknowledgement   References




               Towards Building a Text Commitment
                             System

                                             Gaurav Arora1
                                              200801229
                                               Supervisor
                                       Prof. Prasenjit Majumder1

               1 Dhirubhai   Ambani Institute of Information and Communication Technology
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Outline


       1       Introduction
                  Problem Defination
                  Natural Langauge understanding
                  Literature Survey and Usage
                  Approach Overview
       2       Structural Features
       3       Our Approach
                 Generating Model for simple sentences
                 Extracting Similar POS Patterns and sentence
                 genreration
       4       Results and Analysis
Introduction        Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Problem Definition

Textual Commitment




       Publicly Held Beliefs
       Textual Commitment system simplifies complex sentence in a
       set of simple sentences which are public beliefs conveyed by
       the complex sentence.

       Textual Commitment Origin
       Textual Entailment was proposed by LCC(Language Computer
       Corporation) to be used as core component module for Natural
       Language Understanding.
Introduction        Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Problem Definition

Textual Commitment Example

       Example Complex Sentence
       Text: "The Extra Girl" (1923) is a story of a small?town girl,Sue
       Graham (played by Mabel Normand) who comes to Hollywood
       to be in the pictures. This Mabel Normand vehicle, produced by
       Mack Sennett, followed earlier films about the film industry and
       also paved the way for later films about Hollywood, such as
       King Vidor?s "Show People" (1928).

       Simplified Sentences
       T1.     "The Extra Girl" is a story of a small?town girl.
       T2.     "The Extra Girl" is a story of Sue Graham.
       T3.     Sue Graham is a small?town girl.
       T4.     Sue Graham [was] played by Mabel Normand.
       T5.     Sue Graham comes to Hollywood to be in the pictures.
       T6.     A Mabel Normand vehicle was produced by Mack Sennett.
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Natural Language understanding

Language Understanding Components

               By machine reading or understanding text mean the
               formation of a coherent set of beliefs based on a textual
               corpus and a background theory.
               Textual Entailment systems determine whether one
               sentence is entailed by another.
       Language understanding Features
               Noisy
               Limited scope
               Corpus-wide statistics
               Minimal reasoning
               Bottom up
               General
               Very Fast!
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Literature Survey and Usage

Question Answering to QA4MRE




               Question Answering(QA) System have a upper bound of
               60% of accuracy in systems performance.
               Current QA system have less emphasis on Understanding
               and analyzing text.
               To tackle 60% upper bound QA4MRE focuses on
               understanding single document and emphasis is on
               component like Textual Commitment,Textual Entailment.
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Literature Survey and Usage

Textual Entailment




               Pascal Recognising Textual Entailment(RTE) Challenge is
               reputed evaluation campaign for research in Textual
               Entailment from past 7 years.
               Researcher use logic prover to detect entailment to
               overcome need of background knowledge with an
               performance upper bound as 71%.
               LCC Proposed Textual commitment obtained 9%
               improvement over upper bound.
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Literature Survey and Usage

Textual Entailment classes




                                 Figure: Textual Entailment Classes
Introduction     Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Literature Survey and Usage

Textual Commitment Approach




       LCC Heuristic Approach
       LCC’s TC system uses a series of ex-traction heuristics in order
       to enumerate a subset of the discourse commitments that are
       inferable from either the text or hypothesis

       Statistical approach for Textual Commitment
       Due to unavailability of Heuristics, we decided to build a Textual
       Commitment system using statistical features of Language.
Introduction    Structural Features   Our Approach   Results and Analysis   Acknowledgement   References


Approach Overview

Statistical approach for Textual Commitment


               Learning Grammatical Structural rules of Simple
               Sentences(POS Tags).
               Converting Complex Sentences into Structural Elements.
               Finding Similar Rules for Generating Simple sentences.
               Generating simple sentences in natural language based on
               Rules.

       Example Part of Speech Tagging
       They-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD

       Feature
       Key feature for statistical language , Textual Commitment
       generation is Part of Speech tagging.
Introduction   Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Simple Sentence Distribution




                        Figure: A Distribution of sentence in english
Introduction   Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Comparison of POS Tags




               Figure: A Distribution of POS Tags in simple sentences
Introduction     Structural Features    Our Approach   Results and Analysis   Acknowledgement   References


Generating Model for simple sentences

Module 1 Block diagram
Introduction     Structural Features    Our Approach   Results and Analysis   Acknowledgement   References


Generating Model for simple sentences

Basic Components


               Tri-gram Language Model Generation on POS Tags.
               Artificial Generation of POS Patterns.
               Ranking of Artificially generated sentences based on
               created Language Model.

       Example
       Ranked POSTAG Patterns
       -53.7293 DT NN VBD VBN
       -54.0778 PRP VBP RB VBN
       -54.2327 NNP NN NNP NNP
       -54.7982 PRP VBP RB JJ
       -55.3234 NNP NNP NN NNP
       Total Generated Rules: 9606406
Introduction     Structural Features    Our Approach   Results and Analysis   Acknowledgement   References


Generating Model for simple sentences

Distribution of POSTAG in Simple Sentence Tokens




                                 Figure: Textual Entailment Classes
Introduction     Structural Features    Our Approach   Results and Analysis   Acknowledgement   References


Generating Model for simple sentences

Distribution of Simple Sentence based on LM score


                    Total Rules: 9606406
                    Number of rules categorized by scores
                    Rules > -100 ( 679545 )
                    Rules > -90 ( 170662 )
                    Rules > -80 ( 27280 )
                    Rules > -70 ( 2328 )
                    Rules > -65 ( 474 )
                    Rules > -60 ( 76 )
                    Rules > -70,Words length - 5 and 6
                    1594
                    Rules > -83,Words length - 7 and 8
                    3110
                    Total Rules considered for matching: 4704
Introduction     Structural Features      Our Approach    Results and Analysis   Acknowledgement   References


Extracting Similar POS Patterns and sentence generation

Module 2 and 3 Block Diagram
Introduction     Structural Features      Our Approach    Results and Analysis   Acknowledgement   References


Extracting Similar POS Patterns and sentence generation

Extracting Similar POS Patterns - Basic Components



               Extraction of POS tags and Chunks from Complex
               sentences.
               Chunks are Noun Phrase,Together occurring words which
               must also occur together in simple sentences.
               Considering POS Rules from Module 1 as Virtual
               documents.
               Searching for Rules/Documents Similar to Chunks and
               POSTAG in complex sentences.
               Xapian is used for search,Phrase search to ensure
               occurrence of chunks tags together in Similar rules.
Introduction     Structural Features      Our Approach    Results and Analysis   Acknowledgement   References


Extracting Similar POS Patterns and sentence generation

Extracting Similar POS Patterns - Module I/O

       Example
       Sentence:
       A Revenue Cutter, the ship was named for Harriet Lane, niece
       of President James Buchanan,who served as Buchanan?s
       White House hostess.

       Example
       Frequency of POS tags,chunks in Complex Sentences:
       POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,..
       Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,..

       Example
       Extracted Patterns from Xapian:
       91% NNP NNP VBD VBN IN DT NN RB
       86% NNP VBD RB VBN IN DT NN RB
Introduction     Structural Features      Our Approach    Results and Analysis   Acknowledgement   References


Extracting Similar POS Patterns and sentence generation

Simple Sentence Generation - Basic Components




               Replacement of all chunks in Similar POS Tag Rules with
               chunk value.
               Additional rules with different chunk values are added if
               chunk maps to more than one value.
               After replacement of chunk, Left POS Tags are filled with
               values.
               Module Generate a lot of noisy sentences from this
               module.
Introduction     Structural Features      Our Approach    Results and Analysis   Acknowledgement   References


Extracting Similar POS Patterns and sentence generation

Simple Sentence Generation - Module I/O

       chunk value mapping
       NNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBD
       VBN-0=was named, DT NN-0=the ship, RB IN-0=niece of

       Example
       A Revenue Cutter, the ship was named for Harriet Lane, niece
       of President James Buchanan,who served as Buchanan?s
       White House hostess.
       Simple Sentences:
       Harriet Lane President James Buchanan niece
       Harriet Lane served for hostess
       Harriet Lane was for the ship
       Buchanan ? White House the ship hostess
       Harriet Lane was the ship
Introduction    Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Recall System




               Recall of System is important,System is input to textual
               entailment and other Natural Language Understanding
               Module.
               Recall of our statistical Textual Commitment system is
               0.23.
               System Recall calculated on 5 Complex Sentences.
               Recall value shows positive signs for Sophisticated
               Statistical Textual Commitment system.
Introduction    Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Analysis




               Require additional module to rank good sentences and
               remove noisy sentences.
               Sophisticated Natural Language Generation Module.
               Generating Simple sentence pattern from Complex
               sentences rather than Artificially generating Rules.
               Finding a more suitable Model- Combination of bi-gram
               and tri-gram , bigram model .
Introduction   Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




Acknowledgement




       I would like to express my sincere thanks to Prof. Prasenjit
       Majumder for providing opportunity to work under his esteem
       guidance and helping throughout the project and providing his
       valuable critical suggestion on my work.Additionally i would like
       to thanks SRILM and Xapian team for helping me work with
       their open source software.
Introduction    Structural Features   Our Approach   Results and Analysis   Acknowledgement   References




References

               Hickl: A discourse commitment-based framework for
               recognizing textual entailment
               Anselmo Peñas et. al. Overview of QA4MRE at CLEF
               2011: Question Answering for Machine Reading
               Evaluation, Working Notes of CLEF (2011)
               L. Bentivogli (FBK-irst) et. al. The Sixth PASCAL
               Recognizing Textual Entailment Challenge
               ( 2010) Olly Betts,Xapian,version 1.2.9
               Asher Stern and Ido Dagan: A Confidence Model for
               Syntactically-Motivated Entailment Proofs. In Proceedings
               of RANLP 2011
               Katrin Kirchhoff et. al. Factored Language Models Tutorial
               (2008)
               (2002) The IEEE website. [Online]. Available:
               http://www.ieee.org/
               (2010) SRILM-Language Modelling Toolkit.

Mais conteúdo relacionado

Destaque

Automatic id
Automatic idAutomatic id
Automatic id
chanchira
 
Automatic id
Automatic idAutomatic id
Automatic id
chanchira
 
Strategy social media and journalism
Strategy social media and journalismStrategy social media and journalism
Strategy social media and journalism
Davy Sims
 
Informática i
Informática iInformática i
Informática i
ricardo
 
Social Media w Chinach
Social Media w ChinachSocial Media w Chinach
Social Media w Chinach
Konceptika
 
Its not easy being green
Its not easy being greenIts not easy being green
Its not easy being green
mediaman64
 
บทที่51
 บทที่51 บทที่51
บทที่51
kik.nantanit
 
Bridge Outdoors Fall and Winter 2011 Catalog
Bridge Outdoors Fall and Winter 2011 CatalogBridge Outdoors Fall and Winter 2011 Catalog
Bridge Outdoors Fall and Winter 2011 Catalog
Bridge Outdoors
 
Automatic id
Automatic idAutomatic id
Automatic id
chanchira
 
Sat vocabulary drills set one
Sat vocabulary drills set oneSat vocabulary drills set one
Sat vocabulary drills set one
Cecily Anderson
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
markme18
 

Destaque (20)

Journalism and Social Media
Journalism and Social MediaJournalism and Social Media
Journalism and Social Media
 
Automatic id
Automatic idAutomatic id
Automatic id
 
Automatic id
Automatic idAutomatic id
Automatic id
 
Strategy social media and journalism
Strategy social media and journalismStrategy social media and journalism
Strategy social media and journalism
 
Social Media Association for Business Presentation
Social Media Association for Business PresentationSocial Media Association for Business Presentation
Social Media Association for Business Presentation
 
Control del filtro por aire comprimido
Control del filtro por aire comprimidoControl del filtro por aire comprimido
Control del filtro por aire comprimido
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 
Tech training workshop 3 final 090810
Tech training   workshop 3 final 090810Tech training   workshop 3 final 090810
Tech training workshop 3 final 090810
 
Informática i
Informática iInformática i
Informática i
 
T karaoke song catalog
T karaoke song catalogT karaoke song catalog
T karaoke song catalog
 
Social Media w Chinach
Social Media w ChinachSocial Media w Chinach
Social Media w Chinach
 
Its not easy being green
Its not easy being greenIts not easy being green
Its not easy being green
 
บทที่51
 บทที่51 บทที่51
บทที่51
 
Bridge outdoors Spring & Summer 2012
Bridge outdoors Spring & Summer 2012Bridge outdoors Spring & Summer 2012
Bridge outdoors Spring & Summer 2012
 
Журналисты 2.0
Журналисты 2.0Журналисты 2.0
Журналисты 2.0
 
Bridge Outdoors Fall and Winter 2011 Catalog
Bridge Outdoors Fall and Winter 2011 CatalogBridge Outdoors Fall and Winter 2011 Catalog
Bridge Outdoors Fall and Winter 2011 Catalog
 
Automatic id
Automatic idAutomatic id
Automatic id
 
Sat vocabulary drills set one
Sat vocabulary drills set oneSat vocabulary drills set one
Sat vocabulary drills set one
 
Plattegronden interieurtekenen
Plattegronden interieurtekenenPlattegronden interieurtekenen
Plattegronden interieurtekenen
 
Lecture4 binary-numbers-logic-operations
Lecture4  binary-numbers-logic-operationsLecture4  binary-numbers-logic-operations
Lecture4 binary-numbers-logic-operations
 

Semelhante a 200801229 final presentation

Ekaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotationEkaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotation
Shahab Mokarizadeh
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
ijnlc
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
feiwin
 

Semelhante a 200801229 final presentation (20)

Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
D017232729
D017232729D017232729
D017232729
 
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEUNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
 
Text summarization
Text summarization Text summarization
Text summarization
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Ekaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotationEkaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotation
 
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference PresentationRecent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
 
Optimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimization
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
ACL-IJCNLP 2015
ACL-IJCNLP 2015ACL-IJCNLP 2015
ACL-IJCNLP 2015
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
Abstractive Review Summarization
Abstractive Review SummarizationAbstractive Review Summarization
Abstractive Review Summarization
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

200801229 final presentation

  • 1. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Towards Building a Text Commitment System Gaurav Arora1 200801229 Supervisor Prof. Prasenjit Majumder1 1 Dhirubhai Ambani Institute of Information and Communication Technology
  • 2. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Outline 1 Introduction Problem Defination Natural Langauge understanding Literature Survey and Usage Approach Overview 2 Structural Features 3 Our Approach Generating Model for simple sentences Extracting Similar POS Patterns and sentence genreration 4 Results and Analysis
  • 3. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Problem Definition Textual Commitment Publicly Held Beliefs Textual Commitment system simplifies complex sentence in a set of simple sentences which are public beliefs conveyed by the complex sentence. Textual Commitment Origin Textual Entailment was proposed by LCC(Language Computer Corporation) to be used as core component module for Natural Language Understanding.
  • 4. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Problem Definition Textual Commitment Example Example Complex Sentence Text: "The Extra Girl" (1923) is a story of a small?town girl,Sue Graham (played by Mabel Normand) who comes to Hollywood to be in the pictures. This Mabel Normand vehicle, produced by Mack Sennett, followed earlier films about the film industry and also paved the way for later films about Hollywood, such as King Vidor?s "Show People" (1928). Simplified Sentences T1. "The Extra Girl" is a story of a small?town girl. T2. "The Extra Girl" is a story of Sue Graham. T3. Sue Graham is a small?town girl. T4. Sue Graham [was] played by Mabel Normand. T5. Sue Graham comes to Hollywood to be in the pictures. T6. A Mabel Normand vehicle was produced by Mack Sennett.
  • 5. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Natural Language understanding Language Understanding Components By machine reading or understanding text mean the formation of a coherent set of beliefs based on a textual corpus and a background theory. Textual Entailment systems determine whether one sentence is entailed by another. Language understanding Features Noisy Limited scope Corpus-wide statistics Minimal reasoning Bottom up General Very Fast!
  • 6. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Literature Survey and Usage Question Answering to QA4MRE Question Answering(QA) System have a upper bound of 60% of accuracy in systems performance. Current QA system have less emphasis on Understanding and analyzing text. To tackle 60% upper bound QA4MRE focuses on understanding single document and emphasis is on component like Textual Commitment,Textual Entailment.
  • 7. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Literature Survey and Usage Textual Entailment Pascal Recognising Textual Entailment(RTE) Challenge is reputed evaluation campaign for research in Textual Entailment from past 7 years. Researcher use logic prover to detect entailment to overcome need of background knowledge with an performance upper bound as 71%. LCC Proposed Textual commitment obtained 9% improvement over upper bound.
  • 8. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Literature Survey and Usage Textual Entailment classes Figure: Textual Entailment Classes
  • 9. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Literature Survey and Usage Textual Commitment Approach LCC Heuristic Approach LCC’s TC system uses a series of ex-traction heuristics in order to enumerate a subset of the discourse commitments that are inferable from either the text or hypothesis Statistical approach for Textual Commitment Due to unavailability of Heuristics, we decided to build a Textual Commitment system using statistical features of Language.
  • 10. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Approach Overview Statistical approach for Textual Commitment Learning Grammatical Structural rules of Simple Sentences(POS Tags). Converting Complex Sentences into Structural Elements. Finding Similar Rules for Generating Simple sentences. Generating simple sentences in natural language based on Rules. Example Part of Speech Tagging They-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD Feature Key feature for statistical language , Textual Commitment generation is Part of Speech tagging.
  • 11. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Simple Sentence Distribution Figure: A Distribution of sentence in english
  • 12. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Comparison of POS Tags Figure: A Distribution of POS Tags in simple sentences
  • 13. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Generating Model for simple sentences Module 1 Block diagram
  • 14. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Generating Model for simple sentences Basic Components Tri-gram Language Model Generation on POS Tags. Artificial Generation of POS Patterns. Ranking of Artificially generated sentences based on created Language Model. Example Ranked POSTAG Patterns -53.7293 DT NN VBD VBN -54.0778 PRP VBP RB VBN -54.2327 NNP NN NNP NNP -54.7982 PRP VBP RB JJ -55.3234 NNP NNP NN NNP Total Generated Rules: 9606406
  • 15. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Generating Model for simple sentences Distribution of POSTAG in Simple Sentence Tokens Figure: Textual Entailment Classes
  • 16. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Generating Model for simple sentences Distribution of Simple Sentence based on LM score Total Rules: 9606406 Number of rules categorized by scores Rules > -100 ( 679545 ) Rules > -90 ( 170662 ) Rules > -80 ( 27280 ) Rules > -70 ( 2328 ) Rules > -65 ( 474 ) Rules > -60 ( 76 ) Rules > -70,Words length - 5 and 6 1594 Rules > -83,Words length - 7 and 8 3110 Total Rules considered for matching: 4704
  • 17. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Extracting Similar POS Patterns and sentence generation Module 2 and 3 Block Diagram
  • 18. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Extracting Similar POS Patterns and sentence generation Extracting Similar POS Patterns - Basic Components Extraction of POS tags and Chunks from Complex sentences. Chunks are Noun Phrase,Together occurring words which must also occur together in simple sentences. Considering POS Rules from Module 1 as Virtual documents. Searching for Rules/Documents Similar to Chunks and POSTAG in complex sentences. Xapian is used for search,Phrase search to ensure occurrence of chunks tags together in Similar rules.
  • 19. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Extracting Similar POS Patterns and sentence generation Extracting Similar POS Patterns - Module I/O Example Sentence: A Revenue Cutter, the ship was named for Harriet Lane, niece of President James Buchanan,who served as Buchanan?s White House hostess. Example Frequency of POS tags,chunks in Complex Sentences: POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,.. Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,.. Example Extracted Patterns from Xapian: 91% NNP NNP VBD VBN IN DT NN RB 86% NNP VBD RB VBN IN DT NN RB
  • 20. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Extracting Similar POS Patterns and sentence generation Simple Sentence Generation - Basic Components Replacement of all chunks in Similar POS Tag Rules with chunk value. Additional rules with different chunk values are added if chunk maps to more than one value. After replacement of chunk, Left POS Tags are filled with values. Module Generate a lot of noisy sentences from this module.
  • 21. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Extracting Similar POS Patterns and sentence generation Simple Sentence Generation - Module I/O chunk value mapping NNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBD VBN-0=was named, DT NN-0=the ship, RB IN-0=niece of Example A Revenue Cutter, the ship was named for Harriet Lane, niece of President James Buchanan,who served as Buchanan?s White House hostess. Simple Sentences: Harriet Lane President James Buchanan niece Harriet Lane served for hostess Harriet Lane was for the ship Buchanan ? White House the ship hostess Harriet Lane was the ship
  • 22. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Recall System Recall of System is important,System is input to textual entailment and other Natural Language Understanding Module. Recall of our statistical Textual Commitment system is 0.23. System Recall calculated on 5 Complex Sentences. Recall value shows positive signs for Sophisticated Statistical Textual Commitment system.
  • 23. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Analysis Require additional module to rank good sentences and remove noisy sentences. Sophisticated Natural Language Generation Module. Generating Simple sentence pattern from Complex sentences rather than Artificially generating Rules. Finding a more suitable Model- Combination of bi-gram and tri-gram , bigram model .
  • 24. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Acknowledgement I would like to express my sincere thanks to Prof. Prasenjit Majumder for providing opportunity to work under his esteem guidance and helping throughout the project and providing his valuable critical suggestion on my work.Additionally i would like to thanks SRILM and Xapian team for helping me work with their open source software.
  • 25. Introduction Structural Features Our Approach Results and Analysis Acknowledgement References References Hickl: A discourse commitment-based framework for recognizing textual entailment Anselmo Peñas et. al. Overview of QA4MRE at CLEF 2011: Question Answering for Machine Reading Evaluation, Working Notes of CLEF (2011) L. Bentivogli (FBK-irst) et. al. The Sixth PASCAL Recognizing Textual Entailment Challenge ( 2010) Olly Betts,Xapian,version 1.2.9 Asher Stern and Ido Dagan: A Confidence Model for Syntactically-Motivated Entailment Proofs. In Proceedings of RANLP 2011 Katrin Kirchhoff et. al. Factored Language Models Tutorial (2008) (2002) The IEEE website. [Online]. Available: http://www.ieee.org/ (2010) SRILM-Language Modelling Toolkit.