SlideShare a Scribd company logo
1 of 31
A ROBUST FACTOID BASED
QUESTION GENERATION SYSTEM
PRESENTED BY
ANIMESH SHAW
ARITRA DAS
SHREEPARNA SARKAR
CONTENTS
• Motivation.
• Our Objective.
• About Factoid Questions.
• Basic Terminology.
• Working Procedure.
• Rule base Generation.
• Question Generation.
• Evaluation
• Future Scope
MOTIVATION
Google Speech Recognition
Chat bots talking to each other , taken
from Cornell Creative Machines Lab
Google Translator , currently translating English to Bengali.
Cleverbot, A chat bot with a good sense of humor. Taken from http://www.cleverbot.com/
CONTD.
OUR OBJECTIVE
Generate an efficient Question Generation System
To generate factoid questions from some text document or corpus
Generate questions from each and every sentence if the sentence has
some information else the sentence is discarded
For some sentences more than
one type of factoid questions
is possible thus attempt
generating all such possible
types.
Take user’s opinion or
feedback, and improve the
result for further use.
FACTOID QUESTIONS?
Factoid Questions: Type of questions that demand accurate information
regarding an entity or an event like person names, locations, organization
etc., as opposed to definition questions, opinion questions or complex
questions as the why or how questions.
BASIC TERMINOLOGY
1. TOKENIZING: Breaking the string into words and Punctuation marks.
e.g. - I went home last night. → [‘I’, ‘went’, ‘home’, ‘last’, ‘night’, ‘.’ ]
2. TAGGING: Assigning Parts-of-speech tags to words.
e.g. - cat → noun → NN, eat → verb → VB
3. LEMMATIZING: Finding word lemmata
(e.g. - was → be).
4. CHUNKING: A feature that allows grouping of words that convey the
same thought and then tagging those sets of words. These tags can be
like - Verb phrase, Prepositional Phrase, Noun Phrase.
e.g. → Bangladesh defeated India in 2007 World Cup
CONTD.
5. CHUNKS: ‘Bangladesh’ , ‘defeated’, ‘India’ , ‘in’, ‘2007 World Cup’
6. RELATION FINDING: Finding relation between the chunks, ,
sentence subject, object and predicates as:
RELATIONS:
Bangladesh defeated India in 2007 World Cup
NP-SBJ-1 VP-1 NP-OBJ-1 PP-TMP-1 NP-TMP-1
RELATION EXAMPLE
CONTD.
WORKING PROCEDURE
1. Taken large train sets of Wh-Questions.
2. Broken the sentence into chunks and parsed.
3. Found relations.
The Sentence: “Who became the 16th president of the United States of America in
1861”
CHUNKING:
['Who', 'NP-SBJ-1']
['became', 'VP-1']
['the 16th president', 'NP-PRD-1']
['of', 'PP']
['United States', 'NP']
['of', 'PP']
['America', 'NP']
['in', 'PP']
['1681', 'NP']
Storing the tags in a List
['NP-SBJ-1', 'VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1']
“who”
Storing the tags with the corresponding Wh-Type in a list
['Who', ['VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1']]
4. Determined the wh-type , which is determined by the observing the head
word of the question.
CONTD.
RULE BASE GENERATION
The Parent Tree:
This tree is fed to the system before the training is done. When the
system reads a question it determines the Wh-type and traverses to
that specific node and starts populating the tree.
POPULATING THE RULE-TREE
Travelled to the specific wh-node and stored these relations by populating
the subsequent nodes of the tree with these chunk relations
NORMALIZED COUNT
This is used to let the parser know when to print the question and
when not while backtracking to other child nodes. It is defined as :
Occurrence of every tail node of a question from train set
Total number of question of that particular wh-tag
The Count is attached to the tail node only.
Example:
‘Who doesn’t want to rule the world?’
Nodes: NP-SBJ-1 VP-1,VBZ-VB-TO-VB NP-OBJ -1,14
Here, this type of question structure appears 14 times in the trained set.
The tail will have the count value as integer but when the recursive decent
parser parses the question base it normalizes the value. This also provides
the user with a more probable question among many other questions.
RULE-TREE WITH NORMALIZED COUNT
While populating, the count of visiting each tail node (the node that holds the
last chunk relation) is saved in the corresponding node.
A snapshot of the rule base with count value:
ANSWER PREPROCESSING AND
QUESTION TYPE DECISION SYSTEM
• While populating the tree with manually generated questions, the NER tag of
the answer for a given question is stored with the corresponding wh-tag.
• Only some word(s) are stored.
Example :
“Who is the Father of the Nation? Ans Mahatma Gandhi.”
‘Mahatma Gandhi ‘ on NER tagging :
Mahatma [PERSON]
Gandhi [PERSON]
ANSWER BASE
When the same tag is found in the answer again and again, the
count value is increased accordingly.
The Answer Base
Vocabulary = 4
Vocabulary = 4
PRIORITIZING THE QUESTIONS
It’s possible that there are more than one questions in a single path from
root to a leaf. The system will prioritize the questions according to their
count-depth product:
Normalized count * depth of tail
Example:
Questions Priority
Who is Mahatma Gandhi? (14/747)*3 = 0.056
Who is the father of nation? (21/747)*5 = 0.14
So the 2nd sentence is more likely to be the question that is
generally asked if the given sentence is parsed. Although it
depends upon the train sets’ questions.
SELECTION OF QUESTION
The probability of each type of probable question is calculated using the
following function:
Tag with maximum probability is taken into consideration and that type
of question is generated.
F(sentence) = Max(Probability(Words/Wh-tag))
EXAMPLE: TRIGGERING WH-TYPE QUESTIONS
When Tags Count
2011 DATE DATE = 1
9:30PM TIME TIME = 1
In 2012 IN IN = 2
10th OCT DATE
In summer IN
Who Tags Count
Grace Badell Person Person = 3
John Whiks Person
General Mccllen Person
Sentence: “Sourav was captain of India in the 2003 world cup.”
Chunks: ‘Sourav’ , ‘India’ , ‘in the 2003’
Tags : ‘PERSON’ ‘LOCATION’ ‘IN’
Where Tags Count
Asia LOCATION LOCATION = 3
Plymouth LOCATION IN = 2
In the sea IN
Pacific Ocean LOCATION
In her eyes IN
Probability(Sourav, India, in the 2003/when)
= Prob(When) * Prob (PERSON/when) * Prob (LOCATION/when) * Prob (IN/when)
= (4/13) * (1/(4+3)) * (1/(4+3)) * (2/(4+3))
= 0.30 * 0.14 * 0.14 * 0.28
= 0.0016
Probability(Sourav, India, in the 2003/where)
= Prob(When) * Prob (PERSON/where) * Prob (LOCATION/where) * Prob (IN/where)
= (6/13) * (1/(5+2)) * (3/(5+2)) * (2/(5+2))
= 0.46 * 0.1 * 0.3 * 0.2
= 0.0027
So, the system will generate the ‘Who’ type question.
Probability(Sourav, India, in the 2003/who)
= Prob(Who) * Prob (PERSON/who) * Prob (LOCATION/who) * Prob (IN/who)
= (3/13) * (3/(3+3)) * (1/(3+3)) * (1/(3+3))
= 0.23 * 0.5 * 0.16 * 0.16
= 0.0029
CONTD.
After the training is done , the system will generate questions from sentences by
traversing the question base with the values of the nodes.
Example: “Mahatma Gandhi is the Father of Nation.”
Suppose it tries to generate ‘Who’ question from this, then the steps would be :
Sentence parsing:
Mahatma Gandhi is the Father of Nation.
Chunks: NP-SBJ-1 VP-1 NP-PRD-1 PP NP
Tags: NNP – NNP VBZ DT-NN IN NN
The chunks and the corresponding relations are put into a table where
the keys are the relations and the values are the chunk phrases
CONTD.
Question Generation:
These relation and tag pairs are searched by a Recursive Descent Parser in the
question base. If a path is found with these nodes the corresponding chunks
are appended one after another and the question is generated.
“Who is the father of nation?”
The Chunk Table:
CONTD.
THE FEEDBACK SYSTEM
• Takes the user feedback on the generated questions.
• Updates the count values.
• Updates the question base accordingly.
• Reduces the generation of False Positives.
• Enhances the probability of generation of quality questions.
For reference the image is as:
EVALUATION
Manual
Generation(%)
System
Generation(%)
Perception 100 58.82
Recall 100 91
Perception : % of selected items that are correct
Recall : % of correct items that are selected
We tested the system on a given test dataset and acquire the following results :
SCOPE IN FUTURE
Question Generation is an important function of advanced learning technologies
such as:
• Intelligent tutoring systems
• Inquiry-based environments
• Game-based learning environments
• Psycholinguistics
• Discourse and Dialogue
• Natural Language Generation
• Natural Language Understanding
• Academic purposes to create Practice and Assessment materials
REFERENCES
[1] Liu, Ming, Rafael A. Calvo, and Vasile Rus. "G-Asks: An intelligent automatic question
generation system for academic writing support." Dialogue & Discourse 3.2 (2012): 101-124.
[2] Chen, Wei, and Jack Mostow. "Using Automatic Question Generation to Evaluate Questions
Generated by Children." The 2011 AAAI Fall Symposium on Question Generation. 2011.
[3] Radev, Dragomir, et al. "Probabilistic question answering on the web." Journal of the American
Society for Information Science and Technology 56.6 (2005): 571-583.
[4] Roussinov, Dmitri, and Jose Robles. "Web question answering through automatically learned
patterns." Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. ACM, 2004.
[5] Agarwal, Manish, and Prashanth Mannem. "Automatic gap-fill question generation from text
books." Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational
Applications. Association for Computational Linguistics, 2011.
[6] Skalban, Yvonne, et al. "Automatic Question Generation in Multimedia-Based Learning."
COLING (Posters). 2012.
[7] Becker, Lee, Rodney D. Nielsen, and W. Ward. "What a pilot study says about running a
question generation challenge." Proceedings of the Second Workshop on Question Generation,
Brighton, England, July. 2009.
[8] Xu, Yushi, Anna Goldie, and Stephanie Seneff. "Automatic question generation and answer
judging: a q&a game for language learning." SLaTE. 2009.
[9] Rus, Vasile, and C. Graesser Arthur. "The question generation shared task and evaluation
challenge." The University of Memphis. National Science Foundation. 2009.
[10] Lin, Chin-Yew. "Automatic question generation from queries." Workshop on the Question
Generation Shared Task. 2008.
[11] Ali, Husam, Yllias Chali, and Sadid A. Hasan. "Automation of question generation from
sentences." Proceedings of QG2010: The Third Workshop on Question Generation. 2010.
[12] Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python. "
O'Reilly Media, Inc.", 2009.
CONTD.
Factoid based natural language question generation system
Factoid based natural language question generation system

More Related Content

What's hot

58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
Marius Corici
 

What's hot (20)

58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
Comparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsComparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systems
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
Human Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking ProfilesHuman Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking Profiles
 
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun CompoundsUsing Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
 
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONSEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
 
Keyword extraction and clustering for document recommendation in conversations.
Keyword extraction and clustering for document recommendation in conversations.Keyword extraction and clustering for document recommendation in conversations.
Keyword extraction and clustering for document recommendation in conversations.
 
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
 
L1803058388
L1803058388L1803058388
L1803058388
 
Adhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech taggerAdhyann – a hybrid part of-speech tagger
Adhyann – a hybrid part of-speech tagger
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
 
Research on AITV
Research on AITVResearch on AITV
Research on AITV
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Lac presentation
Lac presentationLac presentation
Lac presentation
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISFEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
 
Introduction to Text Analysis
Introduction to Text AnalysisIntroduction to Text Analysis
Introduction to Text Analysis
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 

Similar to Factoid based natural language question generation system

Week 11 english 145
Week 11 english 145 Week 11 english 145
Week 11 english 145
lisyaseloni
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
Deepak K
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
weiw_oz
 

Similar to Factoid based natural language question generation system (20)

Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
Tools for Tackling Complex Text
Tools for Tackling Complex TextTools for Tackling Complex Text
Tools for Tackling Complex Text
 
Kamal EDCI 736 Final Tutorial
Kamal EDCI 736 Final TutorialKamal EDCI 736 Final Tutorial
Kamal EDCI 736 Final Tutorial
 
Questionnaires and surveys
Questionnaires and surveysQuestionnaires and surveys
Questionnaires and surveys
 
Question Focus Recognition in Question Answering Systems
Question Focus Recognition in Question  Answering Systems Question Focus Recognition in Question  Answering Systems
Question Focus Recognition in Question Answering Systems
 
Techniques For Deep Query Understanding
Techniques For Deep Query UnderstandingTechniques For Deep Query Understanding
Techniques For Deep Query Understanding
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Week 11 english 145
Week 11 english 145 Week 11 english 145
Week 11 english 145
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
 
Vivo Search
Vivo SearchVivo Search
Vivo Search
 
ISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type PredictionISWC 2020 - Semantic Answer Type Prediction
ISWC 2020 - Semantic Answer Type Prediction
 
Nanang Zubaidi - Week 2 - Thesis Statement
Nanang Zubaidi - Week 2 - Thesis StatementNanang Zubaidi - Week 2 - Thesis Statement
Nanang Zubaidi - Week 2 - Thesis Statement
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
 

More from Animesh Shaw

More from Animesh Shaw (7)

WhatsApp Forensic
WhatsApp ForensicWhatsApp Forensic
WhatsApp Forensic
 
Investigating server logs
Investigating server logsInvestigating server logs
Investigating server logs
 
Flash drives
Flash drivesFlash drives
Flash drives
 
Financial Crimes
Financial CrimesFinancial Crimes
Financial Crimes
 
Email investigation
Email investigationEmail investigation
Email investigation
 
Cyber Crime
Cyber CrimeCyber Crime
Cyber Crime
 
Cryptography & Steganography
Cryptography & SteganographyCryptography & Steganography
Cryptography & Steganography
 

Recently uploaded

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 

Recently uploaded (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

Factoid based natural language question generation system

  • 1. A ROBUST FACTOID BASED QUESTION GENERATION SYSTEM PRESENTED BY ANIMESH SHAW ARITRA DAS SHREEPARNA SARKAR
  • 2. CONTENTS • Motivation. • Our Objective. • About Factoid Questions. • Basic Terminology. • Working Procedure. • Rule base Generation. • Question Generation. • Evaluation • Future Scope
  • 3. MOTIVATION Google Speech Recognition Chat bots talking to each other , taken from Cornell Creative Machines Lab
  • 4. Google Translator , currently translating English to Bengali. Cleverbot, A chat bot with a good sense of humor. Taken from http://www.cleverbot.com/ CONTD.
  • 5. OUR OBJECTIVE Generate an efficient Question Generation System To generate factoid questions from some text document or corpus Generate questions from each and every sentence if the sentence has some information else the sentence is discarded For some sentences more than one type of factoid questions is possible thus attempt generating all such possible types. Take user’s opinion or feedback, and improve the result for further use.
  • 6. FACTOID QUESTIONS? Factoid Questions: Type of questions that demand accurate information regarding an entity or an event like person names, locations, organization etc., as opposed to definition questions, opinion questions or complex questions as the why or how questions.
  • 7. BASIC TERMINOLOGY 1. TOKENIZING: Breaking the string into words and Punctuation marks. e.g. - I went home last night. → [‘I’, ‘went’, ‘home’, ‘last’, ‘night’, ‘.’ ] 2. TAGGING: Assigning Parts-of-speech tags to words. e.g. - cat → noun → NN, eat → verb → VB 3. LEMMATIZING: Finding word lemmata (e.g. - was → be). 4. CHUNKING: A feature that allows grouping of words that convey the same thought and then tagging those sets of words. These tags can be like - Verb phrase, Prepositional Phrase, Noun Phrase. e.g. → Bangladesh defeated India in 2007 World Cup
  • 8. CONTD. 5. CHUNKS: ‘Bangladesh’ , ‘defeated’, ‘India’ , ‘in’, ‘2007 World Cup’ 6. RELATION FINDING: Finding relation between the chunks, , sentence subject, object and predicates as: RELATIONS: Bangladesh defeated India in 2007 World Cup NP-SBJ-1 VP-1 NP-OBJ-1 PP-TMP-1 NP-TMP-1
  • 10. WORKING PROCEDURE 1. Taken large train sets of Wh-Questions. 2. Broken the sentence into chunks and parsed. 3. Found relations. The Sentence: “Who became the 16th president of the United States of America in 1861” CHUNKING: ['Who', 'NP-SBJ-1'] ['became', 'VP-1'] ['the 16th president', 'NP-PRD-1'] ['of', 'PP'] ['United States', 'NP'] ['of', 'PP'] ['America', 'NP'] ['in', 'PP'] ['1681', 'NP']
  • 11. Storing the tags in a List ['NP-SBJ-1', 'VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1'] “who” Storing the tags with the corresponding Wh-Type in a list ['Who', ['VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1']] 4. Determined the wh-type , which is determined by the observing the head word of the question. CONTD.
  • 12. RULE BASE GENERATION The Parent Tree: This tree is fed to the system before the training is done. When the system reads a question it determines the Wh-type and traverses to that specific node and starts populating the tree.
  • 13. POPULATING THE RULE-TREE Travelled to the specific wh-node and stored these relations by populating the subsequent nodes of the tree with these chunk relations
  • 14. NORMALIZED COUNT This is used to let the parser know when to print the question and when not while backtracking to other child nodes. It is defined as : Occurrence of every tail node of a question from train set Total number of question of that particular wh-tag The Count is attached to the tail node only. Example: ‘Who doesn’t want to rule the world?’ Nodes: NP-SBJ-1 VP-1,VBZ-VB-TO-VB NP-OBJ -1,14 Here, this type of question structure appears 14 times in the trained set. The tail will have the count value as integer but when the recursive decent parser parses the question base it normalizes the value. This also provides the user with a more probable question among many other questions.
  • 15. RULE-TREE WITH NORMALIZED COUNT While populating, the count of visiting each tail node (the node that holds the last chunk relation) is saved in the corresponding node. A snapshot of the rule base with count value:
  • 16. ANSWER PREPROCESSING AND QUESTION TYPE DECISION SYSTEM • While populating the tree with manually generated questions, the NER tag of the answer for a given question is stored with the corresponding wh-tag. • Only some word(s) are stored. Example : “Who is the Father of the Nation? Ans Mahatma Gandhi.” ‘Mahatma Gandhi ‘ on NER tagging : Mahatma [PERSON] Gandhi [PERSON]
  • 17. ANSWER BASE When the same tag is found in the answer again and again, the count value is increased accordingly. The Answer Base Vocabulary = 4 Vocabulary = 4
  • 18.
  • 19. PRIORITIZING THE QUESTIONS It’s possible that there are more than one questions in a single path from root to a leaf. The system will prioritize the questions according to their count-depth product: Normalized count * depth of tail Example: Questions Priority Who is Mahatma Gandhi? (14/747)*3 = 0.056 Who is the father of nation? (21/747)*5 = 0.14 So the 2nd sentence is more likely to be the question that is generally asked if the given sentence is parsed. Although it depends upon the train sets’ questions.
  • 20. SELECTION OF QUESTION The probability of each type of probable question is calculated using the following function: Tag with maximum probability is taken into consideration and that type of question is generated. F(sentence) = Max(Probability(Words/Wh-tag))
  • 21. EXAMPLE: TRIGGERING WH-TYPE QUESTIONS When Tags Count 2011 DATE DATE = 1 9:30PM TIME TIME = 1 In 2012 IN IN = 2 10th OCT DATE In summer IN Who Tags Count Grace Badell Person Person = 3 John Whiks Person General Mccllen Person Sentence: “Sourav was captain of India in the 2003 world cup.” Chunks: ‘Sourav’ , ‘India’ , ‘in the 2003’ Tags : ‘PERSON’ ‘LOCATION’ ‘IN’ Where Tags Count Asia LOCATION LOCATION = 3 Plymouth LOCATION IN = 2 In the sea IN Pacific Ocean LOCATION In her eyes IN
  • 22. Probability(Sourav, India, in the 2003/when) = Prob(When) * Prob (PERSON/when) * Prob (LOCATION/when) * Prob (IN/when) = (4/13) * (1/(4+3)) * (1/(4+3)) * (2/(4+3)) = 0.30 * 0.14 * 0.14 * 0.28 = 0.0016 Probability(Sourav, India, in the 2003/where) = Prob(When) * Prob (PERSON/where) * Prob (LOCATION/where) * Prob (IN/where) = (6/13) * (1/(5+2)) * (3/(5+2)) * (2/(5+2)) = 0.46 * 0.1 * 0.3 * 0.2 = 0.0027 So, the system will generate the ‘Who’ type question. Probability(Sourav, India, in the 2003/who) = Prob(Who) * Prob (PERSON/who) * Prob (LOCATION/who) * Prob (IN/who) = (3/13) * (3/(3+3)) * (1/(3+3)) * (1/(3+3)) = 0.23 * 0.5 * 0.16 * 0.16 = 0.0029 CONTD.
  • 23. After the training is done , the system will generate questions from sentences by traversing the question base with the values of the nodes. Example: “Mahatma Gandhi is the Father of Nation.” Suppose it tries to generate ‘Who’ question from this, then the steps would be : Sentence parsing: Mahatma Gandhi is the Father of Nation. Chunks: NP-SBJ-1 VP-1 NP-PRD-1 PP NP Tags: NNP – NNP VBZ DT-NN IN NN The chunks and the corresponding relations are put into a table where the keys are the relations and the values are the chunk phrases CONTD.
  • 24. Question Generation: These relation and tag pairs are searched by a Recursive Descent Parser in the question base. If a path is found with these nodes the corresponding chunks are appended one after another and the question is generated. “Who is the father of nation?” The Chunk Table: CONTD.
  • 25. THE FEEDBACK SYSTEM • Takes the user feedback on the generated questions. • Updates the count values. • Updates the question base accordingly. • Reduces the generation of False Positives. • Enhances the probability of generation of quality questions. For reference the image is as:
  • 26. EVALUATION Manual Generation(%) System Generation(%) Perception 100 58.82 Recall 100 91 Perception : % of selected items that are correct Recall : % of correct items that are selected We tested the system on a given test dataset and acquire the following results :
  • 27. SCOPE IN FUTURE Question Generation is an important function of advanced learning technologies such as: • Intelligent tutoring systems • Inquiry-based environments • Game-based learning environments • Psycholinguistics • Discourse and Dialogue • Natural Language Generation • Natural Language Understanding • Academic purposes to create Practice and Assessment materials
  • 28. REFERENCES [1] Liu, Ming, Rafael A. Calvo, and Vasile Rus. "G-Asks: An intelligent automatic question generation system for academic writing support." Dialogue & Discourse 3.2 (2012): 101-124. [2] Chen, Wei, and Jack Mostow. "Using Automatic Question Generation to Evaluate Questions Generated by Children." The 2011 AAAI Fall Symposium on Question Generation. 2011. [3] Radev, Dragomir, et al. "Probabilistic question answering on the web." Journal of the American Society for Information Science and Technology 56.6 (2005): 571-583. [4] Roussinov, Dmitri, and Jose Robles. "Web question answering through automatically learned patterns." Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. ACM, 2004. [5] Agarwal, Manish, and Prashanth Mannem. "Automatic gap-fill question generation from text books." Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, 2011. [6] Skalban, Yvonne, et al. "Automatic Question Generation in Multimedia-Based Learning." COLING (Posters). 2012. [7] Becker, Lee, Rodney D. Nielsen, and W. Ward. "What a pilot study says about running a question generation challenge." Proceedings of the Second Workshop on Question Generation, Brighton, England, July. 2009.
  • 29. [8] Xu, Yushi, Anna Goldie, and Stephanie Seneff. "Automatic question generation and answer judging: a q&a game for language learning." SLaTE. 2009. [9] Rus, Vasile, and C. Graesser Arthur. "The question generation shared task and evaluation challenge." The University of Memphis. National Science Foundation. 2009. [10] Lin, Chin-Yew. "Automatic question generation from queries." Workshop on the Question Generation Shared Task. 2008. [11] Ali, Husam, Yllias Chali, and Sadid A. Hasan. "Automation of question generation from sentences." Proceedings of QG2010: The Third Workshop on Question Generation. 2010. [12] Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python. " O'Reilly Media, Inc.", 2009. CONTD.