Active Learning on Question Answering with Dialogues

Active Learning on Question
Answering with Dialogues
Shen Gao

Content
● Question Answering
● Data Collection
● Active Learning
● User Interaction
● System Architecture
● Results & Future Work
● “Question Answering”

Question Answering
Question Answering is a Computer Science discipline focuses on building
automated systems which are able to answer questions from human in
natural language.

Question Answering
Model
Passage
Question
Answer

Question Answering Data Sets
Text Source QA Source
Quasar-T Search Engine (Google / Bing) Trivia
Search QA Search Engine (Google / Bing) Jeopardy!
SQuAD Wikipedia Articles Annotation

Why Dialogue?
● Natural
● Machine User Interaction
● Availability
○ Transcripts
○ Texting
● Little previous work
Source: Statistics Brain

Question Answering in Dialogue
● TV Series Friends
● 10 Seasons
● 236 Episodes
● 3000 + Scenes
● Datasets from Character Mining
○ JSON formatted data
○ Tokenized
○ Season - Episode - Scene - Utterance
○ Plots available for 44% scenes

Classiﬁcation on Question Types
● Based on type of answer:
○ Categorical (Multichoice) - Binary (Polar)
○ Continuous (Span of text)
● Based on Inference
○ Explicit
○ Implicit
● Based on answerability (newly introduced in sQuAD):
○ Unanswerable
○ Answerable

Explicit Questions
Q1: Does the job interview includes cooking a salad?

Implicit Questions
Q2 Is the interviewer picky?

Explicit vs Implicit
● The contextual similarity between question and answer
● The amount of inference needed to resolve
● Q1: Explicit; Abundant Similarity; Little Inference
● Q2: Implicit; Little / None Similarity; Substantial Inference

Annotation
● Annotation Phases:
○ Experimental Phase - Small Data Chunk
○ Production Phase - All Data
● Tasks per phase:
○ Question & Answer Generation
○ Veriﬁcation - Inter-Annotator Agreement
Experimental
Revision
On
Template
Stable
High
IAT
Production

Challenges on Annotation
● Ambiguous Pronouns:
○ Example: In a scene having Chandler and Joey: Is he excited about the date?
● Exact wording from the original text
● Low Agreement measured
● Attempted Resolution:
○ Update instructions
○ Integrate Plots in Scene
○ Reduce the number of Questions

Evaluation Metric
● Binary Questions - Exact Match (EM)
● Continuous Questions - F1 Score

Results from Annotation
● Second Round:
○ Added Plot
○ Updated Instructions
● Third Round: Dropped # of Questions
● Random guess would give 50%!
● Cannot obtain high quality data

Change in Path
Dialogue QA
Continuous
Binary
Annotation Model Dev Analysis
Annotation Model Dev Analysis
Active
Learning
System
Dev
Online
Production
Analysis

Active Learning
● Active Learning is a sub-branch of Machine Learning in which the learning
system will interactively query the user to obtain the desired data from user.
● The goal of our system is to:
○ Collect data for model needed for improvement
○ Improve the model by applying these data
● What we offer:
○ Answer queries from user
○ Learn from user
● What user provide:
○ Annotation on the data

Baseline Model
● BERT (Bidirectional Encoder
Representations from Transformers)
from Google AI
● Contextual vs Context Free
○ Bank account
○ River Bank
Pre-train
Network
Contextual
Representation
Downstream
Model
Output

Baseline Model
● Unprecedented results in sQuAD
● Power of Bidirectional Flow
○ Versus Left->Right; Right->Left
○ Allows learning a word from all
of its context
● Masked training

User Interaction - Post Question

User Interaction - Receive Answer

User Interaction - Correct Answer

User Guidance
● Which Scene the user needs to work on
○ Ensure all scenes are evenly annotated
● Which Type of question the user needs to work on
○ Type we have least data on
○ Type the model performed worst
● User Experience: Too Monotonous?

User Guidance
● Scene Selection
● Randomly select from least
annotated
● Type Selection
● Use Probability Function to
Control randomly Select

User Guidance
● Constant c is used to linearly scale the probabilities
● Describes the degree of discrepancy between question types

User Guidance
● Train - Train the model
● Dev - Obtain stat for guidance
● Test - Evaluate Performance
● Test Statistics never shown to system

Tech Stack Overview
● Front-End: HTML, Javascript, JQuery,CSS
● Back-End: Django backend Framework (Routing, Request Parsing, ORM), python
● Database: mySQL Database
● Machine Learning Service: Tensorﬂow
● Deployment: AWS EC2 instance

Model View Controller (MVC)
● View: User Interface
● Controller: Logic
● Model: Data Storage

Controller get-scene
scene, type
post-question
answer
post-correction
● REST API
● Unauthenticated
● GET get-scene
● POST post-question
● POST post-answer

Controller - Security
● Server needs to know which question
user is changing
● Dummy id could create loophole
● Allow malicious user to change the
response from others
● Session is anonymous, unauthenticated
post-correction:
question-id: 1
question-id:
3/26-s1-e1-c1-1
post-correction:
question-id: 1
question-id:
3/26-s1-e1-c1-1

Controller - Security
● Solution - Hashing + Salt
● Password should not be stored in plain text
● Salt mitigates brute-force attack
● Hash also prevents secret disclosure:
○ Prevents user from know how we compute the
hash
● The hash itself is returned to user

Django Object-Relational Mapping (ORM)
● Mapping Between Database Language and Programming Language
● SQL <-> Python
● Apply structural changes to Database
● Query Database in Programming Language
● Widely used in industry & Reduce Error

Optimization on DB
● Indexing on ﬁelds need query
○ hash in User Response
○ count in Scene
● Delay in Database writes:
Receive
Request
Handle
Request
Return
Response
Database
IO

Concurrency on DB
● Two users could work on the same
question type / scene
● Increment the count at the same time
● Pessimistic Row-Level Locking
○ Must acquire lock before write
○ Prevents dirty write

BERT Service
● Performance
○ Reduce Overhead
● Concurrency
○ Modularize into workers
○ Synchronize
● Update

BERT Service - Predict
● Workers
○ Dedicated Model
○ Dedicated Local Space for compute
● Worker Array - Size N
● Mutex Array - Size N
● Semaphore - N available
● Acquire Semaphore ﬁrst
● Then acquire mutex
● Exception Handling ensure no deadlock
W W W W W
Semaphore
M M M MM

BERT Service - Train
● Query DB for new responses
● Check batch size
● Train with batch
● Populate new worker array
● Change pointers
BERT Service
W W W W W
W W W W W

Snapshot
● Keep track of model progress
● Cron Jobs
● Use the latest worker to test against
○ dev dataset
○ test dataset
● Record:
○ Respective performance
○ Counts
○ User-Model F1

Production
● Advertised through email to students in the department
● Collected data for 7 days
● Will continue online in future

Result - System Performance
● Measured by average of 100
requests
● Predict interface measured by 100
randomly selected scenes with test
questions
● Performance in deployment
environment

Results - Data Collection
● Collected 151 responses
● Concentrated on weak types (72.18% vs 50.64%)
● No evaluation improvement yet
● 1.76% of training data

Result - User-Model F1
● Model cannot learn from its own
prediction
● Denotes reverse of similarity
between model response and user
input

Future Work
● Funding
● Current Major Limitation: Responses
● More advertising through:
○ Community of NLP
○ Community of Friends

Active Learning on Question Answering with Dialogues

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Active Learning on Question Answering with Dialogues

Semelhante a Active Learning on Question Answering with Dialogues (20)

Mais de Jinho Choi

Mais de Jinho Choi (20)

Último

Último (20)

Active Learning on Question Answering with Dialogues