SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Active Learning on Question
Answering with Dialogues
Shen Gao
Content
● Question Answering
● Data Collection
● Active Learning
● User Interaction
● System Architecture
● Results & Future Work
● “Question Answering”
Question Answering
Question Answering is a Computer Science discipline focuses on building
automated systems which are able to answer questions from human in
natural language.
Question Answering
Model
Passage
Question
Answer
Question Answering Data Sets
Text Source QA Source
Quasar-T Search Engine (Google / Bing) Trivia
Search QA Search Engine (Google / Bing) Jeopardy!
SQuAD Wikipedia Articles Annotation
Why Dialogue?
● Natural
● Machine User Interaction
● Availability
○ Transcripts
○ Texting
● Little previous work
Source: Statistics Brain
Question Answering in Dialogue
● TV Series Friends
● 10 Seasons
● 236 Episodes
● 3000 + Scenes
● Datasets from Character Mining
○ JSON formatted data
○ Tokenized
○ Season - Episode - Scene - Utterance
○ Plots available for 44% scenes
Classification on Question Types
● Based on type of answer:
○ Categorical (Multichoice) - Binary (Polar)
○ Continuous (Span of text)
● Based on Inference
○ Explicit
○ Implicit
● Based on answerability (newly introduced in sQuAD):
○ Unanswerable
○ Answerable
Explicit Questions
Q1: Does the job interview includes cooking a salad?
Implicit Questions
Q2 Is the interviewer picky?
Explicit vs Implicit
● The contextual similarity between question and answer
● The amount of inference needed to resolve
● Q1: Explicit; Abundant Similarity; Little Inference
● Q2: Implicit; Little / None Similarity; Substantial Inference
Annotation Tool
Annotation
● Annotation Phases:
○ Experimental Phase - Small Data Chunk
○ Production Phase - All Data
● Tasks per phase:
○ Question & Answer Generation
○ Verification - Inter-Annotator Agreement
Experimental
Revision
On
Template
Stable
High
IAT
Production
Challenges on Annotation
● Ambiguous Pronouns:
○ Example: In a scene having Chandler and Joey: Is he excited about the date?
● Exact wording from the original text
● Low Agreement measured
● Attempted Resolution:
○ Update instructions
○ Integrate Plots in Scene
○ Reduce the number of Questions
Evaluation Metric
● Binary Questions - Exact Match (EM)
● Continuous Questions - F1 Score
Results from Annotation
● Second Round:
○ Added Plot
○ Updated Instructions
● Third Round: Dropped # of Questions
● Random guess would give 50%!
● Cannot obtain high quality data
Change in Path
Dialogue QA
Continuous
Binary
Annotation Model Dev Analysis
Annotation Model Dev Analysis
Active
Learning
System
Dev
Online
Production
Analysis
Active Learning
● Active Learning is a sub-branch of Machine Learning in which the learning
system will interactively query the user to obtain the desired data from user.
● The goal of our system is to:
○ Collect data for model needed for improvement
○ Improve the model by applying these data
● What we offer:
○ Answer queries from user
○ Learn from user
● What user provide:
○ Annotation on the data
Baseline Model
● BERT (Bidirectional Encoder
Representations from Transformers)
from Google AI
● Contextual vs Context Free
○ Bank account
○ River Bank
Pre-train
Network
Contextual
Representation
Downstream
Model
Output
Baseline Model
● Unprecedented results in sQuAD
● Power of Bidirectional Flow
○ Versus Left->Right; Right->Left
○ Allows learning a word from all
of its context
● Masked training
User Interaction - Tutorial
User Interaction - Post Question
User Interaction - Receive Answer
User Interaction - Correct Answer
User Guidance
● Which Scene the user needs to work on
○ Ensure all scenes are evenly annotated
● Which Type of question the user needs to work on
○ Type we have least data on
○ Type the model performed worst
● User Experience: Too Monotonous?
User Guidance
● Scene Selection
● Randomly select from least
annotated
● Type Selection
● Use Probability Function to
Control randomly Select
User Guidance
● Constant c is used to linearly scale the probabilities
● Describes the degree of discrepancy between question types
User Guidance
● Train - Train the model
● Dev - Obtain stat for guidance
● Test - Evaluate Performance
● Test Statistics never shown to system
Tech Stack Overview
● Front-End: HTML, Javascript, JQuery,CSS
● Back-End: Django backend Framework (Routing, Request Parsing, ORM), python
● Database: mySQL Database
● Machine Learning Service: Tensorflow
● Deployment: AWS EC2 instance
Model View Controller (MVC)
● View: User Interface
● Controller: Logic
● Model: Data Storage
Controller get-scene
scene, type
post-question
answer
post-correction
● REST API
● Unauthenticated
● GET get-scene
● POST post-question
● POST post-answer
Controller - Security
● Server needs to know which question
user is changing
● Dummy id could create loophole
● Allow malicious user to change the
response from others
● Session is anonymous, unauthenticated
post-correction:
question-id: 1
question-id:
3/26-s1-e1-c1-1
post-correction:
question-id: 1
question-id:
3/26-s1-e1-c1-1
Controller - Security
● Solution - Hashing + Salt
● Password should not be stored in plain text
● Salt mitigates brute-force attack
● Hash also prevents secret disclosure:
○ Prevents user from know how we compute the
hash
● The hash itself is returned to user
Django Object-Relational Mapping (ORM)
● Mapping Between Database Language and Programming Language
● SQL <-> Python
● Apply structural changes to Database
● Query Database in Programming Language
● Widely used in industry & Reduce Error
Database Schema
Optimization on DB
● Indexing on fields need query
○ hash in User Response
○ count in Scene
● Delay in Database writes:
Receive
Request
Handle
Request
Return
Response
Database
IO
Concurrency on DB
● Two users could work on the same
question type / scene
● Increment the count at the same time
● Pessimistic Row-Level Locking
○ Must acquire lock before write
○ Prevents dirty write
BERT Service
● Performance
○ Reduce Overhead
● Concurrency
○ Modularize into workers
○ Synchronize
● Update
BERT Service - Predict
● Workers
○ Dedicated Model
○ Dedicated Local Space for compute
● Worker Array - Size N
● Mutex Array - Size N
● Semaphore - N available
● Acquire Semaphore first
● Then acquire mutex
● Exception Handling ensure no deadlock
W W W W W
Semaphore
M M M MM
BERT Service - Train
● Query DB for new responses
● Check batch size
● Train with batch
● Populate new worker array
● Change pointers
BERT Service
W W W W W
W W W W W
Snapshot
● Keep track of model progress
● Cron Jobs
● Use the latest worker to test against
○ dev dataset
○ test dataset
● Record:
○ Respective performance
○ Counts
○ User-Model F1
Production
● Advertised through email to students in the department
● Collected data for 7 days
● Will continue online in future
Result - System Performance
● Measured by average of 100
requests
● Predict interface measured by 100
randomly selected scenes with test
questions
● Performance in deployment
environment
Results - Data Collection
● Collected 151 responses
● Concentrated on weak types (72.18% vs 50.64%)
● No evaluation improvement yet
● 1.76% of training data
Result - User-Model F1
● Model cannot learn from its own
prediction
● Denotes reverse of similarity
between model response and user
input
Future Work
● Funding
● Current Major Limitation: Responses
● More advertising through:
○ Community of NLP
○ Community of Friends
“Question Answering”

Mais conteúdo relacionado

Semelhante a Active Learning on Question Answering with Dialogues

Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
cnvrg.io AI OS - Hands-on ML Workshops
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataWorks Summit
 

Semelhante a Active Learning on Question Answering with Dialogues (20)

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
 
Overcome the Reign of Chaos
Overcome the Reign of ChaosOvercome the Reign of Chaos
Overcome the Reign of Chaos
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
SKLearn Workshop.pptx
SKLearn Workshop.pptxSKLearn Workshop.pptx
SKLearn Workshop.pptx
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
 
MongoDB Online Training.pdf
MongoDB Online Training.pdfMongoDB Online Training.pdf
MongoDB Online Training.pdf
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
Testing Tools Online Training.pdf
Testing Tools Online Training.pdfTesting Tools Online Training.pdf
Testing Tools Online Training.pdf
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 

Mais de Jinho Choi

Mais de Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Active Learning on Question Answering with Dialogues

  • 1. Active Learning on Question Answering with Dialogues Shen Gao
  • 2. Content ● Question Answering ● Data Collection ● Active Learning ● User Interaction ● System Architecture ● Results & Future Work ● “Question Answering”
  • 3. Question Answering Question Answering is a Computer Science discipline focuses on building automated systems which are able to answer questions from human in natural language.
  • 5. Question Answering Data Sets Text Source QA Source Quasar-T Search Engine (Google / Bing) Trivia Search QA Search Engine (Google / Bing) Jeopardy! SQuAD Wikipedia Articles Annotation
  • 6. Why Dialogue? ● Natural ● Machine User Interaction ● Availability ○ Transcripts ○ Texting ● Little previous work Source: Statistics Brain
  • 7. Question Answering in Dialogue ● TV Series Friends ● 10 Seasons ● 236 Episodes ● 3000 + Scenes ● Datasets from Character Mining ○ JSON formatted data ○ Tokenized ○ Season - Episode - Scene - Utterance ○ Plots available for 44% scenes
  • 8. Classification on Question Types ● Based on type of answer: ○ Categorical (Multichoice) - Binary (Polar) ○ Continuous (Span of text) ● Based on Inference ○ Explicit ○ Implicit ● Based on answerability (newly introduced in sQuAD): ○ Unanswerable ○ Answerable
  • 9. Explicit Questions Q1: Does the job interview includes cooking a salad?
  • 10. Implicit Questions Q2 Is the interviewer picky?
  • 11. Explicit vs Implicit ● The contextual similarity between question and answer ● The amount of inference needed to resolve ● Q1: Explicit; Abundant Similarity; Little Inference ● Q2: Implicit; Little / None Similarity; Substantial Inference
  • 13. Annotation ● Annotation Phases: ○ Experimental Phase - Small Data Chunk ○ Production Phase - All Data ● Tasks per phase: ○ Question & Answer Generation ○ Verification - Inter-Annotator Agreement Experimental Revision On Template Stable High IAT Production
  • 14. Challenges on Annotation ● Ambiguous Pronouns: ○ Example: In a scene having Chandler and Joey: Is he excited about the date? ● Exact wording from the original text ● Low Agreement measured ● Attempted Resolution: ○ Update instructions ○ Integrate Plots in Scene ○ Reduce the number of Questions
  • 15. Evaluation Metric ● Binary Questions - Exact Match (EM) ● Continuous Questions - F1 Score
  • 16. Results from Annotation ● Second Round: ○ Added Plot ○ Updated Instructions ● Third Round: Dropped # of Questions ● Random guess would give 50%! ● Cannot obtain high quality data
  • 17. Change in Path Dialogue QA Continuous Binary Annotation Model Dev Analysis Annotation Model Dev Analysis Active Learning System Dev Online Production Analysis
  • 18. Active Learning ● Active Learning is a sub-branch of Machine Learning in which the learning system will interactively query the user to obtain the desired data from user. ● The goal of our system is to: ○ Collect data for model needed for improvement ○ Improve the model by applying these data ● What we offer: ○ Answer queries from user ○ Learn from user ● What user provide: ○ Annotation on the data
  • 19. Baseline Model ● BERT (Bidirectional Encoder Representations from Transformers) from Google AI ● Contextual vs Context Free ○ Bank account ○ River Bank Pre-train Network Contextual Representation Downstream Model Output
  • 20. Baseline Model ● Unprecedented results in sQuAD ● Power of Bidirectional Flow ○ Versus Left->Right; Right->Left ○ Allows learning a word from all of its context ● Masked training
  • 21. User Interaction - Tutorial
  • 22. User Interaction - Post Question
  • 23. User Interaction - Receive Answer
  • 24. User Interaction - Correct Answer
  • 25. User Guidance ● Which Scene the user needs to work on ○ Ensure all scenes are evenly annotated ● Which Type of question the user needs to work on ○ Type we have least data on ○ Type the model performed worst ● User Experience: Too Monotonous?
  • 26. User Guidance ● Scene Selection ● Randomly select from least annotated ● Type Selection ● Use Probability Function to Control randomly Select
  • 27. User Guidance ● Constant c is used to linearly scale the probabilities ● Describes the degree of discrepancy between question types
  • 28. User Guidance ● Train - Train the model ● Dev - Obtain stat for guidance ● Test - Evaluate Performance ● Test Statistics never shown to system
  • 29. Tech Stack Overview ● Front-End: HTML, Javascript, JQuery,CSS ● Back-End: Django backend Framework (Routing, Request Parsing, ORM), python ● Database: mySQL Database ● Machine Learning Service: Tensorflow ● Deployment: AWS EC2 instance
  • 30. Model View Controller (MVC) ● View: User Interface ● Controller: Logic ● Model: Data Storage
  • 31. Controller get-scene scene, type post-question answer post-correction ● REST API ● Unauthenticated ● GET get-scene ● POST post-question ● POST post-answer
  • 32. Controller - Security ● Server needs to know which question user is changing ● Dummy id could create loophole ● Allow malicious user to change the response from others ● Session is anonymous, unauthenticated post-correction: question-id: 1 question-id: 3/26-s1-e1-c1-1 post-correction: question-id: 1 question-id: 3/26-s1-e1-c1-1
  • 33. Controller - Security ● Solution - Hashing + Salt ● Password should not be stored in plain text ● Salt mitigates brute-force attack ● Hash also prevents secret disclosure: ○ Prevents user from know how we compute the hash ● The hash itself is returned to user
  • 34. Django Object-Relational Mapping (ORM) ● Mapping Between Database Language and Programming Language ● SQL <-> Python ● Apply structural changes to Database ● Query Database in Programming Language ● Widely used in industry & Reduce Error
  • 36. Optimization on DB ● Indexing on fields need query ○ hash in User Response ○ count in Scene ● Delay in Database writes: Receive Request Handle Request Return Response Database IO
  • 37. Concurrency on DB ● Two users could work on the same question type / scene ● Increment the count at the same time ● Pessimistic Row-Level Locking ○ Must acquire lock before write ○ Prevents dirty write
  • 38. BERT Service ● Performance ○ Reduce Overhead ● Concurrency ○ Modularize into workers ○ Synchronize ● Update
  • 39. BERT Service - Predict ● Workers ○ Dedicated Model ○ Dedicated Local Space for compute ● Worker Array - Size N ● Mutex Array - Size N ● Semaphore - N available ● Acquire Semaphore first ● Then acquire mutex ● Exception Handling ensure no deadlock W W W W W Semaphore M M M MM
  • 40. BERT Service - Train ● Query DB for new responses ● Check batch size ● Train with batch ● Populate new worker array ● Change pointers BERT Service W W W W W W W W W W
  • 41. Snapshot ● Keep track of model progress ● Cron Jobs ● Use the latest worker to test against ○ dev dataset ○ test dataset ● Record: ○ Respective performance ○ Counts ○ User-Model F1
  • 42. Production ● Advertised through email to students in the department ● Collected data for 7 days ● Will continue online in future
  • 43. Result - System Performance ● Measured by average of 100 requests ● Predict interface measured by 100 randomly selected scenes with test questions ● Performance in deployment environment
  • 44. Results - Data Collection ● Collected 151 responses ● Concentrated on weak types (72.18% vs 50.64%) ● No evaluation improvement yet ● 1.76% of training data
  • 45. Result - User-Model F1 ● Model cannot learn from its own prediction ● Denotes reverse of similarity between model response and user input
  • 46. Future Work ● Funding ● Current Major Limitation: Responses ● More advertising through: ○ Community of NLP ○ Community of Friends