SlideShare uma empresa Scribd logo
1 de 200
Baixar para ler offline
End-to-end goal-oriented question
answering systems
LinkedIn
​Deepak Agarwal ​Bee-Chung Chen ​Qi He ​Jaewon Yang ​Liang Zhang
Tutorial’s
agenda
8:00 Introduction
8:15 End-to-End Workflow
8:25 Basic version: Single-Turn Question Answering
9:15 Advanced version: Multi-Turn Question Answering
9:55 break
10:15 Overview of Our Approach at LinkedIn
10:30 Analytics Bot
11:00 Company Assistant
11:15 LinkedIn Help Center
11:30 Conclusion and Q & A
Introduction
Deepak Agarwal
Traditional Web Search
●
●
●
●
●
●
Modern Web Search with Question Answering
Modern Web Search with Question Answering
Knowledge Card
Question-Answering Systems
●
●
○
Who founded
Linkedin?
Reid Hoffman,
Allen Blue, ...
Question-Answering Systems
●
●
○
○
Where can I have
japanese food
in the downtown?
Tomi Sushi is a
nice japanese
restaurant
Question-Answering Systems
●
●
○
○
○
Please help me
make a reservation
at Tomi Sushi.
Great. When and
how many people?
Question-Answering Systems
●
●
○
○
○
○
○
I lost my password.
Can you help me?
Sure. What is
your account id?
Types of Goal-Oriented QA Systems
●
○
■
Types of Goal-Oriented QA Systems
●
○
■
■
Types of Goal-Oriented QA Systems
●
○
■
■
●
○
Who were the
founders of
Linkedin?
Reid Hoffman,
Allen Blue, ...
Knowledge
DB
DB Query: (_, Founder,
LinkedIn)
How Question Answering Can Help LinkedIn
200.
Knowledge
DB
How many Java
Developers joined
LinkedIn last
month?
Recruiter Insight
This AI software
engineer job at LinkedIn
is the best for you.
Job Seeker Insight
Good jobs for me
based on my profile?
Why This Tutorial? Why Now?
●
○
○
■
● …
○
●
○
○
Outline of This Tutorial
●
○
○
●
○
○
○
●
○
○
●
Part I
Methodology Review
End-to-End Workflow of
Question Answering Systems
Bee-Chung Chen
Modular Approach: Overview
Where can I have
japanese food
in the downtown?
Natural Language
Understanding
(NLU)
Dialogue State
Tracking
(DST)
Action Generation
(Dialogue Policy)
Natural Language
Generation
(NLG)
KB, DB, Index
Tomi Sushi is a
nice japanese
restaurant
Modular Approach: Natural Language Understanding
Natural Language
Understanding
(NLU)
Input: User Utterance
Where can I have
japanese food
in the downtown?
(Speech-to-Text)
Output: Interpretation
Intent: Find_Restaurant
Type = Japanese
Area = Downtown
KB, DB, Index
Restaurant types: Japanese, Chinese, …
Location:
Country: USA
State: CA
City: San Jose
Area: Downtown
Modular Approach: Dialogue State Tracking
Dialogue State
Tracking
(DST)
Input:
Current Interpretation
Intent: Find_Restaurant
Type = Japanese
Area = Downtown
Past State
Intent: Find_Flight
FromCity = New York
FromState = NY
FromCountry = US
ToCity = San Jose
ToState = CA
ToCountry = US
Output: State
Intent: Find_Restaurant
Type = Japanese
Area = Downtown
City = San Jose
State = CA
Country = US
KB, DB, Index
Restaurant types: Japanese, Chinese, …
Location:
Country: USA
State: CA
City: San Jose
Area: Downtown
Or, an embedding vector
Modular Approach: Action Generation (Dialogue Policy)
Action Generation
(Dialogue Policy)
Output: Action
Action: Suggest_Restaurant
Type = Japanese
Name = Tomi Sushi
KB, DB, Index
Restaurant Search API
Input: (Type, Location)
Output: A list of restaurants ranked by their ratings
Input: State
Intent: Find_Restaurant
Type = Japanese
Area = Downtown
City = San Jose
State = CA
Country = US
Modular Approach: Natural Language Generation
Natural Language
Generation
(NLG)
Output: System Utterance
Tomi Sushi is a nice
Japanese restaurant
(Text-to-Speech)
Or, other UI elements
KB, DB, Index
Input: Action
Action: Suggest_Restaurant
Type = Japanese
Name = Tomi Sushi
End-to-End Learning Approach
Where can I have
japanese food
in the downtown?
Sequence-to-
Sequence Model
with
Memory KB, DB, Index
Tomi Sushi is a
nice japanese
restaurant
Question Answering System
Modular approach: Practical
End-to-end learning approach: Research
Basic version:
Single-turn question answering
NLU:
Main Focus
No DST
Action Generation:
Rule-Based
NLG:
Template-Based
KB, DB, Index
Question Answering System
Modular approach: Practical
End-to-end learning approach: Research
Advanced version:
Multi-turn question answering
NLU:
Neural Net
DST:
Neural Net
Dialogue Policy:
Neural Net
NLG:
Neural Net
KB, DB, Index
Basic version: Single-Turn
Question Answering
Qi He
Single-Turn Question Answering Systems
NLU:
Main Focus
No DST
Dialogue Policy:
Rule-Based
NLG:
Template-Based
KB, DB, Index
Pipelined or Joint Learning
Logic
Form
KB, DB
executerule/template-based
action generation
Sequence to Sequence
Domain Detection Intent Detection Slot Filling
ASR
Text
Audio
Text
Direct Search
Definition
Domain/Intent Detection
Domain/Intent detection is a semantic text classification problem.
domain/intent detection
Select Flight
From Airline_Travel_Table
...
fill in arguments
semantic frame/template
vs. Sentence/Query Classification
Classic Sentence
Classification
Query Classification in
Search
Domain/Intent Detection
(Text Classification) in Q&A
Input Written language sentence Keywords Spoken language sentence with
significant utterance variations
Training
data
Rich (News articles,
Reviews, Tweets, TREC)
Rich (Click-through) Few (Human labels)
State-of-the
-arts
[Kalchbrenner et al., ACL
2014]
[Kim, EMNLP 2014]
CNN
[Shen et al., CIKM 2014]
[Palangi et al., TASLP 2016]
CLSM, LSTM-DSSM
[Tur et al., ICASSP 2012]
[Ravuri and Stolcke,
Interspeech 2015]
DCN, RNN
Domain/Intent Detection
Challenges
• 2 questions with different intents
- “How was the Mexican
restaurant” - “Tell me about
Mexican restaurants”
• Q1: Can we automatically
generate contextual features for
entities?
Temporal scope of entity Utterance Variations
• 2 questions with the same intent
- “Show me weekend flights
between JFK and SFO’
- “I want to fly from San
Francisco to New York next
Sunday”
• Q2: Can they generate the
same answer?
• Q3: Which question will
generate the better answer?
• Significant unknown words,
unknown syntactic structures for
the same semantic
• Q4: Can we efficiently expand
the training data?
Lack of training data
Domain/Intent Detection
Deep Neural Networks
Paraphrase
Active Q&A
Character-level Modeling
Paraphrase, Bots Simulator
Domain-independent Grammar
• Q1: Can we automatically
generate contextual features
for entities?
Deep Neural
Networks
CNN
Contextual Features
[Kalchbrenner et al., ACL 2014, Kim, EMNLP 2014, Shen et al., CIKM 2014]
● 1-d convolution → k-d, 1 CNN layer → multiple CNN layers
● multiple filters: capture various lengths of local contexts for each word, n-gram features
● max pooling → k-Max pooling: retain salient features from a few keywords in global feature vector
CNN in Text Classification
[Kalchbrenner et al., ACL 2014, Kim, EMNLP 2014, Shen et al., CIKM 2014]
● TREC question classification:
CNNs are close to “SVM with
careful feature engineerings”
● Large window width: long-term
dependency
● k-Max pooling maintains relative
positions of most relevant
n-grams
● Web query:
○ DSSM < C-DSSM (CLSM)
○ Short text: CNN is slightly
better than unigram model
Learned 3-gram features:
keywords win at 5 active neurons in max pooling:
Contextual Features
RNN/LSTM
The clouds are in the sky
I grew up in France … I speak fluent French
RNN
LSTM
github: 2015-08-Understanding-LSTMs
Contextual Features
LSTM
github: 2015-08-Understanding-LSTMs
Contextual Features
LSTM in Text Classification
[Palangi et al., TASLP 2016]
● Memory: become richer (more
info)
● Input gates: do not update
words 3, 7, 9
● Peephole, forget updates are
not too helpful when text is short
and memory is initialized as 0
(just do not update)
● Web query:
○ DSSM < C-DSSM (CLSM)
< LSTM-DSSM
Case: match “hotels in shanghai” with “shanghai hotels accommodation (3) hotel in
shanghai discount (7) and reservation (9)”
Input gates cell gates (memory)
Contextual Features
Impact of Utterance Length (Context)
[Ravuri and Stolcke, Interspeech 2015]
When sentence is long:
Basic RNN < LSTM
Contextual Features
When sentence is short:
Basic RNN > LSTM
• Q2: Can 2 paraphrases
generate the same answer?
• Q4: Can we efficiently expand
the training data?
Paraphrase
Impact of Paraphrase for Question Answering
Paraphrase
[Duboue and Chu-Carroll, HLTC 2006]
lexical paraphrase
syntactical paraphrase
1. QA is sensitive to small
variations in question
2. QA returns different
answers for questions
that are semantically
equivalent
3. Lack of training data to
cover all paraphrases
Problem
1. Replace user question by
the paraphrase
canonical form
2. Use MT to generate
paraphrases candidates
3. Multiple MTs to enhance
diversity
4. Feature-based
paraphrase selection
1. Oracle of paraphrase
selection: +35% (high
reward)
2. Random paraphrase
selection: -55% (high
risk)
3. A feature-based
selection: +0.6%
Solution Impact
Definition
Paraphrase generation is a sequence-to-sequence modeling problem.
Paraphrase
Three characteristics
1. Monolingual parallel data is not readily available (vs. bilingual parallel data in MT)
Use Pivot language (pairs of ML systems, especially with different methods)
2. Not all of the words or phrases need to be replaced (vs. MT)
3. Hard evaluation (vs. MT)
○ MT uses BLEU: translations are scored based on their similarity to the human references
○ More difficult to provide human references (canonical forms) in paraphrase generation
Paraphrase
External Paraphrase Tables
[Zhao et al., ACL 2009]
Goal:
sentence compression
1. Adequacy: {evidently not,
generally, completely} preserved
meaning
2. Fluency: {incomprehensible,
comprehensible, flawless} paraphrase
3. Usability: {opposite to, does not
achieve, achieve} the application
1. Jointly likelihood of Paraphrase
Tables
2. Trigram language model
3. Application dependent utility
score (e.g., similarity to canonical
form in “paraphrase generation”)
Human evaluation Model
1. Prefer paraphrases
which are a part of the
canonical form
2. Better than pure
MT-based methods
3. Utility score is crucial
Analysis
Paraphrase
RNN Encoder-Decoder
[Cho et al., EMNLP 2014]
Paraphrase
RNN Encoder-Decoder in Paraphrase Generation
[Cho et al., EMNLP 2014]
● Semantically similar (most are
about duration of time)
● Syntactically similar
Paraphrase
• Q3: Which question will
generate the better answer?Active Q&A
Active Question Answering
[Buck et al., ICLR 2018]
Active Q&A
• Q4: Can we efficiently expand
the training data?
Character-level
Modeling
Impact of Unknown Words
[Ravuri and Stolcke, Interspeech 2015]
30% singletons
60% singletons
RNN/LSTM-triletter < RNN/LSTM-word
Preprocessed generic entities (dates, locations, phone numbers...)
“Can you show me the address of Happy Kleaners?” → “Can you show me the address of LOC?”
RNN/LSTM-triletter > RNN/LSTM-word
Character-level Modeling
• Q4: Can we efficiently expand
the training data?Other solutions
Bots Simulator: introduced in Q&A systems@LinkedIn
Domain-independent grammar: introduced in sequence-to-sequence semantic parsing
Definition
Slot Filling
Slot Filling is to extract slot/concept values from the question for a set of predefined slots/concepts.
More specifically, it is often modeled as a sequence labeling task with explicit alignment.
extract semantic concepts
Select Flight
From Airline_Travel_Table
Where dept_city = “Boston” and arr_city = “New York”
and date = today
fill in arguments/slots
semantic frame/template
Characteristics
Semantic frame
• Pre-defined slots/concepts by
goal-oriented dialog systems
- vs. open-domain dialog systems
• Q5: Can we leverage domain
knowledge?
• Long entity phrase has strong
slot dependency
• Q6: Is model sensitive to slot
position?
• Q7: Shall we globally assign
labels together?
Slot dependency
Slot Filling
Sentence watch star war episode IV a new hope
Slots O B-mov I-mov I-mov I-mov I-mov I-mov I-mov
• Input and output are of the same
length
- vs. other sequence labeling tasks
(machine translation and speech
recognition): the output is of the
variable length
• Q8: Can slot filling model
leverage the “explicit
alignment”?
Explicit alignment
Knowledge-based Model
Bidirectional RNN, Slot
Language Model, RNN-CRF
Attention-Based RNN
• Q5: Can we leverage domain
knowledge?
Knowledge-
based Model
Incorporate Additional Knowledge into RNN
Knowledge-based Model
Additional knowledge:
● Look-up table (taxonomy)
● Word categories (from Wikipedia)
● Named entity tags
● Syntactic labels
[Yao et al., Interspeech 2013]
• Q6: Is model sensitive to slot
position?
Bidirectional
RNN
RNN for Slot Filling
RNN
Bidirectional RNN for Slot Filling
Bidirectional RNN
Sentence show flights from Boston to New York today
Slots O O O B-dept O B-arr I-arr B-date
Sentence is today’s New York arrival flight schedule available to see
Slots O B-date B-arr I-arr O O O O O O
Forward RNN is better:
Backward RNN is better:
[Mesnil et al., Interspeech 2013]
• Q7: Shall we globally assign
labels together?
Slot Language
Model, RNN-CRF
Slot Language Model and RNN-CRF
Slot Filling
[Mesnil et al., TASLP 2015]
Motivation
Domain Detection Intent Detection Slot Filling
Benefits
1. Only 1 model needs to be trained, fine-tuned for
multiple tasks, and deployed
2. Tasks enhance each other. For example, if the
intent of a sentence is to find a flight, the sentence
likely contains the departure and arrival cities, and
vice versa
3. Outperform separate models for each task
Joint Learning
NLU: Joint Model
Q9: What is the most effective learning
structure of this multi-task learning?
Q10: Should we jointly optimize the loss
function or not?
append intent to the
beginning/end of slots
2 different tasks
…...
• Q8: Can slot filling model
leverage the “explicit
alignment”?
• Q9: What is the most effective
learning structure of this
multi-task learning?
Attention-based
RNN
Encoder-Decoder for Joint Learning
Attention-based RNN
[Liu and Lane, Interspeech 2016]
attention
alignment
attention + alignment
attention: normalized weighted sum of encoder states,
conditioned on previous decoder state. Carry additional
longer term dependencies (vs. h codes whole sentence
info already)
alignment: Do not learn alignment from training data
for slot filling task -- waste explicit attention
same encoder for 2 decoders
Attention-based RNN for Joint Learning
[Liu and Lane, Interspeech 2016]
attention-based bidirectional RNNattention-based encoder-decoder
performed similarly; faster
Attention-based RNN
• Q9: What is the most effective
learning structure of this
multi-task learning?
• Q10: Should we jointly
optimize the loss function or
not?
Joint Model
Comparison
Other Learning Structure Variations
Joint Model Comparison
Related work Idea
[Xu and Sarikaya, ASRU 2013] CNN features for CRF optimization framework
[Zhang and Wang, IJCAI 2016] Similar to Attention-based RNN, 1) no attention, 2) CNN contextual
layer on top of input, 3) global label assignment, 4) replace LSTM
by GRU
[Hakkani-Tür Interspeech 2016] Append intent to the end of slots, Bidirectional LSTM
[Wen et al., CCF 2017] Modeling slot filling at lower layer and intent detection at higher
layer is slightly better than other variations
[Goo et al., NAACL-HLT 2018] Attention has a higher weight, if slot attention and intent attention
pay more attention to the same part of the input sequence
(indicates 2 tasks have higher correlation)
[Wang et al., NAACL-HLT 2018] Optimize loss function separately, alternatively update hidden
layers of 2 tasks
… and Their Comparisons
Joint Model Comparison
Results
1. Most of models achieved similar results
2. Attention-based RNN (2nd) beats the majority
3. Optimizing loss function separately is slightly
better, partially because the weights on joint loss
function need fine-tune
[Wang et al., NAACL-HLT 2018]
Sequence-to-Sequence Q&A
Sequence to Sequence
Logic
Form
KB, DB
execute
Sequence to SequenceText
Motivations
● Simply the workflow: directly generate
the final action
● Theoretically, model very complex Q&A
Challenges
● Grammar mismatch between question
and logic form
● Practically, suffer from the lack of
training data for complex Q&A
Question
● Q11: How to collect domain-specific
training data?
Grammar-complied Methods
Direct Parsing
• Uniqueness (vs. regular seq2seq): not only
predict text sequence, but also comply with the
grammar of target
• 3 types of sequence prediction:
○ Logic Form
○ Canonical Phrase
○ Derivation Sequence (DS)
• Result: Predict DS taking into account
grammatical constraints increased QA accuracy
(57% → 73%)
[Xiao et al., ACL 2016]
Sequence to Sequence
• Q11: How to collect
domain-specific and
grammar-complied training
data?
Grammar-compl
ied Methods
Lack of Training Data in A Domain
• Pros: 1) KB does not need cover all constants to support the direct x → y,
2) Suitable for online learning
• Alternative: Shift the burden to learning: the features are domain-dependent. A good model can be
used for paraphrase modeling to generate more training data offline
Grammar-complied Methods
[Kwiatkowski et al., EMNLP 2013]
A New Domain: No Training Data
• Focus: quickly collect training data for semantic
parser of a new domain (overnight)
• Advantages: 1) domain → utterances: enforce
completeness of functionality; 2) no supervision on
answers
• Method: Use a domain-general grammar to generate
both logical forms and canonical utterances
[Wang et al., ACL 2015]
Grammar-complied Methods
Direct-Search Q&A
Direct Search
KB, DB
Direct SearchText
Motivations
● Can we retrieve answer in KB/DB like querying
search engine? Yes, but:
● Limited to simple Q&A (retrieve a single fact)
Challenge
● It is hard to find the answer if the answer is
supported by multiple knowledge facts
Question
● Q12: Can we build a scalable framework for
extracting the answer from multiple knowledge
facts?
Memory Network
Direct-Search Q&A: A Single Supporting Fact
[Yih et al., ACL 2014]
Direct Search
KB, DB
Question Question - EntityEntity
Entity Relation Entity
CNN
Similarity
CNN
Similarity
Score
Rank
Answer
• Q12: Can we build a scalable
framework for extracting the
answer from multiple
knowledge facts
Memory
Network
Memory Network Architecture
[Weston, ICML 2016]
Memory Network
1) Input module: input KB
and questions to memory
2) Generalization module:
add new KB to memory
3) Output module: return
supporting facts based on
memory lookups
4) Response module:
score and return objects
of the support facts
Memory Network in Simple Q&A
[Bordes et al., 2015; Weston, ICML 2016]
Memory Network
Summary
Pipelined or Joint Learning
Logic
Form
KB, DB
executerule/template-based
action generation
Sequence to Sequence
Domain Detection Intent Detection Slot Filling
ASR
Text
Audio
Text
Direct Search
Single-turn Q&A
● Solve the single goal when goal is clear
● Technology is relatively mature
● Pave the foundation for deep NLU and
complex infrastructure
Advanced Version: Multi-Turn
Question Answering
Bee-Chung Chen
Multi-Turn Question Answering
Where can I have
japanese food
in the downtown?
Natural Language
Understanding
(NLU)
Dialogue State
Tracking
(DST)
Action Generation
(Dialogue Policy)
Natural Language
Generation
(NLG)
KB, DB, Index
San Jose
downtown or
London downtown?
Challenge: How to Evaluate a Multi-Turn System?
Much more difficult than evaluating a single-turn system
● In a single-turn system, it is easier to label the correct answer to a question
○ Once we have labels, we can do offline evaluation
● In a multi-turn system, It is unclear what is “the correct response”
○ There can be many “successful paths” to achieve the goal
○ It is difficult to label sufficiently many successful paths for offline evaluation
Evaluation Methods for Multi-Turn Systems
Evaluation by hired human evaluators
● Also used for single-turn systems
● Expensive
● Difficult to cover all possible
scenarios
Experiments with end users
● Also used for single-turn systems
● Sometimes difficult to assess
whether a user is satisfied (and need
to predict user satisfaction)
● Only available after we launch the
product
Treating it as a ranking problem
● Also used for single-turn systems
● For each turn, use the model to rank
a set of predetermined possible
system utterances and compute
precision
● Far from a realistic setting
Evaluation based on a user simulator
● Limited by the capability of the
simulator
● However, can be used to provide
unlimited training and test data
Practical Approach
1. Define the problem space and collect sample conversations
2. Build a user simulator
● Use sample conversations to fit the parameters of the simulator
● Collect more sample conversations based on simulator + crowdsourcing
3. Train a model using supervised learning on sample conversations
4. Improve the model using reinforcement learning (RL) based on the simulator
5. Test the model with friends and/or hired human evaluators and apply RL
6. Test the model with end users and apply RL
Define the Problem Space
User Goals
● Request & Constraints
● Example: Make a reservation of a
Japanese restaurant in San Jose,
… and let me know the address
Intents / Dialogue Acts
● inform
● request
● confirm_question
● confirm_answer
● greeting
● closing
● multiple_choice
● thanks
● welcome
● deny
● not_sure
[Microsoft Dialogue Challenge, 2018]
Slot & Values
● Cuisine
Japanese
Chinese
…
● Rating
5
4
…
● City, State
San Jose, CA
New York, NY
...
request: address, reservation
constraints:
cuisine = “Japanese”,
city = “San Jose”, state = “CA”,
rating = “5”,
date = “today”, time = “7pm”,
number_of_people = “4”
Example: https://github.com/xiul-msr/e2e_dialog_challenge
Collect and Annotate Sample Conversations
[Microsoft Dialogue Challenge, 2018]
Role Utterance Annotation (logical form, semantic frame)
User Hello, i was wondering if you can book a
restaurant for me? Pizza would be good.
greeting(greeting=hello),
request(reservation), inform(food=pizza)
Agent Sure! How many people are in your party? request(number_of_people)
User Please book a table for 4 inform(number_of_people=4)
Agent Great! What city are you dining in? request(city)
User Portland inform(city=Portland)
Agent Ciao Pizza or Neapolitan Pizzeria? multiple_choice(restaurant_name=...)
User What is the price range of Ciao Pizza inform(restaurant_name=Ciao Pizza),
request(pricing)
Practical Approach
1. Define the problem space and collect sample conversations
2. Build a user simulator
● Use sample conversations to fit the parameters of the simulator
● Collect more sample conversations based on simulator + crowdsourcing
3. Train a model using supervised learning on sample conversations
4. Improve the model using reinforcement learning (RL) based on the simulator
5. Test the model with friends and/or hired human evaluators and apply RL
6. Test the model with end users and apply RL
Agenda-Based User Simulation
[Schatzmann & Young, 2009]
User Goal
● Request Rt
: e.g., name, address, phone
● Constraint Ct
: e.g., { type=bar, drinks=beer, area=central }
● Rt
and Ct
can change over time t
Agenda
● A stack of user actions to be performed
● Generated by a set of probabilistic rules
● Pop to perform a user action
● Push to add future actions in response
to the agent’s actions
At
=
Build a Simulator: Agenda-Based User Simulation
[Schatzmann & Young, 2009]
Build a Simulator: Summary Space Technique
[Schatzmann & Young, 2009]
Collect More Annotated Conversations by Simulation +
Crowdsourcing
[Shah et al., 2018]
Rule-Based
Agent
User
Simulator
Simulated
Conversations
Contextual
Paraphrasing
Crowdsourcing Task #1
Make conversation more
natural with coreferences and
linguistic variations
Validation
Crowdsourcing Task #2
Verify the created paraphrases
have the same meaning by
consensus of n workers
Generate both
utterances and
annotations
Annotated
Conversation
Paraphrasing and validation tasks are
much easier than annotation tasks
Turn 1 Turn 2 Turn 3
Feed Forward
Neural Net
User Simulation based on Seq-to-Seq Models
[Crook & Marin, 2017; Kreyssig et al., 2018]
After we collect more annotated data, we can improve the user simulator by
more advanced models
Example: Sequence-to-sequence models learned from annotated data
vt
: Feature vector of
the user state at
turn t
User utterance of turn 4 =
Practical Approach
1. Define the problem space and collect sample conversations
2. Build a user simulator
● Use sample conversations to fit the parameters of the simulator
● Collect more sample conversations based on simulator + crowdsourcing
3. Train a model using supervised learning on sample conversations
4. Improve the model using reinforcement learning (RL) based on the simulator
5. Test the model with friends and/or hired human evaluators and apply RL
6. Test the model with end users and apply RL
Multi-Turn Question Answering
Where can I have
japanese food
in the downtown?
Natural Language
Understanding
(NLU)
Dialogue State
Tracking
(DST)
Action Generation
(Dialogue Policy)
Natural Language
Generation
(NLG)
KB, DB, Index
San Jose
downtown or
London downtown?
Train a Dialogue State Tracker (DST)
State-of-the-art: A neural network
model combining NLU & DST
Input: Previous state & conversation
State: Example - belief state
Pr(intent)
Pr(inform) = 0.8
Pr(request) = 0.9
….
Pr(slot=value | intent)
Pr(price=cheap | inform) = 0.9
Pr(city=London | inform) = 0.1
….
Pr(food | request) = 0.7
Pr(phone | request) = 0.1
….
Agent: What kind of food at what price?
request(food, price)
User: I want something cheap.
What are the available food types?
Output: New state
Train a Dialogue State Tracker (DST)
With annotated data, this is a
supervised learning problem
Input: Previous state & conversation
Agent: What kind of food at what price?
request(food, price)
User: I want something cheap.
What are the available food types?
Output:
Pr(intent)
Pr(inform) = 0.8
Pr(request) = 0.9
….
Pr(slot=value | intent)
Pr(price=cheap | inform) = 0.9
Pr(city=London | inform) = 0.1
….
Pr(food | request) = 0.7
Pr(phone | request) = 0.1
….
Label
1
1
Label
1
0
1
0
Example DST: Neural Belief Tracker
[Mrksic et al., 2018]
Predict
Pr(price=cheap | inform)
For each (slot, value)
request(food, price)
CNN
Global-Locally Self-Attentive Encoder
[Zhong et al., 2017]
slot-specific slot-specific
Utterance Utterance
Representation
Example DST: Global-Locally Self-Attentive Tracker
[Zhong et al., 2017]
For each candidate
slot=value Predict
Pr(price_range=cheap | inform)
using global-locally
self-attentive encoders
Multi-Turn Question Answering
Where can I have
japanese food
in the downtown?
Natural Language
Understanding
(NLU)
Dialogue State
Tracking
(DST)
Action Generation
(Dialogue Policy)
Natural Language
Generation
(NLG)
KB, DB, Index
San Jose
downtown or
London downtown?
Dialogue Policy
Input: State
Pr(intent)
Pr(inform) = 0.8
Pr(request) = 0.9
….
Pr(slot=value | intent)
Pr(price=cheap | inform) = 0.9
Pr(city=London | inform) = 0.1
….
Pr(food | request) = 0.7
Pr(phone | request) = 0.1
….
Output: Dialogue act
inform(food = chinese)
Methods:
- Rules
- Supervised Learning
model(state) => intent(slot=value, ...)
- Reinforcement Learning
Supervised Learning
Policy: (st
) → at
Training Data:
State Correct Action
s0
greeting(hello)
s1
request(food)
s2
inform(food=...)
... ...
Example state:
st
: Pr(inform) = 0.8
Pr(request) = 0.9
...
Pr(price=... | inform) = 0.9
Pr(city=... | inform) = 0.1
…
Or, an embedding vector
feature
vector
greeting(hello)
request(food)
inform(food=...)
...
Action template (summary action)
How to Deal with a Large Action Space
(st
)
inform(city=San Jose)
inform(city=London)
inform(food=Chinese)
inform(food=Korean)
...
...
(st
)
inform(city=GET_CITY)
inform(food=GET_FOOD)
Action template
San Jose
London
Chinese
Korean
...
...
API call or rules
Action mask
- Remove invalid candidate actions based on rules (domain knowledge, common sense)
- Example: Don’t recommend a restaurant if location is unknown
Don’t make a reservation if the user has not yet selected a restaurant
Example: Hybrid Code Networks
[Williams et al., 2017]
State Tracker:
RNN
Input feature vector to RNN
Multi-Turn Question Answering
Where can I have
japanese food
in the downtown?
Natural Language
Understanding
(NLU)
Dialogue State
Tracking
(DST)
Action Generation
(Dialogue Policy)
Natural Language
Generation
(NLG)
KB, DB, Index
San Jose
downtown or
London downtown?
Modular Approach: NLG
Natural Language
Generation
(NLG)
Output: System Utterance
Tomi Sushi is a nice
Japanese restaurant
(Text-to-Speech)
Input: Action
Action: Suggest_Restaurant
Type = Japanese
Name = Tomi Sushi
Basic version: Templates
[Name] is a nice [Type] restaurant
How about a [Type] restaurant like [Name]
I would recommend a [Type] restaurant like [Name]
...
Modular Approach: NLG
Natural Language
Generation
(NLG)
Output: System Utterance
Tomi Sushi is a nice
Japanese restaurant
(Text-to-Speech)
Input: Action
Action: Suggest_Restaurant
Type = Japanese
Name = Tomi Sushi
Advanced version: RNN Decoder (e.g., LSTM)
● Add the action as additional input to each RNN cell
● Use multiple layers of RNN to improve performance
● Use a backward RNN to further improve performance
NLG: Semantically Conditioned LSTM
[Wen et al., 2015]
xt
xt
xt
xt
xt
ht-1
d0
Practical Approach
1. Define the problem space and collect sample conversations
2. Build a user simulator
● Use sample conversations to fit the parameters of the simulator
● Collect more sample conversations based on simulator + crowdsourcing
3. Train a model using supervised learning on sample conversations
4. Improve the model using reinforcement learning (RL) based on the simulator
5. Test the model with friends and/or hired human evaluators and apply RL
6. Test the model with end users and apply RL
Agent
Turn t=1
Problem Setting
Agent
State s0
input
x0
action
a0
State s1
input
x1
update
Turn t=0
Agent
State s2
input
x2
action
a1
update
Turn t=2
action
a2
End
USER
DST: (st-1
, at-1
, xt
) → st
Policy: (st
) → at
Example:
at-1
= request(food, price)
Example:
st-1
: Pr(inform) = 0.8
Pr(request) = 0.9
...
Pr(price=... | inform) = 0.9
Pr(city=... | inform) = 0.1
...
xt
= “I want something cheap.
What are the available
food types?”
at
= inform(food, GET...)
Reinforcement Learning
Agent
State s0
input
x0
Agent
State s1
input
x1
action
a0
reward
r0
update
Time t=0
Agent
State s2
input
x2
action
a1
reward
r1
update
Time t=1 Time t=2
action
a2
reward
r2
End
USER
Supervised Learning:
Annotate correct actions
Policy: (st
) → at
-1 -1 20
Example Reward
- Each step: -1
- Success: 20
- Failure: 0
Reinforcement Learning:
Define the reward
Reinforcement Learning Methods: Outline
Basic methods
- Q-learning: Deep Q-Network
- Policy gradient: REINFORCE (Monte-Carlo gradient ascent)
Advanced methods
- Actor-Critic policy gradient method with experience replay
- Deep Dyna-Q & BBQ-Networks
- Multi-level reinforcement learning
Action-Value Function Q(s, a)
Q (s, a) = E [ total reward | we start from state s, take action a,
and then follow ]
Policy: (st
) → at
Q*(s, a) = Q (s, a) when is the optimal policy
Optimal policy *(st
) = argmaxa
Q*(st
, a)
Q*(st
, at
) = E [ rt
+ maxa
Q*(st+1
, a) | st
, at
]
Value
Q-Learning: Deep Q-Network
Goal: Learn Q*(st
, at
) = E [ rt
+ maxa
Q*(st+1
, a) | st
, at
]
Optimal policy *(st
) = argmaxa
Q*(st
, a)
Neural Net: Q(s, a | w) → Value
w = model parameters (weights)
Q-Learning: Find w that minimizes E [ ( Q(s, a | w) - Q*(s, a) )2
]
SGD: Use a single sample (st
, at
, rt
, st+1
) to to compute Q*(st
, at
)
Compute the gradient using the sample and do gradient descent
Deep Q-Network: Details
Q*(st
, at
) = E [ rt
+ maxa
Q*(st+1
, a) | st
, at
]
Deep Neural Net: Q(s, a | w), w = argminw
E [ ( Q(s, a | w) - Q*(s, a) )2
]
While(current state st
)
Take action at
= argmaxa
Q(st
, a | wt
) with probability (1 - );
random, otherwise.
Receive (rt
, st+1
) and save (st
, at
, rt
, st+1
) in replay memory D
Sample a mini-batch B from buffer D
Update wt
based on
[Mnih et al., 2015]
-greedy
Reinforcement Learning Methods: Outline
Basic methods
- Q-learning: Deep Q-Network
- Policy gradient: REINFORCE (Monte-Carlo gradient ascent)
Advanced methods
- Actor-Critic policy gradient method with experience replay
- Deep Dyna-Q & BBQ-Networks
- Multi-level reinforcement learning
Policy Gradient
Policy (s, a | ) = Pr(take action a | state s, model parameter )
[Sutton et al., 2000]
Reward
Gradient Derivation (1)
(No reward discounting over time)
Gradient Derivation (2)
Recursion
(No reward discounting over time)
Gradient: Supervised vs. Reinforcement
Supervised (imitation) Learning
Reinforcement Learning
correct actions
agent’s actions value of action at
Monte-Carlo Policy Gradient Ascent (REINFORCE)
While( run policy (⋅| ) to generate s0
, a0
, …, sT
, aT
)
Compute vt
= total reward starting from step t (based on this sample run)
Update based on
[Williams et al., 1988]
SGD: Do a sample run of the policy s0
, a0
, …, sT
, aT
Use this run to compute the sample Q value
Compute the gradient using this sample and do gradient ascent
Reinforcement Learning Methods: Outline
Basic methods
- Q-learning: Deep Q-Network
- Policy gradient: REINFORCE (Monte-Carlo gradient ascent)
Advanced methods
- Actor-Critic policy gradient method with experience replay
- Deep Dyna-Q & BBQ-Networks
- Multi-level reinforcement learning
Collect data D = { (s0
, a0
, p0
, v0
, …, sT
, aT
, pT
, vT
) }
- pt
= Pr(take action at
at step t), recorded during data collection
While( sample (s0
, a0
, p0
, v0
, …, sT
, aT
, pT
, vT
) from D )
Update based on
Policy Gradient with Experience Replay
Can we learn from past data?
- Importance sampling
, which is capped to prevent high variance
past example
Pr(past example | old policy)
Pr(past example | new policy)
Actor-Critic with Experience Replay
While( run policy (⋅| ) to generate s0
, a0
, …, sT
, aT
)
Save (s0
, a0
, p0
, v0
, …, sT
, aT
, pT
, vT
) in replay memory D
Train w1
and w2
using experience replay (with importance sampling weighting)
Update based on
[Kavosh & Williams, 2016; Weisz et al., 2018]
Problem: vt
has high variance
- Predict vt
by a model Q(st
, at
| w)
- Q(st
, at
| w) also have high variance
- Replace Q(st
, at
| w) by A(st
, at
| w) = Q(st
, at
| w1
) - V(st
| w2
)
Reinforcement Learning Methods: Outline
Basic methods
- Q-learning: Deep Q-Network
- Policy gradient: REINFORCE (Monte-Carlo gradient ascent)
Advanced methods
- Actor-Critic policy gradient method with experience replay
- Deep Dyna-Q & BBQ-Networks
- Multi-level reinforcement learning
Deep Dyna-Q
Train a “world model” to predict rewards and user actions
M(s, a | wM
) → (reward, user action, terminate or not)
Use the world model to generate simulated data
Apply Q-learning to simulated data
=> Planning
(The agent thinks about and “plans” for hypothetical scenarios)
Deep Dyna-Q: Details
For each step:
- Serve user based on Q(s, a | wQ
)
using -greedy
- Save experience in D
- Update wQ
by Q-learning based on a
sample from D
- Update wM
by learning from a
sample from D
- Update wQ
by Q-learning based on
simulation (a.k.a. planning) using
M(s, a | wM
) and Q(s, a | wQ
)
Q(s, a | wQ
) → value
M(s, a | wM
) → (reward,
user action,
terminate or not)
[Peng et al., 2018]
BBQ-Networks
[Lipton et al., 2017]
Model the uncertainty of Deep Q-Network: Q(s, a | w)
Bayes-by-Backprop [Blundell et al., 2015]
Assume prior w ~ N( 0
, diag( 0
2
) )
D = { (si
, ai
, vi
) }, where vi
is the observed Q(si
, ai
)
Approximate p(w | D) by q(w | , ) = N( , diag( 2
) ) s.t.
where = log(1+ exp( ))
Thompson sampling
Draw wt
~ q(w | t
, t
) and take action argmaxa
Q(st
, a | wt
)
BBQ-Networks: Details - SGVB
[Kingma & Welling, 2014]
Draw ~ N(0, 1) for L times Take a minibatch of size M from D
Compute the gradient and perform one step of SGD
Reinforcement Learning Methods: Outline
Basic methods
- Q-learning: Deep Q-Network
- Policy gradient: REINFORCE (Monte-Carlo gradient ascent)
Advanced methods
- Actor-Critic policy gradient method with experience replay
- Deep Dyna-Q & BBQ-Networks
- Multi-level reinforcement learning
Further Reduce the Action Space
Action hierarchy
- Action group → individual action
Learn K+1 policies
- Master policy: (state) → action group g
- K sub-policies:
For each group g, g
(state) → action a
[Casanueva et al., 2018]
Q(s, g | w)
Q(s, a | wg
)
Deep Q-Network
share some parameters
across different groups
Sequential Subtasks
[Peng et al., 2017]
E.g., recommend a
restaurant
E.g., make a
reservation
Practical Approach
1. Define the problem space and collect sample conversations
2. Build a user simulator
● Use sample conversations to fit the parameters of the simulator
● Collect more sample conversations based on simulator + crowdsourcing
3. Train a model using supervised learning on sample conversations
4. Improve the model using reinforcement learning (RL) based on the simulator
5. Test the model with friends and/or hired human evaluators and apply RL
6. Test the model with end users and apply RL
Summary
Multi-turn question answering is an active research area
How to obtain training data is a key challenge
- Simulation + crowdsourcing is a promising direction
- Continuously improve the simulator to generate better data
Reinforcement learning is promising
- It is important to pretrain a RL model on reasonable sample data
- Otherwise, it will have a hard time to get success and will learn to end early
- Interesting directions: Reduce variance in training,
model uncertainty better, leverage hierarchical structure
Part II
Question Answering Systems
@ LinkedIn
Overview of Our Approach
Liang Zhang
Our Use Cases of Goal-Oriented Question Answering
●
○
●
○
●
○
●
Challenges
Challenges - Data Collection
• No data to start with
• What data to collect ?
• Who can help generate the
data?
Cold-Start
Challenges - ML Problem Definition
Supervised ML Problems
Feature X Label Y
Model
QA Problems
●
●
■
●
■ ⇔
■
Challenges - Evaluation
●
○
○
■
■
●
○
■
Challenges - Adversarial Inputs from Users
Challenges - Adversarial Inputs from Users
Our High-level Approach
Single-turn Question Answering Systems
NLU:
Main Focus
No DST
Dialogue Policy:
Rule-Based
NLG:
Template-Based
KB, DB, Index
Pipelined or Joint Learning
Logic
Form
KB, DB
executerule/template-based
action generation
Sequence to Sequence
Domain Detection Intent Detection Slot Filling
ASR
Text
Audio
Text
Direct Search
Our High-level Modeling Design
“Who founded LinkedIn?”
Entity Recognition: “LinkedIn”
Intent Detection: “Founder”
Slot Filling &
Rule/Template-based
Action Generation
Output: “Reid Hoffman, Allen Blue, ...”
Model
DB Query: (_, Founder, LinkedIn)
Data Collection
A Set of Seed Questions & Answers
Define Problem & Scope
“Who founded LinkedIn?” “What jobs do you have?” …...
“Who is the founder of
LinkedIn?”“Founder of LinkedIn?”
“LinkedIn founders”
…...
Expansion of similar questions that lead to
same answer
Sometimes challenging!
Out-of-scope questions collected from public databases with similar domains
Data Annotation
● Entity Recognition:
● Intent detection
● Slot Filling & Rule/Template-based Action generation
“Who founded LinkedIn?” Entity Annotation: “LinkedIn” => Company
“Who founded LinkedIn?” Intent Annotation: “Founder”
“Who founded LinkedIn?” Result Annotation: DB Query: (_, Founder, LinkedIn)
Entity Recognition and Tagging
How many software engineers know Java in United States?
Title Skill Country
● Semi-CRF Model (Sarawagi and Cohen
2005)
● Deep Neural Network (Lample et al.
2016)
Lample et al. 2016
Intent Detection
● Define a set of intents
○ Depend heavily on product design!
● Multi-class classification model
○ Problem: Question => Intent
○ Model: Logistic regression, CNN, RNN, LSTM, …
○ Features: Bag of words, word embeddings, tagged entities, ...
● Out-of-scope intent a must!
Action Generation for Answer Retrieval
● “Good jobs at Google?”
○ Call LinkedIn job recommendation engine with company ==
“google”
● “How many members are in active community?”
○ Convert to SQL query and query the database
● “Change my password”
○ Show the article section that contains “how to change the
password” step by step
● ...
Analytics Bot
Jaewon Yang
Analytics Bot: Motivation
• Data-driven decision making: we
have to answer a lot of questions
about metrics!
• Answering metric questions requires
domain experts
• What if a Q&A bot can answer
(simple) metric-related questions?http://tanerakcok.com/data-driven-product-management-taner-akcok/
Analytics Bot (Ana Bot)
• Internal Q&A bot about metrics
• Answers metric query questions (how many X)
• Answers definition questions (what is X)
Ana: ML Problems
• Intent: Definition, Query,
Out-of-scope (OOS)
• Definition: What is “contributor”?
• Query: How many contributors 2
days ago?
• OOS: How is the weather?
Intent Detection Question2Definition Question2Query
• Find right definition for the
definition question
• In: What is contributor?
• Out: Contributor is a user who
initiates or continues
conversation at LinkedIn
• Create a database query from a
given question
• In: How many contributor
yesterday?
• Out: SELECT COUNT(DISTINCT
mem_id) FROM contributors
WHERE date = ‘08-18-2018’;
Ana: Model Flow
UI
Intent
Detection
Intent-specific NLU:
Question2Definition
Presto
UI
Intent-specific NLU:
Question2Query
Knowledge
Base
Main Problem: Question2Query
• Input: Natural language question
• Output: Database query (SQL)
• Well studied problem, yet accuracy not reliable
(~60%) [Zhong et al. 2017b]
• Our goal: Develop a ML model that can:
○ Has reliable accuracy
○ Work without large training data
SELECT COUNT(DISTINCT mem_id)
FROM contributors
WHERE contribution_type
== ‘message’
AND activity_time > 2018-08-13
“How many contributors messaged last
week?”
Question2Query: Challenges
• Expensive to get large training data
• Need to onboard new metrics
• SQL is hard to canonicalize
• “Almost success” does not count
Challenges Our Approaches
• Leverage models trained on public data sets
• Leverage metadata available in the company
• Formulate slot filling for SQL
• Show our interpretation and allow users to make
minor changes.
Data Collection
• Survey “What question would you ask to Ana?”
• 60 seed questions from 20 domain experts
• Discovered “Definition” intent from the seed Qs
• Selected 20 target metrics (tables and columns)
• Defined slot filling problem for SQL query
generation
Initial User Study Training Data Generation
• 60 Seed questions -> 3k (question, answer) pairs
• Initial annotation by data scientists
• Annotators generate paraphrases
• Multiple reviews to get consistent annotation
Slot Filling for Query
• Question -> Query: How do we generate query?
○ Word-by-word generation does not work
○ Better to do slot filling [Zhong et al, 2017b]
• Why is slot filling better?
○ Concise: Reduce chance of error
○ Interpretable: Easy to explain to the user
○ Modular: Can divide and conquer
SELECT COUNT(DISTINCT mem_id)
FROM contributors
WHERE contribution_type
== ‘message’
AND activity_time > 2018-08-13
“How many contributors messaged last
week?”
Slot Filling for SQL
• Use 4 slots to represent queries
○ Metric: Aggregator, Column, Table
○ Time: Where clause about time
○ Filter: Additional WHERE clauses
○ Breakdown: Group by (Not shown left)
• Concise: Reduced # possible outcomes by 20X
• Interpretable: Show the outcome in English
• Modular: Use SUTime for Time prediction
SELECT COUNT(DISTINCT mem_id)
FROM contributors
WHERE contribution_type
== ‘message’
AND activity_time > 2018-08-13
Metric: unique_contributors
Filter: contribution_type == ‘message’
Time: Last 7 days
“How many contributors messaged last
week?”
ML Model for Slot Filling
• Predict each slot by feature-based regression models
• Features for (input question q, metric m)
○ Similarities between q and the description for m
■ E.g, unique_contributors -> “#members who posted, shared ...”
■ Helps m when m has little training data
○ Similarities between q and training examples with m
• Transfer learning: Use word embeddings, named entities tagger, deep
neural networks trained on other large data sets to compute similarities
User Modifying Query
• Output: (Metric, Filter, Time, Breakdown)
○ One slot miss = incorrect query
• Solution: Let user modify query among top 3
○ Use slot description (No SQL
knowledge needed)
• User gets answer if right query is in top 81
Question2Query Modeling Results
Model Improvement in accuracy
over baseline
Unsupervised (Finding Nearest Neighbor in Training Data) --
Logistic Regression model with All features +21%
Prod Model: Logistic Regression with Pairwise model training +26%
Prod Model with Top 3 Precision +44%
Ana: How to Add New Metric
Acc.
# Training questions.
• Varying # training q and measure acc
per metric
• Performance converge with > 10 q
Intent Detection
• 3-class classification: {Definition, Query,
Out-of-scope (OOS)}
• Hardest problem: preparing training examples
for OOS.
• Model: Logistic regression with unigram and
bigram features. Challenge comes from
training examples, not ML features.
• Ana allows user to override the intent
Question2Definition
• Question -> The most relevant definition
• Method: Rank by character n-gram matches
○ ‘Post’ -> {‘po’, ‘os’, ‘st’, ‘pos’, ‘ost’}
• Easy: Linguistic variation does not matter
• Very useful to the user (higher NPS score)
Ana: Conclusion
• Q&A bot that does Question2Query and Question2Definition
• Formulated slot filling to improve performance,interpretability and modularity
• Leveraged metadata / transfer learning to overcome cold-start
• Users liked:
○ Ana interacts with them, by confirming the intent, confirming the query
○ Ana allows users to correct Ana
○ Ana can look up definitions as well as handling queries
Ana Live Demo
Company Assistant
Company Assistant: Motivation
• Share Rich LinkedIn insights
• People Search & Research Questions
• Career Advice Questions
https://ru.kisspng.com/png-big-data-analytics-data-science-apache-hadoop-data-454060/
Company Assistant: ML Problems
• Intent: 9 Intents + Out-of-scope
(OOS)
• About Company
• People Search
• Skills Insights
• ...
• OOS: Will you marry me?
Intent Classification Entity Tagging
• Identify named entities in the
question
• In: Show ML jobs in Microsoft?
• Out: COMPANY<Microsoft>
• Out: SKILL<Machine Learning>
Company Assistant: Model Flow
UI
Intent
Detection
UI
Jobs API
Entity Tagger
API Selector
News API
Salary API
...
Challenges
• Expensive to get large training data
• Overlapping intents
• Domain specific language
Challenges Our Approaches
• Leverage models trained on public data sets
• Early interactive model testing to arrange
overlapping intents
• Bootstrap from in-domain users (not
crowdsourcing) to collect examples with domain
specific language
Data Collection
• 30 initial questions
• 40 predefined answers (images)
• Ask people to create a question
Internal Data Collection Simulator Bot
• Simulator UI
• ML Model to guess Intent
• User feedback
(Correct/Incorrect)
• Online Accuracy estimation
• OOS intent from external sources
• Paraphrases (Simulator)
• Total Questions: ~3K
• OOS intent: ~10K
Training Data
BIAS
Entity Tagging
• Reusing existing gazetteers/dicts (jobs)
• Semi-markov CRF model
• Train on a hand-labeled data
https://es.kisspng.com/png-gear-color-icon-color-gears-346362/
Identifying Logical Expression
• Example: “Show my connections that do not work at Apple”
○ Intent: Connections, Company: Apple
• Dependency Parsing + Rule-based
Intent Detection
• 10-class classification: {About, Skills, Connections, Courses, Influencers,
Jobs, News, Salary, Talents, Out-of-scope (OOS)}
• Hardest problem: preparing training examples for OOS.
• Final Model: CNN-RNN Deep model
Intent Detection
Intent Detection: Results
Relative comparison to Human
Performance:
Model Performance
Human 100%
Baseline: Logistic Regression + CRF (LUIS) 90.50%
CNN-RNN Model 98.80%
CNN-RNN Model
Intent Precision Recall
About +65% -30%
Analyze Skills +37% -18%
Connections +10% +4%
Courses +27% -31%
Influencers +100% +60%
Jobs +7% -1%
News +15% +1%
Salary +19% -13%
Talent Stats -3% +10%
OOS -3% +20%
Accuracy +9%
Relative comparison to the Baseline:
Company Bot + Ana Bot: Lessons Learned
• Start from user study: Define bot specs and get seed training data
• Leverage existing NLP tools as much as possible
○ Time parsing, Dependency / POS parsing, Word embedding
• Build a simple model and test it asap
○ Benefit: Clean up training data, Fix poorly defined intents
• Show the interpretation and allow users to change
○ Help user achieve the goal even if the bot is 100% accurate
Question Answering in
LinkedIn Help Center
* Special thanks to Weiwei Guo for providing the material
Liang Zhang
Question Answering in LinkedIn Help Center
• Input: Question
• Output: Article / Section of the
Article that contains the answer
○ Few thousands candidates
○ New candidates get added
regularly
• Traditional keyword search does
not work well
Our Data
• A random sample of user click-through data in 01/2017 - 04/2018
• 200K <query, list of articles> samples, click / non-click as responses for each
<query, article>
• Only contains queries with > 4 words (where traditional keyword search
generally fails!)
Our Modeling Approach
• Consider each candidate article as an
intent
• Problem: Natural language Query =>
Intent
• Need to handle regularly added new
articles without too-frequently
retraining the model
Our Modeling Approach -- CNN
• Independent CNN layers for Question & article
title, keywords, content
○ Word2Vec Embedding to initialize
• Final layers: Sim<Question Embedding, Article
Embedding> => Softmax over all candidate
questions
• For new articles in test data: Use embedding
similarity instead of retraining the model
Deep + Wide Model with Better Performance
• Pure CNN Model did not work well
for out-of-vocabulary queries
• Traditional IR-based
Keyword-match features added to
complement (wide component)
Wide Component
(keyword matches)
Offline Evaluation
Model Lift % in Precision@1
vs Control (IR-based)
Embedding Similarity (GloVe) -35%
Embedding Similarity
(LinkedIn data)
-8%
Deep Model (CNN) +55%
Deep + Wide Model +57%
Embedding Similarity: Cosine similarity (query embedding, article
embedding). Query (or article) embedding is element-wise average of word
embeddings in the query (or the article) using Stanford GloVe
Online A/B Test Performance
Metric Lift % vs Control (IR-based)
Happy Path Rate +14%
Undesired Path Rate -18%
Search Sessions
w/ Clicks on Results
+7%
● Happy Path: Users who clicked only one search result and then left help
center without creating a case
● Undesired Path: User who did the search and went to “contact us” directly
without reading any articles from search results.
More Examples...
Conclusion and Future
Challenges
Liang Zhang
Conclusion and Future Challenges
● Literature review of Question Answering Systems for
○ Basic Version: Single-Turn Question Answering
○ Advanced Version: Multi-Turn Question Answering
● Our practical lessons learned through 3 LinkedIn use cases
● An area with a lot of potentials and challenges, e.g.
○ How to handle cold-start scalably?
○ How to make the model work more reliably?
○ How to have seamless human-like interactions with human?
○ How to make the model generic enough so that it works for
different domains with little effort?
○ …...
Questions?
References
● [Androutsopoulos and Malakasiotis, JAIR 2010] A Survey of Paraphrasing and Textual Entailment Methods
● [Berant and Liang, ACL 2014] Semantic Parsing via Paraphrasing
● [Bordes et al., 2015] Large-scale Simple Question Answering with Memory Networks
● [Buck et al., ICLR 2018] Ask The Right Questions: Active Question Reformulation With Reinforcement
Learning
● [Casanueva et al., 2018] Casanueva, Iñigo, et al. "Feudal Reinforcement Learning for Dialogue
Management in Large Domains." arXiv preprint arXiv:1803.03232 (2018).
● [Chen et al., KDD Explorations 2017] A Survey on Dialogue Systems: Recent Advances and New Frontiers
● [Chen et al., WWW 2012 – CQA'12 Workshop] Understanding User Intent in Community Question
Answering
● [Cho et al., EMNLP 2014] Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation
● [Crook & Marin, 2017] Crook, Paul, and Alex Marin. "Sequence to sequence modeling for user simulation in
dialog systems." Proceedings of the 18th Annual Conference of the International Speech Communication
Association (INTERSPEECH 2017). 2017.
● [Dauphin et al., ICLR 2014] Zero-Shot Learning for Semantic Utterance Classification
● [Deng and Yu, Interspeech 2011] Deep Convex Net: A Scalable Architecture for Speech Pattern
Classification
● [Deng et al., SLT 2012] Use Of Kernel Deep Convex Networks And End-to-end Learning For Spoken
Language Understanding
● [Duboue and Chu-Carroll, HLTC 2006] Answering the Question You Wish They Had Asked: The Impact of
Paraphrasing for Question Answering
● [Goo et al., NAACL-HLT 2018] Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
● [Hakkani-Tür Interspeech 2016] Multi-Domain Joint Semantic Frame Parsing using Bi-directional
RNN-LSTM
● [Hashemi et al., QRUMS 2016] Query Intent Detection using Convolutional Neural Networks
● [Hu et al., NIPS 2014] Convolutional Neural Network Architectures for Matching Natural Language
Sentences
● [Kalchbrenner et al., ACL 2014] A convolutional neural network for modelling sentences
● [Kavosh & Williams, 2016] Asadi, Kavosh, and Jason D. Williams. "Sample-efficient deep reinforcement
learning for dialog control." arXiv preprint arXiv:1612.06000 (2016).
● [Kim, EMNLP 2014] Convolutional Neural Networks for Sentence Classification
● [Kingma & Welling, 2014] Kingma, Diederik P., and Max Welling. "Stochastic gradient VB and the variational
auto-encoder." Second International Conference on Learning Representations, ICLR. 2014.
● [Kreyssig et al., 2018] Kreyssig, Florian, et al. "Neural User Simulation for Corpus-based Policy Optimisation
for Spoken Dialogue Systems." arXiv preprint arXiv:1805.06966 (2018).
● [Kwiatkowski et al., EMNLP 2013] Scaling Semantic Parsers with On-the-fly Ontology Matching
● [Lample et al. 2016] Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv
preprint arXiv:1603.01360 (2016).
● [Lee and Dernoncourt, NAACL 2016] Sequential Short-Text Classification with Recurrent and Convolutional
Neural Networks
● [Lipton et al., 2017] Lipton, Zachary, et al. "BBQ-Networks: Efficient Exploration in Deep Reinforcement
Learning for Task-Oriented Dialogue Systems." arXiv preprint arXiv:1711.05715 (2017).
● [Liu and Lane, Interspeech 2016] Attention-Based Recurrent Neural Network Models for Joint Intent
Detection and Slot Filling
● [Mesnil et al., Interspeech 2013] Investigation of Recurrent-Neural-Network Architectures and Learning
Methods for Spoken Language Understanding
● [Mesnil et al., TASLP 2015] Using Recurrent Neural Networks for Slot Filling in Spoken Language
Understanding
● [Mnih et al., 2015] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning."
Nature 518.7540 (2015): 529.
● [Mrksic et al., 2018] Mrkšić, Nikola, and Ivan Vulić. "Fully statistical neural belief tracking." arXiv preprint
arXiv:1805.11350 (2018).
● [Palangi et al., TASLP 2016] Deep Sentence Embedding Using Long Short-Term Memory Networks:
Analysis and Application to Information Retrieval
● [Peng et al., 2017] Peng, Baolin, et al. "Composite task-completion dialogue policy learning via hierarchical
deep reinforcement learning." arXiv preprint arXiv:1704.03084 (2017).
● [Peng et al., 2018] Peng, Baolin, et al. "Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue
Policy Learning." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). Vol. 1. 2018.
● [Rajpurkar et al. 2016] Rajpurkar, Pranav, et al. "Squad: 100,000+ questions for machine comprehension of
text." arXiv preprint arXiv:1606.05250 (2016).
● [Ravuri and Stolcke, Interspeech 2015] Recurrent Neural Network and LSTM Models for Lexical Utterance
Classification
● [Reddy et al., TACL 2014] Large-scale Semantic Parsing without Question-Answer Pairs
● [Sarawagi and Cohen 2015] Sarawagi, Sunita, and William W. Cohen. "Semi-markov conditional random
fields for information extraction." Advances in neural information processing systems. 2005.
● [Sarikaya et al., ICASSP 2011] Deep Belief Nets For Natural Language Call–routing
● [Schatzmann & Young, 2009] Schatzmann, Jost, and Steve Young. "The hidden agenda user simulation
model." IEEE transactions on audio, speech, and language processing 17.4 (2009): 733-747.
● [Serdyuk et al., 2018] Towards End-to-end Spoken Language Understanding
● [Shah et al., 2018] Shah, Pararth, et al. "Building a Conversational Agent Overnight with Dialogue
Self-Play." arXiv preprint arXiv:1801.04871 (2018).
● [Shen et al., CIKM 2014] A Latent Semantic Model with Convolutional-Pooling Structure for Information
Retrieval
● [Shen et al., WWW 2014] Learning Semantic Representations Using Convolutional Neural Networks for
Web Search
● [Sutton et al., 2000] Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with
function approximation." Advances in neural information processing systems. 2000.
● [Tur et al., ICASSP 2012] Towards deeper understanding: Deep convex networks for semantic utterance
classification
● [Wang et al., ACL 2015] Building a Semantic Parser Overnight
● [Wang et al., NAACL-HLT 2018] A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection
and Slot Filling
● [Weisz et al., 2018] Weisz, Gellért, et al. "Sample efficient deep reinforcement learning for dialogue systems
with large action spaces." arXiv preprint arXiv:1802.03753 (2018).
● [Wen et al., 2015] Wen, Tsung-Hsien, et al. "Semantically conditioned lstm-based natural language
generation for spoken dialogue systems." arXiv preprint arXiv:1508.01745 (2015).
● [Wen et al., CCF 2017] Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical
Information
● [Weston, ICML 2016] Memory Networks for Language Understanding, ICML Tutorial 2016
● [Williams et al., 1988] Williams, R. J. Toward a theory of reinforcement-learning connectionist systems.
Technical Report NU-CCS-88-3, Northeastern University, College of Computer Science.
● [Williams et al., 2017] Williams, Jason D., Kavosh Asadi, and Geoffrey Zweig. "Hybrid code networks:
practical and efficient end-to-end dialog control with supervised and reinforcement learning." arXiv preprint
arXiv:1702.03274 (2017).
● [Xiao et al., ACL 2016] Sequence-based Structured Prediction for Semantic Parsing
● [Xu and Sarikaya, ASRU 2013] Convolutional Neural Network Based Triangular CRF For Joint Intent
Detection And Slot Filling
● [Yan et al., AAAI 2017] Building Task-Oriented Dialogue Systems for Online Shopping
● [Yao et al., Interspeech 2013] Recurrent Neural Networks for Language Understanding
● [Yih et al., ACL 2014] Semantic Parsing for Single-Relation Question Answering
● [Zhang and Wang, IJCAI 2016] A Joint Model of Intent Determination and Slot Filling for Spoken Language
Understanding
● [Zhong et al., 2017] Zhong, Victor, Caiming Xiong, and Richard Socher. "Global-Locally Self-Attentive
Encoder for Dialogue State Tracking." Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018.
● [Zhong et al., 2017b] Zhong, Victor, Caiming Xiong, and Richard Socher. “Seq2SQL: Generating Structured
Queries from Natural Language using Reinforcement Learning.” arXiv preprint arXiv:1709.00103 (2017).
Thank you

Mais conteúdo relacionado

Mais procurados

Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptxFiras Obeid
 
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...Edureka!
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPTannusharma26
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsMatej Varga
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
Simplified Introduction to AI
Simplified Introduction to AISimplified Introduction to AI
Simplified Introduction to AIDeepu S Nath
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slidesCloudxLab
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingAkhilPolisetty
 
Knapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsKnapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsStelios Krasadakis
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 

Mais procurados (20)

Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
ChatGPT_ppf.pdf
ChatGPT_ppf.pdfChatGPT_ppf.pdf
ChatGPT_ppf.pdf
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptx
 
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...
A Brief History Of Artificial Intelligence | Developing Text To Speech Recogn...
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language Models
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
Simplified Introduction to AI
Simplified Introduction to AISimplified Introduction to AI
Simplified Introduction to AI
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Question answering
Question answeringQuestion answering
Question answering
 
Knapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsKnapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic Algorithms
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 

Semelhante a [KDD 2018 tutorial] End to-end goal-oriented question answering systems

Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering SystemDhwaj Raj
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Crowdsourcing Best Practices
Crowdsourcing Best Practices Crowdsourcing Best Practices
Crowdsourcing Best Practices Marta Sabou
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxKaiduTester
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my groupNAVER Engineering
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Building a Meta-search Engine
Building a Meta-search EngineBuilding a Meta-search Engine
Building a Meta-search EngineAyan Chandra
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference PresentationRecent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentationstewhir
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)AI Frontiers
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine ReadingIsabelle Augenstein
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsSungjin Nam
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningVo Viet Anh
 
FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsJinho Choi
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Hady Elsahar
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 

Semelhante a [KDD 2018 tutorial] End to-end goal-oriented question answering systems (20)

Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Crowdsourcing Best Practices
Crowdsourcing Best Practices Crowdsourcing Best Practices
Crowdsourcing Best Practices
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Building a Meta-search Engine
Building a Meta-search EngineBuilding a Meta-search Engine
Building a Meta-search Engine
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference PresentationRecent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentation
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community Presentations
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show Transcripts
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

[KDD 2018 tutorial] End to-end goal-oriented question answering systems

  • 1. End-to-end goal-oriented question answering systems LinkedIn ​Deepak Agarwal ​Bee-Chung Chen ​Qi He ​Jaewon Yang ​Liang Zhang
  • 2. Tutorial’s agenda 8:00 Introduction 8:15 End-to-End Workflow 8:25 Basic version: Single-Turn Question Answering 9:15 Advanced version: Multi-Turn Question Answering 9:55 break 10:15 Overview of Our Approach at LinkedIn 10:30 Analytics Bot 11:00 Company Assistant 11:15 LinkedIn Help Center 11:30 Conclusion and Q & A
  • 5. Modern Web Search with Question Answering
  • 6. Modern Web Search with Question Answering Knowledge Card
  • 8. Question-Answering Systems ● ● ○ ○ Where can I have japanese food in the downtown? Tomi Sushi is a nice japanese restaurant
  • 9. Question-Answering Systems ● ● ○ ○ ○ Please help me make a reservation at Tomi Sushi. Great. When and how many people?
  • 10. Question-Answering Systems ● ● ○ ○ ○ ○ ○ I lost my password. Can you help me? Sure. What is your account id?
  • 11. Types of Goal-Oriented QA Systems ● ○ ■
  • 12. Types of Goal-Oriented QA Systems ● ○ ■ ■
  • 13. Types of Goal-Oriented QA Systems ● ○ ■ ■ ● ○ Who were the founders of Linkedin? Reid Hoffman, Allen Blue, ... Knowledge DB DB Query: (_, Founder, LinkedIn)
  • 14. How Question Answering Can Help LinkedIn 200. Knowledge DB How many Java Developers joined LinkedIn last month? Recruiter Insight This AI software engineer job at LinkedIn is the best for you. Job Seeker Insight Good jobs for me based on my profile?
  • 15. Why This Tutorial? Why Now? ● ○ ○ ■ ● … ○ ● ○ ○
  • 16. Outline of This Tutorial ● ○ ○ ● ○ ○ ○ ● ○ ○ ●
  • 18. End-to-End Workflow of Question Answering Systems Bee-Chung Chen
  • 19. Modular Approach: Overview Where can I have japanese food in the downtown? Natural Language Understanding (NLU) Dialogue State Tracking (DST) Action Generation (Dialogue Policy) Natural Language Generation (NLG) KB, DB, Index Tomi Sushi is a nice japanese restaurant
  • 20. Modular Approach: Natural Language Understanding Natural Language Understanding (NLU) Input: User Utterance Where can I have japanese food in the downtown? (Speech-to-Text) Output: Interpretation Intent: Find_Restaurant Type = Japanese Area = Downtown KB, DB, Index Restaurant types: Japanese, Chinese, … Location: Country: USA State: CA City: San Jose Area: Downtown
  • 21. Modular Approach: Dialogue State Tracking Dialogue State Tracking (DST) Input: Current Interpretation Intent: Find_Restaurant Type = Japanese Area = Downtown Past State Intent: Find_Flight FromCity = New York FromState = NY FromCountry = US ToCity = San Jose ToState = CA ToCountry = US Output: State Intent: Find_Restaurant Type = Japanese Area = Downtown City = San Jose State = CA Country = US KB, DB, Index Restaurant types: Japanese, Chinese, … Location: Country: USA State: CA City: San Jose Area: Downtown Or, an embedding vector
  • 22. Modular Approach: Action Generation (Dialogue Policy) Action Generation (Dialogue Policy) Output: Action Action: Suggest_Restaurant Type = Japanese Name = Tomi Sushi KB, DB, Index Restaurant Search API Input: (Type, Location) Output: A list of restaurants ranked by their ratings Input: State Intent: Find_Restaurant Type = Japanese Area = Downtown City = San Jose State = CA Country = US
  • 23. Modular Approach: Natural Language Generation Natural Language Generation (NLG) Output: System Utterance Tomi Sushi is a nice Japanese restaurant (Text-to-Speech) Or, other UI elements KB, DB, Index Input: Action Action: Suggest_Restaurant Type = Japanese Name = Tomi Sushi
  • 24. End-to-End Learning Approach Where can I have japanese food in the downtown? Sequence-to- Sequence Model with Memory KB, DB, Index Tomi Sushi is a nice japanese restaurant
  • 25. Question Answering System Modular approach: Practical End-to-end learning approach: Research Basic version: Single-turn question answering NLU: Main Focus No DST Action Generation: Rule-Based NLG: Template-Based KB, DB, Index
  • 26. Question Answering System Modular approach: Practical End-to-end learning approach: Research Advanced version: Multi-turn question answering NLU: Neural Net DST: Neural Net Dialogue Policy: Neural Net NLG: Neural Net KB, DB, Index
  • 28. Single-Turn Question Answering Systems NLU: Main Focus No DST Dialogue Policy: Rule-Based NLG: Template-Based KB, DB, Index Pipelined or Joint Learning Logic Form KB, DB executerule/template-based action generation Sequence to Sequence Domain Detection Intent Detection Slot Filling ASR Text Audio Text Direct Search
  • 29. Definition Domain/Intent Detection Domain/Intent detection is a semantic text classification problem. domain/intent detection Select Flight From Airline_Travel_Table ... fill in arguments semantic frame/template
  • 30. vs. Sentence/Query Classification Classic Sentence Classification Query Classification in Search Domain/Intent Detection (Text Classification) in Q&A Input Written language sentence Keywords Spoken language sentence with significant utterance variations Training data Rich (News articles, Reviews, Tweets, TREC) Rich (Click-through) Few (Human labels) State-of-the -arts [Kalchbrenner et al., ACL 2014] [Kim, EMNLP 2014] CNN [Shen et al., CIKM 2014] [Palangi et al., TASLP 2016] CLSM, LSTM-DSSM [Tur et al., ICASSP 2012] [Ravuri and Stolcke, Interspeech 2015] DCN, RNN Domain/Intent Detection
  • 31. Challenges • 2 questions with different intents - “How was the Mexican restaurant” - “Tell me about Mexican restaurants” • Q1: Can we automatically generate contextual features for entities? Temporal scope of entity Utterance Variations • 2 questions with the same intent - “Show me weekend flights between JFK and SFO’ - “I want to fly from San Francisco to New York next Sunday” • Q2: Can they generate the same answer? • Q3: Which question will generate the better answer? • Significant unknown words, unknown syntactic structures for the same semantic • Q4: Can we efficiently expand the training data? Lack of training data Domain/Intent Detection Deep Neural Networks Paraphrase Active Q&A Character-level Modeling Paraphrase, Bots Simulator Domain-independent Grammar
  • 32. • Q1: Can we automatically generate contextual features for entities? Deep Neural Networks
  • 33. CNN Contextual Features [Kalchbrenner et al., ACL 2014, Kim, EMNLP 2014, Shen et al., CIKM 2014] ● 1-d convolution → k-d, 1 CNN layer → multiple CNN layers ● multiple filters: capture various lengths of local contexts for each word, n-gram features ● max pooling → k-Max pooling: retain salient features from a few keywords in global feature vector
  • 34. CNN in Text Classification [Kalchbrenner et al., ACL 2014, Kim, EMNLP 2014, Shen et al., CIKM 2014] ● TREC question classification: CNNs are close to “SVM with careful feature engineerings” ● Large window width: long-term dependency ● k-Max pooling maintains relative positions of most relevant n-grams ● Web query: ○ DSSM < C-DSSM (CLSM) ○ Short text: CNN is slightly better than unigram model Learned 3-gram features: keywords win at 5 active neurons in max pooling: Contextual Features
  • 35. RNN/LSTM The clouds are in the sky I grew up in France … I speak fluent French RNN LSTM github: 2015-08-Understanding-LSTMs Contextual Features
  • 37. LSTM in Text Classification [Palangi et al., TASLP 2016] ● Memory: become richer (more info) ● Input gates: do not update words 3, 7, 9 ● Peephole, forget updates are not too helpful when text is short and memory is initialized as 0 (just do not update) ● Web query: ○ DSSM < C-DSSM (CLSM) < LSTM-DSSM Case: match “hotels in shanghai” with “shanghai hotels accommodation (3) hotel in shanghai discount (7) and reservation (9)” Input gates cell gates (memory) Contextual Features
  • 38. Impact of Utterance Length (Context) [Ravuri and Stolcke, Interspeech 2015] When sentence is long: Basic RNN < LSTM Contextual Features When sentence is short: Basic RNN > LSTM
  • 39. • Q2: Can 2 paraphrases generate the same answer? • Q4: Can we efficiently expand the training data? Paraphrase
  • 40. Impact of Paraphrase for Question Answering Paraphrase [Duboue and Chu-Carroll, HLTC 2006] lexical paraphrase syntactical paraphrase 1. QA is sensitive to small variations in question 2. QA returns different answers for questions that are semantically equivalent 3. Lack of training data to cover all paraphrases Problem 1. Replace user question by the paraphrase canonical form 2. Use MT to generate paraphrases candidates 3. Multiple MTs to enhance diversity 4. Feature-based paraphrase selection 1. Oracle of paraphrase selection: +35% (high reward) 2. Random paraphrase selection: -55% (high risk) 3. A feature-based selection: +0.6% Solution Impact
  • 41. Definition Paraphrase generation is a sequence-to-sequence modeling problem. Paraphrase
  • 42. Three characteristics 1. Monolingual parallel data is not readily available (vs. bilingual parallel data in MT) Use Pivot language (pairs of ML systems, especially with different methods) 2. Not all of the words or phrases need to be replaced (vs. MT) 3. Hard evaluation (vs. MT) ○ MT uses BLEU: translations are scored based on their similarity to the human references ○ More difficult to provide human references (canonical forms) in paraphrase generation Paraphrase
  • 43. External Paraphrase Tables [Zhao et al., ACL 2009] Goal: sentence compression 1. Adequacy: {evidently not, generally, completely} preserved meaning 2. Fluency: {incomprehensible, comprehensible, flawless} paraphrase 3. Usability: {opposite to, does not achieve, achieve} the application 1. Jointly likelihood of Paraphrase Tables 2. Trigram language model 3. Application dependent utility score (e.g., similarity to canonical form in “paraphrase generation”) Human evaluation Model 1. Prefer paraphrases which are a part of the canonical form 2. Better than pure MT-based methods 3. Utility score is crucial Analysis Paraphrase
  • 44. RNN Encoder-Decoder [Cho et al., EMNLP 2014] Paraphrase
  • 45. RNN Encoder-Decoder in Paraphrase Generation [Cho et al., EMNLP 2014] ● Semantically similar (most are about duration of time) ● Syntactically similar Paraphrase
  • 46. • Q3: Which question will generate the better answer?Active Q&A
  • 47. Active Question Answering [Buck et al., ICLR 2018] Active Q&A
  • 48. • Q4: Can we efficiently expand the training data? Character-level Modeling
  • 49. Impact of Unknown Words [Ravuri and Stolcke, Interspeech 2015] 30% singletons 60% singletons RNN/LSTM-triletter < RNN/LSTM-word Preprocessed generic entities (dates, locations, phone numbers...) “Can you show me the address of Happy Kleaners?” → “Can you show me the address of LOC?” RNN/LSTM-triletter > RNN/LSTM-word Character-level Modeling
  • 50. • Q4: Can we efficiently expand the training data?Other solutions Bots Simulator: introduced in Q&A systems@LinkedIn Domain-independent grammar: introduced in sequence-to-sequence semantic parsing
  • 51. Definition Slot Filling Slot Filling is to extract slot/concept values from the question for a set of predefined slots/concepts. More specifically, it is often modeled as a sequence labeling task with explicit alignment. extract semantic concepts Select Flight From Airline_Travel_Table Where dept_city = “Boston” and arr_city = “New York” and date = today fill in arguments/slots semantic frame/template
  • 52. Characteristics Semantic frame • Pre-defined slots/concepts by goal-oriented dialog systems - vs. open-domain dialog systems • Q5: Can we leverage domain knowledge? • Long entity phrase has strong slot dependency • Q6: Is model sensitive to slot position? • Q7: Shall we globally assign labels together? Slot dependency Slot Filling Sentence watch star war episode IV a new hope Slots O B-mov I-mov I-mov I-mov I-mov I-mov I-mov • Input and output are of the same length - vs. other sequence labeling tasks (machine translation and speech recognition): the output is of the variable length • Q8: Can slot filling model leverage the “explicit alignment”? Explicit alignment Knowledge-based Model Bidirectional RNN, Slot Language Model, RNN-CRF Attention-Based RNN
  • 53. • Q5: Can we leverage domain knowledge? Knowledge- based Model
  • 54. Incorporate Additional Knowledge into RNN Knowledge-based Model Additional knowledge: ● Look-up table (taxonomy) ● Word categories (from Wikipedia) ● Named entity tags ● Syntactic labels [Yao et al., Interspeech 2013]
  • 55. • Q6: Is model sensitive to slot position? Bidirectional RNN
  • 56. RNN for Slot Filling RNN
  • 57. Bidirectional RNN for Slot Filling Bidirectional RNN Sentence show flights from Boston to New York today Slots O O O B-dept O B-arr I-arr B-date Sentence is today’s New York arrival flight schedule available to see Slots O B-date B-arr I-arr O O O O O O Forward RNN is better: Backward RNN is better: [Mesnil et al., Interspeech 2013]
  • 58. • Q7: Shall we globally assign labels together? Slot Language Model, RNN-CRF
  • 59. Slot Language Model and RNN-CRF Slot Filling [Mesnil et al., TASLP 2015]
  • 60. Motivation Domain Detection Intent Detection Slot Filling Benefits 1. Only 1 model needs to be trained, fine-tuned for multiple tasks, and deployed 2. Tasks enhance each other. For example, if the intent of a sentence is to find a flight, the sentence likely contains the departure and arrival cities, and vice versa 3. Outperform separate models for each task Joint Learning NLU: Joint Model Q9: What is the most effective learning structure of this multi-task learning? Q10: Should we jointly optimize the loss function or not? append intent to the beginning/end of slots 2 different tasks …...
  • 61. • Q8: Can slot filling model leverage the “explicit alignment”? • Q9: What is the most effective learning structure of this multi-task learning? Attention-based RNN
  • 62. Encoder-Decoder for Joint Learning Attention-based RNN [Liu and Lane, Interspeech 2016] attention alignment attention + alignment attention: normalized weighted sum of encoder states, conditioned on previous decoder state. Carry additional longer term dependencies (vs. h codes whole sentence info already) alignment: Do not learn alignment from training data for slot filling task -- waste explicit attention same encoder for 2 decoders
  • 63. Attention-based RNN for Joint Learning [Liu and Lane, Interspeech 2016] attention-based bidirectional RNNattention-based encoder-decoder performed similarly; faster Attention-based RNN
  • 64. • Q9: What is the most effective learning structure of this multi-task learning? • Q10: Should we jointly optimize the loss function or not? Joint Model Comparison
  • 65. Other Learning Structure Variations Joint Model Comparison Related work Idea [Xu and Sarikaya, ASRU 2013] CNN features for CRF optimization framework [Zhang and Wang, IJCAI 2016] Similar to Attention-based RNN, 1) no attention, 2) CNN contextual layer on top of input, 3) global label assignment, 4) replace LSTM by GRU [Hakkani-Tür Interspeech 2016] Append intent to the end of slots, Bidirectional LSTM [Wen et al., CCF 2017] Modeling slot filling at lower layer and intent detection at higher layer is slightly better than other variations [Goo et al., NAACL-HLT 2018] Attention has a higher weight, if slot attention and intent attention pay more attention to the same part of the input sequence (indicates 2 tasks have higher correlation) [Wang et al., NAACL-HLT 2018] Optimize loss function separately, alternatively update hidden layers of 2 tasks
  • 66. … and Their Comparisons Joint Model Comparison Results 1. Most of models achieved similar results 2. Attention-based RNN (2nd) beats the majority 3. Optimizing loss function separately is slightly better, partially because the weights on joint loss function need fine-tune [Wang et al., NAACL-HLT 2018]
  • 67. Sequence-to-Sequence Q&A Sequence to Sequence Logic Form KB, DB execute Sequence to SequenceText Motivations ● Simply the workflow: directly generate the final action ● Theoretically, model very complex Q&A Challenges ● Grammar mismatch between question and logic form ● Practically, suffer from the lack of training data for complex Q&A Question ● Q11: How to collect domain-specific training data? Grammar-complied Methods
  • 68. Direct Parsing • Uniqueness (vs. regular seq2seq): not only predict text sequence, but also comply with the grammar of target • 3 types of sequence prediction: ○ Logic Form ○ Canonical Phrase ○ Derivation Sequence (DS) • Result: Predict DS taking into account grammatical constraints increased QA accuracy (57% → 73%) [Xiao et al., ACL 2016] Sequence to Sequence
  • 69. • Q11: How to collect domain-specific and grammar-complied training data? Grammar-compl ied Methods
  • 70. Lack of Training Data in A Domain • Pros: 1) KB does not need cover all constants to support the direct x → y, 2) Suitable for online learning • Alternative: Shift the burden to learning: the features are domain-dependent. A good model can be used for paraphrase modeling to generate more training data offline Grammar-complied Methods [Kwiatkowski et al., EMNLP 2013]
  • 71. A New Domain: No Training Data • Focus: quickly collect training data for semantic parser of a new domain (overnight) • Advantages: 1) domain → utterances: enforce completeness of functionality; 2) no supervision on answers • Method: Use a domain-general grammar to generate both logical forms and canonical utterances [Wang et al., ACL 2015] Grammar-complied Methods
  • 72. Direct-Search Q&A Direct Search KB, DB Direct SearchText Motivations ● Can we retrieve answer in KB/DB like querying search engine? Yes, but: ● Limited to simple Q&A (retrieve a single fact) Challenge ● It is hard to find the answer if the answer is supported by multiple knowledge facts Question ● Q12: Can we build a scalable framework for extracting the answer from multiple knowledge facts? Memory Network
  • 73. Direct-Search Q&A: A Single Supporting Fact [Yih et al., ACL 2014] Direct Search KB, DB Question Question - EntityEntity Entity Relation Entity CNN Similarity CNN Similarity Score Rank Answer
  • 74. • Q12: Can we build a scalable framework for extracting the answer from multiple knowledge facts Memory Network
  • 75. Memory Network Architecture [Weston, ICML 2016] Memory Network 1) Input module: input KB and questions to memory 2) Generalization module: add new KB to memory 3) Output module: return supporting facts based on memory lookups 4) Response module: score and return objects of the support facts
  • 76. Memory Network in Simple Q&A [Bordes et al., 2015; Weston, ICML 2016] Memory Network
  • 77. Summary Pipelined or Joint Learning Logic Form KB, DB executerule/template-based action generation Sequence to Sequence Domain Detection Intent Detection Slot Filling ASR Text Audio Text Direct Search Single-turn Q&A ● Solve the single goal when goal is clear ● Technology is relatively mature ● Pave the foundation for deep NLU and complex infrastructure
  • 78. Advanced Version: Multi-Turn Question Answering Bee-Chung Chen
  • 79. Multi-Turn Question Answering Where can I have japanese food in the downtown? Natural Language Understanding (NLU) Dialogue State Tracking (DST) Action Generation (Dialogue Policy) Natural Language Generation (NLG) KB, DB, Index San Jose downtown or London downtown?
  • 80. Challenge: How to Evaluate a Multi-Turn System? Much more difficult than evaluating a single-turn system ● In a single-turn system, it is easier to label the correct answer to a question ○ Once we have labels, we can do offline evaluation ● In a multi-turn system, It is unclear what is “the correct response” ○ There can be many “successful paths” to achieve the goal ○ It is difficult to label sufficiently many successful paths for offline evaluation
  • 81. Evaluation Methods for Multi-Turn Systems Evaluation by hired human evaluators ● Also used for single-turn systems ● Expensive ● Difficult to cover all possible scenarios Experiments with end users ● Also used for single-turn systems ● Sometimes difficult to assess whether a user is satisfied (and need to predict user satisfaction) ● Only available after we launch the product Treating it as a ranking problem ● Also used for single-turn systems ● For each turn, use the model to rank a set of predetermined possible system utterances and compute precision ● Far from a realistic setting Evaluation based on a user simulator ● Limited by the capability of the simulator ● However, can be used to provide unlimited training and test data
  • 82. Practical Approach 1. Define the problem space and collect sample conversations 2. Build a user simulator ● Use sample conversations to fit the parameters of the simulator ● Collect more sample conversations based on simulator + crowdsourcing 3. Train a model using supervised learning on sample conversations 4. Improve the model using reinforcement learning (RL) based on the simulator 5. Test the model with friends and/or hired human evaluators and apply RL 6. Test the model with end users and apply RL
  • 83. Define the Problem Space User Goals ● Request & Constraints ● Example: Make a reservation of a Japanese restaurant in San Jose, … and let me know the address Intents / Dialogue Acts ● inform ● request ● confirm_question ● confirm_answer ● greeting ● closing ● multiple_choice ● thanks ● welcome ● deny ● not_sure [Microsoft Dialogue Challenge, 2018] Slot & Values ● Cuisine Japanese Chinese … ● Rating 5 4 … ● City, State San Jose, CA New York, NY ... request: address, reservation constraints: cuisine = “Japanese”, city = “San Jose”, state = “CA”, rating = “5”, date = “today”, time = “7pm”, number_of_people = “4” Example: https://github.com/xiul-msr/e2e_dialog_challenge
  • 84. Collect and Annotate Sample Conversations [Microsoft Dialogue Challenge, 2018] Role Utterance Annotation (logical form, semantic frame) User Hello, i was wondering if you can book a restaurant for me? Pizza would be good. greeting(greeting=hello), request(reservation), inform(food=pizza) Agent Sure! How many people are in your party? request(number_of_people) User Please book a table for 4 inform(number_of_people=4) Agent Great! What city are you dining in? request(city) User Portland inform(city=Portland) Agent Ciao Pizza or Neapolitan Pizzeria? multiple_choice(restaurant_name=...) User What is the price range of Ciao Pizza inform(restaurant_name=Ciao Pizza), request(pricing)
  • 85. Practical Approach 1. Define the problem space and collect sample conversations 2. Build a user simulator ● Use sample conversations to fit the parameters of the simulator ● Collect more sample conversations based on simulator + crowdsourcing 3. Train a model using supervised learning on sample conversations 4. Improve the model using reinforcement learning (RL) based on the simulator 5. Test the model with friends and/or hired human evaluators and apply RL 6. Test the model with end users and apply RL
  • 86. Agenda-Based User Simulation [Schatzmann & Young, 2009] User Goal ● Request Rt : e.g., name, address, phone ● Constraint Ct : e.g., { type=bar, drinks=beer, area=central } ● Rt and Ct can change over time t Agenda ● A stack of user actions to be performed ● Generated by a set of probabilistic rules ● Pop to perform a user action ● Push to add future actions in response to the agent’s actions At =
  • 87. Build a Simulator: Agenda-Based User Simulation [Schatzmann & Young, 2009]
  • 88. Build a Simulator: Summary Space Technique [Schatzmann & Young, 2009]
  • 89. Collect More Annotated Conversations by Simulation + Crowdsourcing [Shah et al., 2018] Rule-Based Agent User Simulator Simulated Conversations Contextual Paraphrasing Crowdsourcing Task #1 Make conversation more natural with coreferences and linguistic variations Validation Crowdsourcing Task #2 Verify the created paraphrases have the same meaning by consensus of n workers Generate both utterances and annotations Annotated Conversation Paraphrasing and validation tasks are much easier than annotation tasks
  • 90. Turn 1 Turn 2 Turn 3 Feed Forward Neural Net User Simulation based on Seq-to-Seq Models [Crook & Marin, 2017; Kreyssig et al., 2018] After we collect more annotated data, we can improve the user simulator by more advanced models Example: Sequence-to-sequence models learned from annotated data vt : Feature vector of the user state at turn t User utterance of turn 4 =
  • 91. Practical Approach 1. Define the problem space and collect sample conversations 2. Build a user simulator ● Use sample conversations to fit the parameters of the simulator ● Collect more sample conversations based on simulator + crowdsourcing 3. Train a model using supervised learning on sample conversations 4. Improve the model using reinforcement learning (RL) based on the simulator 5. Test the model with friends and/or hired human evaluators and apply RL 6. Test the model with end users and apply RL
  • 92. Multi-Turn Question Answering Where can I have japanese food in the downtown? Natural Language Understanding (NLU) Dialogue State Tracking (DST) Action Generation (Dialogue Policy) Natural Language Generation (NLG) KB, DB, Index San Jose downtown or London downtown?
  • 93. Train a Dialogue State Tracker (DST) State-of-the-art: A neural network model combining NLU & DST Input: Previous state & conversation State: Example - belief state Pr(intent) Pr(inform) = 0.8 Pr(request) = 0.9 …. Pr(slot=value | intent) Pr(price=cheap | inform) = 0.9 Pr(city=London | inform) = 0.1 …. Pr(food | request) = 0.7 Pr(phone | request) = 0.1 …. Agent: What kind of food at what price? request(food, price) User: I want something cheap. What are the available food types? Output: New state
  • 94. Train a Dialogue State Tracker (DST) With annotated data, this is a supervised learning problem Input: Previous state & conversation Agent: What kind of food at what price? request(food, price) User: I want something cheap. What are the available food types? Output: Pr(intent) Pr(inform) = 0.8 Pr(request) = 0.9 …. Pr(slot=value | intent) Pr(price=cheap | inform) = 0.9 Pr(city=London | inform) = 0.1 …. Pr(food | request) = 0.7 Pr(phone | request) = 0.1 …. Label 1 1 Label 1 0 1 0
  • 95. Example DST: Neural Belief Tracker [Mrksic et al., 2018] Predict Pr(price=cheap | inform) For each (slot, value) request(food, price) CNN
  • 96. Global-Locally Self-Attentive Encoder [Zhong et al., 2017] slot-specific slot-specific Utterance Utterance Representation
  • 97. Example DST: Global-Locally Self-Attentive Tracker [Zhong et al., 2017] For each candidate slot=value Predict Pr(price_range=cheap | inform) using global-locally self-attentive encoders
  • 98. Multi-Turn Question Answering Where can I have japanese food in the downtown? Natural Language Understanding (NLU) Dialogue State Tracking (DST) Action Generation (Dialogue Policy) Natural Language Generation (NLG) KB, DB, Index San Jose downtown or London downtown?
  • 99. Dialogue Policy Input: State Pr(intent) Pr(inform) = 0.8 Pr(request) = 0.9 …. Pr(slot=value | intent) Pr(price=cheap | inform) = 0.9 Pr(city=London | inform) = 0.1 …. Pr(food | request) = 0.7 Pr(phone | request) = 0.1 …. Output: Dialogue act inform(food = chinese) Methods: - Rules - Supervised Learning model(state) => intent(slot=value, ...) - Reinforcement Learning
  • 100. Supervised Learning Policy: (st ) → at Training Data: State Correct Action s0 greeting(hello) s1 request(food) s2 inform(food=...) ... ... Example state: st : Pr(inform) = 0.8 Pr(request) = 0.9 ... Pr(price=... | inform) = 0.9 Pr(city=... | inform) = 0.1 … Or, an embedding vector feature vector greeting(hello) request(food) inform(food=...) ...
  • 101. Action template (summary action) How to Deal with a Large Action Space (st ) inform(city=San Jose) inform(city=London) inform(food=Chinese) inform(food=Korean) ... ... (st ) inform(city=GET_CITY) inform(food=GET_FOOD) Action template San Jose London Chinese Korean ... ... API call or rules Action mask - Remove invalid candidate actions based on rules (domain knowledge, common sense) - Example: Don’t recommend a restaurant if location is unknown Don’t make a reservation if the user has not yet selected a restaurant
  • 102. Example: Hybrid Code Networks [Williams et al., 2017] State Tracker: RNN Input feature vector to RNN
  • 103. Multi-Turn Question Answering Where can I have japanese food in the downtown? Natural Language Understanding (NLU) Dialogue State Tracking (DST) Action Generation (Dialogue Policy) Natural Language Generation (NLG) KB, DB, Index San Jose downtown or London downtown?
  • 104. Modular Approach: NLG Natural Language Generation (NLG) Output: System Utterance Tomi Sushi is a nice Japanese restaurant (Text-to-Speech) Input: Action Action: Suggest_Restaurant Type = Japanese Name = Tomi Sushi Basic version: Templates [Name] is a nice [Type] restaurant How about a [Type] restaurant like [Name] I would recommend a [Type] restaurant like [Name] ...
  • 105. Modular Approach: NLG Natural Language Generation (NLG) Output: System Utterance Tomi Sushi is a nice Japanese restaurant (Text-to-Speech) Input: Action Action: Suggest_Restaurant Type = Japanese Name = Tomi Sushi Advanced version: RNN Decoder (e.g., LSTM) ● Add the action as additional input to each RNN cell ● Use multiple layers of RNN to improve performance ● Use a backward RNN to further improve performance
  • 106. NLG: Semantically Conditioned LSTM [Wen et al., 2015] xt xt xt xt xt ht-1 d0
  • 107. Practical Approach 1. Define the problem space and collect sample conversations 2. Build a user simulator ● Use sample conversations to fit the parameters of the simulator ● Collect more sample conversations based on simulator + crowdsourcing 3. Train a model using supervised learning on sample conversations 4. Improve the model using reinforcement learning (RL) based on the simulator 5. Test the model with friends and/or hired human evaluators and apply RL 6. Test the model with end users and apply RL
  • 108. Agent Turn t=1 Problem Setting Agent State s0 input x0 action a0 State s1 input x1 update Turn t=0 Agent State s2 input x2 action a1 update Turn t=2 action a2 End USER DST: (st-1 , at-1 , xt ) → st Policy: (st ) → at Example: at-1 = request(food, price) Example: st-1 : Pr(inform) = 0.8 Pr(request) = 0.9 ... Pr(price=... | inform) = 0.9 Pr(city=... | inform) = 0.1 ... xt = “I want something cheap. What are the available food types?” at = inform(food, GET...)
  • 109. Reinforcement Learning Agent State s0 input x0 Agent State s1 input x1 action a0 reward r0 update Time t=0 Agent State s2 input x2 action a1 reward r1 update Time t=1 Time t=2 action a2 reward r2 End USER Supervised Learning: Annotate correct actions Policy: (st ) → at -1 -1 20 Example Reward - Each step: -1 - Success: 20 - Failure: 0 Reinforcement Learning: Define the reward
  • 110. Reinforcement Learning Methods: Outline Basic methods - Q-learning: Deep Q-Network - Policy gradient: REINFORCE (Monte-Carlo gradient ascent) Advanced methods - Actor-Critic policy gradient method with experience replay - Deep Dyna-Q & BBQ-Networks - Multi-level reinforcement learning
  • 111. Action-Value Function Q(s, a) Q (s, a) = E [ total reward | we start from state s, take action a, and then follow ] Policy: (st ) → at Q*(s, a) = Q (s, a) when is the optimal policy Optimal policy *(st ) = argmaxa Q*(st , a) Q*(st , at ) = E [ rt + maxa Q*(st+1 , a) | st , at ] Value
  • 112. Q-Learning: Deep Q-Network Goal: Learn Q*(st , at ) = E [ rt + maxa Q*(st+1 , a) | st , at ] Optimal policy *(st ) = argmaxa Q*(st , a) Neural Net: Q(s, a | w) → Value w = model parameters (weights) Q-Learning: Find w that minimizes E [ ( Q(s, a | w) - Q*(s, a) )2 ] SGD: Use a single sample (st , at , rt , st+1 ) to to compute Q*(st , at ) Compute the gradient using the sample and do gradient descent
  • 113. Deep Q-Network: Details Q*(st , at ) = E [ rt + maxa Q*(st+1 , a) | st , at ] Deep Neural Net: Q(s, a | w), w = argminw E [ ( Q(s, a | w) - Q*(s, a) )2 ] While(current state st ) Take action at = argmaxa Q(st , a | wt ) with probability (1 - ); random, otherwise. Receive (rt , st+1 ) and save (st , at , rt , st+1 ) in replay memory D Sample a mini-batch B from buffer D Update wt based on [Mnih et al., 2015] -greedy
  • 114. Reinforcement Learning Methods: Outline Basic methods - Q-learning: Deep Q-Network - Policy gradient: REINFORCE (Monte-Carlo gradient ascent) Advanced methods - Actor-Critic policy gradient method with experience replay - Deep Dyna-Q & BBQ-Networks - Multi-level reinforcement learning
  • 115. Policy Gradient Policy (s, a | ) = Pr(take action a | state s, model parameter ) [Sutton et al., 2000] Reward
  • 116. Gradient Derivation (1) (No reward discounting over time)
  • 117. Gradient Derivation (2) Recursion (No reward discounting over time)
  • 118. Gradient: Supervised vs. Reinforcement Supervised (imitation) Learning Reinforcement Learning correct actions agent’s actions value of action at
  • 119. Monte-Carlo Policy Gradient Ascent (REINFORCE) While( run policy (⋅| ) to generate s0 , a0 , …, sT , aT ) Compute vt = total reward starting from step t (based on this sample run) Update based on [Williams et al., 1988] SGD: Do a sample run of the policy s0 , a0 , …, sT , aT Use this run to compute the sample Q value Compute the gradient using this sample and do gradient ascent
  • 120. Reinforcement Learning Methods: Outline Basic methods - Q-learning: Deep Q-Network - Policy gradient: REINFORCE (Monte-Carlo gradient ascent) Advanced methods - Actor-Critic policy gradient method with experience replay - Deep Dyna-Q & BBQ-Networks - Multi-level reinforcement learning
  • 121. Collect data D = { (s0 , a0 , p0 , v0 , …, sT , aT , pT , vT ) } - pt = Pr(take action at at step t), recorded during data collection While( sample (s0 , a0 , p0 , v0 , …, sT , aT , pT , vT ) from D ) Update based on Policy Gradient with Experience Replay Can we learn from past data? - Importance sampling , which is capped to prevent high variance past example Pr(past example | old policy) Pr(past example | new policy)
  • 122. Actor-Critic with Experience Replay While( run policy (⋅| ) to generate s0 , a0 , …, sT , aT ) Save (s0 , a0 , p0 , v0 , …, sT , aT , pT , vT ) in replay memory D Train w1 and w2 using experience replay (with importance sampling weighting) Update based on [Kavosh & Williams, 2016; Weisz et al., 2018] Problem: vt has high variance - Predict vt by a model Q(st , at | w) - Q(st , at | w) also have high variance - Replace Q(st , at | w) by A(st , at | w) = Q(st , at | w1 ) - V(st | w2 )
  • 123. Reinforcement Learning Methods: Outline Basic methods - Q-learning: Deep Q-Network - Policy gradient: REINFORCE (Monte-Carlo gradient ascent) Advanced methods - Actor-Critic policy gradient method with experience replay - Deep Dyna-Q & BBQ-Networks - Multi-level reinforcement learning
  • 124. Deep Dyna-Q Train a “world model” to predict rewards and user actions M(s, a | wM ) → (reward, user action, terminate or not) Use the world model to generate simulated data Apply Q-learning to simulated data => Planning (The agent thinks about and “plans” for hypothetical scenarios)
  • 125. Deep Dyna-Q: Details For each step: - Serve user based on Q(s, a | wQ ) using -greedy - Save experience in D - Update wQ by Q-learning based on a sample from D - Update wM by learning from a sample from D - Update wQ by Q-learning based on simulation (a.k.a. planning) using M(s, a | wM ) and Q(s, a | wQ ) Q(s, a | wQ ) → value M(s, a | wM ) → (reward, user action, terminate or not) [Peng et al., 2018]
  • 126. BBQ-Networks [Lipton et al., 2017] Model the uncertainty of Deep Q-Network: Q(s, a | w) Bayes-by-Backprop [Blundell et al., 2015] Assume prior w ~ N( 0 , diag( 0 2 ) ) D = { (si , ai , vi ) }, where vi is the observed Q(si , ai ) Approximate p(w | D) by q(w | , ) = N( , diag( 2 ) ) s.t. where = log(1+ exp( )) Thompson sampling Draw wt ~ q(w | t , t ) and take action argmaxa Q(st , a | wt )
  • 127. BBQ-Networks: Details - SGVB [Kingma & Welling, 2014] Draw ~ N(0, 1) for L times Take a minibatch of size M from D Compute the gradient and perform one step of SGD
  • 128. Reinforcement Learning Methods: Outline Basic methods - Q-learning: Deep Q-Network - Policy gradient: REINFORCE (Monte-Carlo gradient ascent) Advanced methods - Actor-Critic policy gradient method with experience replay - Deep Dyna-Q & BBQ-Networks - Multi-level reinforcement learning
  • 129. Further Reduce the Action Space Action hierarchy - Action group → individual action Learn K+1 policies - Master policy: (state) → action group g - K sub-policies: For each group g, g (state) → action a [Casanueva et al., 2018] Q(s, g | w) Q(s, a | wg ) Deep Q-Network share some parameters across different groups
  • 130. Sequential Subtasks [Peng et al., 2017] E.g., recommend a restaurant E.g., make a reservation
  • 131. Practical Approach 1. Define the problem space and collect sample conversations 2. Build a user simulator ● Use sample conversations to fit the parameters of the simulator ● Collect more sample conversations based on simulator + crowdsourcing 3. Train a model using supervised learning on sample conversations 4. Improve the model using reinforcement learning (RL) based on the simulator 5. Test the model with friends and/or hired human evaluators and apply RL 6. Test the model with end users and apply RL
  • 132. Summary Multi-turn question answering is an active research area How to obtain training data is a key challenge - Simulation + crowdsourcing is a promising direction - Continuously improve the simulator to generate better data Reinforcement learning is promising - It is important to pretrain a RL model on reasonable sample data - Otherwise, it will have a hard time to get success and will learn to end early - Interesting directions: Reduce variance in training, model uncertainty better, leverage hierarchical structure
  • 133. Part II Question Answering Systems @ LinkedIn
  • 134. Overview of Our Approach Liang Zhang
  • 135. Our Use Cases of Goal-Oriented Question Answering ● ○ ● ○ ● ○ ●
  • 137. Challenges - Data Collection • No data to start with • What data to collect ? • Who can help generate the data? Cold-Start
  • 138. Challenges - ML Problem Definition Supervised ML Problems Feature X Label Y Model QA Problems ● ● ■ ● ■ ⇔ ■
  • 140. Challenges - Adversarial Inputs from Users
  • 141. Challenges - Adversarial Inputs from Users
  • 143. Single-turn Question Answering Systems NLU: Main Focus No DST Dialogue Policy: Rule-Based NLG: Template-Based KB, DB, Index Pipelined or Joint Learning Logic Form KB, DB executerule/template-based action generation Sequence to Sequence Domain Detection Intent Detection Slot Filling ASR Text Audio Text Direct Search
  • 144. Our High-level Modeling Design “Who founded LinkedIn?” Entity Recognition: “LinkedIn” Intent Detection: “Founder” Slot Filling & Rule/Template-based Action Generation Output: “Reid Hoffman, Allen Blue, ...” Model DB Query: (_, Founder, LinkedIn)
  • 145. Data Collection A Set of Seed Questions & Answers Define Problem & Scope “Who founded LinkedIn?” “What jobs do you have?” …... “Who is the founder of LinkedIn?”“Founder of LinkedIn?” “LinkedIn founders” …... Expansion of similar questions that lead to same answer Sometimes challenging! Out-of-scope questions collected from public databases with similar domains
  • 146. Data Annotation ● Entity Recognition: ● Intent detection ● Slot Filling & Rule/Template-based Action generation “Who founded LinkedIn?” Entity Annotation: “LinkedIn” => Company “Who founded LinkedIn?” Intent Annotation: “Founder” “Who founded LinkedIn?” Result Annotation: DB Query: (_, Founder, LinkedIn)
  • 147. Entity Recognition and Tagging How many software engineers know Java in United States? Title Skill Country ● Semi-CRF Model (Sarawagi and Cohen 2005) ● Deep Neural Network (Lample et al. 2016) Lample et al. 2016
  • 148. Intent Detection ● Define a set of intents ○ Depend heavily on product design! ● Multi-class classification model ○ Problem: Question => Intent ○ Model: Logistic regression, CNN, RNN, LSTM, … ○ Features: Bag of words, word embeddings, tagged entities, ... ● Out-of-scope intent a must!
  • 149. Action Generation for Answer Retrieval ● “Good jobs at Google?” ○ Call LinkedIn job recommendation engine with company == “google” ● “How many members are in active community?” ○ Convert to SQL query and query the database ● “Change my password” ○ Show the article section that contains “how to change the password” step by step ● ...
  • 151. Analytics Bot: Motivation • Data-driven decision making: we have to answer a lot of questions about metrics! • Answering metric questions requires domain experts • What if a Q&A bot can answer (simple) metric-related questions?http://tanerakcok.com/data-driven-product-management-taner-akcok/
  • 152. Analytics Bot (Ana Bot) • Internal Q&A bot about metrics • Answers metric query questions (how many X) • Answers definition questions (what is X)
  • 153. Ana: ML Problems • Intent: Definition, Query, Out-of-scope (OOS) • Definition: What is “contributor”? • Query: How many contributors 2 days ago? • OOS: How is the weather? Intent Detection Question2Definition Question2Query • Find right definition for the definition question • In: What is contributor? • Out: Contributor is a user who initiates or continues conversation at LinkedIn • Create a database query from a given question • In: How many contributor yesterday? • Out: SELECT COUNT(DISTINCT mem_id) FROM contributors WHERE date = ‘08-18-2018’;
  • 154. Ana: Model Flow UI Intent Detection Intent-specific NLU: Question2Definition Presto UI Intent-specific NLU: Question2Query Knowledge Base
  • 155. Main Problem: Question2Query • Input: Natural language question • Output: Database query (SQL) • Well studied problem, yet accuracy not reliable (~60%) [Zhong et al. 2017b] • Our goal: Develop a ML model that can: ○ Has reliable accuracy ○ Work without large training data SELECT COUNT(DISTINCT mem_id) FROM contributors WHERE contribution_type == ‘message’ AND activity_time > 2018-08-13 “How many contributors messaged last week?”
  • 156. Question2Query: Challenges • Expensive to get large training data • Need to onboard new metrics • SQL is hard to canonicalize • “Almost success” does not count Challenges Our Approaches • Leverage models trained on public data sets • Leverage metadata available in the company • Formulate slot filling for SQL • Show our interpretation and allow users to make minor changes.
  • 157. Data Collection • Survey “What question would you ask to Ana?” • 60 seed questions from 20 domain experts • Discovered “Definition” intent from the seed Qs • Selected 20 target metrics (tables and columns) • Defined slot filling problem for SQL query generation Initial User Study Training Data Generation • 60 Seed questions -> 3k (question, answer) pairs • Initial annotation by data scientists • Annotators generate paraphrases • Multiple reviews to get consistent annotation
  • 158. Slot Filling for Query • Question -> Query: How do we generate query? ○ Word-by-word generation does not work ○ Better to do slot filling [Zhong et al, 2017b] • Why is slot filling better? ○ Concise: Reduce chance of error ○ Interpretable: Easy to explain to the user ○ Modular: Can divide and conquer SELECT COUNT(DISTINCT mem_id) FROM contributors WHERE contribution_type == ‘message’ AND activity_time > 2018-08-13 “How many contributors messaged last week?”
  • 159. Slot Filling for SQL • Use 4 slots to represent queries ○ Metric: Aggregator, Column, Table ○ Time: Where clause about time ○ Filter: Additional WHERE clauses ○ Breakdown: Group by (Not shown left) • Concise: Reduced # possible outcomes by 20X • Interpretable: Show the outcome in English • Modular: Use SUTime for Time prediction SELECT COUNT(DISTINCT mem_id) FROM contributors WHERE contribution_type == ‘message’ AND activity_time > 2018-08-13 Metric: unique_contributors Filter: contribution_type == ‘message’ Time: Last 7 days “How many contributors messaged last week?”
  • 160. ML Model for Slot Filling • Predict each slot by feature-based regression models • Features for (input question q, metric m) ○ Similarities between q and the description for m ■ E.g, unique_contributors -> “#members who posted, shared ...” ■ Helps m when m has little training data ○ Similarities between q and training examples with m • Transfer learning: Use word embeddings, named entities tagger, deep neural networks trained on other large data sets to compute similarities
  • 161. User Modifying Query • Output: (Metric, Filter, Time, Breakdown) ○ One slot miss = incorrect query • Solution: Let user modify query among top 3 ○ Use slot description (No SQL knowledge needed) • User gets answer if right query is in top 81
  • 162. Question2Query Modeling Results Model Improvement in accuracy over baseline Unsupervised (Finding Nearest Neighbor in Training Data) -- Logistic Regression model with All features +21% Prod Model: Logistic Regression with Pairwise model training +26% Prod Model with Top 3 Precision +44%
  • 163. Ana: How to Add New Metric Acc. # Training questions. • Varying # training q and measure acc per metric • Performance converge with > 10 q
  • 164. Intent Detection • 3-class classification: {Definition, Query, Out-of-scope (OOS)} • Hardest problem: preparing training examples for OOS. • Model: Logistic regression with unigram and bigram features. Challenge comes from training examples, not ML features. • Ana allows user to override the intent
  • 165. Question2Definition • Question -> The most relevant definition • Method: Rank by character n-gram matches ○ ‘Post’ -> {‘po’, ‘os’, ‘st’, ‘pos’, ‘ost’} • Easy: Linguistic variation does not matter • Very useful to the user (higher NPS score)
  • 166. Ana: Conclusion • Q&A bot that does Question2Query and Question2Definition • Formulated slot filling to improve performance,interpretability and modularity • Leveraged metadata / transfer learning to overcome cold-start • Users liked: ○ Ana interacts with them, by confirming the intent, confirming the query ○ Ana allows users to correct Ana ○ Ana can look up definitions as well as handling queries
  • 169. Company Assistant: Motivation • Share Rich LinkedIn insights • People Search & Research Questions • Career Advice Questions https://ru.kisspng.com/png-big-data-analytics-data-science-apache-hadoop-data-454060/
  • 170. Company Assistant: ML Problems • Intent: 9 Intents + Out-of-scope (OOS) • About Company • People Search • Skills Insights • ... • OOS: Will you marry me? Intent Classification Entity Tagging • Identify named entities in the question • In: Show ML jobs in Microsoft? • Out: COMPANY<Microsoft> • Out: SKILL<Machine Learning>
  • 171. Company Assistant: Model Flow UI Intent Detection UI Jobs API Entity Tagger API Selector News API Salary API ...
  • 172. Challenges • Expensive to get large training data • Overlapping intents • Domain specific language Challenges Our Approaches • Leverage models trained on public data sets • Early interactive model testing to arrange overlapping intents • Bootstrap from in-domain users (not crowdsourcing) to collect examples with domain specific language
  • 173. Data Collection • 30 initial questions • 40 predefined answers (images) • Ask people to create a question Internal Data Collection Simulator Bot • Simulator UI • ML Model to guess Intent • User feedback (Correct/Incorrect) • Online Accuracy estimation • OOS intent from external sources • Paraphrases (Simulator) • Total Questions: ~3K • OOS intent: ~10K Training Data BIAS
  • 174. Entity Tagging • Reusing existing gazetteers/dicts (jobs) • Semi-markov CRF model • Train on a hand-labeled data https://es.kisspng.com/png-gear-color-icon-color-gears-346362/
  • 175. Identifying Logical Expression • Example: “Show my connections that do not work at Apple” ○ Intent: Connections, Company: Apple • Dependency Parsing + Rule-based
  • 176. Intent Detection • 10-class classification: {About, Skills, Connections, Courses, Influencers, Jobs, News, Salary, Talents, Out-of-scope (OOS)} • Hardest problem: preparing training examples for OOS. • Final Model: CNN-RNN Deep model
  • 178. Intent Detection: Results Relative comparison to Human Performance: Model Performance Human 100% Baseline: Logistic Regression + CRF (LUIS) 90.50% CNN-RNN Model 98.80% CNN-RNN Model Intent Precision Recall About +65% -30% Analyze Skills +37% -18% Connections +10% +4% Courses +27% -31% Influencers +100% +60% Jobs +7% -1% News +15% +1% Salary +19% -13% Talent Stats -3% +10% OOS -3% +20% Accuracy +9% Relative comparison to the Baseline:
  • 179. Company Bot + Ana Bot: Lessons Learned • Start from user study: Define bot specs and get seed training data • Leverage existing NLP tools as much as possible ○ Time parsing, Dependency / POS parsing, Word embedding • Build a simple model and test it asap ○ Benefit: Clean up training data, Fix poorly defined intents • Show the interpretation and allow users to change ○ Help user achieve the goal even if the bot is 100% accurate
  • 180. Question Answering in LinkedIn Help Center * Special thanks to Weiwei Guo for providing the material Liang Zhang
  • 181. Question Answering in LinkedIn Help Center • Input: Question • Output: Article / Section of the Article that contains the answer ○ Few thousands candidates ○ New candidates get added regularly • Traditional keyword search does not work well
  • 182. Our Data • A random sample of user click-through data in 01/2017 - 04/2018 • 200K <query, list of articles> samples, click / non-click as responses for each <query, article> • Only contains queries with > 4 words (where traditional keyword search generally fails!)
  • 183. Our Modeling Approach • Consider each candidate article as an intent • Problem: Natural language Query => Intent • Need to handle regularly added new articles without too-frequently retraining the model
  • 184. Our Modeling Approach -- CNN • Independent CNN layers for Question & article title, keywords, content ○ Word2Vec Embedding to initialize • Final layers: Sim<Question Embedding, Article Embedding> => Softmax over all candidate questions • For new articles in test data: Use embedding similarity instead of retraining the model
  • 185. Deep + Wide Model with Better Performance • Pure CNN Model did not work well for out-of-vocabulary queries • Traditional IR-based Keyword-match features added to complement (wide component) Wide Component (keyword matches)
  • 186. Offline Evaluation Model Lift % in Precision@1 vs Control (IR-based) Embedding Similarity (GloVe) -35% Embedding Similarity (LinkedIn data) -8% Deep Model (CNN) +55% Deep + Wide Model +57% Embedding Similarity: Cosine similarity (query embedding, article embedding). Query (or article) embedding is element-wise average of word embeddings in the query (or the article) using Stanford GloVe
  • 187. Online A/B Test Performance Metric Lift % vs Control (IR-based) Happy Path Rate +14% Undesired Path Rate -18% Search Sessions w/ Clicks on Results +7% ● Happy Path: Users who clicked only one search result and then left help center without creating a case ● Undesired Path: User who did the search and went to “contact us” directly without reading any articles from search results.
  • 190. Conclusion and Future Challenges ● Literature review of Question Answering Systems for ○ Basic Version: Single-Turn Question Answering ○ Advanced Version: Multi-Turn Question Answering ● Our practical lessons learned through 3 LinkedIn use cases ● An area with a lot of potentials and challenges, e.g. ○ How to handle cold-start scalably? ○ How to make the model work more reliably? ○ How to have seamless human-like interactions with human? ○ How to make the model generic enough so that it works for different domains with little effort? ○ …...
  • 193. ● [Androutsopoulos and Malakasiotis, JAIR 2010] A Survey of Paraphrasing and Textual Entailment Methods ● [Berant and Liang, ACL 2014] Semantic Parsing via Paraphrasing ● [Bordes et al., 2015] Large-scale Simple Question Answering with Memory Networks ● [Buck et al., ICLR 2018] Ask The Right Questions: Active Question Reformulation With Reinforcement Learning ● [Casanueva et al., 2018] Casanueva, Iñigo, et al. "Feudal Reinforcement Learning for Dialogue Management in Large Domains." arXiv preprint arXiv:1803.03232 (2018). ● [Chen et al., KDD Explorations 2017] A Survey on Dialogue Systems: Recent Advances and New Frontiers ● [Chen et al., WWW 2012 – CQA'12 Workshop] Understanding User Intent in Community Question Answering ● [Cho et al., EMNLP 2014] Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation ● [Crook & Marin, 2017] Crook, Paul, and Alex Marin. "Sequence to sequence modeling for user simulation in dialog systems." Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017). 2017. ● [Dauphin et al., ICLR 2014] Zero-Shot Learning for Semantic Utterance Classification ● [Deng and Yu, Interspeech 2011] Deep Convex Net: A Scalable Architecture for Speech Pattern Classification
  • 194. ● [Deng et al., SLT 2012] Use Of Kernel Deep Convex Networks And End-to-end Learning For Spoken Language Understanding ● [Duboue and Chu-Carroll, HLTC 2006] Answering the Question You Wish They Had Asked: The Impact of Paraphrasing for Question Answering ● [Goo et al., NAACL-HLT 2018] Slot-Gated Modeling for Joint Slot Filling and Intent Prediction ● [Hakkani-Tür Interspeech 2016] Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM ● [Hashemi et al., QRUMS 2016] Query Intent Detection using Convolutional Neural Networks ● [Hu et al., NIPS 2014] Convolutional Neural Network Architectures for Matching Natural Language Sentences ● [Kalchbrenner et al., ACL 2014] A convolutional neural network for modelling sentences ● [Kavosh & Williams, 2016] Asadi, Kavosh, and Jason D. Williams. "Sample-efficient deep reinforcement learning for dialog control." arXiv preprint arXiv:1612.06000 (2016). ● [Kim, EMNLP 2014] Convolutional Neural Networks for Sentence Classification ● [Kingma & Welling, 2014] Kingma, Diederik P., and Max Welling. "Stochastic gradient VB and the variational auto-encoder." Second International Conference on Learning Representations, ICLR. 2014. ● [Kreyssig et al., 2018] Kreyssig, Florian, et al. "Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue Systems." arXiv preprint arXiv:1805.06966 (2018).
  • 195. ● [Kwiatkowski et al., EMNLP 2013] Scaling Semantic Parsers with On-the-fly Ontology Matching ● [Lample et al. 2016] Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv preprint arXiv:1603.01360 (2016). ● [Lee and Dernoncourt, NAACL 2016] Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks ● [Lipton et al., 2017] Lipton, Zachary, et al. "BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems." arXiv preprint arXiv:1711.05715 (2017). ● [Liu and Lane, Interspeech 2016] Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling ● [Mesnil et al., Interspeech 2013] Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding ● [Mesnil et al., TASLP 2015] Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding ● [Mnih et al., 2015] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529. ● [Mrksic et al., 2018] Mrkšić, Nikola, and Ivan Vulić. "Fully statistical neural belief tracking." arXiv preprint arXiv:1805.11350 (2018). ● [Palangi et al., TASLP 2016] Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval
  • 196. ● [Peng et al., 2017] Peng, Baolin, et al. "Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning." arXiv preprint arXiv:1704.03084 (2017). ● [Peng et al., 2018] Peng, Baolin, et al. "Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018. ● [Rajpurkar et al. 2016] Rajpurkar, Pranav, et al. "Squad: 100,000+ questions for machine comprehension of text." arXiv preprint arXiv:1606.05250 (2016). ● [Ravuri and Stolcke, Interspeech 2015] Recurrent Neural Network and LSTM Models for Lexical Utterance Classification ● [Reddy et al., TACL 2014] Large-scale Semantic Parsing without Question-Answer Pairs ● [Sarawagi and Cohen 2015] Sarawagi, Sunita, and William W. Cohen. "Semi-markov conditional random fields for information extraction." Advances in neural information processing systems. 2005. ● [Sarikaya et al., ICASSP 2011] Deep Belief Nets For Natural Language Call–routing ● [Schatzmann & Young, 2009] Schatzmann, Jost, and Steve Young. "The hidden agenda user simulation model." IEEE transactions on audio, speech, and language processing 17.4 (2009): 733-747. ● [Serdyuk et al., 2018] Towards End-to-end Spoken Language Understanding ● [Shah et al., 2018] Shah, Pararth, et al. "Building a Conversational Agent Overnight with Dialogue Self-Play." arXiv preprint arXiv:1801.04871 (2018).
  • 197. ● [Shen et al., CIKM 2014] A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval ● [Shen et al., WWW 2014] Learning Semantic Representations Using Convolutional Neural Networks for Web Search ● [Sutton et al., 2000] Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." Advances in neural information processing systems. 2000. ● [Tur et al., ICASSP 2012] Towards deeper understanding: Deep convex networks for semantic utterance classification ● [Wang et al., ACL 2015] Building a Semantic Parser Overnight ● [Wang et al., NAACL-HLT 2018] A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling ● [Weisz et al., 2018] Weisz, Gellért, et al. "Sample efficient deep reinforcement learning for dialogue systems with large action spaces." arXiv preprint arXiv:1802.03753 (2018). ● [Wen et al., 2015] Wen, Tsung-Hsien, et al. "Semantically conditioned lstm-based natural language generation for spoken dialogue systems." arXiv preprint arXiv:1508.01745 (2015). ● [Wen et al., CCF 2017] Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information ● [Weston, ICML 2016] Memory Networks for Language Understanding, ICML Tutorial 2016
  • 198. ● [Williams et al., 1988] Williams, R. J. Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University, College of Computer Science. ● [Williams et al., 2017] Williams, Jason D., Kavosh Asadi, and Geoffrey Zweig. "Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning." arXiv preprint arXiv:1702.03274 (2017). ● [Xiao et al., ACL 2016] Sequence-based Structured Prediction for Semantic Parsing ● [Xu and Sarikaya, ASRU 2013] Convolutional Neural Network Based Triangular CRF For Joint Intent Detection And Slot Filling ● [Yan et al., AAAI 2017] Building Task-Oriented Dialogue Systems for Online Shopping ● [Yao et al., Interspeech 2013] Recurrent Neural Networks for Language Understanding ● [Yih et al., ACL 2014] Semantic Parsing for Single-Relation Question Answering ● [Zhang and Wang, IJCAI 2016] A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding ● [Zhong et al., 2017] Zhong, Victor, Caiming Xiong, and Richard Socher. "Global-Locally Self-Attentive Encoder for Dialogue State Tracking." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2018. ● [Zhong et al., 2017b] Zhong, Victor, Caiming Xiong, and Richard Socher. “Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.” arXiv preprint arXiv:1709.00103 (2017).
  • 199.