Task-oriented spoken dialog system is a prominent component in today’s virtual personal assistant (e.g. Alexa, Siri), which enables people to perform everyday tasks by interacting with devices via voice interfaces. Recent advances in deep learning enabled new research directions for end-to-end dialog modeling. Such data-driven end-to-end learning systems address many limitations of conventional dialog systems. This talk will review the research work on deep learning and reinforcement learning for neural dialog systems. We will further discuss hybrid dialog learning frameworks that combine offline training and online interactive learning with human-in-the-loop. This talk will conclude with the challenges and directions in further advancing data-drive conversational AI systems. Bing Liu is a research scientist in Facebook working on conversational AI. His research interests focus on machine learning for spoken language processing, natural language understanding, and dialog systems. He develops conversational AI system that learns from both offline annotated samples and online interactions. Bing received his Ph.D. degree from Carnegie Mellon University in 2018 where he worked on deep learning and reinforcement learning for task-oriented dialog systems. Before joining Facebook, he interned at Google Research working on end-to-end learning of neural dialog systems.
2. Interactive Learning of Task-Oriented
Dialog Systems
Bing Liu
Research Scientist, Facebook
PhD, Carnegie Mellon University
3. ❖ Dialog systems
➢ Chit-chat bot, QA bot, task-oriented dialog system, ...
❖ Get stuff done - assist users in completing specific tasks
➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal)
➢ Voice command in vehicle and smart home
➢ Customer service; Sales and marketing
Task-Oriented Dialog System
2
5. Task-Oriented Dialog System
❖ Highly handcrafted
❖ Process interdependent
4
❖ Data driven end-to-end (E2E) systems
➢ [Wen et al. 2016]: E2E supervised training neural dialog model
➢ [Bordes and Weston, 2017]: E2E model with memory network
➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E
system
❖ Interactive learning for E2E system with less human supervision
6. Why Learn through Interactions?
❖ Task-oriented dialog as a sequential decision making process over
multiple steps
5
❖ State space grows exponentially with number of dialog turns
❖ Extremely hard to
➢ Design all possible dialog paths
➢ Collect a dialog corpus that is large
enough to cover all dialog scenarios
→ Continuously learn through the interaction
with users and improve over time
7. How can we learn end-to-end task-oriented dialog
system effectively through interaction with users?
6
8. End-to-End Task-Oriented Dialog Modeling
7
❖ Dialog context modeling with hierarchical RNN
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
10. Supervised Pre-training
❖ Supervised model pre-training on dialog corpus with MLE
➢ Objective function: linear interpolation of cross-entropy losses for
■ Dialog state tracking, i.e. user goal estimation, and
■ Dialog policy, i.e. system action prediction
➢ Optimization: Stochastic gradient descent, Adam
9
← Loss for user goal estimation
← Loss for system action prediction
11. Learn Interactively from User Feedback
❖ Interactive dialog learning with user feedback
10
Provide feedback for
policy optimization
Human-Human
Dialog Corpora
Supervised
Pre-training
12. Learn Interactively from User Feedback
❖ Use user feedback as dialog reward
❖ Introduce step penalty to encourage
shorter dialog for task completion
❖ Optimize dialog model end-to-end
with policy gradient RL:
11
13. Learn Interactively from User Feedback
❖ Policy optimization with RL can be slow due to sparse reward
12
❖ Dialog state distribution mismatch between offline training and
interactive learning leads to compounding errors
→ Ask user for correction/demonstration
when fails at a task and learn to act
❖ Agent may learn to recover from bad state with
RL but the search process can be very inefficient
14. Learn Interactively from User Teaching
❖ Interactive dialog learning with user teaching
13
Correct mistakes &
Demo desired dialog
agent behavior
Add to existing corpora
Driven by the
agent’s own policy
New
Dialog
Human-Human
Dialog Corpora
Supervised
Pre-training
15. Evaluation
14
Slots: theatre name, movie, date, time, num of people
SL: Supervised pre-training model
IL: Imitation learning with user teaching
RL: Reinforcement learning with user feedback
❖ Movie booking domain simulation (M2M)
Table: Human evaluation results. Mean and
standard deviation of crowd worker scores (1-5)
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
16. 15
What if a user did not provide any feedback, can we
still learn anything from the interaction?
17. Can we learn a dialog reward function?
❖ User feedback serves as reward to RL optimization
16
❖ Task completion based reward requires prior knowledge of user’s goal
→ NOT usually accessible in real world user interactions
❖ In practice, user feedback can be inconsistent and is NOT always
available
18. Adversarial Dialog Learning
17
Reward
Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
❖ Reward a machine-agent for conducting task-oriented dialog in a way
that is indistinguishable from the way human-agents do it.
19. Discriminative Reward Model
18
User’s Turn Agent’s Turn
External
Entity Info
❖ Input:
➢ Sequence of dialog turns
❖ Representation:
➢ BiLSTM with max-pooling
❖ Output:
➢ Prob. of a dialog being
successfully completed by
a human agent
Bing Liu and Ian Lane, "Adversarial Learning
of Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
20. Model Training
❖ Supervised pre-training with an initial set of pos & neg samples
➢ Pre-train dialog agent G on positive dialog samples with MLE
➢ Pre-train discriminative reward function D on pos & neg samples
❖ Interactive learning cycle
➢ Collect new dialog sample(s) between agent G and users
➢ Update dialog agent G with RL using the reward produced by D
➢ Update reward function D using the newly collected sample(s)
➢ Continue for next learning cycle
19
21. ❖ Comparing different reward functions
Evaluation
20
Bing Liu and Ian Lane, "Adversarial Learning of
Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
22. Summary
❖ The multi-turn nature of task-oriented dialogs makes it especially
important for a system to learn through interaction with users
❖ Learning task-oriented dialog model end-to-end with user teaching
and feedback
❖ Adversarial dialog learning to address the challenges with missing or
inconsistent user feedback with less human supervision
21