End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

1
Y U N - N U N G ( V I V I A N ) C H E N
H T T P : / / V I V I A N C H E N . I D V.T W
H A K K A N I - T U R , T U R , G A O , D E N G

Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
2
End-to-EndMemoryNetworksforMulti-TurnSpokenLanguageUnderstandingYun-Nung(Vivian)Chen

Outline
Introduction
Model Architecture
End-to-End Training
Experiments
3

Spoken Dialogue System (SDS)
• Spoken dialogue systems are intelligent agents that are able to help
users finish tasks more efficiently via spoken interactions.
• Spoken dialogue systems are being incorporated into various devices
(smart-phones, smart TVs, in-car navigating system, etc).
4
Good intelligent assistants help users to organize and access information
conveniently
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion

Dialogue System Pipeline
5
ASR
Language Understanding (LU)
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• Policy Decision
Output
Generation
Hypothesis
are there any action movies
to see this weekend
Semantic Frame
(Intents, Slots)
request_movie
genre=action
date=this weekend
System Action
request_locaion
Text response
Where are you located?
Screen Display
location?
Text Input
Are there any action movies to see this weekend?
Speech Signal

LU Importance
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
15
30
45
60
75
90
105
120
135
150
165
180
195
210
225
240
255
270
285
300
315
330
345
360
375
390
405
420
435
450
465
480
495
SuccessRate
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05
RL Agent w/o LU errors
Rule Agent w/o LU errors

LU Importance
7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
15
30
45
60
75
90
105
120
135
150
165
180
195
210
225
240
255
270
285
300
315
330
345
360
375
390
405
420
435
450
465
480
495
SuccessRate
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05
RL Agent w/o LU errors
RL Agent w/ 5% LU errors
Rule Agent w/o LU errors
Rule Agent w/ 5% LU errors
>5% performance drop
The system performance is sensitive to LU errors, for both rule-based
and reinforcement learning agents.

Dialogue System Pipeline
8
SLU usually focuses on understanding single-turn utterances
The understanding result is usually influenced by
1) local observations 2) global knowledge.
ASR
Language Understanding (LU)
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• Policy Decision
Output
Generation
Hypothesis
are there any action movies
to see this weekend
Semantic Frame
(Intents, Slots)
request_movie
genre=action
date=this weekend
System Action
request_locaion
Text response
Where are you located?
Screen Display
location?
Text Input
Are there any action movies to see this weekend?
Speech Signal
current bottleneck
 error propagation

Spoken Language Understanding
9
just sent email to bob about fishing this weekend
O O O O
B-contact_name
O
B-subject I-subject I-subject
U
S
I send_email
D communication
 send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2
 send_email(message=“are we going to fish this weekend”)
send email to bob
U2
 send_email(contact_name=“bob”)
B-message
I-message
I-message I-message I-message
I-message I-message
B-contact_nameS1
Domain Identification  Intent Prediction  Slot Filling

Outline
Introduction
Model Architecture
End-to-End Training
Experiments
10

MODEL ARCHITECTURE
11
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted
Sum
h
∑ Wkg
o
Knowledge Encoding
Representation
history utterances
{xi}
current utterance
c
Inner
Product
Sentence
Encoder
RNNin
x1 x2 xi…
Contextual
Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN
Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

MODEL ARCHITECTURE
12
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted
Sum
h
∑ Wkg
o
Knowledge Encoding
Representation
history utterances
{xi}
current utterance
c
Inner
Product
Sentence
Encoder
RNNin
x1 x2 xi…
Contextual
Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN
Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
CNN
CNN

END-TO-END TRAINING
• Tagging Objective
• RNN Tagger
13
slot tag sequence contextual utterances & current utterance
ht-1 ht+1ht
V V V
W W W W
wt-1 wt+1wt
yt-1 yt+1yt
U U U
o
M M M
Automatically figure out the attention distribution without explicit
supervision

Outline
Introduction
Model Architecture
End-to-End Training
Experiments
14

EXPERIMENTS
• Dataset: Cortana communication session data
– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
15
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
The model trained on single-turn data performs worse for non-first
turns due to mismatched training data

EXPERIMENTS
– GRU for all RNN
– adam optimizer
– hidden unit=100
– dropout=0.5
16
Model Training Set
Knowledge
Encoding
Sentence
Encoder
RNN Tagger
multi-turn x x 55.9 45.7 47.4
Treating multi-turn data as single-turn for training performs reasonable

EXPERIMENTS
– GRU for all RNN
– adam optimizer
– hidden unit=100
– dropout=0.5
17
Model Training Set
Knowledge
Encoding
Sentence
Encoder
RNN Tagger
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Encoding current and history utterances improves the performance
but increases the training time

EXPERIMENTS
– GRU for all RNN
– adam optimizer
– hidden unit=100
– dropout=0.5
18
Model Training Set
Knowledge
Encoding
Sentence
Encoder
RNN Tagger
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1
Applying memory networks significantly outperforms all approaches
with much less training time

EXPERIMENTS
– GRU for all RNN
– adam optimizer
– hidden unit=100
– dropout=0.5
19
Model Training Set
Knowledge
Encoding
Sentence
Encoder
RNN Tagger
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
Proposed
multi-turn history + current (x, c) CNN 73.8 66.5 68.0
CNN produces comparable results for sentence encoding with
shorter training time
NEW! NOT IN THE PAPER!

Outline
Introduction
Model Architecture
End-to-End Training
Experiments
20

Conclusion
• The proposed end-to-end memory networks store
contextual knowledge, which can be exploited dynamically
based on an attention model for manipulating knowledge
carryover for multi-turn understanding
• The end-to-end model performs the tagging task instead of
classification
• The experiments show the feasibility and robustness of
modeling knowledge carryover through memory networks
21

Future Work
• Leveraging not only local observation but also global
knowledge for better language understanding
– Syntax or semantics can serve as global knowledge to guide
the understanding model
– “Knowledge as a Teacher: Knowledge-Guided Structural
Attention Networks,” arXiv preprint arXiv: 1609.03286
22

Q & A
T H A N K S F O R YO U R AT T E N T I O N !
23
The code will be available at
https://github.com/yvchen/ContextualSLU

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (18)

Semelhante a End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

Semelhante a End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding (20)

Último

Último (20)

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding