SlideShare uma empresa Scribd logo
1 de 68
How the Context Matters Language & Interaction in Dialogues
YUN-NUNG (VIVIAN) CHEN
 Introduction
 Word-Level Contexts in Sentences
 Learning from Prior Knowledge –
Knowledge-Guided Structural Attention Networks (K-SAN) [Chen et al., ‘16]
 Learning from Observations –
Modularizing Unsupervised Sense Embedding (MUSE) [Lee & Chen, ‘17]
 Sentence-Level Contexts in Dialogues
 Inference –
Leveraging Behavioral Patterns for Personalized Understanding [Chen et al., ‘15]
Contexts-Aware Spoken Language Understanding [Chen et al., ‘16]
 Investigation of Understanding Impact –
Reinforcement Learning Based Neural Dialogue System [Li et al., ‘17]
Misunderstanding Impact [Li et al., ‘17]
 Conclusion
2
3
 Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently
via conversational interactions.
 Dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-
car navigating system, etc).
4
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
LU and DM significantly benefit from contexts in sentences and in dialogues
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge
Providers 5
 Word-level context
 Prior knowledge such as linguistic syntax
 Collocated words
 Sentence-level context
6
Smartphone companies including apple, blackberry, and sony will be invited.
show me the flights from seattle to san francisco
(browsing movie reviews…)
Good movies….
Find me a good action movie this weekend
London Has Fallen is currently the number 1 action movie in America
request_movie
(genre=action, date=this weekend)
How misunderstanding influences the dialogue system performance
Contexts provide informative cues for better understanding
How behavioral and sentence contexts influences the user intent
7
Knowledge-Guided Structural Attention Network (K-SAN)
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
 Syntax (Dependency Tree)  Semantics (AMR Graph)
8
show
me
the
flights
from
seattle
to
san
francisco
ROOT
1.
3.
4.
2.
Sentence s show me the flights from seattle to san francisco
show
you
flight
I
1.
2.
4.
city
city
Seattle
San Francisco
3.
.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
Prior knowledge about syntax or semantics may guide understanding
 Prior knowledge as a teacher
9
knowledge-guided structure {xi}
Knowledge
Encoding
Sentence
Encoding
Inner
Product
u
mi
Knowledge Attention Distribution
pi
Encoded Knowledge Representation Weighted Sum
∑
h
o
Knowledge-Guided
Representation
slot tagging sequence
s
y
show me the flights from seattle to sanfrancisco
ROOT
Input Sentence
ht-1 ht+1ht
W W W W
wt-1
yt-1
U
wt
U
wt+1
U
V
yt
V
yt+1
V
RNN
Tagger
Knowledge Encoding Module
CNNkg
CNNin NNout
MMM
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
 Syntax (Dependency Tree)  Semantics (AMR Graph)
10
show
me
the
flights
from
seattle
to
san
francisco
ROOT
1.
3.
4.
2.
1. show me
2. show flights the
3. show flights from seattle
4. show flights to francisco san
Sentence sshow me the flights from seattle to san francisco
Knowledge-Guided Substructure xi
(s / show
:ARG0 (y / you)
:ARG1 (f / flight
:source (c / city
:name (d / name :op1 Seattle))
:destination (c2 / city
:name (s2 / name :op1 San :op2 Francisco)))
:ARG2 (i / I)
:mode imperative)
Knowledge-Guided Substructure xi
1. show you
2. show flight seattle
3. show flight san francisco
4. show i
show
you
flight
I
1.
2.
4.
city
city
Seattle
San Francisco
3.
.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
knowledge-guided structure {xi}
Knowledge
Encoding
Sentence
Encoding
Inner
Product
u
mi
Knowledge Attention Distribution
pi
Encoded Knowledge Representation Weighted Sum
∑
h
o
Knowledge-Guided
Representation
slot tagging sequence
s
y
show me the flights from seattle to sanfrancisco
ROOT
Input Sentence
ht-1 ht+1ht
W W W W
wt-1
yt-1
U
Mwt
U
wt+1
U
V
yt
V
yt+1
V
MM
RNN
Tagger
Knowledge Encoding Module
CNNkg
CNNin NNout
11
The model will pay more attention to more important substructures that may be crucial for slot tagging.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
12
ATIS Dataset
(F1 slot filling)
Small
(1/40)
Medium
(1/10)
Large
Tagger (GRU) 73.83 85.55 93.11
Encoder-Tagger (GRU) 72.79 88.26 94.75
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
13
ATIS Dataset
(F1 slot filling)
Small
(1/40)
Medium
(1/10)
Large
Tagger (GRU) 73.83 85.55 93.11
Encoder-Tagger (GRU) 72.79 88.26 94.75
K-SAN (Stanford dep) 74.60+ 87.99 94.86+
K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+
Syntax provides richer knowledge and more general guidance when less training data.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
14
ATIS Dataset
(F1 slot filling)
Small
(1/40)
Medium
(1/10)
Large
Tagger (GRU) 73.83 85.55 93.11
Encoder-Tagger (GRU) 72.79 88.26 94.75
K-SAN (Stanford dep) 74.60+ 87.99 94.86+
K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+
K-SAN (AMR) 74.32+ 88.14 94.85+
K-SAN (JAMR) 74.27+ 88.27+ 94.89+
Syntax provides richer knowledge and more general guidance when less training data.
Semantics captures the most salient info so it achieves similar performance with much
less substructures
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
 Joint Intent Prediction and Slot Filling
15
knowledge-guided structure {xi}
Knowledge
Encoding
Sentence
Encoding
Inner
Product
u
mi
Knowledge Attention Distribution
pi
Encoded Knowledge Representation Weighted Sum
∑
h
o
Knowledge-Guided
Representation
s
show me the flights from seattle to sanfrancisco
ROOT
Input Sentence
RNN
Tagger
Knowledge Encoding Module
CNNkg
CNNin NNout
slot tagging sequence y
ht
-1
ht+
1
h
tW W W W
wt-
1
yt-1
U
Mwt
U
wt+1
U
V
yt
V
yt+1
V
MM EOS
U
Intent
V
ht+1
Extend the K-SAN for joint semantic frame parsing by outputting the user intent at last timestamp (Hakkani-Tur et al.).
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
16
ATIS Dataset
(train: 4478/dev:
500/test: 893)
Small (1/40) Medium (1/10) Large
Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame
Tagger 73.8 73.0 85.6 86.4 93.1 93.4
Encoder-Tagger 72.8 71.9 88.3 87.5 94.8 93.1
K-SAN (Syntax) 74.4+ 74.6+ 88.4+ 88.2+ 95.0+ 95.4+
K-SAN (Semantics) 74.3+ 73.4+ 88.3 88.1+ 94.9+ 95.1+
Communication
(train: 10479/dev:
1000/test: 2300)
Small (1/40) Medium (1/10) Large
Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame
Tagger 45.5 50.3 69.0 69.8 80.4 79.8
Encoder-Tagger 45.5 47.7 69.4 73.1 85.7 86.0
K-SAN (Syntax) 45.0 55.1+ 69.5+ 75.3+ 85.0 84.5
K-SAN (Semantics) 45.1 55.0 69.1 74.3+ 85.3 85.2
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
17
When data is scare, K-SAN with joint parsing significantly improves the performance (slot & frame)
ATIS Dataset
(train: 4478/dev:
500/test: 893)
Small (1/40) Medium (1/10) Large
Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame
Tagger 73.8 73.0 33.5 85.6 86.4 58.5 93.1 93.4 79.7
Encoder-Tagger 72.8 71.9 35.2 88.3 87.5 61.9 94.8 93.1 82.5
K-SAN (Syntax) 74.4+ 74.6+ 37.6+ 88.4+ 88.2+ 63.5+ 95.0+ 95.4+ 84.3+
K-SAN (Semantics) 74.3+ 73.4+ 37.1+ 88.3 88.1+ 63.6+ 94.9+ 95.1+ 83.8+
Communication
(train: 10479/dev:
1000/test: 2300)
Small (1/40) Medium (1/10) Large
Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame
Tagger 45.5 50.3 48.9 69.0 69.8 68.2 80.4 79.8 79.5
Encoder-Tagger 45.5 47.7 52.7 69.4 73.1 71.4 85.7 86.0 83.9
K-SAN (Syntax) 45.0 55.1+ 57.2+ 69.5+ 75.3+ 73.5+ 85.0 84.5 84.5
K-SAN (Semantics) 45.1 55.0 54.1+ 69.1 74.3+ 73.8+ 85.3 85.2 83.4
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
 Darker blocks and lines correspond to higher attention weights
18
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
 Darker blocks and lines correspond to higher attention weights
Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are
important for tagging.
19
Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
20
Modularizing Unsupervised Sense Embeddings (MUSE)
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
 Word embeddings are trained on a corpus in an unsupervised manner
 Using the same embeddings for different senses for NLP tasks, e.g. NLU, POS tagging
21
Finally I chose Google instead of Apple.
Can you buy me a bag of apples, oranges, and bananas?
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
Words with different senses should correspond different embeddings
Smartphone companies including blackberry, and sony will be invited.
 Input: unannotated text corpus
 Two key mechanisms
 Sense selection given a text context
 Sense representation to embed statistical characteristics of sense identity
22
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
apple
apple-1 apple-2
sense selection
sense embedding
 Efficient sense selection [Neelakantan et al.,
2014; Li and Jurafsky, 2015]
 Use word embeddings as input to update the
sense posterior given words
 Introduce ambiguity
 Purely sense-level embedding [Qiu et al.,
2016]
 Inefficient sense selection  exponential time
complexity
23
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
The prior approaches have disadvantages about either ambiguity or inefficiency
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
 Sense selection
 Policy-based
 Value-based
24
Corpus: { Smartphone companies including apple blackberry, and sony will be invited.}
sense selection ←
rewardsignal←
senseselection→
sample collocation1
2
2
3
Sense selection for collocated word 𝐶 𝑡′
Sense Selection Module
…𝐶𝑡′ = 𝑤𝑗𝐶𝑡′−1
𝑞(𝑧𝑗1|𝐶𝑡′) 𝑞(𝑧𝑗2|𝐶𝑡′) 𝑞(𝑧𝑗3|𝐶𝑡′)
matrix 𝑄𝑗
matrix 𝑃
… 𝐶𝑡′+1
apple andincluding sonyblackberry
𝑧𝑖1
Sense Representation Module
…𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧 𝑢𝑣|𝑧𝑖1)
negative sampling
matrix 𝑉
matrix 𝑈
 Sense representation learning
 Skip-gram approximation
Sense Selection Module
…𝐶𝑡 = 𝑤𝑖𝐶𝑡−1
𝑞(𝑧𝑖1| 𝐶𝑡) 𝑞(𝑧𝑖2| 𝐶𝑡) 𝑞(𝑧𝑖3| 𝐶𝑡)
Sense selection for target word 𝐶𝑡
matrix 𝑄𝑖
matrix 𝑃
… 𝐶𝑡+1
including apple blackberrycompanies and
Collocated likelihood serves as a reward signal to optimize the sense selection module.
 Learning algorithm
 Sense selection strategy
 Stochastic policy: selects the sense based on the probability distribution
 Greedy: selects the sense with the largest Q-value (no exploration)
 ε-Greedy: selects a random sense with ε probability, and adopts the greedy strategy
 Boltzmann: samples the sense based on the Boltzmann distribution modeled by Q-value
25
𝑧𝑖1
Sense Representation Module
…𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧 𝑢𝑣|𝑧𝑖1)
matrix 𝑈
matrix 𝑉
Sense Selection Module
…𝐶𝑡 = 𝑤𝑖𝐶𝑡−1
𝑞(𝑧𝑖1| 𝐶𝑡) 𝑞(𝑧𝑖2| 𝐶𝑡) 𝑞(𝑧𝑖3| 𝐶𝑡)
Sense selection for target word 𝐶𝑡
matrix 𝑄𝑖
matrix 𝑃
… 𝐶𝑡+1
including apple blackberrycompanies and
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
 Dataset: SCWS for multi-sense embedding evaluation
26
He borrowed the money from banks.
I live near to a river.
correlation=?
Baseline
bank
bank-1 bank-2
0.6 0.4
He borrowed the money from
0.6 x + 0.4 x
MaxSimC
AvgSimC
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
 Dataset: SCWS for multi-sense embedding evaluation
27
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
MUSE-Policy 66.1 67.4
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
He borrowed the money from banks.
I live near to a river.
correlation=?
 Dataset: SCWS for multi-sense embedding evaluation
28
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
MUSE-Policy 66.1 67.4
MUSE-Greedy 66.3 68.3
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
He borrowed the money from banks.
I live near to a river.
correlation=?
 Dataset: SCWS for multi-sense embedding evaluation
29
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
MUSE-Policy 66.1 67.4
MUSE-Greedy 66.3 68.3
MUSE-ε-Greedy 67.4+ 68.6
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
He borrowed the money from banks.
I live near to a river.
correlation=?
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
 Dataset: SCWS for multi-sense embedding evaluation
30
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
MUSE-Policy 66.1 67.4
MUSE-Greedy 66.3 68.3
MUSE-ε-Greedy 67.4+ 68.6
MUSE-Boltzmann 67.9+ 68.7
MUSE with exploration achieves the best sense embeddings in MaxSimC.
He borrowed the money from banks.
I live near to a river.
correlation=?
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
31
Approach ESL-50 RD-300 TOEFL-80
Global Context 47.73 45.07 60.87
SkipGram 52.08 55.66 66.67
IMS+SkipGram 41.67 53.77 66.67
EM 27.08 33.96 40.00
MSSG (Neelakantan et al., 2014) 57.14 58.93 78.26
CRP (Li & Jurafsky, 2015) 50.00 55.36 82.61
MUSE-Policy 52.38 51.79 79.71
MUSE-Greedy 57.14 58.93 79.71
MUSE-ε-Greedy 61.90+ 62.50+ 84.06+
MUSE-Boltzmann 64.29+ 66.07+ 88.41+
Retro-GlobalContext 63.64 66.20 71.01
Retro-SkipGram 56.25 65.09 73.33
Conventional Word
Embedding
Word Sense
Disambiguation
Unsupervised Sense
Embedding
Supervised Sense
Embedding
MUSE with exploration achieves the state-of-the-art results for synonym selection.
 KNN senses sorted by collocation likelihood
32
Context KNN Senses
… braves finish the season in tie with the los angeles dodgers … scoreless otl shootout 6-6 hingis 3-3 7-7 0-0
… his later years proudly wore tie with the chinese characters for … pants trousers shirt juventus blazer socks anfield
… of the mulberry or the blackberry and minos sent him to … cranberries maple vaccinium apricot apple
… of the large number of blackberry users in the us federal … smartphones sap microsoft ipv6 smartphone
… ladies wore extravagant head ornaments combs pearl necklaces face … venter thorax neck spear millimeters fusiform
… appoint john pope republican as head of the new army of … multi-party appoints unicameral beria appointed
MUSE learns sense embeddings in an unsupervised way and achieves the first
purely sense-level representation learning system with linear-time sense selection
G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
33
Leveraging Behavior Patterns of Mobile Apps for
Personalized Spoken Language Understanding
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of
ICMI, pages 83-86, 2015. ACM.
 Task: user intent prediction
 Challenge: language ambiguity
 User preference
 Some people prefer “Message” to “Email”
 Some people prefer “Outlook” to “Gmail”
 App-level contexts
 “Message” is more likely to follow “Camera”
 “Email” is more likely to follow “Excel”
35
send to vivian
v.s.
Email? Message?
Communication
Considering behavioral patterns in history to model SLU for intent prediction.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Subjects’ app invocation is logged on a daily basis
 Subjects annotate their app activities with
 Task Structure: link applications that serve a common goal
 Task Description: briefly describe the goal or intention of the task
 Subjects use a wizard system to perform the annotated task by speech
36
TASK59; 20150203; 1; Tuesday; 10:48
play music via bluetooth speaker
com.android.settings  com.lge.music
Meta
Desc
App
: Ready.
: Connect my phone to bluetooth speaker.
: Connected to bluetooth speaker.
: And play music.
: What music would you like to play?
: Shuffle playlist.
: I will play the music for you.
W1
U1
W2
U2
W3
U3
W4
Dialogue
SETTINGS
MUSIC
MUSIC
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
37
1
Lexical Intended App
photo check CAMERA IMtell
take this photo
tell vivian this is me in the lab
CAMERA
IM
Train
check my grades on website
send an email to professor
…
CHROME
EMAIL
send
Behavioral
NULL CAMERA
.85
take a photo of this
send it to alice
CAMERA
IM
…
email
1
1
1 1
1
1 .70
CHROME
1
1
1
1
1
1
CHROME EMAIL
1
1
1
1
.95
.80 .55
User Utterance
Intended
App
Test
take a photo of this
send it to alex
…
hidden semantics
Issue: unobserved hidden semantics may benefit understanding
Solution: use matrix factorization to complete a partially-missing matrix based on a low-rank latent semantics assumption.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 The decomposed matrices represent low-rank latent semantics for utterances and
words/histories/apps respectively
 The product of two matrices fills the probability of hidden semantics
38
1
Lexical Intended App
photo check CAMERA IMtell send
Behavioral
NULL CAMERA
.85
email
1
1
1 1
1
1 .70
CHROME
1
1
1
1
1
1
CHROME EMAIL
1
1
1
1
.95
.80 .55
𝑼
𝑾 + 𝑯 + 𝑨
≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑯 + 𝑨
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Model implicit feedback by completing the matrix
 not treat unobserved facts as negative samples (true or false)
 give observed facts higher scores than unobserved facts
 Objective:
 the model can be achieved by SGD updates with fact pairs
39
1
𝑓+
𝑓−
𝑓−
𝑢
𝑥
The objective is to learn a set of well-ranked apps per utterance.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
40
1
Lexical Intended App
photo check CAMERA IMtell
take this photo
tell vivian this is me in the lab
CAMERA
IM
Train
check my grades on website
send an email to professor
…
CHROME
EMAIL
send
Behavioral
NULL CAMERA
.85
take a photo of this
send it to alice
CAMERA
IM
…
email
1
1
1 1
1
1 .70
CHROME
1
1
1
1
1
1
CHROME EMAIL
1
1
1
1
.95
.80 .55
User Utterance
Intended
App
Reasoning with Matrix Factorization for Implicit Intents
Test
take a photo of this
send it to alex
…
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
 Google recognized transcripts (word error rate = 25%)
 Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
 Baseline: Maximum Likelihood Estimation (MLE)
Multinomial Logistic Regression (MLR)
41
Approach Lexical Behavioral All
(a)
MLE
User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
The user-dependent model is better than the user-independent model.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
 Google recognized transcripts (word error rate = 25%)
 Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
 Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR)
42
Approach Lexical Behavioral All
(a)
MLE
User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c)
MLR
User-Indep 42.8 / 46.4 14.9 / 18.7
(d) User-Dep 48.2 / 52.1 19.3 / 25.2
Lexical features are useful to predict intended apps for both user-independent and user-dependent models.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
 Google recognized transcripts (word error rate = 25%)
 Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
 Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR)
43
Approach Lexical Behavioral All
(a)
MLE
User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c)
MLR
User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+
(d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+
Combining lexical and behavioral features shows imporvement, which models explicit information from observations.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
 Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues
 Google recognized transcripts (word error rate = 25%)
 Evaluation metric: accuracy of user intent prediction (ACC)
mean average precision of ranked intents (MAP)
 Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR)
44
Approach Lexical Behavioral All
(a)
MLE
User-Indep 13.5 / 19.6
(b) User-Dep 20.2 / 27.9
(c)
MLR
User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+
(d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+
(e) (c) + Personalized MF 47.6 / 51.1 16.4 / 20.3 50.3+* / 54.2+*
(f) (d) + Personalized MF 48.3 / 52.7 20.6 / 26.7 51.9+* / 55.7+*
Personalized MF significantly improves MLR results by considering hidden semantics.
 App functionality modeling
 Learning app embeddings
45
App embeddings encoding functionality help user-independent understanding
Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language
Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
End-to-End Memory Networks with Knowledge Carryover
for Multi-Turn Spoken Language Understanding
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
4747
just sent email to bob about fishing this weekend
O O O O
B-contact_name
O
B-subject I-subject I-subject
U
S
I send_emailD communication
 send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2
 send_email(message=“are we going to fish this weekend”)
send email to bob
U2
 send_email(contact_name=“bob”)
B-message
I-message
I-message I-message I-message
I-message I-message
B-contact_nameS1
Domain Identification  Intent Prediction  Slot Filling
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
48
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted
Sum
h
∑ Wkg
o
Knowledge Encoding
Representation
history utterances
{xi}
current utterance
c
Inner
Product
Sentence
Encoder
RNNin
x1 x2 xi…
Contextual
Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN
Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
49
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted
Sum
h
∑ Wkg
o
Knowledge Encoding
Representation
history utterances
{xi}
current utterance
c
Inner
Product
Sentence
Encoder
RNNin
x1 x2 xi…
Contextual
Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN
Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
CNN
CNN
The model can automatically focus on salient contexts for current understanding
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
 Dataset: Cortana communication session data
 GRU for all RNN
 adam optimizer
 embedding dim=150
 hidden unit=100
 dropout=0.5
50
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
The model trained on single-turn data performs worse for non-first turns due to mismatched training data
 Dataset: Cortana communication session data
 GRU for all RNN
 adam optimizer
 embedding dim=150
 hidden unit=100
 dropout=0.5
51
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
Treating multi-turn data as single-turn for training performs reasonable
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
 Dataset: Cortana communication session data
 GRU for all RNN
 adam optimizer
 embedding dim=150
 hidden unit=100
 dropout=0.5
52
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Directly encoding contexts improves the performance but increases training time
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
 Dataset: Cortana communication session data
 GRU for all RNN
 adam optimizer
 embedding dim=150
 hidden unit=100
 dropout=0.5
53
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1
Memory networks significantly outperforms all approaches with much less training time
Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of
Interspeech, 2016. ISCA.
 Dataset: Cortana communication session data
 GRU for all RNN
 adam optimizer
 embedding dim=150
 hidden unit=100
 dropout=0.5
54
Model Training Set
Knowledge
Encoding
Sentence
Encoder
First Turn Other Overall
RNN Tagger
single-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-
Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Proposed
multi-turn history + current (x, c) RNN 73.2 65.7 67.1
multi-turn history + current (x, c) CNN 73.8 66.5 68.0
CNN produces comparable results for sentence encoding with shorter training time
55
Investigation of Language Understanding Impact for
Reinforcement Learning Based Dialogue Systems
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017.
 Dialogue management is framed as a reinforcement learning task
 Agent learns to select actions to maximize the expected reward
56
Environment
Observation
Action
Reward
If booking a right ticket, reward = +30
If failing, reward = -30
Otherwise, reward = -1
Agent
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
 Dialogue management is framed as a reinforcement learning task
 Agent learns to select actions to maximize the expected reward
57
Environment
Observation
Action
Agent
Natural Language Generation
User Agenda Modeling
User Simulator
Language Understanding
Dialogue Management
Neural Dialogue System
Text Input: Are there any action movies to see this weekend?
Dialogue Policy: request_location
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
 NLU and NLG are trained in a supervised manner
 DM is trained in a reinforcement learning framework (NLU and NLG can be fine tuned)
58
wi
B-
type
wi+
1
wi+2
O O
EO
S
<intent>
wi
B-
type
wi+
1
wi+2
O O
EO
S
<intent>
Dialogue Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi+
1
wi+2
O O
EO
S
<intent
>
Language Understanding
Time t-2
Time t
Dialogue
Management
w
0
w1 w2
Natural Language Generation
EO
S
User
Goal
User Agenda Modeling
User Simulator End-to-End Neural Dialogue System
Text Input
Are there any action
movies to see this
weekend?
Semantic Frame
request_movie
genre=action,
date=this weekend
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
 DM receives frame-level information
 No error model: perfect recognizer and LU
 Error model: simulate the possible errors
59
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
User Simulation
user dialogue acts
(semantic frames)
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
 User simulator sends natural language
 No recognition error
 Errors from NLG or LU
60
Natural Language
Generation (NLG)
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
User Simulation
NL
Language
Understanding
(LU)
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
 Frame-level semantics
61
 Natural language
The RL agent is able to learn how to interact with users to complete tasks more
efficiently and effectively, and outperforms the rule-based agent.
X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv:
1703.07055, 2017.
62
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
12
24
36
48
60
72
84
96
108
120
132
144
156
168
180
192
204
216
228
240
252
264
276
288
300
312
324
336
348
360
372
384
396
408
420
432
444
456
468
480
492
SuccessRate
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 Rule - 0.00
RL Agent w/o LU errors
Rule-Based Agent w/o LU errors
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv:
1703.07055, 2017.
63
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
12
24
36
48
60
72
84
96
108
120
132
144
156
168
180
192
204
216
228
240
252
264
276
288
300
312
324
336
348
360
372
384
396
408
420
432
444
456
468
480
492
SuccessRate
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05
RL Agent w/o LU errors
RL Agent w/ 5% LU errors
Rule-Based Agent w/o LU errors
Rule-Based Agent w/ 5% LU errors
>5%
performance
drop
The system performance is sensitive to LU errors (sentence-level contexts), for both rule-based and RL agents.
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv:
1703.07055, 2017.
 Intent error type
 I0: random
 I1: within group
 I2: between group
 Intent error rate
 I3: 0.00
 I4: 0.10
 I5: 0.20
64
Intent errors slightly influence the RL system performance
Group 1: greeting(), thanks(), etc
Group 2: inform(xx)
Group 3: request(xx)
Between-group intent errors degrade the system performance more
request_moviename(actor=Robert Downey Jr)
request_year
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv:
1703.07055, 2017.
 Slot error type
 I0: random
 I1: slot deletion
 I2: value substitution
 I3: slot substitution
 Slot error rate
 S4: 0.00
 S5: 0.10
 S6: 0.20
65
Slot errors significantly degrade the RL system performance
Value substitution has the largest impact on the system performance
request_moviename
(actor=Robert Downey Jr)
director Robert Downey Sr
 Intent error rate  Slot error rate
66
The RL agent has better robustness to intent errors in terms of dialogue-level performance
Slot filling is more important than intent detection in language understanding
X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv:
1703.07055, 2017.
 Word-level contexts in sentences help understand word meanings
 Learning from Prior Knowledge –
K-SAN achieves better LU via known knowledge [Chen et al., ‘16]
 Learning from Observations –
MUSE learns sense embeddings with efficient sense selection [Lee & Chen, ‘17]
 Sentence-level contexts have different impacts on dialogue performance
 Inference –
App contexts improve personalized understanding via inference [Chen et al., ‘15]
Contexts for knowledge carryover benefits multi-turn LU [Chen et al., ‘16]
 Investigation of Understanding Impact –
Slot errors degrade system performance more than intent errors [Li et al., ‘17]
 Contexts from different levels provide cues for better understanding in supervised and
unsupervised ways
67
Q & A

Mais conteúdo relacionado

Mais procurados

An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingYun-Nung (Vivian) Chen
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep LearningAdam Gibson
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 

Mais procurados (20)

An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task Understanding
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Plug play language_models
Plug play language_modelsPlug play language_models
Plug play language_models
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Deeplearning NLP
Deeplearning NLPDeeplearning NLP
Deeplearning NLP
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 

Semelhante a How the Context Matters Language and Interaction in Dialogues

More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
 
Uncertainty classification of expert systems a rough set approach
Uncertainty classification of expert systems   a rough set approachUncertainty classification of expert systems   a rough set approach
Uncertainty classification of expert systems a rough set approachEr. rahul abhishek
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniquesvivatechijri
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Predictionvivatechijri
 
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysisfridolin.wild
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Army Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseArmy Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseRDECOM
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008Jason Morris
 
Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...
 Key Management Scheme for Secure Group Communication in WSN with Multiple Gr... Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...
Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...csandit
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionAlessandro Suglia
 
6. kr paper journal nov 11, 2017 (edit a)
6. kr paper journal nov 11, 2017 (edit a)6. kr paper journal nov 11, 2017 (edit a)
6. kr paper journal nov 11, 2017 (edit a)IAESIJEECS
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 

Semelhante a How the Context Matters Language and Interaction in Dialogues (20)

More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
Uncertainty classification of expert systems a rough set approach
Uncertainty classification of expert systems   a rough set approachUncertainty classification of expert systems   a rough set approach
Uncertainty classification of expert systems a rough set approach
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysis
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Army Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseArmy Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber Defense
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 
Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...
 Key Management Scheme for Secure Group Communication in WSN with Multiple Gr... Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...
Key Management Scheme for Secure Group Communication in WSN with Multiple Gr...
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
6. kr paper journal nov 11, 2017 (edit a)
6. kr paper journal nov 11, 2017 (edit a)6. kr paper journal nov 11, 2017 (edit a)
6. kr paper journal nov 11, 2017 (edit a)
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 

Mais de Yun-Nung (Vivian) Chen

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Yun-Nung (Vivian) Chen
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人Yun-Nung (Vivian) Chen
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUYun-Nung (Vivian) Chen
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Yun-Nung (Vivian) Chen
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Yun-Nung (Vivian) Chen
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...Yun-Nung (Vivian) Chen
 

Mais de Yun-Nung (Vivian) Chen (10)

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
 
Chatbot的智慧與靈魂
Chatbot的智慧與靈魂Chatbot的智慧與靈魂
Chatbot的智慧與靈魂
 
Deep Learning for Dialogue Systems
Deep Learning for Dialogue SystemsDeep Learning for Dialogue Systems
Deep Learning for Dialogue Systems
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

How the Context Matters Language and Interaction in Dialogues

  • 1. How the Context Matters Language & Interaction in Dialogues YUN-NUNG (VIVIAN) CHEN
  • 2.  Introduction  Word-Level Contexts in Sentences  Learning from Prior Knowledge – Knowledge-Guided Structural Attention Networks (K-SAN) [Chen et al., ‘16]  Learning from Observations – Modularizing Unsupervised Sense Embedding (MUSE) [Lee & Chen, ‘17]  Sentence-Level Contexts in Dialogues  Inference – Leveraging Behavioral Patterns for Personalized Understanding [Chen et al., ‘15] Contexts-Aware Spoken Language Understanding [Chen et al., ‘16]  Investigation of Understanding Impact – Reinforcement Learning Based Neural Dialogue System [Li et al., ‘17] Misunderstanding Impact [Li et al., ‘17]  Conclusion 2
  • 3. 3
  • 4.  Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via conversational interactions.  Dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc). 4 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
  • 5. LU and DM significantly benefit from contexts in sentences and in dialogues Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers 5
  • 6.  Word-level context  Prior knowledge such as linguistic syntax  Collocated words  Sentence-level context 6 Smartphone companies including apple, blackberry, and sony will be invited. show me the flights from seattle to san francisco (browsing movie reviews…) Good movies…. Find me a good action movie this weekend London Has Fallen is currently the number 1 action movie in America request_movie (genre=action, date=this weekend) How misunderstanding influences the dialogue system performance Contexts provide informative cues for better understanding How behavioral and sentence contexts influences the user intent
  • 7. 7 Knowledge-Guided Structural Attention Network (K-SAN) Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016. Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 8.  Syntax (Dependency Tree)  Semantics (AMR Graph) 8 show me the flights from seattle to san francisco ROOT 1. 3. 4. 2. Sentence s show me the flights from seattle to san francisco show you flight I 1. 2. 4. city city Seattle San Francisco 3. . Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016. Prior knowledge about syntax or semantics may guide understanding
  • 9.  Prior knowledge as a teacher 9 knowledge-guided structure {xi} Knowledge Encoding Sentence Encoding Inner Product u mi Knowledge Attention Distribution pi Encoded Knowledge Representation Weighted Sum ∑ h o Knowledge-Guided Representation slot tagging sequence s y show me the flights from seattle to sanfrancisco ROOT Input Sentence ht-1 ht+1ht W W W W wt-1 yt-1 U wt U wt+1 U V yt V yt+1 V RNN Tagger Knowledge Encoding Module CNNkg CNNin NNout MMM Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 10.  Syntax (Dependency Tree)  Semantics (AMR Graph) 10 show me the flights from seattle to san francisco ROOT 1. 3. 4. 2. 1. show me 2. show flights the 3. show flights from seattle 4. show flights to francisco san Sentence sshow me the flights from seattle to san francisco Knowledge-Guided Substructure xi (s / show :ARG0 (y / you) :ARG1 (f / flight :source (c / city :name (d / name :op1 Seattle)) :destination (c2 / city :name (s2 / name :op1 San :op2 Francisco))) :ARG2 (i / I) :mode imperative) Knowledge-Guided Substructure xi 1. show you 2. show flight seattle 3. show flight san francisco 4. show i show you flight I 1. 2. 4. city city Seattle San Francisco 3. . Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 11. knowledge-guided structure {xi} Knowledge Encoding Sentence Encoding Inner Product u mi Knowledge Attention Distribution pi Encoded Knowledge Representation Weighted Sum ∑ h o Knowledge-Guided Representation slot tagging sequence s y show me the flights from seattle to sanfrancisco ROOT Input Sentence ht-1 ht+1ht W W W W wt-1 yt-1 U Mwt U wt+1 U V yt V yt+1 V MM RNN Tagger Knowledge Encoding Module CNNkg CNNin NNout 11 The model will pay more attention to more important substructures that may be crucial for slot tagging. Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 12. 12 ATIS Dataset (F1 slot filling) Small (1/40) Medium (1/10) Large Tagger (GRU) 73.83 85.55 93.11 Encoder-Tagger (GRU) 72.79 88.26 94.75 Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 13. 13 ATIS Dataset (F1 slot filling) Small (1/40) Medium (1/10) Large Tagger (GRU) 73.83 85.55 93.11 Encoder-Tagger (GRU) 72.79 88.26 94.75 K-SAN (Stanford dep) 74.60+ 87.99 94.86+ K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+ Syntax provides richer knowledge and more general guidance when less training data. Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 14. 14 ATIS Dataset (F1 slot filling) Small (1/40) Medium (1/10) Large Tagger (GRU) 73.83 85.55 93.11 Encoder-Tagger (GRU) 72.79 88.26 94.75 K-SAN (Stanford dep) 74.60+ 87.99 94.86+ K-SAN (Syntaxnet dep) 74.35+ 88.40+ 95.00+ K-SAN (AMR) 74.32+ 88.14 94.85+ K-SAN (JAMR) 74.27+ 88.27+ 94.89+ Syntax provides richer knowledge and more general guidance when less training data. Semantics captures the most salient info so it achieves similar performance with much less substructures Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” arXiv: 1609.00777, 2016.
  • 15.  Joint Intent Prediction and Slot Filling 15 knowledge-guided structure {xi} Knowledge Encoding Sentence Encoding Inner Product u mi Knowledge Attention Distribution pi Encoded Knowledge Representation Weighted Sum ∑ h o Knowledge-Guided Representation s show me the flights from seattle to sanfrancisco ROOT Input Sentence RNN Tagger Knowledge Encoding Module CNNkg CNNin NNout slot tagging sequence y ht -1 ht+ 1 h tW W W W wt- 1 yt-1 U Mwt U wt+1 U V yt V yt+1 V MM EOS U Intent V ht+1 Extend the K-SAN for joint semantic frame parsing by outputting the user intent at last timestamp (Hakkani-Tur et al.). Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 16. 16 ATIS Dataset (train: 4478/dev: 500/test: 893) Small (1/40) Medium (1/10) Large Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Tagger 73.8 73.0 85.6 86.4 93.1 93.4 Encoder-Tagger 72.8 71.9 88.3 87.5 94.8 93.1 K-SAN (Syntax) 74.4+ 74.6+ 88.4+ 88.2+ 95.0+ 95.4+ K-SAN (Semantics) 74.3+ 73.4+ 88.3 88.1+ 94.9+ 95.1+ Communication (train: 10479/dev: 1000/test: 2300) Small (1/40) Medium (1/10) Large Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Tagger 45.5 50.3 69.0 69.8 80.4 79.8 Encoder-Tagger 45.5 47.7 69.4 73.1 85.7 86.0 K-SAN (Syntax) 45.0 55.1+ 69.5+ 75.3+ 85.0 84.5 K-SAN (Semantics) 45.1 55.0 69.1 74.3+ 85.3 85.2 Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 17. 17 When data is scare, K-SAN with joint parsing significantly improves the performance (slot & frame) ATIS Dataset (train: 4478/dev: 500/test: 893) Small (1/40) Medium (1/10) Large Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Tagger 73.8 73.0 33.5 85.6 86.4 58.5 93.1 93.4 79.7 Encoder-Tagger 72.8 71.9 35.2 88.3 87.5 61.9 94.8 93.1 82.5 K-SAN (Syntax) 74.4+ 74.6+ 37.6+ 88.4+ 88.2+ 63.5+ 95.0+ 95.4+ 84.3+ K-SAN (Semantics) 74.3+ 73.4+ 37.1+ 88.3 88.1+ 63.6+ 94.9+ 95.1+ 83.8+ Communication (train: 10479/dev: 1000/test: 2300) Small (1/40) Medium (1/10) Large Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Slot (Indep) Slot (Joint) Frame Tagger 45.5 50.3 48.9 69.0 69.8 68.2 80.4 79.8 79.5 Encoder-Tagger 45.5 47.7 52.7 69.4 73.1 71.4 85.7 86.0 83.9 K-SAN (Syntax) 45.0 55.1+ 57.2+ 69.5+ 75.3+ 73.5+ 85.0 84.5 84.5 K-SAN (Semantics) 45.1 55.0 54.1+ 69.1 74.3+ 73.8+ 85.3 85.2 83.4 Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 18.  Darker blocks and lines correspond to higher attention weights 18 Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 19.  Darker blocks and lines correspond to higher attention weights Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging. 19 Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Syntax or semantics? knowledge-guided joint semantic frame parsing,” in Proceedings of SLT, 2016.
  • 20. 20 Modularizing Unsupervised Sense Embeddings (MUSE) G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
  • 21.  Word embeddings are trained on a corpus in an unsupervised manner  Using the same embeddings for different senses for NLP tasks, e.g. NLU, POS tagging 21 Finally I chose Google instead of Apple. Can you buy me a bag of apples, oranges, and bananas? G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. Words with different senses should correspond different embeddings
  • 22. Smartphone companies including blackberry, and sony will be invited.  Input: unannotated text corpus  Two key mechanisms  Sense selection given a text context  Sense representation to embed statistical characteristics of sense identity 22 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. apple apple-1 apple-2 sense selection sense embedding
  • 23.  Efficient sense selection [Neelakantan et al., 2014; Li and Jurafsky, 2015]  Use word embeddings as input to update the sense posterior given words  Introduce ambiguity  Purely sense-level embedding [Qiu et al., 2016]  Inefficient sense selection  exponential time complexity 23 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. The prior approaches have disadvantages about either ambiguity or inefficiency
  • 24. G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.  Sense selection  Policy-based  Value-based 24 Corpus: { Smartphone companies including apple blackberry, and sony will be invited.} sense selection ← rewardsignal← senseselection→ sample collocation1 2 2 3 Sense selection for collocated word 𝐶 𝑡′ Sense Selection Module …𝐶𝑡′ = 𝑤𝑗𝐶𝑡′−1 𝑞(𝑧𝑗1|𝐶𝑡′) 𝑞(𝑧𝑗2|𝐶𝑡′) 𝑞(𝑧𝑗3|𝐶𝑡′) matrix 𝑄𝑗 matrix 𝑃 … 𝐶𝑡′+1 apple andincluding sonyblackberry 𝑧𝑖1 Sense Representation Module …𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧 𝑢𝑣|𝑧𝑖1) negative sampling matrix 𝑉 matrix 𝑈  Sense representation learning  Skip-gram approximation Sense Selection Module …𝐶𝑡 = 𝑤𝑖𝐶𝑡−1 𝑞(𝑧𝑖1| 𝐶𝑡) 𝑞(𝑧𝑖2| 𝐶𝑡) 𝑞(𝑧𝑖3| 𝐶𝑡) Sense selection for target word 𝐶𝑡 matrix 𝑄𝑖 matrix 𝑃 … 𝐶𝑡+1 including apple blackberrycompanies and Collocated likelihood serves as a reward signal to optimize the sense selection module.
  • 25.  Learning algorithm  Sense selection strategy  Stochastic policy: selects the sense based on the probability distribution  Greedy: selects the sense with the largest Q-value (no exploration)  ε-Greedy: selects a random sense with ε probability, and adopts the greedy strategy  Boltzmann: samples the sense based on the Boltzmann distribution modeled by Q-value 25 𝑧𝑖1 Sense Representation Module …𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧 𝑢𝑣|𝑧𝑖1) matrix 𝑈 matrix 𝑉 Sense Selection Module …𝐶𝑡 = 𝑤𝑖𝐶𝑡−1 𝑞(𝑧𝑖1| 𝐶𝑡) 𝑞(𝑧𝑖2| 𝐶𝑡) 𝑞(𝑧𝑖3| 𝐶𝑡) Sense selection for target word 𝐶𝑡 matrix 𝑄𝑖 matrix 𝑃 … 𝐶𝑡+1 including apple blackberrycompanies and G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
  • 26. Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1  Dataset: SCWS for multi-sense embedding evaluation 26 He borrowed the money from banks. I live near to a river. correlation=? Baseline bank bank-1 bank-2 0.6 0.4 He borrowed the money from 0.6 x + 0.4 x MaxSimC AvgSimC G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
  • 27.  Dataset: SCWS for multi-sense embedding evaluation 27 Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1 MUSE-Policy 66.1 67.4 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. He borrowed the money from banks. I live near to a river. correlation=?
  • 28.  Dataset: SCWS for multi-sense embedding evaluation 28 Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1 MUSE-Policy 66.1 67.4 MUSE-Greedy 66.3 68.3 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. He borrowed the money from banks. I live near to a river. correlation=?
  • 29.  Dataset: SCWS for multi-sense embedding evaluation 29 Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1 MUSE-Policy 66.1 67.4 MUSE-Greedy 66.3 68.3 MUSE-ε-Greedy 67.4+ 68.6 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. He borrowed the money from banks. I live near to a river. correlation=?
  • 30. G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.  Dataset: SCWS for multi-sense embedding evaluation 30 Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1 MUSE-Policy 66.1 67.4 MUSE-Greedy 66.3 68.3 MUSE-ε-Greedy 67.4+ 68.6 MUSE-Boltzmann 67.9+ 68.7 MUSE with exploration achieves the best sense embeddings in MaxSimC. He borrowed the money from banks. I live near to a river. correlation=?
  • 31. G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017. 31 Approach ESL-50 RD-300 TOEFL-80 Global Context 47.73 45.07 60.87 SkipGram 52.08 55.66 66.67 IMS+SkipGram 41.67 53.77 66.67 EM 27.08 33.96 40.00 MSSG (Neelakantan et al., 2014) 57.14 58.93 78.26 CRP (Li & Jurafsky, 2015) 50.00 55.36 82.61 MUSE-Policy 52.38 51.79 79.71 MUSE-Greedy 57.14 58.93 79.71 MUSE-ε-Greedy 61.90+ 62.50+ 84.06+ MUSE-Boltzmann 64.29+ 66.07+ 88.41+ Retro-GlobalContext 63.64 66.20 71.01 Retro-SkipGram 56.25 65.09 73.33 Conventional Word Embedding Word Sense Disambiguation Unsupervised Sense Embedding Supervised Sense Embedding MUSE with exploration achieves the state-of-the-art results for synonym selection.
  • 32.  KNN senses sorted by collocation likelihood 32 Context KNN Senses … braves finish the season in tie with the los angeles dodgers … scoreless otl shootout 6-6 hingis 3-3 7-7 0-0 … his later years proudly wore tie with the chinese characters for … pants trousers shirt juventus blazer socks anfield … of the mulberry or the blackberry and minos sent him to … cranberries maple vaccinium apricot apple … of the large number of blackberry users in the us federal … smartphones sap microsoft ipv6 smartphone … ladies wore extravagant head ornaments combs pearl necklaces face … venter thorax neck spear millimeters fusiform … appoint john pope republican as head of the new army of … multi-party appoints unicameral beria appointed MUSE learns sense embeddings in an unsupervised way and achieves the first purely sense-level representation learning system with linear-time sense selection G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in Proceedings of EMNLP, 2017.
  • 33. 33
  • 34. Leveraging Behavior Patterns of Mobile Apps for Personalized Spoken Language Understanding Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 35.  Task: user intent prediction  Challenge: language ambiguity  User preference  Some people prefer “Message” to “Email”  Some people prefer “Outlook” to “Gmail”  App-level contexts  “Message” is more likely to follow “Camera”  “Email” is more likely to follow “Excel” 35 send to vivian v.s. Email? Message? Communication Considering behavioral patterns in history to model SLU for intent prediction. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 36.  Subjects’ app invocation is logged on a daily basis  Subjects annotate their app activities with  Task Structure: link applications that serve a common goal  Task Description: briefly describe the goal or intention of the task  Subjects use a wizard system to perform the annotated task by speech 36 TASK59; 20150203; 1; Tuesday; 10:48 play music via bluetooth speaker com.android.settings  com.lge.music Meta Desc App : Ready. : Connect my phone to bluetooth speaker. : Connected to bluetooth speaker. : And play music. : What music would you like to play? : Shuffle playlist. : I will play the music for you. W1 U1 W2 U2 W3 U3 W4 Dialogue SETTINGS MUSIC MUSIC Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 37. 37 1 Lexical Intended App photo check CAMERA IMtell take this photo tell vivian this is me in the lab CAMERA IM Train check my grades on website send an email to professor … CHROME EMAIL send Behavioral NULL CAMERA .85 take a photo of this send it to alice CAMERA IM … email 1 1 1 1 1 1 .70 CHROME 1 1 1 1 1 1 CHROME EMAIL 1 1 1 1 .95 .80 .55 User Utterance Intended App Test take a photo of this send it to alex … hidden semantics Issue: unobserved hidden semantics may benefit understanding Solution: use matrix factorization to complete a partially-missing matrix based on a low-rank latent semantics assumption. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 38.  The decomposed matrices represent low-rank latent semantics for utterances and words/histories/apps respectively  The product of two matrices fills the probability of hidden semantics 38 1 Lexical Intended App photo check CAMERA IMtell send Behavioral NULL CAMERA .85 email 1 1 1 1 1 1 .70 CHROME 1 1 1 1 1 1 CHROME EMAIL 1 1 1 1 .95 .80 .55 𝑼 𝑾 + 𝑯 + 𝑨 ≈ 𝑼 × 𝒅 𝒅 × 𝑾 + 𝑯 + 𝑨 Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 39.  Model implicit feedback by completing the matrix  not treat unobserved facts as negative samples (true or false)  give observed facts higher scores than unobserved facts  Objective:  the model can be achieved by SGD updates with fact pairs 39 1 𝑓+ 𝑓− 𝑓− 𝑢 𝑥 The objective is to learn a set of well-ranked apps per utterance. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 40. 40 1 Lexical Intended App photo check CAMERA IMtell take this photo tell vivian this is me in the lab CAMERA IM Train check my grades on website send an email to professor … CHROME EMAIL send Behavioral NULL CAMERA .85 take a photo of this send it to alice CAMERA IM … email 1 1 1 1 1 1 .70 CHROME 1 1 1 1 1 1 CHROME EMAIL 1 1 1 1 .95 .80 .55 User Utterance Intended App Reasoning with Matrix Factorization for Implicit Intents Test take a photo of this send it to alex … Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 41.  Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues  Google recognized transcripts (word error rate = 25%)  Evaluation metric: accuracy of user intent prediction (ACC) mean average precision of ranked intents (MAP)  Baseline: Maximum Likelihood Estimation (MLE) Multinomial Logistic Regression (MLR) 41 Approach Lexical Behavioral All (a) MLE User-Indep 13.5 / 19.6 (b) User-Dep 20.2 / 27.9 The user-dependent model is better than the user-independent model. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 42.  Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues  Google recognized transcripts (word error rate = 25%)  Evaluation metric: accuracy of user intent prediction (ACC) mean average precision of ranked intents (MAP)  Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR) 42 Approach Lexical Behavioral All (a) MLE User-Indep 13.5 / 19.6 (b) User-Dep 20.2 / 27.9 (c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 (d) User-Dep 48.2 / 52.1 19.3 / 25.2 Lexical features are useful to predict intended apps for both user-independent and user-dependent models. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 43.  Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues  Google recognized transcripts (word error rate = 25%)  Evaluation metric: accuracy of user intent prediction (ACC) mean average precision of ranked intents (MAP)  Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR) 43 Approach Lexical Behavioral All (a) MLE User-Indep 13.5 / 19.6 (b) User-Dep 20.2 / 27.9 (c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+ Combining lexical and behavioral features shows imporvement, which models explicit information from observations. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 44. Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.  Dataset: 533 dialogues (1,607 utterances); 455 multi-turn dialogues  Google recognized transcripts (word error rate = 25%)  Evaluation metric: accuracy of user intent prediction (ACC) mean average precision of ranked intents (MAP)  Baseline: Maximum Likelihood Estimation (MLE); Multinomial Logistic Regression (MLR) 44 Approach Lexical Behavioral All (a) MLE User-Indep 13.5 / 19.6 (b) User-Dep 20.2 / 27.9 (c) MLR User-Indep 42.8 / 46.4 14.9 / 18.7 46.2+ / 50.1+ (d) User-Dep 48.2 / 52.1 19.3 / 25.2 50.1+ / 53.9+ (e) (c) + Personalized MF 47.6 / 51.1 16.4 / 20.3 50.3+* / 54.2+* (f) (d) + Personalized MF 48.3 / 52.7 20.6 / 26.7 51.9+* / 55.7+* Personalized MF significantly improves MLR results by considering hidden semantics.
  • 45.  App functionality modeling  Learning app embeddings 45 App embeddings encoding functionality help user-independent understanding Y.-N. Chen, M. Sun, A. I Rudnicky, and A. Gershman, "Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding," in Proc. of ICMI, pages 83-86, 2015. ACM.
  • 46. End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.
  • 47. 4747 just sent email to bob about fishing this weekend O O O O B-contact_name O B-subject I-subject I-subject U S I send_emailD communication  send_email(contact_name=“bob”, subject=“fishing this weekend”) are we going to fish this weekend U1 S2  send_email(message=“are we going to fish this weekend”) send email to bob U2  send_email(contact_name=“bob”) B-message I-message I-message I-message I-message I-message I-message B-contact_nameS1 Domain Identification  Intent Prediction  Slot Filling Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.
  • 48. 48 u Knowledge Attention Distributionpi mi Memory Representation Weighted Sum h ∑ Wkg o Knowledge Encoding Representation history utterances {xi} current utterance c Inner Product Sentence Encoder RNNin x1 x2 xi… Contextual Sentence Encoder x1 x2 xi… RNNmem slot tagging sequence y ht-1 ht V V W W W wt-1 wt yt-1 yt U U RNN Tagger M M Idea: additionally incorporating contextual knowledge during slot tagging 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.
  • 49. Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA. 49 u Knowledge Attention Distributionpi mi Memory Representation Weighted Sum h ∑ Wkg o Knowledge Encoding Representation history utterances {xi} current utterance c Inner Product Sentence Encoder RNNin x1 x2 xi… Contextual Sentence Encoder x1 x2 xi… RNNmem slot tagging sequence y ht-1 ht V V W W W wt-1 wt yt-1 yt U U RNN Tagger M M Idea: additionally incorporating contextual knowledge during slot tagging 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding CNN CNN The model can automatically focus on salient contexts for current understanding
  • 50. Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.  Dataset: Cortana communication session data  GRU for all RNN  adam optimizer  embedding dim=150  hidden unit=100  dropout=0.5 50 Model Training Set Knowledge Encoding Sentence Encoder First Turn Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 The model trained on single-turn data performs worse for non-first turns due to mismatched training data
  • 51.  Dataset: Cortana communication session data  GRU for all RNN  adam optimizer  embedding dim=150  hidden unit=100  dropout=0.5 51 Model Training Set Knowledge Encoding Sentence Encoder First Turn Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA. Treating multi-turn data as single-turn for training performs reasonable
  • 52. Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.  Dataset: Cortana communication session data  GRU for all RNN  adam optimizer  embedding dim=150  hidden unit=100  dropout=0.5 52 Model Training Set Knowledge Encoding Sentence Encoder First Turn Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Encoder- Tagger multi-turn current utt (c) RNN 57.6 56.0 56.3 multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Directly encoding contexts improves the performance but increases training time
  • 53. Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.  Dataset: Cortana communication session data  GRU for all RNN  adam optimizer  embedding dim=150  hidden unit=100  dropout=0.5 53 Model Training Set Knowledge Encoding Sentence Encoder First Turn Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Encoder- Tagger multi-turn current utt (c) RNN 57.6 56.0 56.3 multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1 Memory networks significantly outperforms all approaches with much less training time
  • 54. Y.-N. Chen, D. Hakkani-Tur, G. Tur, J. Gao, and L. Deng, “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding," in Proc. of Interspeech, 2016. ISCA.  Dataset: Cortana communication session data  GRU for all RNN  adam optimizer  embedding dim=150  hidden unit=100  dropout=0.5 54 Model Training Set Knowledge Encoding Sentence Encoder First Turn Other Overall RNN Tagger single-turn x x 60.6 16.2 25.5 multi-turn x x 55.9 45.7 47.4 Encoder- Tagger multi-turn current utt (c) RNN 57.6 56.0 56.3 multi-turn history + current (x, c) RNN 69.9 60.8 62.5 Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1 multi-turn history + current (x, c) CNN 73.8 66.5 68.0 CNN produces comparable results for sentence encoding with shorter training time
  • 55. 55 Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017. X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017.
  • 56.  Dialogue management is framed as a reinforcement learning task  Agent learns to select actions to maximize the expected reward 56 Environment Observation Action Reward If booking a right ticket, reward = +30 If failing, reward = -30 Otherwise, reward = -1 Agent X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 57.  Dialogue management is framed as a reinforcement learning task  Agent learns to select actions to maximize the expected reward 57 Environment Observation Action Agent Natural Language Generation User Agenda Modeling User Simulator Language Understanding Dialogue Management Neural Dialogue System Text Input: Are there any action movies to see this weekend? Dialogue Policy: request_location X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 58.  NLU and NLG are trained in a supervised manner  DM is trained in a reinforcement learning framework (NLU and NLG can be fine tuned) 58 wi B- type wi+ 1 wi+2 O O EO S <intent> wi B- type wi+ 1 wi+2 O O EO S <intent> Dialogue Policy request_location User Dialogue Action Inform(location=San Francisco) Time t-1 wi <slot> wi+ 1 wi+2 O O EO S <intent > Language Understanding Time t-2 Time t Dialogue Management w 0 w1 w2 Natural Language Generation EO S User Goal User Agenda Modeling User Simulator End-to-End Neural Dialogue System Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 59.  DM receives frame-level information  No error model: perfect recognizer and LU  Error model: simulate the possible errors 59 Error Model • Recognition error • LU error Dialogue State Tracking (DST) system dialogue acts Dialogue Policy Optimization Dialogue Management (DM) User Model User Simulation user dialogue acts (semantic frames) X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 60.  User simulator sends natural language  No recognition error  Errors from NLG or LU 60 Natural Language Generation (NLG) Dialogue State Tracking (DST) system dialogue acts Dialogue Policy Optimization Dialogue Management (DM) User Model User Simulation NL Language Understanding (LU) X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 61.  Frame-level semantics 61  Natural language The RL agent is able to learn how to interact with users to complete tasks more efficiently and effectively, and outperforms the rule-based agent. X. Li, Y.-N. Chen, L. Li, and J. Gao, “End-to-End Task-Completion Neural Dialogue Systems,” arXiv: 1703.01008, 2017.
  • 62. X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017. 62 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240 252 264 276 288 300 312 324 336 348 360 372 384 396 408 420 432 444 456 468 480 492 SuccessRate Simulation Epoch Learning Curve of System Performance Upper Bound DQN - 0.00 Rule - 0.00 RL Agent w/o LU errors Rule-Based Agent w/o LU errors
  • 63. X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017. 63 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 192 204 216 228 240 252 264 276 288 300 312 324 336 348 360 372 384 396 408 420 432 444 456 468 480 492 SuccessRate Simulation Epoch Learning Curve of System Performance Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05 RL Agent w/o LU errors RL Agent w/ 5% LU errors Rule-Based Agent w/o LU errors Rule-Based Agent w/ 5% LU errors >5% performance drop The system performance is sensitive to LU errors (sentence-level contexts), for both rule-based and RL agents.
  • 64. X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017.  Intent error type  I0: random  I1: within group  I2: between group  Intent error rate  I3: 0.00  I4: 0.10  I5: 0.20 64 Intent errors slightly influence the RL system performance Group 1: greeting(), thanks(), etc Group 2: inform(xx) Group 3: request(xx) Between-group intent errors degrade the system performance more request_moviename(actor=Robert Downey Jr) request_year
  • 65. X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017.  Slot error type  I0: random  I1: slot deletion  I2: value substitution  I3: slot substitution  Slot error rate  S4: 0.00  S5: 0.10  S6: 0.20 65 Slot errors significantly degrade the RL system performance Value substitution has the largest impact on the system performance request_moviename (actor=Robert Downey Jr) director Robert Downey Sr
  • 66.  Intent error rate  Slot error rate 66 The RL agent has better robustness to intent errors in terms of dialogue-level performance Slot filling is more important than intent detection in language understanding X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems,” arXiv: 1703.07055, 2017.
  • 67.  Word-level contexts in sentences help understand word meanings  Learning from Prior Knowledge – K-SAN achieves better LU via known knowledge [Chen et al., ‘16]  Learning from Observations – MUSE learns sense embeddings with efficient sense selection [Lee & Chen, ‘17]  Sentence-level contexts have different impacts on dialogue performance  Inference – App contexts improve personalized understanding via inference [Chen et al., ‘15] Contexts for knowledge carryover benefits multi-turn LU [Chen et al., ‘16]  Investigation of Understanding Impact – Slot errors degrade system performance more than intent errors [Li et al., ‘17]  Contexts from different levels provide cues for better understanding in supervised and unsupervised ways 67
  • 68. Q & A