A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia.
Seokhwan Kim, Rafael E. Banchs, Haizhou Li.
The 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, Jun 2014
Early Development of Mammals (Mouse and Human).pdf
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia
1. A Composite Kernel Approach for Dialog Topic Tracking with
Structured Domain Knowledge from Wikipedia
Seokhwan Kim, Rafael E. Banchs, Haizhou Li
Human Language Technology Department, Institute for Infocomm Research, Singapore
Introduction
Most previous dialog systems have focused only on a single
target task on a single topic and domain.
Some researchers have tried to identify dialog topics with text
categorization or external knowledge-based approaches.
In this work, we propose a composite kernel approach for
dialog topic tracking with structured domain knowledge from
Wikipedia.
Dialog Topic Tracking
A classification problem f(xt) = (yt−1, yt)
xt contains the input features obtained at a turn t
yt ∈ C, where C is a closed set of topic categories.
If a topic transition occurs at t, yt should be different from yt−1.
Otherwise, both yt and yt−1 have the same value.
Examples of dialog topic tracking
t Speaker Utterance Topic Transition
0 Guide How can I help you? NONE→NONE
1
Tourist Can you recommend some good places to visit in Singapore?
NONE→ATTR
Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place
to visit.
2
Tourist Merlion is a symbol for Singapore, right?
ATTR→ATTR
Guide Yes, we use that to symbolise Singapore.
3
Tourist Okay.
ATTR→ATTR
Guide The lion head symbolised the founding of the island and the fish body just
symbolised the humble fishing village.
4
Tourist How can I get there from Orchard Road?
ATTR→TRSP
Guide You can take the north-south line train from Orchard Road and stop at Raffles
Place station.
5
Tourist Is this walking distance from the station to the destination?
TRSP→TRSP
Guide Yes, it’ll take only ten minutes on foot.
6
Tourist Alright.
TRSP→FOOD
Guide Well, you can also enjoy some seafoods at the riverside near the place.
7
Tourist What food do you have any recommendations to try there?
FOOD→FOOD
Guide If you like spicy foods, you must try chilli crab which is one of our favourite
dishes here in Singapore.
8 Tourist Great! I’ll try that. FOOD→FOOD
f detects three topic transitions at t1, t4, t6
Topic sequence: Attraction→Transportation→Food
Wikipedia-based Dialog Topic Tracking
Supervised machine learning
With the training examples annotated with topic labels
Based on the information obtained solely from an ongoing dialog
Not sufficient to identify not only user-initiative, but also system-initiative
topic transitions
Wikipedia-based dialog topic tracking
To leverage on Wikipedia as an external knowledge source
Obtained without significant effort toward building resources
Wikipedia-based Composite Kernel
Composite kernel for dialog topic tracking
With relevant Wikipedia paragraphs selected using cosine similarity
sim (x, pi) =
φ(x) · φ(pi)
|φ(x)||φ(pi)|
The term vector φ(x) is computed by accumulating the weights in the
previous turns
φ(x) = α1, α2, · · · , α|W| ∈ R|W|
Two kernel structures are constructed using the information from the
highly-ranked paragraphs
History Sequence Kernel
A sequence of the most similar paragraphs
S = (s0, · · · , st), where sj = argmaxi sim xj, pi
Sub-sequence kernel
Ks(S1, S2) =
u∈An i:u=S1[i] j:u=S2[j]
λl(i)+l(j)
Domain Context Tree Kernel
A tree structure incorporates various types of domain
knowledge from Wikipedia
Subset tree kernel
Kt(T1, T2) =
n1∈NT1
n2∈NT2
(n1, n2) ,
Kernel Composition
Linear combination of three kernels
K(x1, x2) = α · Kl(V1, V2) + β · Ks(S1, S2) + γ · Kt(T1, T2)
Experimental Setup
Singapore tour guide dialogs
Human-human mixed initiative dialogs
35 sessions, 21 hours, 19,651 utterances
Manually annotated with nine topic categories
Wikipedia collection
3,155 articles related to Singapore
Collected from Wikipedia database dump as of February 2013
Models
Baselines: Kl only, Kl with P as features
Proposed approaches: Kl + Ks, Kl + Kt, Kl + Ks + Kt.
Measures
5-fold cross validation to the manual annotations
Accuracy of the predicted topic label for every turn
Precision/Recall/F-measure for each event of topic transition occurred
Experimental Results
Comparison of the performances among the models
Turn-level Transition-level
Accuracy P R F
Kl 62.45 42.77 24.77 31.37
Kl + P 62.44 42.76 24.77 31.37
Kl + Ks 67.19 39.94 40.59 40.26
Kl + Kt 68.54 45.55 35.69 40.02
Kl + Ks + Kt 69.98 44.82 39.83 42.18
Error distibutions of topic transitions
0
500
1000
1500
2000
2500
3000
Kl Kl + P Kl + Ks Kl + Kt ALL
NumberofTransitionErrors
FP(SYS)
FN(SYS)
FP(USR)
FN(USR)
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg WWW: http://hlt.i2r.a-star.edu.sg/