A Crowd-Powered Conversational Assistant That Automates Itself Over Time

A Crowd-Powered Conversational Assistant
That Automates Itself Over Time
CMU LTI PhD Thesis Proposal
Ting-Hao (Kenneth) Huang
January 11th, 2017
Jeffrey P. Bigham
CMU (Chair)
Alexander I. Rudnicky
CMU
Chris Callison-Burch
U Penn
Walter S. Lasecki
U Mich
Niki Kittur
CMU
Thesis Committee

Intro | Improving Chorus | Deployment | Automating Over Time 2
Intelligent Conversational Assistants

Intro | Improving Chorus | Deployment | Automating Over Time
Challenges of Open Conversation
• Combining multiple dialog systems
• DialPort (Zhao, et al., 2016)
• Adapting a model to many other domains
• Walker, et al., 2007; Sun, et al., 2016
• Chit-chat system
• Hold social conversations (Banchs, et al., 2012)
• Still a very hard problem…
– Alexa Prize: $2.5 Million
• “… achieves the grand challenge of conversing coherently
and engagingly with humans on popular topics for 20
minutes.”
3

Chorus: A Crowd-powered
Conversation Assistant
• A group of crowd workers collectively hold a
conversation by:
1. Propose Responses
2. Vote on Responses
3. Take Notes
• Reward points for
each action
Lasecki, W. S.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, J. F.; and Bigham, J. P. 2013.
Chorus: A crowd-powered conversational assistant. In UIST 2013, UIST ’13, 151–162.

What kind of conversations
can Chorus have?

female, computer science
PhD student in Texas
we're going to visit her this
weekend from Pittsburgh
She's in Austin
Does she have any
favorite TV shows,
movies, or video games?
U
Sure! What types of
things does your friend
like?
U
Can you suggest some
birthday present for one
of my friend?
Gift
Suggestion

Pittsburgh
with which company
are you flying?
U
Let me check
UHow many suitcases can I
take on a flight from the US
to Israel?
Can I ask you from where
are you planning to board
the flight?
and which air services
are you using?
Travel
Planning

okay wait a sec
How can I help you?
UWho was the prime minister
of Australia when JFK was
assassinated
Let me check it
Robert Menzies
Arbitrary
Question

A Top-Down Approach
9
Chorus
Fully-Automated System
Hybrid System
Minor
Automated
Assistance
Cost
High
Low
Latency
High
Low

Outline
1. Intro
2. Part I: Improving Chorus
3. Part II: Deployment
4. Part III: Automating Over Time
5. Conclusion
10

Outline
1. Intro
– InstructableCrowd: Creating IF-THEN Rules via
Conversation with the Crowd ( Huang et. al.,
CHI EA 2016)
5. Conclusion
Ting-Hao K. Huang, Amos Azaria, Jeffrey P Bigham. InstructableCrowd: Creating IF-THEN Rules
via Conversations with the Crowd. In CHI LBW 2016. (Best Paper Honorable Mention)
Icon made by Pixel Buddha from www.flaticon.com

A Rule = IF(s) + THEN(s)
https://commons.wikimedia.org/wiki/File:IFTTT_Logo.svg, https://www.flickr.com/photos/cjmartin/9261707401,
https://www.flickr.com/photos/paperon/15641138784, https://www.flickr.com/photos/chriscoyier/16673560329 , http://www.publicdomainpictures.net/view-
image.php?image=23182
IF(s) THEN(s)

InstructableCrowd: Creating IF-THEN
Rules via Conversation with the Crowd

InstructableCrowd Overview
14

Worker Interface

User Study
• 10 participants, 6 scenarios, 10 workers per trial
• Evaluation
– App Selection (P/R/F1)
– Attribute Filling (Accuracy)

Evaluation
App Selection (P/R/F1)
Attribute Filling (Accuracy)
17

What did we learn?
• Crowd-powered conversational interface
can be used to create IF-THEN rules.
IF(s) THEN(s)
User

Outline
1. Intro
– Chorus Deployment (Huang et. al., HCOMP 2016)
– Chorus Dataset (Proposed)
5. Conclusion
19
Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham. "Is there anything else I
can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational
Agent. In Proceedings of Conference on Human Computation & Crowdsourcing (HCOMP 2016),
2016, Austin, TX, USA.

We deployed Chorus
• Launched on May 20th, 2016.
• 132 users used it during 1,028 conversational sessions
• TalkingToTheCrowd.org

System Overview

22
How to recruit workers
fast on-demand?

How to recruit workers
fast on-demand?
• Two Common Practices
– Start recruiting on-demand (Bigham, et al., 2010)
– Keep workers on-call (Retainer) (Bernstein, et al., 2011)
• Both are designed for short tasks

Retainer Model
24 / 31
Conversation
Conv.
Ends
Wait in Retainer
Time
Conv.
Starts
Wait in Retainer
Workers’ waiting
time cost money.

Chorus’ Recruiting Method
Conversation 1 Conversation 2
Post
HIT
Fully
Occupied
Conv. 1
Ends
Post
HIT
Wait in Retainer
Time

Is this recruiting method
fast enough?
• Avg first crowd response time = 88.351 sec
56.08% first crowd respond within 1 min

What challenges
did we identify?

Challenges Identified
• Malicious workers & users
• Identifying the end of a conversation
• When workers’ consensus is not enough
28

Challenges Identified
• Malicious workers & users
• Identifying the end of a conversation
• When workers’ consensus is not enough
29

Malicious Users
• Abusive Languages
– Sexual content
– Profanity
– Hate speech
– Threats of criminal acts
• Solutions
– Word detection
30

What did we learn?
• Deploying a on-demand real-time crowd-
powered agent is feasible.
• Basic Statistics
– Avg session duration = 10.493 min
(SD = 14.139 min)
– Avg #message per session = 17.877
(SD = 24.158)
– Avg cost per conversation = $2.48
(SD = $0.99)

Outline
1. Intro
– Chorus Deployment (Huang et. al., HCOMP 2016)
– Chorus Dataset (Proposed)
5. Conclusion
32

Chorus Dataset (Proposed)
• Goal: Future Automation
– Automatic response generation & selection
– Dialog Learning (state tracking)
• Data
– Message, Vote (upvote / downvote), Note
• Data Pre-processing
– Anonymization
– Inappropriate Content
– Spamming Messages
– Conversation Segmentation

Outline
1. Intro
– Guardian: A Crowd-Powered Dialog System for
Web APIs (Huang et. al., HCOMP 2015; Huang et.
al., HCOMP WIP 2014)
– Automate Chorus over time (Proposed)
5. Conclusion
34
Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham. Guardian: A Crowd-Powered Spoken
Dialog System for Web APIs. In HCOMP 2015.
Ting-Hao K. Huang, Walter S. Lasecki, Alan L. Ritter, Jeffrey P. Bigham. Combining Non-Expert
and Expert Crowd Work to Convert Web APIs to Dialog Systems. HCOMP WIP 2014.

Empower Chorus with
Multiple Dialog Systems
35

How to build a set of
dialog systems quickly?
36

Use Web APIs to Empower Chorus
37 / 56
16,583+ APIS

Guardian: A Crowd-Powered Dialog System
for Web APIs
3
2 Call Web APIHi, I’m in San Diego.
Any Chinese restaurants here?
1 Talk and Extract Parameter
Interpret Result to User
Mandarin Wok Restaurant is
good ! It’s on 4227 Balboa Ave.
term = Chinese
location = San Diego
Yelp
Search
API 2.0
{ ... "name":
"Mandarin Wok
Restaurant”,...
"address":["4227
Balboa Ave”,...], …}
JSON

How to convert a Web API to
a conversational agent?
term
location
Hi, I’m in San Diego.
Which parameters to use?
How to extract parameters?
39

term
location
40

Select Parameter:
Step (1): Collect Questions
I like Chinese food.
What do you want to eat?
? !
I’m in Pittsburgh.
Which city are you in?
? !
Dinner.
Is it dinner or lunch?
? !
...
Yelp API
Question Collection

Select Parameter:
Step (2): Filter Parameters
offset
? !
? !
Dinner.
? !
...
term
location
sw_latitude
sw_longitude
category_filter
Yelp API
Question Collection
Parameter Filtering

Select Parameter:
Step (3): Parameter-Question Matching
offset
? !
? !
Dinner.
? !
...
location
?
!
term
? !
!
?
!
? !
?
!
?
!
category_filter
? !
?
!
?
!
?
!
? !
?
!
? !
?
! ? !
? ! ? !
?
!
?
!
?
!
?
!
?
!
?
!?
!
? !
? !
? !
? !
? !? !
?
!
?
!
? !
? !? !
? !
? !
? !
?
!
? !
?
!
term
location
sw_latitude
sw_longitude
category_filter
BetterParameter
Yelp API
Question Collection
Parameter Filtering
Question-Parameter Matching

term
location
44

Extract Parameters:
Dialog ESP Game
Answer
Aggregate
Location =
San Diego
RecruitedPlayers
Time Constraint

Aggregate Method 1: ESP + 1st
ESP Answers
do NOT
Match
ESP Answer
Matches

Aggregate Method 2: 1st Only
ESP Answers
do NOT
Match
ESP Answer
Matches

Experiment
• Data
– Airline Travel Information System (ATIS)
• Class A: Context Independent (simple)
• Class D: Context Dependent
• Class X: Unevaluable
• Settings
– Number of workers = 10
– Time constraint = 20 seconds
– 2 aggregate methods
– Using Amazon Mechanical Turk

How good? How fast?
0
0.2
0.4
0.6
0.8
1
Class A Class D Class X
CRF
1st only
ESP + 1st
0
1
2
3
4
5
6
7
8
9
Class A Class D Class X
1st only
ESP + 1st
F1-score = 0.8 (Class D) < 9 sec (ESP+1st)
F1-score Response Time (sec)

Guardian: A Crowd-Powered Dialog System
for Web APIs
3
2 Call Web APIHi, I’m in San Diego.
1 Talk and Extract Parameter
Interpret Result to User
Mandarin Wok Restaurant is
good ! It’s on 4227 Balboa Ave.
term = Chinese
location = San Diego
Yelp
Search
API 2.0
{ ... "name":
"Mandarin Wok
Restaurant”,...
"address":["4227
Balboa Ave”,...], …}
JSON

Web API
Task
Find Chinese
restaurants in
Pittsburgh.
Check current
weather
by using a zip
code.
Find
information
of “Titanic”.
API Only 9 / 10 9 / 10 6 / 10
API +
Crowd Recover
10 / 10 9 / 10 10 / 10
Domain
Referenced
0.96 0.94 0.88
End-to-end Evaluation (TCR)

What did we learn?
• Use non-expert crowd to convert Web APIs to
dialog systems are feasible.
Define Parameters
Extract Parameters

Outline
1. Intro
– Guardian: A Crowd-Powered Dialog System for Web
APIs (Huang et. al., HCOMP 2015; Huang et. al.,
HCOMP WIP 2014)
– Automate Chorus over time (Proposed)
5. Conclusion
53

Empower Chorus with
Multiple Dialog Systems
54

55 / 31
Initial Chorus
55

Automatic
Responder Selection
• Label: Upvote/Downvote
• Feature:
• Conversation content
• Previously selected bot
• Generated text
• End-user’s responses
• …

57 / 31
Automatic
Response Voting
57
Automatic
Responder Selection
• Label: Upvote/Downvote
• Feature:
• Conversation content
• Previously selected bot
• Mete data of the bot
• Other response candidates
• …

58 / 31
Adjusting
Worker’s Workload
58
Automatic
Response Voting
Automatic
Responder Selection
• Bootstrapping
• Bot v.s. workers
• Competing for votes

Preliminary Results
• 3 automatic bots
• Randomly selected each turn
59

Preliminary Results (Cont.)
60
Humans are good.
Auto bots receive
much more downvotes.

Conclusion
• What did we do?
– InstructableCrowd (CHI EA 2016)
• Enabling Chorus to create IF-THEN rules
– Chorus Deployment (HCOMP 2016)
• Chorus is deployable
– Guardian (HCOMP 2015; HCOMP WIP 2014)
• Converting a Web API to a dialog system
• What will we do?
– Automate Chorus over time
– Release Chorus dataset
61

Contributions
62
Fully-Automated System
Hybrid System
Minor
Automated
Assistance
Deploying
Chorus
Automate Chorus
Over Time
Data for
Future
Automation

Timeline
• January - March 2017: Automating response voting
• March - May 2017: Automating responder selection
• May - September 2017: Automating dynamic workload
assignment
• September - December 2017: Chorus Dataset
• September - December 2017: Thesis writing
• Spring 2018: Thesis Defense
63

Q&A
TalkingToTheCrowd.org
64

Reference
• Zhao, T., Lee, K., & Eskenazi, M. (2016). DialPort: Connecting the Spoken Dialog
Research Community to Real User Data. arXiv preprint arXiv:1606.02562..
• Banchs, R. E., & Li, H. (2012, July). IRIS: a chat-oriented dialogue system based on
the vector space model. In Proceedings of the ACL 2012 System Demonstrations (pp.
37-42). Association for Computational Linguistics.
• Walker, M. A., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain
adaptation in sentence planning for dialogue. Journal of Artificial Intelligence
Research, 30, 413-456.
• Sun, M. (2016). Adapting Spoken Dialog Systems Towards Domains and Users.
Doctoral dissertation, YAHOO! Research.
• Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., ... & Yeh, T. (2010,
October). VizWiz: nearly real-time answers to visual questions. In Proceedings of the
23nd annual ACM symposium on User interface software and technology (pp. 333-
342). ACM.
• Bernstein, M. S., Brandt, J., Miller, R. C., & Karger, D. R. (2011, October). Crowds in
two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th
annual ACM symposium on User interface software and technology (pp. 33-42). ACM.
65

Publication List
1. "Is there anything else I can help you with?": Challenges in Deploying an On-
Demand Crowd-Powered Conversational Agent
Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham.
In Proceedings of Conference on Human Computation & Crowdsourcing (HCOMP 2016),
2016, Austin, TX, USA.
2. Guardian: A Crowd-Powered Spoken Dialog System for Web APIs
Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham.
In Conference on Human Computation & Crowdsourcing (HCOMP 2015), pages 62–71,
November, 2015, San Diego, USA.
3. InstructableCrowd: Creating IF-THEN Rules via Conversations with the Crowd
Ting-Hao K. Huang, Amos Azaria, Jeffrey P Bigham.
In CHI '16 Late-Breaking Work on Human Factors in Computing Systems (CHI LBW
2016), May, 2016, San Jose, CA, USA. (Best Paper Honorable Mention Award)
4. Combining Non-Expert and Expert Crowd Work to Convert Web APIs to Dialog
Systems
Ting-Hao K. Huang, Walter S. Lasecki, Alan L. Ritter, Jeffrey P. Bigham.
Work-in-Progress paper in the Proceeding of Conference on Human Computation &
Crowdsourcing (HCOMP WIP 2014), pages 22-23, November 2-4, 2014, Pittsburgh, USA.
66

4 Conditions
1. User Only2. Crowd Only
3. Crowd + User4. Crowd Voting
68

Worker Interface
69

Trade-offs (Class A)
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10
Avg.ResponseTime(sec)
# Player
ESP + First (20 sec)
ESP + First (15 sec)
First (20 sec)
First (15 sec)
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 2 4 6 8 10
F1-score
# Player
ESP + 1st (20 sec)
ESP + 1st (15 sec)
1st (20 sec)
1st (15 sec)
0.65
0.70
0.75
0.80
0.85
0.90
0.95
5 6 7 8 9 10 11 12
F1-score
Avg. Response Time (sec)
10 Players
9 Player
8 players
7 Players
6 Players
5 Players
ESP + 1st
(20 sec)
1st Only
(20 sec)
More Workers,
Faster
More Workers,
Better Quality
Faster,
Worse Quality

Aggregate Method 1: ESP Only
ESP Answers
do NOT
Match
Empty
Label
ESP Answer
Matches

Evaluation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MAP MRR
Question Matching
Not Unnatural
Ask Siri
Ask a Friend
Question Matching
outperforms all baselines.

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

Recommended

Recommended

More Related Content

Similar to A Crowd-Powered Conversational Assistant That Automates Itself Over Time

Similar to A Crowd-Powered Conversational Assistant That Automates Itself Over Time (20)

More from Ting-Hao Huang

More from Ting-Hao Huang (6)

Recently uploaded

Recently uploaded (20)

A Crowd-Powered Conversational Assistant That Automates Itself Over Time

Editor's Notes