5. 多模式的優點
Task performance and user preference
Migration of Human-Computer Interaction away
from the desktop
Adaptation to the environment
Error recovery and handling
Special situations where mode choice helps
5
6. Task Performance and User Preference
Task performance and user preference for multimodal over
speech only interfaces [Oviatt et al., 1997]
10% faster task completion,
23% fewer words, (Shorter and simpler linguistic constructions)
36% fewer task errors,
35% fewer spoken disfluencies,
90-100% user preference to interact this way.
• Speech-only dialog system
Speech: Bring the drink on the table to the side of bed
• Multimodal dialog System
Speech: Bring this to here
Pen gesture:
Easy,
Simplified
user
utterance !
6
7. Migration of Human-Computer
Interaction away from the desktop
Small portable computing devices
Such as smart-phones
Limited screen real estate for graphical output
Limited input no keyboard/mouse
Complex GUIs not feasible
Augment limited GUI with natural modalities such as speech and pen
Use less space
Rapid navigation over menu hierarchy
Other devices
Kiosks, car navigation system…
No mouse or keyboard
Speech + pen gesture 7
8. Adaptation to the environment
Multimodal interfaces enable rapid adaptation to
changes in the environment
Allow user to switch modes
Mobile devices that are used in multiple environments
Environmental conditions can be
Physical
Noise: Increases in ambient noise can degrade speech
performance switch to GUI, stylus pen input
Brightness: Bright light in outdoor environment can limit
usefulness of graphical display
Social
Speech many be easiest for password, account number etc, but
in public places users may be uncomfortable being overheard
Switch to GUI or keypad input
8
9. Error Recovery and Handling
Advantages for recovery and reduction of error:
Users intuitively pick the mode that is less error-prone.
Language is often simplified.
Users intuitively switch modes after an error
The same problem is not repeated.
Multimodal error correction
Cross-mode compensation - complementarity
Combining inputs from multiple modalities can reduce the
overall error rate
Multimodal interface has potentially
9
10. Special Situations Where Mode Choice Helps
Users with disability
People with a strong accent or a cold
People with RSI
Young children or non-literate users
Other users who have problems when handle the
standard devices: mouse and keyboard
Multimodal interface let people choose their
preferred interaction style depending on the actual
task, the context, and their own preferences and
abilities. 10
13. Outline
1. Design the Final Assembly Task
2. Data Collection
3. Intent Classification
4. Helper Bot
13
14. 1. DESIGN THE FINAL ASSEMBLY TASK:
COMPONENTS
1. Body
2. Legs
3. Feet
4. Neck
5. Head
6. (Left) Upper Arm
7. (Left) Forearm
8. (Right) Upper Arm
9. (Right) Forearm
Head
Neck
(Left) Upper
Arm
(Left) Forearm
(Right) Forearm
Body
Legs
Foots
(Right) Upper Arm
14
15. 1. Design the Final Assembly Task:
Final Assembly
Given the 9 main body parts, assemble them into a
full meccanoid robot
To simply the complexity of assembling the whole
mecannoid robot and reduce the cost of collecting
data
Require to design new assembly steps based on
original assembly steps
15
16. 1. Design the Final Assembly Task:
Final Assembly Steps
Step Component1 Component2
1 Legs Body
2 Feet Legs
3 Neck Body
4 Head Neck
5 (Left) Forearm (Left) Upper Arm
6 (Right) Forearm (Right) Upper Arm
7 Arm Body
16
17. 2. Data Collection
1. Pilot Study (Pre-data collection)
– observe the interactions among users and the helper agent
2. Question Collection via Crowdsourcing
– collect training data for our intent classifier
17
18. 2. Data Collection:
Pilot Study on the Workbench
Multi-modal Environment:
• 2 cameras
(2nd & 3rd person perspective, not for
object tracking)
• 1 microphone
• A pair of IMU sensors
(weared on the subject forearms)
6 IOX summer interns
~50 mins/trial
18
20. 2. Data Collection:
Question Types
Stage-dependent
Q: Which direction do the screws have to go in? Doesn’t
matter?
A: (Show detail picture) You can check this and direction
is from top to bottom.
Stage-independent (FAQ)
Q: Is there a better way for locking screws?
A: You can try to use your fingers to lock the screws.
20
21. Scenario Stage/Intent
1. Is there a right direction to lock the screw? Stage1: Leg->Body, Intent: screw direction
2. Which holes should I lock the screws? Stage1: Leg->Body, Intent: connection
3. I should put the nuts on top or bottom? Stage1: Leg->Body, Intent: nut position
4. Does the direction I lock the screws matters? Stage2: Feet->Body, Intent: screw direction
5. Do you have anymore detail? Stage2: Feet->Body, Intent: connection
6. Is the screw direction from outside to inside? Stage3: Neck->Body, Intent: screw direction
7. Can I get a closer picture? Stage3: Neck->Body, Intent: connection
8. Does the neck has forward side or backward side? Stage3: Neck->Body, Intent: neck direction
9. Where do I put the screws? Can you zoom? Stage4: Head->Neck, Intent: connection
10. Doesn’t matter which one I put S2? Stage5&6: Forearm->Upper arm, Intent: S2
11. Where is the S2 go and where is the M2 go? Stage5&6: Forearm->Upper arm, Intent: connection
12. Should I embed it? Stage5&6: Forearm->Upper arm, Intent: embedding
13. Again only one screw? Stage7: Arm->Body, Intent: S2
14. Show me more detail about the joint part? Stage7: Arm->Body, Intent: connection
15. I want to check the body to make sure I didn’t make mistake. Stage7: Arm->Body, Intent: check
16. Does hex nuts have front side or back side? FAQ
17. Should I lock it tightly? FAQ
18. Is there a better way for locking screws? FAQ
19. Is it necessary to lock all the holes? FAQ
20. Can I lay it down to install? FAQ
21. Should I Install wires first? FAQ
21 Scenarios
↓
13 Intents
21
22. 2. Data Collection:
Question Collection via Crowdsourcing
For training our dialogue model, we must collect data
Collecting dialogues from the workbench is too costly
We design a human intelligence task (HIT) on
Amazon Mechanical Turk (AMT)
21 scenarios, belonging to 7 stages
22
24. Provided Question Collected Question
Where do I put the screws? Can you
show more details?
How do I know which hole the screw goes in?
Can you change the view so I can see the place to put the
screws better?
Is there a right direction to lock the screws? Does the screw enter from outside or inside?
What direction do I put the screw in from?
Is there a better way to lock the screw? How can I make the locking of the screws some more easy?
I need some tips to lock the screw efficiently
1670 collected questions from 80 turkers
2. Data Collection:
Collected Questions
24
25. 3. Intent Classification
Given a question, classify it into one of the 13 pre-
defined intents
Our approach is adding a question reformulator
(QR) to a standard question classifier
25
26. 3. Intent Classification
Baseline vs. QR-based
Question
Word Embedding Layer
2-layer Perceptron Classifier
(# of hidden units=100)
Intent
Accuracy = 72%
Question
Word Embedding Layer
2-layer Perceptron Classifier
(# of hidden units=100)
Intent
Accuracy = 75.5% (+3.5%)
Auto-Encoder Question
Reformulator
(# of hidden units=100)
26
27. 4. Helper Bot
Provide guidance for final assembly
The novice can ask the bot by speaking or typing
The helper bot asks back if some required
information is not given or some ambiguities needs
to be resolved
28
28. Deep-learning : ∎∎
Flask : ∎
Python : ∎
Arduino : ∎
Fronted : ∎
Backend : ∎
Recoder
Text to Speech (TTS)
Presenter
(Text/Picture/Sentiment)
Speech
to Text
(ASR)
Intent
Classifier
Dialogue
Manager
Server
Frontend Backend
Sentiment
Analyzer
Light Sensing
HelpBot System Architecture
29
34. Personal Dialogue Agents
in an ACB Environment
User-Agent2
User-Agent3User-Agent1
User
IoT-Agent1
IoT-Agent2
IoT-Agent3
35
35. Data Flow for Personal Dialogue Agent
Interactive
Dialogue-Model
Learning
Deep
Response
Generation
Semantic
Understanding
IoT information IoT information IoT information
36
37. Application Scenario:
Personalized Learning at Programming Lab
• Mina DeLuca
– 18 years old
– a CS student taking
Java course
– Her personal
dialogue agent for
programming is
Meera
38
38. Members of Programming Lab Class
39
Mina-OS
MeeraMina-ML
Mina
Robin-OS
RockRobin-ML
Robin
Ema-OS
Ema-MLEmily
Ema
Jane-OS
Jane-MLJanet
Jane
39. Modalities
• Text
– user input/system output
• Speech
– user input/system output, concentration, emotion
• Video
– lip reading, facial expression, body language, body movement
• Infrared
– head posture, body movement, eye tracking
• Keyboard/Mouse
– status
40
41. User-Agent Interaction (I)
42
PA_Meera:
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
2.[Line 6] error: ';' expected for (int i=1 to 100)
Personal Agent
[Meera]
Keyboard &
mouse didn’t
move for 5
mins
NextPrevIndex
42. ……
……
Information Processing Flow
43
Keyboard &
mouse
didn’t move
silence
Leaning
head
Interactive
Dialogue
Modeling
Ask(have
trouble,
debug)
Deep
Response
Generation
Mina, do you
have trouble in
debugging?
Mina’s
Dialogue
History
Mina’s
Social
Data
Mina’s Profile
Semantic
Understanding
Knowledge
Base
43. User-Agent Interaction (II)
44
PA_Meera: Mina, do you
have trouble in
debugging?
Personal Agent
[Meera]
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
2.[Line 6] error: ';' expected for (int i=1 to 100)
NextPrevIndex
44. User-Agent Interaction (III)
45
PA_Meera: Mina, do you
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
Personal Agent
[Meera]
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
2.[Line 6] error: ';' expected for (int i=1 to 100)
NextPrevIndex
Mina looks for
someone’s help
45. Interaction among Personal Agents of Classmates
46
Mina-OS
MeeraMina-ML
Mina
Robin-OS
RockRobin-ML
Robin
Ema-OS
Ema-MLEmily
Ema
Jane-OS
Jane-MLJanet
Jane
46. User-Agent Interaction (IV)
47
PA_Meera: Mina, do you
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
Personal Agent
[Meera]
Dialogue model generates an act:
inform(Jane, can_help, you)
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
2.[Line 6] error: ';' expected for (int i=1 to 100)
NextPrevIndex
47. User-Agent Interaction (III)
48
PA_Meera: Mina, do you
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
PA_Meera: Jane is
available to help you.
Personal Agent
[Meera]
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
2.[Line 6] error: ';' expected for (int i=1 to 100)
NextPrevIndex
the INFORM act is realized to the
utterance “Jane is available to help
you”
48. Platform
49
Personal Dialogue Agent
Cloud Engine
Agent
on classroom
desktop
Agent
on home
desktop
Agent
in car
Agent
on classroom
desktop
Agent
in car
Agent
on classroom
desktop
Agent
on home
desktop
Agent
in restaurant
Mina’s agents
Jane’s agents
Robin’s agents