[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人

工業4.0潛力新應用：
多模式對話機器人
AI 整合產業新紀元！點亮竹科關鍵轉型思維
國立中央大學資工系蔡宗翰教授
thtsai@g.ncu.edu.tw
Dec. 6, 2018
1

講師介紹—蔡宗翰
 現職
 國立中央大學教授, 智慧型資訊服務實驗室(IISR)主持人
 中研院地理資訊專題中心合聘副研究員
 台大智慧聯網中心 “互動式對話學習技術” 計畫主持人
 教育部人工智慧推動辦公室競賽計畫主持人
 專長
 自然語言處理
 問答/對話系統
 深度學習
 生物與醫學文本自動分析
 數位人文 2

自然語言處理相關得獎記錄
 SIGHAN Bakeoff-3中文斷詞競賽繁體冠軍
 SIGHAN Bakeoff-3中文NER競賽繁體亞軍
 NTCIR 競賽問答系統項目冠軍 (日本國立情報學研究所主辦)
 NTCIR 2014 競賽行動資訊檢索項目冠軍
(日本國立情報學研究所主辦)
 ESWC 2014 巨量文本意見探勘競賽冠軍 (歐洲語意網主辦)
 BioCreative 2017
生醫文本探勘競賽基因蛋白質識別(GPRO)項目冠軍
(美國國家衛生院主辦)
3

什麼是多模式對話機器人
 多模式對話機器人(Multimodal Chatbot)
 支持多種不同輸入和輸出模式的對話機器人
 輸入：語音，筆，手勢，面部表情等
 輸出：語音，圖形輸出等
4

多模式的優點
 Task performance and user preference
 Migration of Human-Computer Interaction away
from the desktop
 Adaptation to the environment
 Error recovery and handling
 Special situations where mode choice helps
5

Task Performance and User Preference
 Task performance and user preference for multimodal over
speech only interfaces [Oviatt et al., 1997]
 10% faster task completion,
 23% fewer words, (Shorter and simpler linguistic constructions)
 36% fewer task errors,
 35% fewer spoken disfluencies,
 90-100% user preference to interact this way.
• Speech-only dialog system
Speech: Bring the drink on the table to the side of bed
• Multimodal dialog System
Speech: Bring this to here
Pen gesture:
Easy,
Simplified
user
utterance !
6

Migration of Human-Computer
Interaction away from the desktop
 Small portable computing devices
 Such as smart-phones
 Limited screen real estate for graphical output
 Limited input no keyboard/mouse
  Complex GUIs not feasible
 Augment limited GUI with natural modalities such as speech and pen
 Use less space
 Rapid navigation over menu hierarchy
 Other devices
 Kiosks, car navigation system…
 No mouse or keyboard
  Speech + pen gesture 7

Adaptation to the environment
 Multimodal interfaces enable rapid adaptation to
changes in the environment
 Allow user to switch modes
 Mobile devices that are used in multiple environments
 Environmental conditions can be
 Physical
 Noise: Increases in ambient noise can degrade speech
performance  switch to GUI, stylus pen input
 Brightness: Bright light in outdoor environment can limit
usefulness of graphical display
 Social
 Speech many be easiest for password, account number etc, but
in public places users may be uncomfortable being overheard
 Switch to GUI or keypad input
8

Error Recovery and Handling
 Advantages for recovery and reduction of error:
 Users intuitively pick the mode that is less error-prone.
 Language is often simplified.
 Users intuitively switch modes after an error
 The same problem is not repeated.
 Multimodal error correction
 Cross-mode compensation - complementarity
 Combining inputs from multiple modalities can reduce the
overall error rate
 Multimodal interface has potentially
9

Special Situations Where Mode Choice Helps
 Users with disability
 People with a strong accent or a cold
 People with RSI
 Young children or non-literate users
 Other users who have problems when handle the
standard devices: mouse and keyboard
 Multimodal interface let people choose their
preferred interaction style depending on the actual
task, the context, and their own preferences and
abilities. 10

多模式對話機器人X工業4.0
組裝助手
12

Outline
1. Design the Final Assembly Task
2. Data Collection
3. Intent Classification
4. Helper Bot
13

1. DESIGN THE FINAL ASSEMBLY TASK:
COMPONENTS
1. Body
2. Legs
3. Feet
4. Neck
5. Head
6. (Left) Upper Arm
7. (Left) Forearm
8. (Right) Upper Arm
9. (Right) Forearm
Head
Neck
(Left) Upper
Arm
(Left) Forearm
(Right) Forearm
Body
Legs
Foots
(Right) Upper Arm
14

1. Design the Final Assembly Task:
Final Assembly
 Given the 9 main body parts, assemble them into a
full meccanoid robot
 To simply the complexity of assembling the whole
mecannoid robot and reduce the cost of collecting
data
 Require to design new assembly steps based on
original assembly steps
15

1. Design the Final Assembly Task:
Final Assembly Steps
Step Component1 Component2
1 Legs Body
2 Feet Legs
3 Neck Body
4 Head Neck
5 (Left) Forearm (Left) Upper Arm
6 (Right) Forearm (Right) Upper Arm
7 Arm Body
16

2. Data Collection
1. Pilot Study (Pre-data collection)
– observe the interactions among users and the helper agent
2. Question Collection via Crowdsourcing
– collect training data for our intent classifier
17

2. Data Collection:
Pilot Study on the Workbench
Multi-modal Environment:
• 2 cameras
(2nd & 3rd person perspective, not for
object tracking)
• 1 microphone
• A pair of IMU sensors
(weared on the subject forearms)
6 IOX summer interns
~50 mins/trial
18

2. Data Collection:
Pilot Study on the Workbench
19

2. Data Collection:
Question Types
 Stage-dependent
Q: Which direction do the screws have to go in? Doesn’t
matter?
A: (Show detail picture) You can check this and direction
is from top to bottom.
 Stage-independent (FAQ)
Q: Is there a better way for locking screws?
A: You can try to use your fingers to lock the screws.
20

Scenario Stage/Intent
1. Is there a right direction to lock the screw? Stage1: Leg->Body, Intent: screw direction
2. Which holes should I lock the screws? Stage1: Leg->Body, Intent: connection
3. I should put the nuts on top or bottom? Stage1: Leg->Body, Intent: nut position
4. Does the direction I lock the screws matters? Stage2: Feet->Body, Intent: screw direction
5. Do you have anymore detail? Stage2: Feet->Body, Intent: connection
6. Is the screw direction from outside to inside? Stage3: Neck->Body, Intent: screw direction
7. Can I get a closer picture? Stage3: Neck->Body, Intent: connection
8. Does the neck has forward side or backward side? Stage3: Neck->Body, Intent: neck direction
9. Where do I put the screws? Can you zoom? Stage4: Head->Neck, Intent: connection
10. Doesn’t matter which one I put S2? Stage5&6: Forearm->Upper arm, Intent: S2
11. Where is the S2 go and where is the M2 go? Stage5&6: Forearm->Upper arm, Intent: connection
12. Should I embed it? Stage5&6: Forearm->Upper arm, Intent: embedding
13. Again only one screw? Stage7: Arm->Body, Intent: S2
14. Show me more detail about the joint part? Stage7: Arm->Body, Intent: connection
15. I want to check the body to make sure I didn’t make mistake. Stage7: Arm->Body, Intent: check
16. Does hex nuts have front side or back side? FAQ
17. Should I lock it tightly? FAQ
18. Is there a better way for locking screws? FAQ
19. Is it necessary to lock all the holes? FAQ
20. Can I lay it down to install? FAQ
21. Should I Install wires first? FAQ
21 Scenarios
↓
13 Intents
21

2. Data Collection:
Question Collection via Crowdsourcing
 For training our dialogue model, we must collect data
 Collecting dialogues from the workbench is too costly
 We design a human intelligence task (HIT) on
Amazon Mechanical Turk (AMT)
 21 scenarios, belonging to 7 stages
22

2. Data Collection:
HIT Page on AMT
23

Provided Question Collected Question
Where do I put the screws? Can you
show more details?
How do I know which hole the screw goes in?
Can you change the view so I can see the place to put the
screws better?
Is there a right direction to lock the screws? Does the screw enter from outside or inside?
What direction do I put the screw in from?
Is there a better way to lock the screw? How can I make the locking of the screws some more easy?
I need some tips to lock the screw efficiently
1670 collected questions from 80 turkers
2. Data Collection:
Collected Questions
24

 Given a question, classify it into one of the 13 pre-
defined intents
 Our approach is adding a question reformulator
(QR) to a standard question classifier
25

Baseline vs. QR-based
Question
Word Embedding Layer
2-layer Perceptron Classifier
(# of hidden units=100)
Intent
Accuracy = 72%
Question
Word Embedding Layer
2-layer Perceptron Classifier
Intent
Accuracy = 75.5% (+3.5%)
Auto-Encoder Question
Reformulator
26

4. Helper Bot
 Provide guidance for final assembly
 The novice can ask the bot by speaking or typing
 The helper bot asks back if some required
information is not given or some ambiguities needs
to be resolved
28

Deep-learning : ∎∎
Flask : ∎
Python : ∎
Arduino : ∎
Fronted : ∎
Backend : ∎
Recoder
Text to Speech (TTS)
Presenter
(Text/Picture/Sentiment)
Speech
to Text
(ASR)
Intent
Classifier
Dialogue
Manager
Server
Frontend Backend
Sentiment
Analyzer
Light Sensing
HelpBot System Architecture
29

4. Helper Bot
Interface
Speech recognition result
Instruction (text & picture)
30

Sentiment score
System response
(picture & text)
4. Helper Bot
Interface
31

Typing mode
4. Helper Bot
Interface
32

系統概念性展示:
教導新手對話機器人
33Apprentice in smart factory

A Broader Application:
個人化多模式對話學習助理
34

Personal Dialogue Agents
in an ACB Environment
User-Agent2
User-Agent3User-Agent1
User
IoT-Agent1
IoT-Agent2
IoT-Agent3
35

Data Flow for Personal Dialogue Agent
Interactive
Dialogue-Model
Learning
Deep
Response
Generation
Semantic
Understanding
IoT information IoT information IoT information
36

Application Scenario:
Personalized Learning at Programming Lab
• Mina DeLuca
– 18 years old
– a CS student taking
Java course
– Her personal
dialogue agent for
programming is
Meera
38

Members of Programming Lab Class
39
Mina-OS
MeeraMina-ML
Mina
Robin-OS
RockRobin-ML
Robin
Ema-OS
Ema-MLEmily
Ema
Jane-OS
Jane-MLJanet
Jane

Modalities
• Text
– user input/system output
• Speech
– user input/system output, concentration, emotion
• Video
– lip reading, facial expression, body language, body movement
• Infrared
– head posture, body movement, eye tracking
• Keyboard/Mouse
– status
40

RealSense
Sound sensor
Motion sensor
Webcam
(turn on
with user’s permission)
41

User-Agent Interaction (I)
42
PA_Meera:
compiletheme
Compiling output:
1.[Line 6] error: ';' expected for (int i=1 to 100)
Personal Agent
[Meera]
Keyboard &
mouse didn’t
move for 5
mins
NextPrevIndex

……
……
Information Processing Flow
43
Keyboard &
mouse
didn’t move
silence
Leaning
head
Interactive
Dialogue
Modeling
Ask(have
trouble,
debug)
Deep
Response
Generation
Mina, do you
have trouble in
debugging?
Mina’s
Dialogue
History
Mina’s
Social
Data
Mina’s Profile
Semantic
Understanding
Knowledge
Base

User-Agent Interaction (II)
44
PA_Meera: Mina, do you
have trouble in
debugging?
Personal Agent
[Meera]
compiletheme
Compiling output:
NextPrevIndex

User-Agent Interaction (III)
45
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
Personal Agent
[Meera]
compiletheme
Compiling output:
NextPrevIndex
Mina looks for
someone’s help

Interaction among Personal Agents of Classmates
46
Mina-OS
MeeraMina-ML
Mina
Robin-OS
RockRobin-ML
Robin
Ema-OS
Ema-MLEmily
Ema
Jane-OS
Jane-MLJanet
Jane

User-Agent Interaction (IV)
47
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
Personal Agent
[Meera]
Dialogue model generates an act:
inform(Jane, can_help, you)
compiletheme
Compiling output:
NextPrevIndex

User-Agent Interaction (III)
48
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
PA_Meera: Jane is
available to help you.
Personal Agent
[Meera]
compiletheme
Compiling output:
NextPrevIndex
the INFORM act is realized to the
utterance “Jane is available to help
you”

Platform
49
Personal Dialogue Agent
Cloud Engine
Agent
on classroom
desktop
Agent
on home
desktop
Agent
in car
Agent
on classroom
desktop
Agent
in car
Agent
on classroom
desktop
Agent
on home
desktop
Agent
in restaurant
Mina’s agents
Jane’s agents
Robin’s agents

延伸學習
 科普層次
 圖解AI
 課程
 教育部人培計畫AI課程地圖
 人工智慧技術及應用領域課程
 實際演練
 教育部(全國大專校院)人工智慧競賽
 熱身賽: 107/10-108/2
 正式賽: 108/2起 (總獎金150萬)
50

AI CUP 2018 教育部大專校院AI競賽
https://aidea-web.tw/moe
51

[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a [TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人

Semelhante a [TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人 (20)

Mais de 台灣資料科學年會

Mais de 台灣資料科學年會 (20)

Último

Último (20)

[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人