SlideShare uma empresa Scribd logo
1 de 22
Enhancing Mobile Voice Assistants
with WorldGaze
Sven Mayer, Gierad Laput, Chris Harrison
CHI 2020 Paper
Contents
• Introduction
• Related Work
• System Design
• Exploratory Study
• Implementation
• Evaluation
• Example Uses
• Limitation
Introduction
1
• Each major smartphone has its own voice assistance
• Apple Siri
• Samsung Bixby
• Google Assistant
• Help us with our daily tasks
• Calculating
• Setting an alarm
• Given us weather information
• Given as opening hours for a specific restaurant
Apple Siri Google Assistant
Introduction
1
• Weakness of voice assistance
• Lack in contextual information about the surroundings
• Siri shows a list of possible options for Starbucks while
user is standing right in front of the Starbucks
• What we need…
• Contextual information about the user’s surroundings
• Make up for a more interaction between the user and
the voice assistants
Introduction
1
“When does this place close?” “When does Oakland Fashion Optical close?”
Related Work
• Multimodal Interaction
 Combining pen and finger input on touchscreens
• Drini et al. "Unimanual Pen+ Touch Input Using Variations of Precision Grip Postures." UIST. 2018.
• Hinckley et al. "Pen+ touch= new tools." UIST. 2010.
 Combining touch and gaze for enhanced selection
• Pfeuffer et al. "Gaze-touch: combining gaze with multi-touch for interaction on the same surface." UIST. 2014.
• Gaze Pointing
 Mid-air pointing
• Mayer et al. "Modeling distant pointing for compensating systematic displacements." CHI. 2015.
• Mayer et al. "The effect of offset correction and cursor on mid-air pointing in real and virtual environments." CHI. 2018.
 Eye tracking
• Zhai et al. "Manual and gaze input cascaded (MAGIC) pointing." CHI. 1999.
• Geospatial Mobile Interactions
 GPS and Wi-Fi localization
• Object Context + Voice Interactions
 Gaze and voice combined systems
• Glenn III et al. "Eye-voice-controlled interface." HFES., 1986.
• Koons et al. "Integrating simultaneous input from speech, gaze, and hand gestures." Intelligent multimedia interfaces. 1993
1
System Design
1
System Design
• Rear camera
 Knowledge about the world
 Retrieve more information about the user surroundings
 Understand the surroundings around the user better
• Front camera
 Knowledge about the user’s gaze
 What’s the user taking, displayed within the viewport of the camera
• WorldGaze
 Extract where the users actually looking
 Retrieve the place or object (business logos or signage)
 Fed into a voice assistance, context-rich inquiry
 Extra contextual information is added to the inquiry automatically
1
WorldGaze makes the overall experience more natural feeling to the user
8
Exploratory Study
• Wizard-of-Oz Analyzes
 Comparing: Touch, Voice, and WorldGaze
 Task: retrieving information (e.g. opening hours, ratings, phone numbers)
 Participants: 12 (9 males and 3 females), mean age 25.5 years (SD=3.3)
• Condition
 Touch: use Google Maps to query information
 Voice: Wizard-of-Oz voice assistant (triggered by “Hey Siri”) that always returned the correct answer
 WorldGaze: the voice assistant similarly returned the answer
• Feedback
 System Usability Scale(SUS), 10-items on a 5-point Likert scale
 Raw NASA TLX questionnaire, 6-items on a 21-point Likert scale
 Future use desirability, 7-point Likert scale
9
Exploratory Study
• Quantitative feedback
Lower is betterLower is better Higher is better
 SUS: did not reveal any significant difference
 TLX: Touch had a significantly higher task load than Voice and WorldGaze
 Future Use: No significant difference for three different types
 WorldGaze requires less words to be articulated, utterance duration is shorter
10
Exploratory Study
• Qualitative feedback
WorldGaze is faster – or it feels faster anyway – less frustrating
WorldGaze would be useful to have (P5)
Implicit input with WorldGaze would be striking (P9)
WorldGaze offers a lot of potential for future interaction paradigms
11
Exploratory Study
• Use Scenario
 Asking questions about products in stores or menu items in restaurants
 Interacting with smart home objects, such as controlling the TV or lighting
 Navigation support in museums
 Desktop computer interaction
• New Interactions
 Integrating into smart glasses is suggested by 6 participants
 Camera-equipped smart device (Facebook Portal, Google Nest Hub)
 Comparing multiple objects or places
12
Implementation
• Platform Selection
 iPhone, iOS 13.0
• The only mobile OS permitting front and back cameras to be opened simultaneously
• Tested with iPhone XR
• Head Gaze Ray Casting
 Robust face API provided by the Apple ARKit 3 SDK
 The forward-facing head vector (GazeVector) is used to extend a ray out from the bridge of the nose
 Runs at 30 FPS with ~50 ms of latency on an iPhone XR
• Object Recognition & Segmentation
 Apple’s Vision Framework
13
Implementation
• Voice Assistant Integration
 Continuous listening feature on iOS combined with speech-to-text
 Listen “Hey Siri”
 Search the string for ambiguous nouns (e.g. “this”, “that place”) and replace instances
• Battery Life Implications
 Integrated as a background service that wakes upon a voice assistant trigger
 Estimated power consumption at ~0.1 mWh per inquiry, using bench equipment
14
Video
15
Evaluation
• Evaluate the tracking and targeting performance
 Participants: 12 (9 males and 3 females), mean age 28.9 years (SD=5.8)
 Statistically significant influence of distance on error
 Horizontal and Vertical accuracy impact on error
16
Example Uses
• Streetscapes
• “When does this open?”
• “What is the rating for this place?”
• “Make me a reservation for 2 at 7 pm”
17
Example Uses
• Retail
• “Does this come in any other colors?”
• “Add this to my wishlist”
• “What is the price difference between this... and this.”
• Smart Homes and Offices
• Say “on” to lights or a TV
• Say “Down” to a TV or thermostat
18
Limitation
• Wider-angle lenses can cover more of the world gaze addressable
• Accuracy of gaze vector
 Numerous state-of-the-art algorithms are tested but severely lacking for use cases
19
Criticism
• User MUST see the screen while using the voice assistant
 Qiaohui Zhang, Atsumi Imamiya, Kentaro Go, and Xiaoyang Mao. 2004. Resolving ambiguities of a gaze and spee
ch interface. In Proceedings of the symposium on Eye tracking research & applications (ETRA ’04).
• Usage pattern of voice assistant
• How the system detect the restaurant which does not have logos?
• Accessibility
 People with low vision
• Social acceptance and privacy
 People may think user is recording them
20
Conclusion
WordGaze to enhance Voice Assistants
Exploration the possibilities of WorldGaze
Implementation of WorldGaze
Use Cases to showcase enhance Assistants
Thank you
Any questions?

Mais conteúdo relacionado

Semelhante a [Seminar] 200904 Seunghyeong Choe

COMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VRCOMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VRMark Billinghurst
 
COMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research DirectionsCOMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research DirectionsMark Billinghurst
 
VSMM 2016 Keynote: Using AR and VR to create Empathic Experiences
VSMM 2016 Keynote: Using AR and VR to create Empathic ExperiencesVSMM 2016 Keynote: Using AR and VR to create Empathic Experiences
VSMM 2016 Keynote: Using AR and VR to create Empathic ExperiencesMark Billinghurst
 
Novel Interfaces for AR Systems
Novel Interfaces for AR SystemsNovel Interfaces for AR Systems
Novel Interfaces for AR SystemsMark Billinghurst
 
Multimodal Multi-sensory Interaction for Mixed Reality
Multimodal Multi-sensory Interaction for Mixed RealityMultimodal Multi-sensory Interaction for Mixed Reality
Multimodal Multi-sensory Interaction for Mixed RealityMark Billinghurst
 
COMP 4010 Lecture12 Research Directions in AR
COMP 4010 Lecture12 Research Directions in ARCOMP 4010 Lecture12 Research Directions in AR
COMP 4010 Lecture12 Research Directions in ARMark Billinghurst
 
Beyond Reality (2027): The Future of Virtual and Augmented Reality
Beyond Reality (2027): The Future of Virtual and Augmented RealityBeyond Reality (2027): The Future of Virtual and Augmented Reality
Beyond Reality (2027): The Future of Virtual and Augmented RealityMark Billinghurst
 
UCD from across the pond - A case study in remote UX
UCD from across the pond - A case study in remote UXUCD from across the pond - A case study in remote UX
UCD from across the pond - A case study in remote UXNeil Turner
 
Future Directions for Augmented Reality
Future Directions for Augmented RealityFuture Directions for Augmented Reality
Future Directions for Augmented RealityMark Billinghurst
 
Empathic Computing: New Approaches to Gaming
Empathic Computing: New Approaches to GamingEmpathic Computing: New Approaches to Gaming
Empathic Computing: New Approaches to GamingMark Billinghurst
 
Mobile AR lecture 9 - Mobile AR Interface Design
Mobile AR lecture 9 - Mobile AR Interface DesignMobile AR lecture 9 - Mobile AR Interface Design
Mobile AR lecture 9 - Mobile AR Interface DesignMark Billinghurst
 
Human Computer Interaction: Academia and Industry
Human Computer Interaction: Academia and IndustryHuman Computer Interaction: Academia and Industry
Human Computer Interaction: Academia and Industrystudiotelon
 
Collaborative Immersive Analytics
Collaborative Immersive AnalyticsCollaborative Immersive Analytics
Collaborative Immersive AnalyticsMark Billinghurst
 
Context-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic MonitoringContext-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic Monitoringsamserpoosh
 
Was it worth the hassle? mobile hci 2014
Was it worth the hassle? mobile hci 2014Was it worth the hassle? mobile hci 2014
Was it worth the hassle? mobile hci 2014Jesper Kjeldskov
 
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip Likens
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip LikensPre-Conference Course: Wearables Workshop: UX Essentials - Phillip Likens
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip LikensUXPA International
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Spatial Audio for Augmented Reality
Spatial Audio for Augmented RealitySpatial Audio for Augmented Reality
Spatial Audio for Augmented RealityMark Billinghurst
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesMark Billinghurst
 

Semelhante a [Seminar] 200904 Seunghyeong Choe (20)

COMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VRCOMP 4010 Lecture12 - Research Directions in AR and VR
COMP 4010 Lecture12 - Research Directions in AR and VR
 
COMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research DirectionsCOMP 4010 Lecture10 AR/VR Research Directions
COMP 4010 Lecture10 AR/VR Research Directions
 
VSMM 2016 Keynote: Using AR and VR to create Empathic Experiences
VSMM 2016 Keynote: Using AR and VR to create Empathic ExperiencesVSMM 2016 Keynote: Using AR and VR to create Empathic Experiences
VSMM 2016 Keynote: Using AR and VR to create Empathic Experiences
 
Novel Interfaces for AR Systems
Novel Interfaces for AR SystemsNovel Interfaces for AR Systems
Novel Interfaces for AR Systems
 
Multimodal Multi-sensory Interaction for Mixed Reality
Multimodal Multi-sensory Interaction for Mixed RealityMultimodal Multi-sensory Interaction for Mixed Reality
Multimodal Multi-sensory Interaction for Mixed Reality
 
COMP 4010 Lecture12 Research Directions in AR
COMP 4010 Lecture12 Research Directions in ARCOMP 4010 Lecture12 Research Directions in AR
COMP 4010 Lecture12 Research Directions in AR
 
ISS2022 Keynote
ISS2022 KeynoteISS2022 Keynote
ISS2022 Keynote
 
Beyond Reality (2027): The Future of Virtual and Augmented Reality
Beyond Reality (2027): The Future of Virtual and Augmented RealityBeyond Reality (2027): The Future of Virtual and Augmented Reality
Beyond Reality (2027): The Future of Virtual and Augmented Reality
 
UCD from across the pond - A case study in remote UX
UCD from across the pond - A case study in remote UXUCD from across the pond - A case study in remote UX
UCD from across the pond - A case study in remote UX
 
Future Directions for Augmented Reality
Future Directions for Augmented RealityFuture Directions for Augmented Reality
Future Directions for Augmented Reality
 
Empathic Computing: New Approaches to Gaming
Empathic Computing: New Approaches to GamingEmpathic Computing: New Approaches to Gaming
Empathic Computing: New Approaches to Gaming
 
Mobile AR lecture 9 - Mobile AR Interface Design
Mobile AR lecture 9 - Mobile AR Interface DesignMobile AR lecture 9 - Mobile AR Interface Design
Mobile AR lecture 9 - Mobile AR Interface Design
 
Human Computer Interaction: Academia and Industry
Human Computer Interaction: Academia and IndustryHuman Computer Interaction: Academia and Industry
Human Computer Interaction: Academia and Industry
 
Collaborative Immersive Analytics
Collaborative Immersive AnalyticsCollaborative Immersive Analytics
Collaborative Immersive Analytics
 
Context-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic MonitoringContext-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic Monitoring
 
Was it worth the hassle? mobile hci 2014
Was it worth the hassle? mobile hci 2014Was it worth the hassle? mobile hci 2014
Was it worth the hassle? mobile hci 2014
 
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip Likens
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip LikensPre-Conference Course: Wearables Workshop: UX Essentials - Phillip Likens
Pre-Conference Course: Wearables Workshop: UX Essentials - Phillip Likens
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Spatial Audio for Augmented Reality
Spatial Audio for Augmented RealitySpatial Audio for Augmented Reality
Spatial Audio for Augmented Reality
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR Experiences
 

Mais de ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...ivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 

Mais de ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 

Último

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

[Seminar] 200904 Seunghyeong Choe

  • 1. Enhancing Mobile Voice Assistants with WorldGaze Sven Mayer, Gierad Laput, Chris Harrison CHI 2020 Paper
  • 2. Contents • Introduction • Related Work • System Design • Exploratory Study • Implementation • Evaluation • Example Uses • Limitation
  • 3. Introduction 1 • Each major smartphone has its own voice assistance • Apple Siri • Samsung Bixby • Google Assistant • Help us with our daily tasks • Calculating • Setting an alarm • Given us weather information • Given as opening hours for a specific restaurant Apple Siri Google Assistant
  • 4. Introduction 1 • Weakness of voice assistance • Lack in contextual information about the surroundings • Siri shows a list of possible options for Starbucks while user is standing right in front of the Starbucks • What we need… • Contextual information about the user’s surroundings • Make up for a more interaction between the user and the voice assistants
  • 5. Introduction 1 “When does this place close?” “When does Oakland Fashion Optical close?”
  • 6. Related Work • Multimodal Interaction  Combining pen and finger input on touchscreens • Drini et al. "Unimanual Pen+ Touch Input Using Variations of Precision Grip Postures." UIST. 2018. • Hinckley et al. "Pen+ touch= new tools." UIST. 2010.  Combining touch and gaze for enhanced selection • Pfeuffer et al. "Gaze-touch: combining gaze with multi-touch for interaction on the same surface." UIST. 2014. • Gaze Pointing  Mid-air pointing • Mayer et al. "Modeling distant pointing for compensating systematic displacements." CHI. 2015. • Mayer et al. "The effect of offset correction and cursor on mid-air pointing in real and virtual environments." CHI. 2018.  Eye tracking • Zhai et al. "Manual and gaze input cascaded (MAGIC) pointing." CHI. 1999. • Geospatial Mobile Interactions  GPS and Wi-Fi localization • Object Context + Voice Interactions  Gaze and voice combined systems • Glenn III et al. "Eye-voice-controlled interface." HFES., 1986. • Koons et al. "Integrating simultaneous input from speech, gaze, and hand gestures." Intelligent multimedia interfaces. 1993 1
  • 8. System Design • Rear camera  Knowledge about the world  Retrieve more information about the user surroundings  Understand the surroundings around the user better • Front camera  Knowledge about the user’s gaze  What’s the user taking, displayed within the viewport of the camera • WorldGaze  Extract where the users actually looking  Retrieve the place or object (business logos or signage)  Fed into a voice assistance, context-rich inquiry  Extra contextual information is added to the inquiry automatically 1 WorldGaze makes the overall experience more natural feeling to the user
  • 9. 8 Exploratory Study • Wizard-of-Oz Analyzes  Comparing: Touch, Voice, and WorldGaze  Task: retrieving information (e.g. opening hours, ratings, phone numbers)  Participants: 12 (9 males and 3 females), mean age 25.5 years (SD=3.3) • Condition  Touch: use Google Maps to query information  Voice: Wizard-of-Oz voice assistant (triggered by “Hey Siri”) that always returned the correct answer  WorldGaze: the voice assistant similarly returned the answer • Feedback  System Usability Scale(SUS), 10-items on a 5-point Likert scale  Raw NASA TLX questionnaire, 6-items on a 21-point Likert scale  Future use desirability, 7-point Likert scale
  • 10. 9 Exploratory Study • Quantitative feedback Lower is betterLower is better Higher is better  SUS: did not reveal any significant difference  TLX: Touch had a significantly higher task load than Voice and WorldGaze  Future Use: No significant difference for three different types  WorldGaze requires less words to be articulated, utterance duration is shorter
  • 11. 10 Exploratory Study • Qualitative feedback WorldGaze is faster – or it feels faster anyway – less frustrating WorldGaze would be useful to have (P5) Implicit input with WorldGaze would be striking (P9) WorldGaze offers a lot of potential for future interaction paradigms
  • 12. 11 Exploratory Study • Use Scenario  Asking questions about products in stores or menu items in restaurants  Interacting with smart home objects, such as controlling the TV or lighting  Navigation support in museums  Desktop computer interaction • New Interactions  Integrating into smart glasses is suggested by 6 participants  Camera-equipped smart device (Facebook Portal, Google Nest Hub)  Comparing multiple objects or places
  • 13. 12 Implementation • Platform Selection  iPhone, iOS 13.0 • The only mobile OS permitting front and back cameras to be opened simultaneously • Tested with iPhone XR • Head Gaze Ray Casting  Robust face API provided by the Apple ARKit 3 SDK  The forward-facing head vector (GazeVector) is used to extend a ray out from the bridge of the nose  Runs at 30 FPS with ~50 ms of latency on an iPhone XR • Object Recognition & Segmentation  Apple’s Vision Framework
  • 14. 13 Implementation • Voice Assistant Integration  Continuous listening feature on iOS combined with speech-to-text  Listen “Hey Siri”  Search the string for ambiguous nouns (e.g. “this”, “that place”) and replace instances • Battery Life Implications  Integrated as a background service that wakes upon a voice assistant trigger  Estimated power consumption at ~0.1 mWh per inquiry, using bench equipment
  • 16. 15 Evaluation • Evaluate the tracking and targeting performance  Participants: 12 (9 males and 3 females), mean age 28.9 years (SD=5.8)  Statistically significant influence of distance on error  Horizontal and Vertical accuracy impact on error
  • 17. 16 Example Uses • Streetscapes • “When does this open?” • “What is the rating for this place?” • “Make me a reservation for 2 at 7 pm”
  • 18. 17 Example Uses • Retail • “Does this come in any other colors?” • “Add this to my wishlist” • “What is the price difference between this... and this.” • Smart Homes and Offices • Say “on” to lights or a TV • Say “Down” to a TV or thermostat
  • 19. 18 Limitation • Wider-angle lenses can cover more of the world gaze addressable • Accuracy of gaze vector  Numerous state-of-the-art algorithms are tested but severely lacking for use cases
  • 20. 19 Criticism • User MUST see the screen while using the voice assistant  Qiaohui Zhang, Atsumi Imamiya, Kentaro Go, and Xiaoyang Mao. 2004. Resolving ambiguities of a gaze and spee ch interface. In Proceedings of the symposium on Eye tracking research & applications (ETRA ’04). • Usage pattern of voice assistant • How the system detect the restaurant which does not have logos? • Accessibility  People with low vision • Social acceptance and privacy  People may think user is recording them
  • 21. 20 Conclusion WordGaze to enhance Voice Assistants Exploration the possibilities of WorldGaze Implementation of WorldGaze Use Cases to showcase enhance Assistants