Does mouse and monitor soon become a thing of the past? Will we communicate with devices in an augmented reality instead? Will devices still chat with us and not much more about us? Will they not rather know automatically what to do autonomously? Sascha does not just discuss these questions. He also shows how modern interaction and multimodal user interfaces can be integrated into your own connected applications using technologies such as Amazon Echo, Google Home and Microsoft Cortana.
3. Our Oldest Interface
• The deeply instinctive nature of speech
presents specific constraints and new
challenges. Our brains are fundamentally
wired to interpret the source of speech as
human. […] Thus, a device that speaks to us is
tapping into a deep river of psychological
adaptations, and subject to a set of
assumptions a pixel-based UI will never
encounter.
(Cheryl Platz, 2017, https://medium.com/microsoft-
design/voice-user-interface-design-new-solutions-to-old-
problems-baa36a64b3e4#.zc46diybh)
• There is no consensus on the ultimate origin
or age of human language (human language
could be 40,000 years old or much older).
Source: https://en.wikipedia.org/wiki/File:Real-time_MRI_-_Speaking_(English).ogv
4. Conversational User Experience
Amazon Echo: Alexa…
Google Home: Okay Google…
LingLong DingDong : DingDong DingDong …
8 Million sold by the end of 2016
https://www.digitalcommerce360.com/2017/01/23/amazons-us-echo-sales-top-8-million/
Harman Kardon Invoke / Microsoft Home Hub: Hey Cortana
5. Amazon Echo: Alexa…
8 Million sold by the end of 2016
https://www.digitalcommerce360.com/2017/01/23/amazons-us-echo-sales-top-8-million/
7. I know so much
…about you
Video: My Friend Cayla 2014
8. Source: MGM Child's Play (The Lakeshore Strangler), Vivid My friend Cayla
Big Brother Award
Internet of Uncanny Things
German Federal Network Agency says, any toy capable of
transmitting signals and recording images or sound without
detection is banned. (https://t.co/R7UCmI9aj9)
10. Conversation Experience and Voice
What Researchers say and why Investors bet on bots!
Conversational
Experience
Voice User
Interface (VUI)
63% like to use Voice to
control their home.
65% of Smartphone Users
have used Voice Assistants.
Bots are the new apps.
(Satya Nadella, Microsoft CEO)
Every fourth German wants to
use Chatbots.
Active Users: 1 billion
WhatsApp, 800 million
Facebook Messenger…
63 % don’t like to talk to/with
machines.
https://www.quora.com/Why-are-people-saying-Bots-are-the-new-apps
https://www.bitkom.org/Presse/Presseinformation/Jeder-Vierte-will-Chatbots-nutzen.html
http://www.fittkaumaass.de/news/chatbots-von-jedem-zweiten-online-kaeufer-abgelehnt
50 % doubt the reliability.
11. Google Voice Search, Google Now,
and Google Assistant
• Voice Search (2002)
• Voice Search is merged with Now (2012)
• Google Android and iOS
• Mobile usage scenarios
• Looks up the Internet but doesn’t know you
• Natural sounding voice commands
(https://www.cnet.com/how-to/complete-list-of-ok-google-commands/)
Google Assistant
• Google's Allo chat app*
• Google’s Pixel phone*
• Google Home
• Probably “next gen. Google Now”
• Conversational
• Deeper artificial intelligence
*Multimodal human-computer interaction involving several of the five human
senses (i.e. vision and voice).
Google Voice Search
Google Now
12. Alexa’s built-in voice capabilities for
your connected products:
• Works same way it would with an
Amazon Echo
• Access to third-party skills
developed using the Alexa Skills
Kit (ASK).
• Develover Kits
• Still a Wake Word Engine needed
(i.e. Sensory Alexa wake word
suite)
Commands/Conversation and Devices
Source: https://developer.amazon.com/alexa-voice-service
Skills Devices
13. Conversational User Experience
Source: https://developer.amazon.com/alexa-voice-service
Alexa in the Car: Ford, Amazon to Provide
Access to Shop, Search and Control Smart
Home Features on the Road.
The world's first Amazon Alexa-enabled
smartwatch: iMCO CoWatch.
LG puts Amazon Alexa on a fridge.
14. …operates without a graphical user interface and
is typically controlled via a network connection.
Headless Devices
Source: Discovery Channel 2013
15. Source: Room E demo by Jared Ficklin, http://www.youtube.com/watch?v=BGaAyBBur3I
Interaction isn´t one-dimensional
Multimodal interaction provides multiple modes of input and output.
speechRecognizer = new SpeechRecognitionEngine();
var grammer = new Grammar(new FileStream("commands.grxml...
speechRecognizer.LoadGrammar(grammer);
speechRecognizer.SpeechRecognized += new EventHandler...
speechRecognizer.SpeechHypothesized += new EventHandler...
speechRecognizer.SpeechRecognitionRejected += new EventHandler...
speechRecognizer.SetInputToAudioStream(stream,
new SpeechAudioFormatInfo(EncodingFormat...
speechRecognizer.RecognizeAsync(RecognizeMode...
17. Gulf between Human and Machine
User and GoalsPhysical System
(World)
Source: Norman, D. (1986). "User Centered System Design: New Perspectives on Human-computer Interaction". CRC. ISBN 978-0-89859-872-8
18. Voice Input Changes Lives
Inclusion
• The biggest and most impactful benefit voice
user experiences provide is vastly improved
accessibility. Looking for inspiration? Go read
the reviews of the Amazon Echo. […] Voice
UIs allow us to remain fully human in our
interactions.
(Cheryl Platz, 2017, https://medium.com/microsoft-design/voice-
user-interface-design-new-solutions-to-old-problems-
baa36a64b3e4#.zc46diybh)
19. Voice User Interface (VUI)
• Grice’s Maxims (1975)
(https://plato.stanford.edu/entries/grice/)
• Quality: Only say things that are true
• Quantity: Don’t be more or less
informative than needed
• Relevance: Only say things relevant to the
topic
• Manner: Be brief, get to the point, and
avoid ambiguity and obscurity
• Cooperative Principle
• Turn-taking
• Context
• Threading
Herbert Paul Grice (March 13, 1913 – August 28, 1988)
20. Speech Recognition/ Speech to Text
and Speech Synthesis / Text to Speech
Speech Processing
Source: Echo/Google Home infinite loop, https://youtu.be/ZfCfTYZJWtI
22. Source: Hatsune Miku - World is mine– 2011, https://youtu.be/YSyWtESoeOc
Vocaloid (2003)
Hatsune Miku: “First sound from the Future.”
Speech Synthesis
23. Speech Synthesis
• Text-to-phoneme
• Known/Unknow words
• Text normalization
• Henry VIII vs Chapter VIII
• Prosodics and emotional content
• How to sound “natural”?
• …
• Usually based on samples
(versus physiological modelling)
• Discrete symbols to continues
Waveforms
• Stochastic process
(Hidden Markov model)
• Machine Learning / Deep Learning
(https://static.googleusercontent.com/media/research.google.com/en//pub
s/archive/41539.pdf)
Source: https://en.wikipedia.org/wiki/Speech_synthesis
24. Speech Synthesis Markup Language (SSML)
• http://www.w3.org/TR/speech-synthesis/
• XML-based
• Some elements and attributes:
• break
• phoneme
• prosody
• say-as
• currency
• digits
• number
• date
• time
• …
• audio
• ...
<speak>
<say-as>
Welcome! Today is
</say-as>
<say-as interpret-as="date">
20121213
</say-as>
</speak>
25. Speech Recognition
History
• 1950’s: Bell Laboratories designed the
"Audrey“ which could understand digits
• 1960’s: IBM demonstrated “Shoebox” which
could understand 16 words
• 1970’s: Carnegie Mellon's "Harpy" speech-
understanding system could understand 1011
words (approximately the vocabulary of an
average three-year-old)
• …
Video: Massive Attack Tour 2008 http://www.uva.co.uk/archives/84
26. Speech Recognition
Moving from word templates and sound patterns to probability.
• 1980’s: Worlds of Wonder's Julie doll (1987),
which children could train to respond to their
voice.
• 1990’s: In the early 90’s Dragon Dictate (9000
USD) and in the late 90’s Dragon
NaturallySpeaking arrived to recognize
continuous speech
• 2000’s: It’s still guessing with around 80
percent accuracy.
• 2010’s: Google's English Voice Search system
now incorporates 230 billion words from
actual user queries.
• …
Video: https://youtu.be/UkU9SbIictc
29. Codified and strict vs.
Conversational
• CLI: Command Line Interface
• Input of Commands via Keyboard
• Eliza by Joseph Weizenbaum (Psychotherapist), 1966
• Already 1220 chatbots according to the chatbots
directory
(https://www.chatbots.org/)
[4]
31. Microsoft Xiaoice, Rinna, and Tay
i.e. Xiaoice has 20 million registered users
Sources: https://en.wikipedia.org/wiki/Xiaoice, https://en.wikipedia.org/wiki/Tay_(bot)
32. Persona for an Avatar with Personality
Source: http://genieblog.ch/cortana-vs-siri-1-emotionen/
35. Conversational UX turns real
Source: https://youtu.be/jSVRrJJ2nl4, SNL Julie the Operator 2006
36. Human?
• Chinese room: Does a machine literally
"understand" Chinese? Or is it merely
simulating the ability to understand Chinese?
Searle calls the first position "strong AI" and
the latter "weak AI".
(https://en.wikipedia.org/wiki/Chinese_room)
• Turing Test: A player C is given the task of
trying to determine which player – A or B – is
a computer and which is a human. C is limited
to using the responses to written questions to
make the determination.
(https://en.wikipedia.org/wiki/Turing_test)
• The Alexa Prize: A social bot that can
converse coherently and engagingly with
humans on popular topics for 20 minutes
(similar to Loebner Prize with 25 minutes).
(https://developer.amazon.com/alexaprize)
45. Action Types
Direct integration
(for home automation, media etc.)
• Direct Actions* (Google)
• Smart Home Skill (Amazon)
• Flash Briefing Skill (Amazon)
• Cortana Skill* (Microsoft)
*not yet available
Indirect integration
(invocation trigger/name)
• Conversation Actions (Google)
• Custom Skill (Amazon)
• Cortana Skill* (Microsoft)
*not yet available
Some restrictions for publishing! Needs invocation trigger!
46. Invocation Types
Invocation Type Conversation
Full Intent User: Alexa, ask Astrology Zone for the horoscope for Leo.
Astrology Zone: Today’s outlook for Leo: An opportunity presents itself at work.
Partial Intent User: Alexa, ask Astrology Daily for my horoscope.
Astrology Daily: Horoscope for what sign?
No Intent User: Alexa, talk to Astrology Daily.
Astrology Daily: You can ask for your horoscope. Which is your sign?
Ask <invocation name> <connecting word> <some action>
<some action> <connecting word> <invocation name>
Tell <invocation name> <connecting word> <some action>
Search <invocation name> for <some action>
Open <invocation name> for <some action>
Talk to <invocation name> and <some action>Launch <invocation name> and <some action>
Start <invocation name> and <some action>
Resume <invocation name> and <some action>
Run <invocation name> and <some action>
Load <invocation name> and <some action>
Begin <invocation name> and <some action>
Use <invocation name> <connecting word> <some action>
47. Prompt Types
Prompt Type Conversation
Question
Interaction remains
open, waiting for
respond.
Astrology Daily: Horoscope for
which sign?
Statement
Interaction will
terminate.
Astrology Daily: Today’s outlook for
Pisces: You could be questioning
your current path…
Wizard of Oz experiment, Image: http://www.kristamcgeebooks.com/
52. Tip: Development and Debugging
• Prepare node.js
• https://nodejs.org/
• https://expressjs.co
var express = require('express');
var bodyParser
= require('body-parser');
var app = express();
app.get('/', function (req, res) {
res.send('Hello World!');
});
app.listen(3000, function () {
console.log('Example app
listening on port 3000!');
});
• Use an editor or IDE (i.e. Visual Studio Code)
• https://code.visualstudio.com/Docs/runtimes/nodejs
• Connect to your local server via tunnel
• https://ngrok.com/
• Generates URL like https://b22ec890.ngrok.io/
ngrok http 3000
• Eclipse SmartHome/QIVICON
• REST API http://127.0.0.1:8080/doc/index.html
• Paper UI http://127.0.0.1:8080/ui/index.html
53. How-to Alexa Skills
var Alexa = require('alexa-sdk');
app.post('/', function(req, res) {
var context = {
succeed: function (result) {
console.log(result);
res.json(result);
},
fail:function (error) {
console.log(error);
}
};
var alexa =
Alexa.handler(req.body, context);
alexa.registerHandlers(handlers);
alexa.execute();
});
var handlers = {
'SwitchOnIntent': function () {
var item = this.event.request.intent.
slots.item.value;
doRequest("ON");
this.emit(':tell', 'Switch ' + item);
},
'SwitchOffIntent': function () {
var item = this.event.request.intent.
slots.item.value;
doRequest("OFF");
this.emit(':tell', 'Switch ' + item);
}
};
https://developer.amazon.com/edw/home.html#/skills/list
66. Next: Microsoft Cortana Skills
…and some more like Samsung Bixby etc.
• Harman Kardon Speaker announced for 2017
• Cortana
• Fictional AI character in the Halo video game.
• Acquired 2009 as TellMe.
• Intelligent personal assistant and knowledge navigator.
• Competes with Siri and Google Now since 2013 (Windows Phone 8.1).
• Available on Windows 10, Windows IoT, on Android and iOS, on Xbox, etc.
• Cortana Skills Kit announced for early 2017
• Cortana Devices SDK announced for 2017
• Allows OEMs (original equipment manufacturer) and ODMs to create smart and personal devices.
• Microsoft Bot Framework
• Build and connect intelligent bots to interact with your users naturally.
• Microsoft Luis - Language Understanding Intelligent Service (part of Cognitive Services)
• Understand language contextually
67. Source: Dark Star, 1974, Bryanston Pictures
I think, therefore I am.
[…] But how do you know that anything else exists?
My sensory apparatus
reveals it to me.
68. Chatty Devices
Does mouse and monitor soon become a thing of the past?
Sascha Wolter |@saschawolter | wolter.biz
Mai 2017
Source: Dark Star, 1974, Bryanston Pictures