Hello world!
• Started in 2017
• Over 20 years experience in web
content and social media
• Social media content, strategy,
advertising, training and
campaign management
• Video editing and motion
graphics
• Chatbots and voice technology
What we’ll cover
• What is voice technology
• How does voice technology
work?
• Why should we use it?
• Who’s using it well?
• Best practice in voice design
• Practical demonstrations
What is voice technology?
• Amazon Alexa
• Google Home
• Microsoft Cortana
• Apple Siri
• Samsung Bixby
A little bit of history
• Alan Turing was a pioneer of
modern computing
• He devised the Turing Test in
1950
MIT AI Laboratory
• Professor Marvin Minsky set
up the research group in the
early 1960s to explore
artificial intelligence, machine
learning and natural
language processing
ELIZA
• One major project that
emerged from the MIT AI
Laboratory was ELIZA in 1964
• Essentially this was an early
chatbot where individuals had
a conversational with a
computer
• They were not told they were
talking to a machine
ELIZA meet DOCTOR
• ELIZA simulated
conversations using pattern
matching and substitution
methodology, but did not
understand the context of
words
• One of the most popular
scripts ELIZA ran was
DOCTOR, that simulated a
psychotherapist
Back to the future
• Bringing things back up to
date, AI, natural language
processing and technology
can now understand context
• The Turing Test has still not
been passed, but we are
getting closer
Google Duplex
• Google recently
demonstrated their Duplex
technology that links voice
technology to cloud services
such as Google Calendar
• A sophisticated DenseNet in
TensorFlow can process
complex interactions, and
understand context
Making progress
• Duplex is said to be effective
in 80% of situations so
doesn’t yet pass the Turing
Test
• Deep Learning expert
Andrew Ng predicts that
once speech recognition is
99% accurate voice will be
the primary way we interact
with computers
The final 4%
• Estimates suggest we are at
around 95% currently
• The final 4% is very
challenging!
Adding functionality
• Amazon Alexa and Google
Home devices can add new
functionality via Skills and
Actions
• These give the devices new
capabilities, and anyone can
build them
Powerfully simple
• It is fairly quick and simple to
create content for these
devices
• There are now over 40,000
Alexa Skills available with an
active developer community
How does voice technology work?
• Voice technology uses
Natural Language Processing
to understand and interpret
voice commands
• This is underpinned by
machine learning techniques
Voice technology in action
Device listens
for invocation
User gives
wake word
Device returns
welcome message
Users gives
intent
Device returns
response
Intents
• An intent is used to trigger a
response
• For example a Skill / Action
could ask where you want to
go on holiday - New York,
Paris or Tokyo?
• Each of these choices would
be a separate intent and
produce different responses
Synonyms
• Intents are really powerful
and can include synonyms,
so if users have a different
name for something this can
be handled gracefully
• Eg Pavement / sidewalk
• AI is used with NLP so
phrases don’t have to be
exact
Slots
• You can also add slots to
intents that request specific
data be captured in a set
order
• This is particularly useful for
retail / ecommerce
Explicit and implicit invocations
• Explicit invocation
Alexa open Coffee Wizard
• Implicit invocation
Alexa recommend a coffee
for a sunny day
Discoverability
• It’s not always appropriate to
use explicit intents, as it can
feel less conversational and
mechanistic
• Alexa uses HypRank, a
neural network to rank Skills
using natural language
HypRank
• It’s not always appropriate to
use explicit intents, as it can
feel less conversational and
mechanistic
• Alexa uses HypRank, a
neural network that uses
contextual signals to rank
Skills using natural language
A few stats
• Voice technology will be a $601
million industry by 2019
Source: Technavio
• Over 21 million smart speakers
in the US by 2020
Source: Activate
• Google Assistant now available
to over 95% of Android devices
and majority of iOS
Source: Alpine AI
Creating Skills and Actions
• Amazon and Google provide
developer friendly tools for
building content
• AWS with Lambda
• Dialogflow with Firebase
• Work with a variety of
languages (Node.js, JAVA,
Python, Go, etc)
Using SSML
• SSML (Speech Synthesis
Markup Language) can be
used to control the
pronunciation, speed and
pitch of phrases
• For example you can make
Alexa pause, whisper or
place emphasis on specific
words