FXD 2019 Keynote: Marti Gold, SiriusXM

the era of multimodal design
October 25, 2019 Financial Experience Design Conference, Boston, MA #FXD2019

marti gold
Director of UX and Design
SiriusXM
The views and opinions expressed in this presentation are my
own and do not necessarily reflect the views of my employer.

what is a multimodal interface?

“Multimodal interfaces allow users to
seamlessly integrate two more of their senses
when interacting with a system, so they can
engage with that system in much the same
way they engage with the physical world.”

#multimodal #FXD2019
• voice
• touch
• natural language processors
• AI
smart displays give designers the ability 
to use voice, sight, and touch to increase user engagement

the rapid growth of smart displays
Source: Voicebot.ai, 2019
0 3.25 6.5 9.75 13
2017
558%
12.3%
< 3%
Percentage of smart speakers in homes that have displays
2018
increase in
year two
year 1
year 2

as of september 2019, there are  
9.6 million installed smart displays in the U.S.
2%
39%
59%
Amazon
Google & Partners
Facebook
Source: Voicebot.ai, 2019
Amazon
• Introduced smart displays in 2017
• 6 million installed base
• 59% of the market
• Down from 67% in Dec 2018
Google (& Partners)
• Entered market late summer 2018
• 4.25 million installed base
• 39% of the market
• Up from 33% in Dec 2018
• Facebook has 2% of the market; uses Alexa

…and that growth isn’t expected to slow down
Source: Strategy Analytics, Jan 8, 2019
0 25 50 75 100
2019
Number of homes that will have smart displays
2023 100m
13m
769% increase over the
next four years

why such rapid adoption?  
 
particularly since so few experiences 
are optimized for smart displays?

three reasons…
• We type at ~40 words per minute.
• We speak at ~130 words per minute.
• We read at ~250 words per minute
Therefore, the optimal experience for
interacting with any device may be:
VOICE IN | READ OUT
which is precisely what MMI devices offer
our users.

let’s look at a task 
many of us do every day…

how is this done using 
a graphic user interface (GUI)?

Using a GUI, 
you can see…
• categories of beverages
• options within each of the
categories
• photos to help you identify
beverages quickly
• prices
• which beverages you’ve
already ordered
• the total beverages in your cart

• Discoverable
• Explore all the options available to you
• Learnable
• For example, if you click the left nav, you will
see the options on the right change
• Visual Cues
• You can differentiate items by photo
• All the time in the world….
• You can study and peruse and compare to
your heart’s content
On one hand, the benefits
are self-evident…

But on the other hand…
• It’s SLOW….
• To place an order and check out, you may
have to visit 5+ screens and fill out multiple
form fields
• Requires personal info to checkout
• Email, phone, address, age, your high school
mascot, your first dog’s name…
• Did I mention S……L……O……W…?
• Remember typing = ~40wpm

“I know! Let’s make this 
a voice app or chatbot!
It will be so much faster!”

“Hey Google,  
What are today’s featured coffees?”
“Okay Marti. We have…
Pumpkin Spiced Latte…
Salted Caramel Mocha Frappucino…
Nitro Cold Brew…
Vanilla Latte…
Blonde Vanilla Latte…
Blonde Skinny Vanilla Latte…”

Stop! Go Back! That one! Wait!

• You know precisely what you want
• You know precisely how to say it
• You are in an environment where your
eyes and hands are occupied doing
something else
• You are in an environment that doesn’t
have a lot of background noise
On one hand,  
VUI’s are ideal IF…

But on the other hand…
• It’s TERRIBLE at presenting choices
• Any more than 3 options?  
Users WILL forget.
• Gives the user NO time to consider
options
• Voice interfaces expect immediate responses
• Highly limited navigation options
• Voice interfaces have linear flows. You can’t
easily skip steps or jump between areas.

“Voice-only interfaces will certainly have
a key role to play for simple interactions
such as turning on a lightbulb or listening
to music. But voice on its own is not
necessarily the best input and output
mechanism when it comes to more
complex tasks.”
David Mercer
Principal Analyst, Strategy Analytics

we are just at the beginning 
of the multimodal era

AI-powered
Chatbots
Bank of America reported
6.3 million users of its
virtual assistant, Erica, in
the first quarter of 2019,
up from 4.8 million the
previous quarter.
That’s an additional  
1.5 million users  
per QUARTER

but we are letting down 
smart display users
by not optimizing our content
for these devices.

Users want to love these
devices, but…
• The adoption of non-display smart speakers is still rising
slightly faster than smart displays
• Could be price. Smart speakers cost far less than smart displays
• Yet the device makers showcase the SAME THREE USE
CASES: Cooking, Smart Home control, and Video Chat.
• There is a potentially huge  
competitive edge for services that can  
take advantage of this gap.

So how can you do that?
Let’s look at common VUI problems
and see how MM’s solve them.

VUI problem #1: cognitive load
• The immediate and transient nature of voice interfaces requires the user to be fully alert
when the system responds
• They cannot control the speed of the information flow
• They cannot re-read to gain a better understanding
• They cannot scan multiple choices
• They cannot click away
• They cannot ignore the voice prompt without risking the cancellation of the entire interaction
• Therefore, they pay close attention
• This cognitive load requires all VUI responses be kept short, and limited in succession
• “Peaks of Attention”

peaks of attention: VUIs vs GUIs
BY DANIEL WESTERLUND
How to Go from Screens to Voice without Overwhelming the User
Multimodal interfaces
provide all the
advantages of a VUI  
but can follow the GUI
attention curve.

conversation guidelines: grice’s maxims
• The maxim of QUANTITY  
Give as much information as needed, but NO MORE.
• The maxim of QUALITY 
Be truthful. Information must be supported by evidence.
• The maxim of RELATION 
Be relevant, saying only things that are pertinent to the discussion.
• The maxim of MANNER 
Be clear, brief, and orderly to avoid obscurity and ambiguity.

grice’s maxim example: “what time is it?”
“It’s 10:56 AM”“It’s morning”“It’s 10:56 and 46 seconds AM,
Eastern Daylight Savings Time,
on October 25, 2019”

VUI problem #2: request and response interruptions
• Voice interfaces respond to requests, but interruptions cause problems
• You cannot count on user input being an isolated utterance. Requests often come embedded within
other conversations
• Real conversations are NOT scripted exchanges based on decision trees and flow charts
• In April 2018, Alexa was still experiencing a 50% failure rate
• Wake word not heard, “I didn’t understand the question”, etc.
• Note: Amazon says that rate has now been cut by 25%, but much is outside their control.
• You must always be asking, “How can my app work through interruptions?”
• A MM interface gives you the opportunity to provide quick visual cues to help the user recover and
achieve their task.

VUI problem #3: learnability
• Sadly, most users don’t know how to use voice commands and have unrealistic expectations about them.
• Ironically, the better AI/NLP systems become, the more users assume the device always has context.
• Without context, the device will return poor results – so the users keep playing music and setting timers.
• Although AI and personalization algorithms are making rapid improvements, users must still be reminded
how to express their intents fully, which does not come naturally.
• They will say: “Read my horoscope” vs "Ask Astrology Daily for horoscope for Leo”
• Reminder is in the response: “I have horoscopes for today from Astrology Daily. What is your zodiac sign?”
• Consequence of poor learnability: Only 3% of Alexa skills still have users after 2 weeks. (February 2018)
• A MM Interface can dramatically increase learnability of your skill or action by providing context and
visual cues for initial inquiries and follow-up responses.

A real challenge you are going to face will be
reconciliation of these important principles 
with your marketing department’s “brand voice”

and finally,
six tips I’ve gathered along the way…

“voice first” design
doesn’t apply to
these devices.
• Multimodal interfaces aren’t simply
voice interfaces with images.
• Do not have your assistant “read” the
screen. In fact, doing so will aggravate
your users.
• Presenting multiple choices on a
single screen means you will need to
build far more complex navigation
paths - not simple linear flows.
1

user education and support tasks
are particularly well-suited to MMIs
• High level product overviews and comparisons
• Anything that benefits from charts or other data
visualizations
• Questionnaires, surveys, risk assessments –
tasks that require multiple-choice responses
• Step-by-step tutorials
• If you have forms or need to provide critical,
detailed information, stick with a GUI.
2

never stop training your NLP application 
(Natural Language Processing)
• Just like VUI’s, even the most intelligent AI-powered multimodal interfaces have to be
“trained for intent” to reduce error rates. Consider these utterances:
• “What is the balance of my checking account?”
• “What’s my checking account balance?”
• “How much money is in checking?”
• Be prepared to monitor your error logs to test and add phrases to your intents — you can
even include negative phrases (“How do I balance my checking account?”).
• Amusing note:  
Remember that linguistics degree your Mom thought was useless? It’s now a 6-figure job.
3

when scoping, put in
3x your normal time
for testing
"80% of the effort that goes into building these
skills is probably going into testing and
refining the user experience, and the things
that users can say, and how they can say
them, and the different ways they can say
them.”
Tingiris,  
founder and managing director of Dabblelab
4

contextual and field
testing is important
Multimodal interactions are difficult for
researchers to observe.
• They are heavily dependent on context and
current activities
• They often happen in private spaces with
many interruptions and distractions
• They may only last a few seconds
• Try to find some way to include
contextual testing as you build your
MM interfaces.
5

and finally,
prepare your org for
frequent updates
• Designing multimodal interfaces is the  
Wild Wild West
• There are lots of suggestions but not a lot
of information on proven best practices
• No matter how carefully you test, keep in
mind that these interfaces are new to
users as well. What they like this month
may completely change next month.
Design and build for easy editing.
• Many orgs do not release non-bug related
updates quickly, so prepare them mentally
for this shift.
6

Thank you!
@martigold
marti.gold@siriusxm.com

FXD 2019 Keynote: Marti Gold, SiriusXM

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a FXD 2019 Keynote: Marti Gold, SiriusXM

Semelhante a FXD 2019 Keynote: Marti Gold, SiriusXM (20)

Mais de Mad*Pow

Mais de Mad*Pow (20)

Último

Último (20)

FXD 2019 Keynote: Marti Gold, SiriusXM