This document discusses designing voice AI for contact centers. It summarizes that while digital services can scale well, human customer service is very expensive. Voice AI aims to automate some service conversations to reduce costs while maintaining good customer experiences. However, voice interfaces differ significantly from chat and require approaches like turn-taking, conversational repair, and recipient design where users adapt how they speak to machines. The document cautions that voice AI is not truly natural language and users may not pay close attention. It emphasizes the complexity of human speech and conversation, and that voice systems will likely fail in unexpected ways initially before improving. Designers should be wary of technical limitations but avoid making users adapt more than necessary.
3. Welcome to IVR Hell
3
§Press “1”, if you are a glorious new customer.
§Press “2”, if you want to make changes.
§Press “3”, if you have a stupid technical question.
§Press “7”, if you are so obnoxious to file a complaint.
Let me talk to a human already!
Source: https://unsplash.com/photos/tEMU4lzAL0w
4. In service-intensive industries the contact
center is a huge cost driver*.
§Humans are expensive
§Well-trained motivated humans are even
more expensive
Empathy for the devil management
4
*= means “Pain in the ass.”
While digital touch points scale well,
excellent human service is excruciatingly expensive.
Source: https://unsplash.com/photos/JQ2D4I-2eyw (Hannah Nicollet)
5. Thus – a tough business:
5
Contact deflection
AHT* Reduction
Offshoring
Call Prioritization
Skill-based Routing
*= Average Handling Time
Automation
Outsourcing
6. To the rescue: Conversational AI!
6
Chat(bots):
§ Service conversations can be (partly) automated
§ Agents can handle several chats simultaneously
Voice Assistants:
§ “Understand” natural spoken language
§ Can be retro-fitted to existing call infrastructure
8. Myth #1: From chat to voice is easy
8
Voice is very different from chat:
§Real-time
§Ephemeral
§Unreliable recognition
§ A lot more conversational “repair” necessary
§Need to detect end of speech and signal the end of a prompt
§No interactive elements supporting the dialog
Chat offering a list of options
9. How to deal with ephemeral
Two Rules of Thumb
9
Rule of three1
Offer at most three different
options in a prompt.
Negotiate, if you have more than three.
What was the
first option?
Meet Otto,a one-option-guy.
(From the movie “A Fish Called Wanda”)
https://youtu.be/2j3adcbEwSM
1 = https://design.google/library/rule-of-three/
2 = Google “Jeff Blankenburg – Things Every Alexa Skill Should Do: Pass the One-Breath Test”
One-Breath-Test2
If you can say the response out loud
without taking a breath, it is
probably the right length.
10. Turn Taking: Still a bummer
10
§Human talk is highly cooperative.
§Turn taking is something, we are doing
very naturally and intuitively.
§Overlapping and latching are regular
elements of human dialog.
Time to respond. If you take
longer, something is “fishy”.
200 ms
(on average)
“Err”
“Mhh”
Still thinking.
Still listening.
Source: https://commons.wikimedia.org/wiki/File:Top-Hat-Rogers-Astaire.jpg
11. Myth #2:Voice AI is ‘natural’
11
„Conversational interfaces are game-like in that they are interactive but consist of a limited set of
rules and legal “moves” compared the real phenomenon they attempt to evoke. Just as users
must learn how video-game interfaces work, or any other user interfaces for that matter, they too
will need to learn how to “play” conversation games and how to navigate conversation spaces.
Conversational interfaces constitute a distinctive form
of interaction, which borrows interaction patterns
from natural human conversation but also exhibits its
own mechanics.“ Robert J. Moore (& Rafael Arar)
Conversational UX Design
(Great Book!)
Why any Human-Machine Interface works:
Humans adapt.
From personal experience.
Source: https://unsplash.com/photos/2EJCSULRwC8 (Alex Knight)
12. Recipient Design
Users talk differently to machines
12
Users with no voice AI experience:
§Talk very slowly and deliberately
§Provide information in small chunks
Actually:
§Talk fluently with normal intonation
§A good bot will handle “over-answering”*
*= User provides more information than was actually asked for.
Source: https://unsplash.com/photos/CEEhmAGpYzE (Colin Maynard)
Source: https://unsplash.com/photos/pi9W2dWDdak (@zanardi)
13. Myth #3: Users pay attention
13
§“[Users] … don’t read [web] pages.” (Stephen Krug)
§“And they don’t listen, either!” (sad voicebot)
“Bonn”
“BN” “BN (Pause) CL 9678”
“BN (Pause) Bonn” “Bonn Cäsar Ludwig 9678”
“Now please tell me your license plate.
Start with the region or town.” (German license plates)
14. Exercise conversational repair
14
Every 84 seconds in conversation, someone will
say “Huh?,” “Who?,” or something similar to
check on what someone just said.
From: How we talk, N.J.Enfield
“BN” “Pardon, was that BN for Bonn?”
“No, it was BM for Rhein-Erft-Kreis.” Support One-Step-Corrections!
15. Do not
train people to
„talk right“!
(Unless you absolutely have to.)
“Now please tell me your license plate.
Start with the region or town.”
16. Expect to get beaten up (at first)
16
No amount of research will
keep you from initially failing
in spectacular ways.
You cannot probably
anticipate every sort of
behaviour for free speech UIs.
Depending on the complexity
and your experience with the
specific conversational setting,
you might take month.
Source: https://unsplash.com/photos/a8QuaMV70FE (Quino Al)
17. When we talk,
many intricate subconscious mechanisms are at work.
Speech is instinctual and deeply social
17
Nothing gets people railed up more quickly
than a broken conversation:
§Repeatedly getting the same thing wrong
§Not responding to attempts to correct an
error
Source: https://unsplash.com/photos/8dvyPDYa35Q (Usman Yousaf)
18. What’s actually going on, when we talk?
18
Watch this TED talk
by Elisabeth Stokoe:
https://youtu.be/MtOG5PK8xDA
19. 19
Be wary of your technical limitations.
But do not make humans adapt more than you have to.
Source: https://unsplash.com/photos/mG-HdjYiPtE (Bewakoof.com)