5. DIY Deep Learning
for Custom Models
AI Enabled
Managed API
Services
Amazon AI: New Deep Learning Services
Polly LexRekognition
Deep Learning
Frameworks
MXNet, TensorFlow, Theano, Caffe, Torch
CONTROL
USABILITY&
SIMPLICITY
10. Amazon AI: New Deep Learning Services
Life-like Speech
Polly Lex
Conversational
Engine
Rekognition
Image Analysis
11. Amazon Lex
Conversational interfaces for your applications, powered
by the same Natural Language Understanding (NLU) &
Automatic Speech Recognition (ASR) models as Alexa
12. Lex: Build Natural, Conversational Interactions
Trigger AWS
Lambda functions
Continually improving
ASR & NLU models
Enterprise
connectors
Salesforce
Microsoft Dynamics
Marketo
Zendesk
Fully
Managed
Voice & Text
“Chatbots”
Text interaction
with Slack & Messenger
Improving human interactions…
• Contact, service, and support center interfaces (text + voice)
• Employee productivity and collaboration (minutes into seconds)
13. Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
14. Origin
Destination
Departure Date
Flight Booking
“Book a flight
to London from Seattle”
Automatic
Speech Recognition
Natural Language
Understanding
Book Flight
London
Utterances
Flight booking
London Heathrow
Intent /
Slot model
London Heathrow
Seattle
Seattle
Seattle
15. Origin
Destination
Departure Date
Flight Booking
“Book a flight
to London from Seattle”
Automatic
Speech Recognition
Natural Language
Understanding
Book Flight
London
Utterances
Flight booking
Intent /
Slot model
London Heathrow
Seattle
Prompt
“When would you like to fly?”
“When would you
like to fly?”
Polly
Seattle
London Heathrow
Seattle
17. Origin
Destination
Departure Date
Flight Booking
“Next Friday”
Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
18. Origin
Destination
Departure Date
Flight Booking
“Next Friday”
Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
Confirmation
“Your flight is booked for next Friday”
“Your flight is booked
for next Friday”
Polly
19. Origin
Destination
Departure Date
Flight Booking
“Next Friday”
Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
Hotel Booking
20. Amazon Polly
Turn Text into lifelike speech using deep learning
technologies to synthesize speech that sounds like a
human voice
21. Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
22. Converts text
to life-like speech
47 voices 24 languages Low latency,
real time
Fully managed
Polly: Life-like Speech Service
What is supported?
• Supports all programming language included in AWS SDK
(Java, Python, Node.js, etc) as well as HTTP API
• Audio stream formats: MP3, Vorbis, raw PCM
• Choose your sampling rate to optimize bandwidth & quality
• Customized Pronunciation
Articles and Blogs
Training Material
Chatbots (Lex)
Public Announcements
23. Polly: SSML and Lexicons
• Using version 1.1 SSML tags to adjust the speech rate, pitch, or volume. e.g.
• <break time="1s"/> pause 1 second between the initial two sentences
• <sub alias="World Wide Web Consortium">W3C</sub> substitute "World Wide Web Consortium" for the
acronym "W3C"
• <amazon:effect name="whispered">Score</amazon:effect> say the second "Score" in a whispered voice
<speak>He was caught up in the game.<break time="1s"/> In the middle of the 10/3/2014 <sub
alias="World Wide Web Consortium">W3C</sub> meeting he shouted, "Score!" quite loudly. When
his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect> in
a whisper.</speak>
• Pronounciation lexicons enable you to customize the pronunciation of words
<lexeme>
<grapheme>Bob</grapheme>
<alias>Robert</alias>
</lexeme>
aws polly synthesize-speech
--lexicon-names LexA LexB
--output-format mp3
--text 'Hello, my name is Bob'
--voice-id Justin
bobAB.mp3
“Hello, my name is Robert”
24. "Our Mapbox Navigation SDK offers a complete
turn-by-turn navigation solution that you can easily
add to your iOS or Android application, and having
clear, well-understood voice guidance is critical to
the user experience. Therefore, we’re excited to
offer natural-sounding pronunciation with highly
intelligible and pleasant voices in our users’ most
widely used languages with Amazon Polly’s Text-to-
Speech service."
– Paul Veugen, VP of Mobile, Mapbox.
26. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Integrated with S3, Lambda, Polly, Lex
27. Object and Scene Detection
• Search, filter, and
curate image
libraries
• Smart searches for
user generated
content
• Photo, travel, real
estate, vacation
rental applications
Maple
Plant
Villa
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
28. Request
Response
Object and Scene Detection – DetectLabels API
{
"Image": {
"Bytes": blob,
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
},
"MaxLabels": number,
"MinConfidence": number
}
Maple
Plant
Villa
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
{
"Labels": [
{
"Confidence": 95.78783416748047,
"Name": "Villa"
},
{
"Confidence": 68.914794921875,
"Name": "Swimming Pool"
},
{
"Confidence": 59.24593734741211,
"Name": "Backyard"
},
{
"Confidence": 59.24593734741211,
"Name": "Yard"
},
],
"OrientationCorrection": "ROTATE_0"
}
Generate labels for thousands of objects, scenes, and concepts, each with a
confidence score
S3 bucket
29. Facial Analysis
Demographic Data
Facial Landmarks
Sentiment Expressed
• Smart searches for
user generated
content
• Photo, travel, real
estate, vacation
rental applications
• Targeted marketing
• Dynamic,
personalized ads
• Improve online dating
match
recommendations
31. Face Comparison
Measure the likelihood that faces in two images are of the same
person
• Add face verification to
applications and devices
• Extend physical security
controls
• Provide guest access to
VIP-only facilities
• Verify users for online
exams and polls
34. Facial Recognition
Identify people in images by finding the closest match for an input face
image against a collection of stored face vectors
• Add friend tagging to
social and messaging apps
• Assist public safety officers
find missing persons
• Identify employees as they
access sensitive locations
• Identify celebrities in
historical media archives
35. Media Case Study
Identify who is on camera at what time for
each of 8 networks so that recorded video
streams can be indexed and searched
Video frame-sampling facial recognition
solution using Amazon Rekognition:
• Indexed 97,000 people into a face collection in
1 day
• Sample frames every 6 secs and test for image
variance
• Upload images to S3 and call Rekognition to
find best facial match
• Store time stamp and faceID metadata
37. Amazon AI Services
• Leveraging Amazon internal experiences with AI / ML
• Managed API services with embedded AI for maximum
accessibility and simplicity
• Full stack of platforms and engines for specialized deep
learning applications