How to be successful in Business
E*
BI
RT
ML
EC2
ECS
Elastic Beanstalk
Redshift
EMR
Athena
Kinesis*
ElasticSearch
ElastiCache Amazon ML
Amazon AI
Spark ML
Deep Learning AMI
The circle of ML
Front-End team
Data Engineering team
Analysts / DS team
DevOps team
Business Problem
Data
ML Model
ML Application
The circle of ML
Front-End team
Data Engineering team
Analysts / DS team
DevOps team
Business Problem
Data
ML Model
ML Application
S3
EMR
Kinesis *
Redshift
Athena
Glue
ECS
Lambda
API-GW
Code *
DMS
EB
P2
Spot
Batch IoT
GreenGrass
FPGA
The circle of ML
Front-End team
Data Engineering team
Analysts / DS team
DevOps team
Business Problem
Data
ML Data
ML Application
Heavy Lifting by AWS
Dive Deep as much as you need
Hardware - Distributed computing, GPU, FPGA, Green Grass
DL - MXNet, TensorFlow, Caffe, Torch, Theano
Platform – Data Science Environment (Spark on EMR, R …)
Simple API Services
Usage&Simplicity
Control
What is Amazon Polly
• A service that converts text into lifelike speech
• Offers 47 lifelike voices across 24 languages
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay and distribute generated
speech
Amazon Polly: Quality
Natural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 54°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile
A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
Amazon Polly: Language Portfolio
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
A-PAC:
• Australian English
• Indian English
• Japanese
EMEA:
• British English
• Danish
• Dutch
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English
Recording Data for TTS
Tons of text
Recording script:
Few weeks of
recordings
Automatic
selection of
texts
Recording script:
• Covers all combinations of diphones
and significant features in a language
an error occurred while searching for your route
because snaps weren't all so obedient anymore,
now we say apple again. and we say apple,
general electric soars today. information on general electric
quick breads, zucchini, holiday, crock pot, cake,
so are you still keeping tabs on your old team,
that weighs more than four tons, disrupts the herring's
swim
…
An apple a day, keeps …
Amazon Polly is cost-effective
• Pay-as-You-go
• $4 for 1M characters
• Free Tier of 5M characters/month - first year
• You can store and reuse generated speech
Why voice matters
• Spoken language crucial for
language learning
• Accurate pronunciation matters
• Faster iteration thanks to TTS
• As good as natural human speech
GoAnimate is a cloud-based, animated video creation
plarform.
Amazon Polly gives
GoAnimate users the ability
to immediately give voice to
the characters they animate
using our platform.
Alvin Hung
CEO, GoAnimate
”
“ • Multi-language communication
• Training or HR professionals who
have to create content in many
languages
• Video preproduction
• Video makers who need to iterate
and fine-tune before the text-to-
speech is eventually replaced by a
professional voiceover
• K–12 education
• Students who make videos and
don’t have access to professional
voices or time for or knowledge of
voiceover
With Polly, GoAnimate gives voice to the characters in their animations
Royal National Institute of Blind People creates and
distributes accessible information in the form of
synthesized content
Amazon Polly delivers
incredibly lifelike voices which
captivate and engage our
readers.
John Worsfold
Solutions Implementation Manager, RNIB
”
“ • RNIB delivers largest library of
audiobooks in the UK for nearly 2
million people with sight loss
• Naturalness of generated speech is
critical to captivate and engage readers
• No restrictions on speech
redistributions enables RNIB to create
and distribute accessible information in
a form of synthesized content
RNIB provides the largest library in the UK for people with sight loss
Amazon Lex - Features
Text and Speech language understanding: Powered by the same technology as
Alexa
Enterprise SaaS Connectors: Connect to enterprise systems
Deployment to chat services
Designed for Builders: Efficient and intuitive tools to build
conversations; scales automatically
Versioning and alias support
AWS Mobile Hub Integration
Authenticate users
Analyze user behavior
Store and share media
Synchronize data
More ….
Track retention
Conversational Bots
LexAWS Mobile SDKs
AWS Mobile Hub
Enterprise Connectors with Mobile Hub
Amazon Lex
Mobile App
Mobile Hub
SaaS Connector
Amazon API
Gateway
AWS
Lambda
1: Understand
user intent
Amazon API
Gateway
AWS
Lambda
3: Translate REST
response into
natural language
Mobile Hub
Custom Connector
2: Invoke a SaaS
application or an existing
business application
Business
Application
Firewall
User Input
Amazon Lex – Use Cases
Informational Bots
Chatbots for everyday consumer requests
Application Bots
Build powerful interfaces to mobile applications
• News updates
• Weather information
• Game scores ….
• Book tickets
• Order food
• Manage bank accounts ….
Enterprise Productivity Bots
Streamline enterprise work activities and improve efficiencies
• Check sales numbers
• Marketing performance
• Inventory status ….
Internet of Things (IoT) Bots
Enable conversational interfaces for device interactions
• Wearables
• Appliances
• Auto ….
Lex Bot Structure
Utterances
Spoken or typed phrases that invoke
your intent
BookHotel
Intents
An Intent performs an action in
response to natural language user
input
Slots
Slots are input data required to fulfill
the intent
Fulfillment
Fulfillment mechanism for your intent
Fulfillment
AWS Lambda Integration Return to Client
User input parsed to derive
intents and slot values. Output
returned to client for further
processing.
Intents and slots passed to
AWS Lambda function for
business logic
implementation.
Customer Testimonials: Capital One
“A highly scalable solution, it also offers potential to speed time to market for a new generation of voice
and text interactions such as our recently launched Capital One skill for Alexa.”
“As a heavy user of AWS, Amazon Lex’s seamless integration with
other AWS services like AWS Lambda and AWS DynamoDB is really
appealing.”
Customer Testimonials: HubSpot
“Through Amazon's Lex, we're adding sophisticated natural language processing capabilities that helps
GrowthBot provide a more intuitive UI for our users. Amazon Lex lets us take advantage of advanced A.I.
and machine learning without having to code the algorithms ourselves.”
“HubSpot's GrowthBot is an all-in-one chatbot which helps marketers and sales people
be more productive by providing access to relevant data and services using a
conversational interface. With GrowthBot, marketers can get help creating content,
researching competitors, and monitoring their analytics.”
Amazon Lex Pricing
Text Speech
Price per 1000 requests $0.75 $4.00
Free Tier*
(requests per month) 10,000 5,000
*Available for the first year upon sign-up to new Amazon Lex customers
Amazon Lex - Technology
Amazon Lex
Automatic Speech
Recognition (ASR)
Natural Language
Understanding (NLU)
Same technology that powers Alexa
Cognito CloudTrail CloudWatch
AWS Services
Action
AWS Lambda
Authentication
& Visibility
Speech
API
Language
API
Fulfillment
End-Users
Developers
Console
SDK
Intents,
Slots,
Prompts,
Utterances
Input:
Speech
or Text
Multi-Platform Clients:
Mobile, IoT, Web,
Chat
API
Response:
Speech (via Polly TTS)
or Text
Alexa or Lex
• Alexa is coming with the eco-system (devices, install
base, 1st party and 3rd party skills)
• Lex is for single tenant chatbot to integrate with own
users
Significantly improve many applications on multiple domains
“deep learning” trend in the past 10 years
image understanding speech recognition natural language
processing
…
Deep Learning
autonomy
TX1 on Flying Drone
TX1 with customized board
Drone
Realtime detection and tracking on TX1
~10 frame/sec with 640x480 resolution
Deploy Everywhere
Beyond
BlindTool by Joseph Paul Cohen, demo on Nexus 4
✦ Fit the core library with all
dependencies into a single C++
source file
✦ Easy to compile on …
Amalgamation
Runs in browser
with Javascript
The first image for
search “dog” at
images.google.com
Outputs “beagle”
with prob = 73%
within 1 sec
New P2 Instance | Up to 16 GPUs
This new instance type incorporates up to 8 NVIDIA Tesla K80 Accelerators,
each running a pair of NVIDIA GK210 GPUs.
Each GPU provides 12 GiB of memory (accessible via 240 GB/second of
memory bandwidth), and 2,496 parallel processing cores.
Available in PDX, IAD and DUB Regions
Instance Name GPU Count vCPU Count Memory
Parallel Processing
Cores
GPU Memory
Network
Performance
p2.xlarge 1 4 61 GiB 2,496 12 GiB High
p2.8xlarge 8 32 488 GiB 19,968 96 GiB 10 Gigabit
p2.16xlarge 16 64 732 GiB 39,936 192 GiB 20 Gigabit
Distributed Experiments
Google Inception v3
Increasing machines
from 1 to 47
2x faster than
TensorFlow if using
more than 10 machines
[from Carlos Gustrin DSS’16 keynote]