Harnessing Artificial Intelligence_Alastair Cousins

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alastair Cousins
Solutions Architect, Amazon Web Services
Harnessing Artificial Intelligence
in Your Applications

Intelligent Multimodal Interfaces

What is Amazon Polly?
• A service that converts text into lifelike speech
• Offers 48 lifelike voices across 24 languages
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay, and distribute
generated speech

Amazon Polly: Quality
Natural-sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such
as abbreviations, numerical sequences, homographs etc.
Today in Sydney Australia, it's 26°C
It’s nice to know, we’re going to Nice
Highly intelligible
A measure of how comprehensible speech is.
Peter Piper picked a peck of pickled peppers

Amazon Polly: SSML
Speech Synthesis Markup Language
Is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Alastair. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Alastair</say-as>
</prosody>
</speak>

Amazon Polly: SSML
Speech Synthesis Markup Language
Is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
This is normal voice,
<amazon:effect name="whispered">
and this is me whispering!
</amazon:effect>
</speak>

Polly Voice Synthesis Architecture
Amazon Polly
Amazon API
Gateway
Lambda
function
Amazon
S3
Mobile App
IoT Device
Calling through API Gateway
allows us to implement caching
and use throttling and API
Keys via Usage Plans

Images – Another Untapped Interface

Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organise millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition

Amazon Rekognition - Example Use Case

Detecting Faces in a Crowd
IoT
Camera
Amazon
Rekognition
Lambda
function
Amazon API
Gateway
DetectFaces()
Image
with
Faces
"Emotions": [
{"Confidence": 99.1335220336914,
"Type": "HAPPY" },
{"Confidence": 3.3275485038757324,
"Type": "CALM"},
{"Confidence": 0.31517744064331055,
"Type": "SAD"}
],
"Eyeglasses": {"Confidence": 99.8050537109375,
"Value": false},
"EyesOpen": {Confidence": 99.99979400634766,
"Value": true},

Understanding Bounding Boxes
Turn Ratios into X/Y
co-ordinates:
multiply by the image
width/height
"BoundingBox": {
"Height": 0.3449999988079071,
"Left": 0.09666666388511658,
"Top": 0.27166667580604553,
"Width": 0.23000000417232513
},

Scaling to Many Faces
Amazon
Rekognition
Lambda
function Amazon
ElasticSearch
Amazon
SNS
Lambda
function
Amazon
S3
User’s Face
Image
Fan Out of Lambda Functions via SNS.
1 Notification per Face detected
Metadata from DetectFaces() +
S3 Object Ref to Face Image
Metadata +
Location +
Timestamp
User’s Face
Image
searchFacesByImage()
indexFaces()

Amazon Lex
AWS
Lambda
Polly Amazon
CloudWatch
Monitoring
Text
Speech
Text
Amazon
DynamoDB
AWS IoT
Amazon API
Gateway
Conversational Interfaces
Applications

Smart Assistant - Key Features
Face detection with OpenCV http://opencv.org
Hot word detection to get device’s attention via Snowboy
https://snowboy.kitt.ai/
Silence detection during live speech capture for start/stop using SoX
http://sox.sourceforge.net/
Streaming of audio capture in live-time to reduce latency AWS IoT
NLU provided by Amazon Lex

Streaming audio via AWS IoT
AWS IoT
Audio streamed in
segments in live-time using
SoX and stdout pipe
Amazon
DynamoDB
Segments keyed and written
to DynamoDB as base64
chunks of audio
Amazon API
Gateway
On silence detected
ProcessLexDialog
Amazon
DynamoDB
Amazon
Lex
Lex Intent result payload

Wait for Hot Word
(Snowboy)
Wait for Face to
appear in camera view
Listen for audio
command
START
Smart Assistant

Wait for Face to
appear in camera view
Capture image from
webcam
(fswebcam)
Recognise Face
(Amazon Rekognition)
Resize to improve
process efficiency
(Imagemagick)
Detect face on device
(OpenCV)
Known User State
Replay Audio
Is the face
in the
collection?
YES
NO
Run User Speech
Dialogue Interaction
and NLU
Smart Assistant

Smart Assistant
Process intent
(API Gateway/Lambda)
Listen for speech input
with silence detection
(SoX)
Play audio response &
loop back to listen for
speech input
On Audio Segment
Recorded –MQTT
(AWS IoT)
On Silence – submit to
APIGW base on key
YES
Run User Speech
Dialogue Interaction
and NLU
Is the
interaction
Ready for
Fulfillment
?
NO
Listen for speech input
with silence detection
(SoX)

Wrap Up
• Amazon AI Services are simple
• Developers can add AI to real world applications quickly
• AI opens up new mediums and interfaces
• Deploy at scale and at low cost
• For more information: https://aws.amazon.com/amazon-ai/

Harnessing Artificial Intelligence_Alastair Cousins

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Harnessing Artificial Intelligence_Alastair Cousins

Semelhante a Harnessing Artificial Intelligence_Alastair Cousins (20)

Mais de Helen Rogers

Mais de Helen Rogers (8)

Último

Último (18)

Harnessing Artificial Intelligence_Alastair Cousins