Mais conteúdo relacionado Semelhante a Media Processing Workflows at High Velocity and Scale using AI and ML - AWS Online Tech Talks (20) Mais de Amazon Web Services (20) Media Processing Workflows at High Velocity and Scale using AI and ML - AWS Online Tech Talks1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Andy Katz, AWS Product Manager, AWS Step Functions
Tom Nightingale, AWS Solutions Architect
March 28, 2018
Media Processing Workflows at
High Velocity & Scale using
Orchestration & Machine
Learning
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Changes in viewers habits are accelerating
Customers expect new content
options available anytime,
anywhere, on any device, and in the
highest possible quality
• OTT growing at almost 10x the
pace of pay TV. Within 5 years
could make up 1/3 of the market
(from 15% today)
• Increase in choice: 400% more
content choices per person from
2007 to 2017
Source:
ABI Research, Over the Top (OTT) and Multiscreen Video Services 2017
Smart Screen News, Avid CEO: ‘Massive Explosion’ of Content Has Created New Challenges for Media Companies, Jan 2018
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Leading to fundamental changes in the industry…
Immediate access
Content on demand, more options
and only pay for what you watch
Cord cutting and skinny bundles.
Distribution moves to streaming
S/A/T/VoD
Global delivery and immediate
localization of larger volumes of content.
Low latency streaming
Enable OTT, broadcast playout, and video
workflows with AWS Media Services
Personalization
The most relevant content and ads
at the top of the page
Personalized digital ads.
Adaptive content recommendations
Machine Learning for personalized
content recommendations and ad serving
Data lakes
ML platforms and frameworks
MediaTailor for ad serving
More content
Deep libraries of content,
particularly exclusives, for
domestic, niche, and global
audiences
Produce more content appealing to
local and global audiences
Multi-device support
Grow DAM/storage (S3/Glacier), render at
scale (ThinkBox and Spot). Smarter supply
chain and data insights
More production, supply chain automation.
Smarter programming decisions based on
audience data
VIEWER EXPECTATIONS BUSINESS CHANGE OPERATIONAL IMPLICATIONS AWS SOLUTIONS
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Compute Database Networking CDN ML
CustomNative Partner
AWS Core Services
AWS Media & Entertainment Solutions
AWS core services provide a foundation on which you can build native,
partner, & customized solutions
Orchestration
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.5
Native
Spin up and combine multiple AWS native media services directly from the AWS
console
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Rekognition
Kinesis Video Streams
Amazon PollyMediaConvert
Thinkbox Deadline
MediaLive
MediaPackage
MediaStore
MediaTailor
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Leverage AWS’s rich ecosystem of leading hardware, software and service
partners providing a range of cloud-enabled media workflow solutions
6
Partner
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Core Services provide true primitives on which to build modular,
custom solutions.
7
Custom
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Compute Database Networking
CDNOrchestration Machine Learning Media
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS stack for media-streaming workflows
Store once, deliver anywhere
Ingest/Create Store Process Deliver
AWS
Direct Connect
AWS
Import/ Export
AWS
Storage Gateway
AWS
Snowball
S3 Transfer Acceleration
AWS Elemental
MediaLive
Amazon EBS
Amazon S3
Amazon
CloudFront
Route 53
AWS WAF
AWS Elemental
MediaTailor
Amazon VPC
Lambda
Amazon EC2
Amazon
Rekognition
Amazon
Lex
Amazon
Polly
Amazon
Machine Learning
Amazon RDS
Amazon
DynamoDB
Amazon Elastic
Transcoder
Amazon
CloudSearch
Amazon SQS
AWS
Step Functions
Amazon SNSAmazon
Transcribe
Amazon
Comprehend
Amazon
Translate
AWS Elemental
MediaStore
AWS Elemental
MediaPackage
AWS Elemental
MediaConvert
AWS Elemental
MediaLive
Amazon Glacier
Amazon EFS
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“I want try/catch/finally”
“I want to select tasks based on data”
“I want to retry failed tasks”
A
B C
A
?
“I want to sequence tasks”
BA
“I want to run tasks in parallel”
CBA
Is this you?
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Task
Choice
Fail
Parallel
Mountains
People
Snow
AWS Step Functions
Makes it easy to coordinate components of distributed applications
using visual workflows
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Configure workflows in JSON
{
"Comment": "Image Processing workflow",
"StartAt": "ExtractImageMetadata",
"States": {
"ExtractImageMetadata": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-west-2:0...",
"InputPath": "$",
"ResultPath": "$.extractedMetadata",
"Next": "ImageTypeCheck",
"Catch": [ {
"ErrorEquals": [ "ImageIdentifyError"],
"Next": "NotSupportedImageType"
} ],
"Retry": [ {
"ErrorEquals": [ "States.ALL"],
"IntervalSeconds": 1,
"MaxAttempts": 2,
"BackoffRate": 1.5 }, ...
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Media workflow management using AWS Step Functions
Streamline, orchestrate & optimize
• Coordinate end-to-end workflows
• Reduce the time to design, test, deliver & iterate
• Optimize use of resources and talent
• Make decisions in real time
• Gain real-time visibility of workflow progress
• Easily adjust workflows as needs change
• Create new workflows & distribution channels faster, for less
• Handle variations in volume of incoming content
• Increase production capacity
• Proactively address bottlenecks
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Ingestion and processing
• Packaging and origination
• DAM, storage, and archiving
• Metadata tagging
13
Digital Asset
Management &
Supply Chain
Automate broadcast supply chains so you can more
manage your content faster and better.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.13
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A basic problem: file-based transcoding
Re-encoding and converting one file format and bitrate/resolution to another
(or many others)
1080p/3Mbps
720p/2Mbps
480p/1Mbps
360p/900Kbps240p/400Kbps
H.265/HEVC
H.264/AVC
H.263 (v2), VP9
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS
Marketplace
Amazon
Elastic Transcoder
AWS
Elemental
AMI Model
Licensed S/W
Minimal Disruption
Proxies
Fast Integration
UGC & Prosumer
On-Prem & Cloud
Live, VOD, JIT
Professional
Media processing options
DIY BYO
PaaS / SaaS
BYOL
Self Contained
Custom Solutions
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem Statement
Frame.io needed a flexible way to coordinate a combination of
Lambda functions and ECS tasks to manipulate large media files
to transcode to various formats and create thumbnails etc.
A leading workflow management platform to
streamline media review and collaboration
Use of AWS
• Step Functions decides whether to use Lambda or ECS to
run transcodes, depending on duration and file size
Business Benefits
• Improved performance and lower costs
• Code is easily managed and debugged
• Increased code releases 20x
Frame.io
Custom solution for real-time transcoding optimization
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem Statement
Reuters Media needed to transcode ~350 news video clips per
day into 14 formats each– as quickly as possible. But using
FFmpeg meant processing time was just about 100% of the
video length.
A multinational mass media and information firm and
parent company of international news agency Reuters
News
Use of AWS
• Serverless split video processing using Step Functions,
Lambda and S3
Business Benefits
• Process video segments in parallel
• Reduced processing time from ~20 min to ~2 minutes
• The bigger the source video, the more segments, the
bigger the savings
Thomson Reuters (Reuters Media)
Serverless split video transcoding
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thomson Reuters (Reuters Media)
Serverless split video transcoding
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Another way: AWS Elemental MediaConvert
A file-based video processing service that allows anyone, with any
size content library, to easily and reliably transcode
on-demand content for broadcast and multiscreen delivery
• Access to professional grade video features and quality
• No software or hardware infrastructure to manage
• Automatically scales in response to variations in incoming video volume
• Ability to manage capacity and control order in which jobs are processed
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS MediaConvert basics
• Job
• Primary unit of work, specifies input and output
• Output Preset
• Settings to create a single output
• Job Template
• Collection of commonly used job settings
• Useful when processing a collection of inputs to produce a fixed set of outputs
• Queue
• All jobs are submitted to a queue
• Allows user to separate or group jobs for processing
• Jobs within a queue are processed in parallel, and queues are processed in
parallel
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS MediaConvert is good for:
Anyone with video content that needs to convert it for delivery it
to consumer devices
• Companies with content distribution workflows, for premium video or
short-form web / UGC content
• Customers processing video content in the cloud now, or planning to move
workflows to the cloud
• Companies with high volume or varying volumes of source video content
• Any video provider or enterprise wanting to streamline transcoding operations
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Answers
https://aws.amazon.com/answers/
AWS Answers:
Best practices
Prescriptive design patterns
Ready-made solutions
Strategic guidance
Solution Resources:
Implementation Guide
CloudFormation Template
Source Code
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New! Video-on-Demand on AWS Solution
Features:
• Serverless architecture
• 1080p through 270p HLS and DASH
outputs
• 4K, HD and SD H.265 MP4 outputs
• SNS notifications on ingest encoding and
complete
• Workflow details and asset metadata
stored in DynamoDB
• Error handling
• Options for source Archiving to Glacier
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest Publish
S3
Process
MediaConvert
Regional/Account
Custom API Endpoint
Workflow driven by AWS Step Functions
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest Publish
S3
Process
MediaConvert
Regional/Account
Custom API Endpoint
Workflow driven by AWS Step Functions
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MediaConvert encoding
MediaConvert
Regional/Account
API Endpoint
Amazon
CloudWatch
Event
Process Step
Functions
Event Rule Pattern
AWS Lambda Publish Step
Functions
JavaScript AWS-SDK
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MediaConvert encoding
MediaConvert
Regional/Account
API Endpoint
Amazon
CloudWatch
Event
Process Step
Functions
Event Rule Pattern
AWS Lambda Publish Step
Functions
JavaScript AWS-SDK
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MediaConvert encoding
MediaConvert
Regional/Account
API Endpoint
Amazon
CloudWatch
Event
Process Step
Functions
Event Rule Pattern
AWS Lambda Publish Step
Functions
JavaScript AWS-SDK
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Error handling
Amazon SNS
Publish
Amazon
CloudWatch
Amazon
DynamoDB
AWS Lambda
Workflow
functions
AWS Lambda
Error
Handler
AWS Step
Functions
AWS Elemental
MediaConvert
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Video-on-Demand on AWS Solution
AWS Answers
https://aws.amazon.com/
answers/media-
entertainment/video-on-
demand-on-aws/
AWS Labs
https://github.com/awsla
bs/video-on-demand-on-
aws
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
31
31
31
3131
Machine Learning and
Analytics
Analyze customer content and contextual data,
enabling you to gain actionable insights to identify
customer interests, manage infrastructure and
monetize content. Automate content processes to
improve operational efficiency and unlock archive
value.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.31
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest Store Analyze Deliver
PETABYTES OF IMAGES
AND MIXED MEDIA ASSETS
CENTRALIZED STORAGE
& GLOBAL REGISTRY
METADATA ENRICHMENT
THROUGH DEEP LEARNING
ENHANCED VALUE
AND SEARCH EXPERIENCE
Going further: media intelligence
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Digital Asset Management & Supply Chain
33
Publishing & Distribution
• Ad personalization
• Content recommendation
• Filtering and quality control
• Translate services
• Audience engagement
• Demographics and sentiment
analysis
• Anti-piracy
Content Creation &
Post Production
• Pre-processing and
optimization
• Dailies/editorial review
• Application & filesystem
texture and asset search
• B-roll and false take tagging
• Tag on ingest
• Live and VOD feature extraction
• Celebrity detection
• Auto-categorization
• Metadata augmentation
• Close captioning
• Automated captioning
and translation
• Automated IP
protection and security
warnings
Machine learning can be applied across the
media value chain
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Amazon machine learning stack
Platform Services
Application Services
Frameworks & Interfaces
Caffe2 CNTK
Apache
MXNet
PyTorch TensorFlow Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend LexRekognition
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Rekognition
Deep learning-based image and video analysis
Object, Scene &
Activity Recognition
Facial
Recognition
Facial Analysis Person Tracking
Unsafe Content
Detection
Celebrity
Recognition
Text in Images
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest Store Analyze Deliver
PETABYTES OF IMAGES
AND MIXED MEDIA ASSETS
CENTRALIZED STORAGE
& GLOBAL REGISTRY
METADATA ENRICHMENT
THROUGH DEEP LEARNING
ENHANCED VALUE
AND SEARCH EXPERIENCE
Media intelligence pipeline
37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Global asset ingest & registration
• New media may be shot for ingest anywhere
on the planet (and beyond)
• Globally unique asset-ID registry which
creates an ID for media assets
• Service can handle parent-child relationships
for asset versioning
38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key AWS components
S3 Step Functions DynamoDB
API GatewayLambda RekognitionElasticsearch
CloudFront
{
"FaceMatches": [
{"Face": {"BoundingB
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
CompareFaces
DetectFaces
DetectLabels
DetectModerationLabels
GetCelebrityInfo
RecognizeCelebrities
Lambda-Centric AWS Service Stack Rekognition API Endpoints
39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-time
Search
Label
Detection
UUID
Generator
Solution architecture
UUID
API Gateway
Lambda(s)
Rekognition
CloudFront
Browser / API
Client
Image
Processing
Step Functions
Elasticsearch
Client Lookup
Delivery
Ingest
Processing
Service
Frontend
Asset
Metadata
DynamoDB
Metadata
Service
API Gateway
Content
Archive
S3 Image
Storage
40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step Functions design
• Lambda is a natural fit for image processing
with Rekognition
• Caveat: inherently stateless
• Media processing pipelines are multi-stage:
UUID gen, media resizing & content
optimization
• State machine-based Step Functions are an
absolute must to ensure processing at high
velocity and scale
Start CheckUUID FailureState
CheckReko GetUUID
WasReko
CheckSize
NeedResize
Rekognize
SaveES SaveDynDB
Success
End
Resize
HasUUID
41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
"Labels": [ {
"Confidence": 98.82418823242188,
"Name": "Animal"},{
"Confidence": 98.82418823242188,
"Name": "Gorilla"},{
"Confidence": 98.82418823242188,
"Name": "Mammal"},{
"Confidence": 98.82418823242188,
"Name": "Monkey"},{
...
"Labels": [ {
"Confidence": 95.04956817626953,
"Name": "Reptile" },{
"Confidence": 95.04956817626953,
"Name": "Sea Life" },{
"Confidence": 95.04956817626953,
"Name": "Sea Turtle" },{
"Confidence": 95.04956817626953,
"Name": "Tortoise" },{
...
Hawkbill
Sea Turtle
Mountain
Gorilla
Rekognition sample response
42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
User experience
43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Next steps for the solution
Video capability
Metadata transformer for varying output req’s
Rekognition result differential tracking
Integration with existing Web & Mobile apps
44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Media Analytics use cases
• Immediate response for public safety and security
• Providing a searchable library of videos and images
• Sentiment analysis for advertisers, retailers, or social
media analysts
• Customer analytics
• Localizing web content for international users
45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Amazon machine learning stack
Platform Services
Application Services
Frameworks & Interfaces
Caffe2 CNTK
Apache
MXNet
PyTorch TensorFlow Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend LexRekognition Transcribe Comprehend
46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
Automatic conversion of speech into accurate, grammatically correct text
Support for
telephony audio
Timestamp
generation
Intelligent punctuation &
formatting
Recognize multiple
speakers
Custom
vocabulary
Multiple
languages
47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Comprehend
Discover insights and relationships in text
Entities
Key Phrases
Language
Sentiment
Amazon
Comprehend
48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Media Analytics solution
49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Media Analytics solution workflow
50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Media Analytics Step Functions state machine
51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Playout &
Distribution
Filtering & Quality
Control
Visual Effects & Editing
Application & Filesystem
Texture & Asset Search
Analytics
Sentiment Analysis
Other Amazon AI
Services
(Lex, Polly)
DAM & Archive
Auto-categorization
Metadata Augmentation
Digital Supply Chain
Tag on Ingest
Live and VOD Feature
Extraction
Celebrity Detection
Publishing
Value Add
API-based services
OTT
Filtering & Quality
Control
Acquisition
Pre-processing &
optimization
Applications of ML across M&E segments
52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS can help at each step of the video value chain
5252
Build Your Way
• Faster: Accelerate innovation and increase agility by reducing time-to-market
• Smarter: Personalize experiences, streamline processes, and unlock content through machine learning and automation
• More Efficiently: On-demand, pay-as-you-go compute, storage, and video services scaling to demand
Faster, Smarter, More Efficiently
Native Partner Custom Content Creation
& Post Production
DistributionDigital Asset Management
& Supply Chain
Machine Learning
and Analytics
End to End
53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
AWS Media & Entertainment: https://aws.amazon.com/digital-media/
AWS Elemental Video Solutions: https://aws.amazon.com/digital-media/aws-managed-video-services/
AWS Answers Video on Demand Solution: https://aws.amazon.com/answers/media-entertainment/video-on-demand-on-aws/
AWS Step Functions: https://aws.amazon.com/step-functions/
54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Backup
55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequential steps
Start
Upload RAW file
Delete RAW file
End
56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parallel steps
Start
Select image
converter
Load in database
End
RAW to JPEGRAW to TIFF RAW to PNG
Unsupported image
type
57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Branching steps
Start
Process photo
Load in database
End
Resize imageExtract metadata Facial recognition
58. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On-prem transcoding
• Complex to setup & manage
• Scaling requires effort
• Upfront & unpredictable costs
• Lacks flexibility as different
resolutions (e.g., 8K) and form
factors (e.g., AR/VR) emerge
Transcoding challenges
Cloud-based transcoding
• Not suited for broadcast grade video and
quality
• Limited scalability and support
• Complicated pricing and manual on-
boarding issues
• Not easy to integrate with other AWS
Services
59. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concatenate
segments
No
All segments
processed?
Process segment 1
Locate
Keyframes
Source
video store
Split
video
Source
segment bucket
Result
video bucket
Processed
segment bucket
Process segment 2
Source video Video segments Resultant video
State Machine
60. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Playout &
Distribution
Visual Effects & Editing
Analytics
DAM & Archive
Digital Supply Chain
Publishing
OTT
Acquisition
Media segments
61. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Label data storage
• JSON blobs well suited to
unstructured ES search & NoSQL
• Multiple labels can be used to
effectively widen ES search results
• Rekognition’s MinConfidence
threshold removes false positives;
MaxLabels limits returned results
• Client-side filtering can be used to
rank results by confidence score
[
{
"UUID": "<UUID>",
"Bucket": "<bucket>",
"Key": "<key>",
"Labels": [
{
"Labels": [
{ "Name": "turtle", "Confidence": 98.4629 },
{ "Name": "water", "Confidence": 79.2097 },
{ "Name": "sea", "Confidence": 75.0611 },
{ "Name": "clouds", "Confidence": 50.5281 }
]
...
]
DynamoDB Elasticsearch
62. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Amazon AI/ML services for media
Content Indexing / Metadata Generation
Use services such as Amazon Rekognition & Amazon
Transcribe to generate metadata about your content
Store that metadata and making it searchable
Content Retrieval / Action Metadata
Database tells you scene exists in a given file at a
given time
Retrieve it for timely use
Live and File
SOURCES
AWS Elemental
Media Services
MEDIA
PROCESSING
Amazon ML/AI
Services
ML / AI
Amazon
DynamoDB
DATABASE
Live and File
CONTENT
AWS Elemental
Media Services
MEDIA
PROCESSING
63. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Content Indexing / Metadata Generation:
AWS Elemental MediaConvert & Amazon Transcribe
TheChallenge
• An online training
provider has 1000s of
hours of video that
need captions
• Video is in a variety of
formats
TheSolution
• Use AWS Elemental
MediaConvert create
audio only version of
content
• Use Amazon Transcribe
to generate
timestamped
transcription
• Convert Amazon
Transcribe output to
captions file
TheBenefit
• All formats of video
content get captions
added to make them
more accessible
• Option to run Amazon
Transcribe output
through Amazon
Translate to get multi-
language captions
64. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
AWS Elemental MediaConvert
job transcodes source file,
creating audio-only rendition
for Amazon Transcribe
AWS Elemental
MediaConvert also
creates normal
audio/video output
AWS Lambda function
triggered by S3 object-
created event creates a
new Transcribe job
Amazon Transcribe
outputs JSON file of
detected words and
timing
AWS Lambda function converts Amazon
Transcribe JSON into subtitle format
(such as WebVTT, SRT, or TTML) and
delivers to S3 bucket with content
AWS Elemental
MediaConvert
FILE-BASED
PROCESSING
Amazon S3
STORAGE
File
SOURCE
AWS Lambda
SERVERLESS
Amazon
Transcribe
ML / AI
Amazon S3
STORAGE
65. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Elemental MediaConvert
Add audio-only WAV output to the job – start by adding an additional file
output group
66. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Elemental MediaConvert
Configure audio-only Uncompressed WAV output
67. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
AWS Lambda function to create Amazon Transcribe job from WAV file created
by AWS Elemental MediaConvert
import boto3
def lambda_handler(event, context):
s3_object_key = event[‘Records’][0][‘s3’][‘object’][’key’]
transcribe = boto3.client(‘transcribe’)
job_uri = “http://S3_bucket_endpoint/” + s3_object_key
transcribe.start_transcription_job(
TranscriptionJobName=‘Job123’,
Media={‘MediaFileUri’: job_uri},
MediaFormat=‘wav’,
LanguageCode=‘en-US’)
return “Done”
68. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
Transcribe creates JSON file with
complete transcription, and word
by word timing
69. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
Must convert Amazon Transcribe JSON into
usable closed caption / subtitle format such
as SRT
Not a trivial problem, need to determine
sentence boundaries and which words to
combine into the same captions
Example:
70. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Transcribe
Some ideas for ways to tackle this problem:
• Calculate the cadence of the wording and look for larger than average gaps between
words. Use these points as our breaks
• Use a fixed caption duration of 1-2 seconds and “aggregate” all words that fall within
that duration
None of these methods are perfect – analyzing audio alone won’t necessarily
account for scene changes, gaps in dialog, non-dialog sound elements, etc
• But they can get us close…
Example implementation of Amazon Transcribe to SRT conversion:
• https://code.amazon.com/packages/ElementalTechMarketingTranscribeTools/trees/mai
nline
71. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Label data storage
• JSON blobs well suited to
unstructured ES search & NoSQL
• Multiple labels can be used to
effectively widen ES search results
• Rekognition’s MinConfidence
threshold removes false positives;
MaxLabels limits returned results
• Client-side filtering can be used to
rank results by confidence score
[
{
"UUID": "<UUID>",
"Bucket": "<bucket>",
"Key": "<key>",
"Labels": [
{
"Labels": [
{ "Name": "turtle", "Confidence": 98.4629 },
{ "Name": "water", "Confidence": 79.2097 },
{ "Name": "sea", "Confidence": 75.0611 },
{ "Name": "clouds", "Confidence": 50.5281 }
]
...
]
DynamoDB Elasticsearch