AWS re:Invent 2013 Scalable Media Processing in the Cloud

Scalable Media Processing
Phil Cluff, British Broadcasting Corporation
David Sayed, Amazon Web Services
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda
•
•
•
•

Media workflows
Where AWS fits
Cloud media processing approaches
BBC iPlayer in the cloud

Media Workflows

Archive

Featurettes

Networks

Interviews
Media
Workflow

2D Movie
3D Movie
Archive
Materials
Stills

Theatrical
DVD/BD

Media
Workflow
Media
Workflow

Online

MSOs
Mobile Apps

Where AWS Fits Into Media Processing
Analytics and Monetization

Amazon Web Services

Playback

Track

Auth.

Protect

Package

QC

Process

Index

Ingest

Media Asset Management

Media Processing Approaches

3 Phases

Cloud Media Processing Approaches
Phase 1: Lift
processing from
the premises and
shift to the cloud

Lift and Shift

Media Processing
Operation

OS
Media Processing
Operation

OS

Storage

EC2

Storage
Media Processing
Operation

OS

EC2

Storage

The Problem with Lift and Shift
Monolithic Media Processing Operation

OS

EC2

Storage

Ingest

Operation

Postprocessing

Export

Workflow

Media Processing
Operation

Parameters

Cloud Media Processing Approaches:
Phase 2

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Refactor
and optimize to
leverage cloud
resources

Refactor and Optimization Opportunities
“Deconstruct monolithic media processing
operations”
–
–
–
–
–
–

Ingest
Atomic media processing operation
Post-processing
Export
Workflow
Parameters

Refactoring and Optimization Example
EBS

EC2

EBS

EC2

EBS

API Calls

EC2

Source S3
Bucket

SWF

Output S3
Bucket

Cloud Media Processing Approaches

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Refactor
and optimize to
leverage cloud
resources

Phase 3:
Decomposed, mo
dular cloud-native
architecture

Decomposition and Modularization Ideas
for Media Processing
• Decouple *everything* that is not part of atomic
media processing operation
• Use managed services where possible for
workflow, queues, databases, etc.
• Manage
–
–
–
–

Capacity
Redundancy
Latency
Security

in the Cloud
AKA “Video Factory”

Phil Cluff
Principal Software Engineer & Team Lead
BBC Media Services

Sources:
BBC iPlayer Performance Pack August 2013
http://www.bbc.co.uk/blogs/internet/posts/Video-Factory

• The UK’s biggest video & audio on-demand service
– And it’s free!

• Over 7 million requests every day
– ~2% of overall consumption of BBC output

• Over 500 unique hours of content every week
– Available immediately after broadcast, for at least 7 days

• Available on over 1000 devices including
– PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes…
• Both streaming and download (iOS, Android, PC)

• 20 million app downloads to date

What Is Video Factory?
• Complete in-house rebuild of
ingest, transcode, and delivery workflows for
BBC iPlayer
• Scalable, message-driven cloud-based
architecture
• The result of 1 year of development by ~18
engineers

Why Did We Build Video Factory?
• Old system
–
–
–
–

Monolithic
Slow
Couldn’t cope with spikes
Mixed ownership with third party

• Video Factory
– Highly scalable, reliable
– Completely elastic transcode resource
– Complete ownership

Why Use the Cloud?
• Background of 6 channels, spikes up to 24 channels, 6 days a week
• A perfect pattern for an elastic architecture

Off-Air Transcode Requests for 1 week

Video Factory – Architecture
• Entirely message driven
– Amazon Simple Queuing Service (SQS)
• Some Amazon Simple Notification Service (SNS)

– We use lots of classic message patterns

• ~20 small components
– Singular responsibility – “Do one thing, and do it well”
• Share libraries if components do things that are alike
• Control bloat

– Components have contracts of behavior
• Easy to test

Video Factory – Workflow
SDI Broadcast
Video Feed

Amazon Elastic
Transcoder

x 24

Broadcast
Encoder

SMPTE
Timecode

RTP
Chunker

Playout Video

Amazon S3
Mezzanine
Time Addressable
Media Store

Mezzanine Video Capture

Mezzanine

Elemental
Cloud

Live Ingest
Logic

Transcoded Video
Metadata

Playout
Data Feed

Transcode
Abstraction
Layer

DRM

QC
Editorial
Clipping
MAM

Amazon S3
Distribution
Renditions

Detail
• Mezzanine video capture
• Transcode abstraction
• Eventing demonstration

Mezzanine Capture
SDI Broadcast
Video Feed
x 24
3 GB HD/1 GB SD

SMPTE
Timecode

Broadcast Grade Encoder

MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD

RTP
Chunker
MPEG2 Transport Stream (H.264) Chunks

Chunk
Concatenator

Chunk
Uploader
Amazon S3
Mezzanine
Chunks

Control
Messages

Amazon S3
Mezzanine

Concatenating Chunks
• Build file using Amazon S3 multipart requests
– 10 GB Mezzanine file constructed in under 10 seconds

• Amazon S3 multipart APIs are very helpful
– Component only makes REST API calls
• Small instances; still gives very high performance

• Be careful – Amazon S3 isn’t immediately consistent
when dealing with multipart built files
– Mitigated with rollback logic in message-based applications

By Numbers – Mezzanine Capture
• 24 channels
– 6 HD, 18 SD
– 16 TB of Mezzanine data every day per capture

• 200,000 chunks every day
– And Amazon S3 has never lost one
– That’s ~2 (UK) billion RTP packets every day… per capture

• Broadcast grade resiliency
– Several data centers / 2 copies each

Transcode Abstraction
• Abstract away from single supplier
–
–
–

Avoid vendor lock in
Choose suppliers based on performance and quality and broadcaster-friendly feature sets
BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles

• Smart routing & smart bundling
–
–

Save money on non–time critical transcode
Save time & money by bundling together “like” outputs

• Hybrid cloud friendly
–

Route a baseline of transcode to local encoders, and spike to cloud

• Who has the next game changer?

Transcode Abstraction
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
Transcoder
Backend

Amazon Elastic
Transcoder
REST

Elemental
Backend

Elemental
Cloud

Amazon S3
Mezzanine

Amazon S3
Distribution
Renditions

Transcode Abstraction - Future
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
Transcoder
Backend

Amazon Elastic
Transcoder
REST

Elemental
Backend

Elemental
Cloud

Unknown Future
Backend X

?

Amazon S3
Mezzanine

Amazon S3
Distribution
Renditions

Example – A Simple Elastic Transcoder Backend
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

POST

Unmarshal and
Validate Message

Initialize
Transcode

SQS Message Transaction

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP

Example – Add Error Handling
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

Dead Letter
Queue

POST

Unmarshal and
Validate Message

Initialize
Transcode

Bad Message
Queue

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP

Fail
Queue

Example – Add Monitoring Eventing
Amazon Elastic
Transcoder

XML
Transcode
Request

POST

Get Message
from Queue

Unmarshal and
Validate Message

Monitoring
Events

Monitoring
Events

Dead Letter
Queue

Initialize
Transcode
Monitoring
Events

Bad Message
Queue

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP
Monitoring
Events

Fail
Queue

BBC eventing framework
• Key-value pairs pushed into Splunk
– Business-level events, e.g.:
• Message consumed
• Transcode started

– System-level events, e.g.:
• HTTP call returned status 404
• Application’s heap size
• Unhandled exception

• Fixed model for “context” data
– Identifiable workflows, grouping of events; transactions
– Saves us a LOT of time diagnosing failures

Component Development – General Development &
Architecture
•

Java applications
–
–
–

•

Run inside Apache Tomcat on m1.small EC2 instances
Run at least 3 of everything
Autoscale on queue depth

Built on top of the Apache Camel framework
–
–
–

A platform for build message-driven applications
Reliable, well-tested SQS backend
Camel route builders Java DSL

•

Full of messaging patterns

•

Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD)
–

•

Cucumber

Deployed continuously
–

Many times a day, 5 days a week

Error Handling Messaging Patterns
• We use several message patterns
– Bad message queue
– Dead letter queue
– Fail queue

• Key concept
– Never lose a message
– Message is either in-flight, done, or in an error queue somewhere

• All require human intervention for the workflow to
continue
– Not necessarily a bad thing

Message Patterns – Bad Message Queue
The message doesn’t unmarshal to the object it should
OR

We could unmarshal the object, but it doesn’t meet our
validation rules
•
•
•
•

Wrapped in a message wrapper which contains context
Never retried
Very rare in production systems
Implemented as an exception handler on the route builder

Message Patterns – Dead Letter Queue
We tried processing the message a number of times, and
something we weren’t expecting went wrong each time

•
•
•
•

Message is an exact copy of the input message
Retried several times before being put on the DLQ
Can be common, even in production systems
Implemented as a bean in the route builder for SQS

Message Patterns – Fail Queue
Something I knew could go wrong went wrong

•
•
•
•

Wrapped in a message wrapper that contains context
Requires some level of knowledge of the system to be retried
Often evolve from understanding the causes of DLQ’d messages
Implemented as an exception handler on the route builder

Demonstration – Eventing Framework

Questions?

philip.cluff@bbc.co.uk
dsayed@amazon.com

@GeneticGenesis
@dsayed

Please give us your feedback on this
presentation

MED302
As a thank you, we will select prize
winners daily for completed surveys!

AWS re:Invent 2013 Scalable Media Processing in the Cloud

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a AWS re:Invent 2013 Scalable Media Processing in the Cloud

Semelhante a AWS re:Invent 2013 Scalable Media Processing in the Cloud (20)

Mais de David Sayed

Mais de David Sayed (12)

Último

Último (20)

AWS re:Invent 2013 Scalable Media Processing in the Cloud

Notas do Editor