The cloud empowers you to process media at scale in ways that were previously not possible, enabling you to make business decisions that are no longer constrained by infrastructure availability. Hear about best practices to architect scalable, highly available, high-performance workflows for digital media processing. In addition, this session covers AWS and partner solutions for transcoding, content encryption (watermarking and DRM), QC, and other processing topics.
4. Where AWS Fits Into Media Processing
Analytics and Monetization
Amazon Web Services
Playback
Track
Auth.
Protect
Package
QC
Process
Index
Ingest
Media Asset Management
6. Cloud Media Processing Approaches
Phase 1: Lift
processing from
the premises and
shift to the cloud
7. Lift and Shift
Media Processing
Operation
OS
Media Processing
Operation
OS
Storage
EC2
Storage
Media Processing
Operation
OS
EC2
Storage
8. The Problem with Lift and Shift
Monolithic Media Processing Operation
OS
EC2
Storage
Ingest
Operation
Postprocessing
Export
Workflow
Media Processing
Operation
Parameters
9. Cloud Media Processing Approaches:
Phase 2
Phase 1: Lift
processing from
the premises and
shift to the cloud
Phase 2: Refactor
and optimize to
leverage cloud
resources
10. Refactor and Optimization Opportunities
“Deconstruct monolithic media processing
operations”
–
–
–
–
–
–
Ingest
Atomic media processing operation
Post-processing
Export
Workflow
Parameters
12. Cloud Media Processing Approaches
Phase 1: Lift
processing from
the premises and
shift to the cloud
Phase 2: Refactor
and optimize to
leverage cloud
resources
Phase 3:
Decomposed,
modular cloudnative architecture
13. Decomposition and Modularization Ideas
for Media Processing
• Decouple *everything* that is not part of atomic
media processing operation
• Use managed services where possible for workflow,
queues, databases, etc.
• Manage
–
–
–
–
Capacity
Redundancy
Latency
Security
14. in the Cloud
AKA “Video Factory”
Phil Cluff
Principal Software Engineer & Team Lead
BBC Media Services
15. Sources:
BBC iPlayer Performance Pack August 2013
http://www.bbc.co.uk/blogs/internet/posts/Video-Factory
• The UK’s biggest video & audio on-demand service
– And it’s free!
• Over 7 million requests every day
– ~2% of overall consumption of BBC output
• Over 500 unique hours of content every week
– Available immediately after broadcast, for at least 7 days
• Available on over 1000 devices including
– PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes…
• Both streaming and download (iOS, Android, PC)
• 20 million app downloads to date
18. What Is Video Factory?
• Complete in-house rebuild of ingest, transcode,
and delivery workflows for BBC iPlayer
• Scalable, message-driven cloud-based
architecture
• The result of 1 year of development by ~18
engineers
20. Why Did We Build Video Factory?
• Old system
–
–
–
–
Monolithic
Slow
Couldn’t cope with spikes
Mixed ownership with third party
• Video Factory
– Highly scalable, reliable
– Completely elastic transcode resource
– Complete ownership
21. Why Use the Cloud?
• Background of 6 channels, spikes up to 24 channels, 6 days a week
• A perfect pattern for an elastic architecture
Off-Air Transcode Requests for 1 week
22. Video Factory – Architecture
• Entirely message driven
– Amazon Simple Queuing Service (SQS)
• Some Amazon Simple Notification Service (SNS)
– We use lots of classic message patterns
• ~20 small components
– Singular responsibility – “Do one thing, and do it well”
• Share libraries if components do things that are alike
• Control bloat
– Components have contracts of behavior
• Easy to test
23. Video Factory – Workflow
SDI Broadcast
Video Feed
Amazon Elastic
Transcoder
x 24
Broadcast
Encoder
SMPTE
Timecode
RTP
Chunker
Playout Video
Amazon S3
Mezzanine
Time Addressable
Media Store
Mezzanine Video Capture
Mezzanine
Elemental
Cloud
Live Ingest
Logic
Transcoded Video
Metadata
Playout
Data Feed
Transcode
Abstraction
Layer
DRM
QC
Editorial
Clipping
MAM
Amazon S3
Distribution
Renditions
26. Mezzanine Capture
SDI Broadcast
Video Feed
x 24
3 GB HD/1 GB SD
SMPTE
Timecode
Broadcast Grade Encoder
MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD
RTP
Chunker
MPEG2 Transport Stream (H.264) Chunks
Chunk
Concatenator
Chunk
Uploader
Amazon S3
Mezzanine
Chunks
Control
Messages
Amazon S3
Mezzanine
27. Concatenating Chunks
• Build file using Amazon S3 multipart requests
– 10 GB Mezzanine file constructed in under 10 seconds
• Amazon S3 multipart APIs are very helpful
– Component only makes REST API calls
• Small instances; still gives very high performance
• Be careful – Amazon S3 isn’t immediately consistent
when dealing with multipart built files
– Mitigated with rollback logic in message-based applications
28. By Numbers – Mezzanine Capture
• 24 channels
– 6 HD, 18 SD
– 16 TB of Mezzanine data every day per capture
• 200,000 chunks every day
– And Amazon S3 has never lost one
– That’s ~2 (UK) billion RTP packets every day… per capture
• Broadcast grade resiliency
– Several data centers / 2 copies each
30. Transcode Abstraction
• Abstract away from single supplier
–
–
–
Avoid vendor lock in
Choose suppliers based on performance and quality and broadcaster-friendly feature sets
BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles
• Smart routing & smart bundling
–
–
Save money on non–time critical transcode
Save time & money by bundling together “like” outputs
• Hybrid cloud friendly
–
Route a baseline of transcode to local encoders, and spike to cloud
• Who has the next game changer?
33. Example – A Simple Elastic Transcoder Backend
Amazon Elastic
Transcoder
XML
Transcode
Request
Get Message
from Queue
POST
Unmarshal and
Validate Message
Initialize
Transcode
SQS Message Transaction
POST
(Via SNS)
XML
Transcode
Status
Message
Wait for SNS
Callback over HTTP
34. Example – Add Error Handling
Amazon Elastic
Transcoder
XML
Transcode
Request
Get Message
from Queue
Dead Letter
Queue
POST
Unmarshal and
Validate Message
Initialize
Transcode
Bad Message
Queue
SQS Message Transaction
POST
(Via SNS)
XML
Transcode
Status
Message
Wait for SNS
Callback over HTTP
Fail
Queue
35. Example – Add Monitoring Eventing
Amazon Elastic
Transcoder
XML
Transcode
Request
POST
Get Message
from Queue
Unmarshal and
Validate Message
Monitoring
Events
Monitoring
Events
Dead Letter
Queue
Initialize
Transcode
Monitoring
Events
Bad Message
Queue
SQS Message Transaction
POST
(Via SNS)
XML
Transcode
Status
Message
Wait for SNS
Callback over HTTP
Monitoring
Events
Fail
Queue
36. BBC eventing framework
• Key-value pairs pushed into Splunk
– Business-level events, e.g.:
• Message consumed
• Transcode started
– System-level events, e.g.:
• HTTP call returned status 404
• Application’s heap size
• Unhandled exception
• Fixed model for “context” data
– Identifiable workflows, grouping of events; transactions
– Saves us a LOT of time diagnosing failures
37. Component Development – General Development &
Architecture
•
Java applications
–
–
–
•
Run inside Apache Tomcat on m1.small EC2 instances
Run at least 3 of everything
Autoscale on queue depth
Built on top of the Apache Camel framework
–
–
–
A platform for build message-driven applications
Reliable, well-tested SQS backend
Camel route builders Java DSL
•
Full of messaging patterns
•
Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD)
–
•
Cucumber
Deployed continuously
–
Many times a day, 5 days a week
38. Error Handling Messaging Patterns
• We use several message patterns
– Bad message queue
– Dead letter queue
– Fail queue
• Key concept
– Never lose a message
– Message is either in-flight, done, or in an error queue somewhere
• All require human intervention for the workflow to
continue
– Not necessarily a bad thing
39. Message Patterns – Bad Message Queue
The message doesn’t unmarshal to the object it should
OR
We could unmarshal the object, but it doesn’t meet our
validation rules
•
•
•
•
Wrapped in a message wrapper which contains context
Never retried
Very rare in production systems
Implemented as an exception handler on the route builder
40. Message Patterns – Dead Letter Queue
We tried processing the message a number of times, and
something we weren’t expecting went wrong each time
•
•
•
•
Message is an exact copy of the input message
Retried several times before being put on the DLQ
Can be common, even in production systems
Implemented as a bean in the route builder for SQS
41. Message Patterns – Fail Queue
Something I knew could go wrong went wrong
•
•
•
•
Wrapped in a message wrapper that contains context
Requires some level of knowledge of the system to be retried
Often evolve from understanding the causes of DLQ’d messages
Implemented as an exception handler on the route builder