SlideShare uma empresa Scribd logo
1 de 44
Scalable Media Processing
Phil Cluff, British Broadcasting Corporation
David Sayed, Amazon Web Services
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
•
•
•
•

Media workflows
Where AWS fits
Cloud media processing approaches
BBC iPlayer in the cloud
Media Workflows

Archive

Featurettes

Networks

Interviews
Media
Workflow

2D Movie
3D Movie
Archive
Materials
Stills

Theatrical
DVD/BD

Media
Workflow
Media
Workflow

Online

MSOs
Mobile Apps
Where AWS Fits Into Media Processing
Analytics and Monetization

Amazon Web Services

Playback

Track

Auth.

Protect

Package

QC

Process

Index

Ingest

Media Asset Management
Media Processing Approaches

3 Phases
Cloud Media Processing Approaches
Phase 1: Lift
processing from
the premises and
shift to the cloud
Lift and Shift

Media Processing
Operation

OS
Media Processing
Operation

OS

Storage

EC2

Storage
Media Processing
Operation

OS

EC2

Storage
The Problem with Lift and Shift
Monolithic Media Processing Operation

OS

EC2

Storage

Ingest

Operation

Postprocessing

Export

Workflow

Media Processing
Operation

Parameters
Cloud Media Processing Approaches:
Phase 2

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Refactor
and optimize to
leverage cloud
resources
Refactor and Optimization Opportunities
“Deconstruct monolithic media processing
operations”
–
–
–
–
–
–

Ingest
Atomic media processing operation
Post-processing
Export
Workflow
Parameters
Refactoring and Optimization Example
EBS

EC2

EBS

EC2

EBS

API Calls

EC2

Source S3
Bucket

SWF

Output S3
Bucket
Cloud Media Processing Approaches

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Refactor
and optimize to
leverage cloud
resources

Phase 3:
Decomposed, mo
dular cloud-native
architecture
Decomposition and Modularization Ideas
for Media Processing
• Decouple *everything* that is not part of atomic
media processing operation
• Use managed services where possible for
workflow, queues, databases, etc.
• Manage
–
–
–
–

Capacity
Redundancy
Latency
Security
in the Cloud
AKA “Video Factory”

Phil Cluff
Principal Software Engineer & Team Lead
BBC Media Services
Sources:
BBC iPlayer Performance Pack August 2013
http://www.bbc.co.uk/blogs/internet/posts/Video-Factory

• The UK’s biggest video & audio on-demand service
– And it’s free!

• Over 7 million requests every day
– ~2% of overall consumption of BBC output

• Over 500 unique hours of content every week
– Available immediately after broadcast, for at least 7 days

• Available on over 1000 devices including
– PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes…
• Both streaming and download (iOS, Android, PC)

• 20 million app downloads to date
Video
“Where Next?”
What Is Video Factory?
• Complete in-house rebuild of
ingest, transcode, and delivery workflows for
BBC iPlayer
• Scalable, message-driven cloud-based
architecture
• The result of 1 year of development by ~18
engineers
And here they are!
Why Did We Build Video Factory?
• Old system
–
–
–
–

Monolithic
Slow
Couldn’t cope with spikes
Mixed ownership with third party

• Video Factory
– Highly scalable, reliable
– Completely elastic transcode resource
– Complete ownership
Why Use the Cloud?
• Background of 6 channels, spikes up to 24 channels, 6 days a week
• A perfect pattern for an elastic architecture

Off-Air Transcode Requests for 1 week
Video Factory – Architecture
• Entirely message driven
– Amazon Simple Queuing Service (SQS)
• Some Amazon Simple Notification Service (SNS)

– We use lots of classic message patterns

• ~20 small components
– Singular responsibility – “Do one thing, and do it well”
• Share libraries if components do things that are alike
• Control bloat

– Components have contracts of behavior
• Easy to test
Video Factory – Workflow
SDI Broadcast
Video Feed

Amazon Elastic
Transcoder

x 24

Broadcast
Encoder

SMPTE
Timecode

RTP
Chunker

Playout Video

Amazon S3
Mezzanine
Time Addressable
Media Store

Mezzanine Video Capture

Mezzanine

Elemental
Cloud

Live Ingest
Logic

Transcoded Video
Metadata

Playout
Data Feed

Transcode
Abstraction
Layer

DRM

QC
Editorial
Clipping
MAM

Amazon S3
Distribution
Renditions
Detail
• Mezzanine video capture
• Transcode abstraction
• Eventing demonstration
Mezzanine Video Capture
Mezzanine Capture
SDI Broadcast
Video Feed
x 24
3 GB HD/1 GB SD

SMPTE
Timecode

Broadcast Grade Encoder

MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD

RTP
Chunker
MPEG2 Transport Stream (H.264) Chunks

Chunk
Concatenator

Chunk
Uploader
Amazon S3
Mezzanine
Chunks

Control
Messages

Amazon S3
Mezzanine
Concatenating Chunks
• Build file using Amazon S3 multipart requests
– 10 GB Mezzanine file constructed in under 10 seconds

• Amazon S3 multipart APIs are very helpful
– Component only makes REST API calls
• Small instances; still gives very high performance

• Be careful – Amazon S3 isn’t immediately consistent
when dealing with multipart built files
– Mitigated with rollback logic in message-based applications
By Numbers – Mezzanine Capture
• 24 channels
– 6 HD, 18 SD
– 16 TB of Mezzanine data every day per capture

• 200,000 chunks every day
– And Amazon S3 has never lost one
– That’s ~2 (UK) billion RTP packets every day… per capture

• Broadcast grade resiliency
– Several data centers / 2 copies each
Transcode Abstraction
Transcode Abstraction
• Abstract away from single supplier
–
–
–

Avoid vendor lock in
Choose suppliers based on performance and quality and broadcaster-friendly feature sets
BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles

• Smart routing & smart bundling
–
–

Save money on non–time critical transcode
Save time & money by bundling together “like” outputs

• Hybrid cloud friendly
–

Route a baseline of transcode to local encoders, and spike to cloud

• Who has the next game changer?
Transcode Abstraction
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
Transcoder
Backend

Amazon Elastic
Transcoder
REST

Elemental
Backend

Elemental
Cloud

Amazon S3
Mezzanine

Amazon S3
Distribution
Renditions
Transcode Abstraction - Future
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
Transcoder
Backend

Amazon Elastic
Transcoder
REST

Elemental
Backend

Elemental
Cloud

Unknown Future
Backend X

?

Amazon S3
Mezzanine

Amazon S3
Distribution
Renditions
Example – A Simple Elastic Transcoder Backend
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

POST

Unmarshal and
Validate Message

Initialize
Transcode

SQS Message Transaction

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP
Example – Add Error Handling
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

Dead Letter
Queue

POST

Unmarshal and
Validate Message

Initialize
Transcode

Bad Message
Queue
SQS Message Transaction

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP

Fail
Queue
Example – Add Monitoring Eventing
Amazon Elastic
Transcoder

XML
Transcode
Request

POST

Get Message
from Queue

Unmarshal and
Validate Message

Monitoring
Events

Monitoring
Events

Dead Letter
Queue

Initialize
Transcode
Monitoring
Events

Bad Message
Queue
SQS Message Transaction

POST
(Via SNS)

XML
Transcode
Status
Message

Wait for SNS
Callback over HTTP
Monitoring
Events

Fail
Queue
BBC eventing framework
• Key-value pairs pushed into Splunk
– Business-level events, e.g.:
• Message consumed
• Transcode started

– System-level events, e.g.:
• HTTP call returned status 404
• Application’s heap size
• Unhandled exception

• Fixed model for “context” data
– Identifiable workflows, grouping of events; transactions
– Saves us a LOT of time diagnosing failures
Component Development – General Development &
Architecture
•

Java applications
–
–
–

•

Run inside Apache Tomcat on m1.small EC2 instances
Run at least 3 of everything
Autoscale on queue depth

Built on top of the Apache Camel framework
–
–
–

A platform for build message-driven applications
Reliable, well-tested SQS backend
Camel route builders Java DSL

•

Full of messaging patterns

•

Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD)
–

•

Cucumber

Deployed continuously
–

Many times a day, 5 days a week
Error Handling Messaging Patterns
• We use several message patterns
– Bad message queue
– Dead letter queue
– Fail queue

• Key concept
– Never lose a message
– Message is either in-flight, done, or in an error queue somewhere

• All require human intervention for the workflow to
continue
– Not necessarily a bad thing
Message Patterns – Bad Message Queue
The message doesn’t unmarshal to the object it should
OR

We could unmarshal the object, but it doesn’t meet our
validation rules
•
•
•
•

Wrapped in a message wrapper which contains context
Never retried
Very rare in production systems
Implemented as an exception handler on the route builder
Message Patterns – Dead Letter Queue
We tried processing the message a number of times, and
something we weren’t expecting went wrong each time

•
•
•
•

Message is an exact copy of the input message
Retried several times before being put on the DLQ
Can be common, even in production systems
Implemented as a bean in the route builder for SQS
Message Patterns – Fail Queue
Something I knew could go wrong went wrong

•
•
•
•

Wrapped in a message wrapper that contains context
Requires some level of knowledge of the system to be retried
Often evolve from understanding the causes of DLQ’d messages
Implemented as an exception handler on the route builder
Demonstration – Eventing Framework
Questions?

philip.cluff@bbc.co.uk
dsayed@amazon.com

@GeneticGenesis
@dsayed
Please give us your feedback on this
presentation

MED302
As a thank you, we will select prize
winners daily for completed surveys!

Mais conteúdo relacionado

Mais procurados

eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
Tony Ng
 

Mais procurados (20)

Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
AWS Cloud Technology And Future of Faster Modern Architecture
AWS Cloud Technology And Future of Faster Modern ArchitectureAWS Cloud Technology And Future of Faster Modern Architecture
AWS Cloud Technology And Future of Faster Modern Architecture
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
(ENT209) Netflix Cloud Migration, DevOps and Distributed Systems | AWS re:Inv...
 
Microservices development for DevOps
Microservices development for DevOpsMicroservices development for DevOps
Microservices development for DevOps
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Kafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedKafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presented
 
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
 
Capacity Planning with Free Tools
Capacity Planning with Free ToolsCapacity Planning with Free Tools
Capacity Planning with Free Tools
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 
The Lean Cloud for Startups with AWS - Architectural Best Practices & Automat...
The Lean Cloud for Startups with AWS - Architectural Best Practices & Automat...The Lean Cloud for Startups with AWS - Architectural Best Practices & Automat...
The Lean Cloud for Startups with AWS - Architectural Best Practices & Automat...
 

Semelhante a AWS re:Invent 2013 Scalable Media Processing in the Cloud

Semelhante a AWS re:Invent 2013 Scalable Media Processing in the Cloud (20)

Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to Containers
 
[AWS에서의 미디어 및 엔터테인먼트] 클라우드에서의 브로드캐스팅 서비스
[AWS에서의 미디어 및 엔터테인먼트] 클라우드에서의 브로드캐스팅 서비스[AWS에서의 미디어 및 엔터테인먼트] 클라우드에서의 브로드캐스팅 서비스
[AWS에서의 미디어 및 엔터테인먼트] 클라우드에서의 브로드캐스팅 서비스
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
AWS re:Invent 2016: Discovery Channel's Broadcast Workflows and Channel Origi...
AWS re:Invent 2016: Discovery Channel's Broadcast Workflows and Channel Origi...AWS re:Invent 2016: Discovery Channel's Broadcast Workflows and Channel Origi...
AWS re:Invent 2016: Discovery Channel's Broadcast Workflows and Channel Origi...
 
AWS Summit 2013 | Auckland - Scalable Media Processing on the Cloud
AWS Summit 2013 | Auckland - Scalable Media Processing on the CloudAWS Summit 2013 | Auckland - Scalable Media Processing on the Cloud
AWS Summit 2013 | Auckland - Scalable Media Processing on the Cloud
 
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Broadcast Playout on AWS
Broadcast Playout on AWSBroadcast Playout on AWS
Broadcast Playout on AWS
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the Time
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
 
Scalable Media Workflows in the Cloud
Scalable Media Workflows in the CloudScalable Media Workflows in the Cloud
Scalable Media Workflows in the Cloud
 
Amazon CloudFront Complete with Blazeclan's Media Solution Stack
Amazon CloudFront Complete with Blazeclan's Media Solution StackAmazon CloudFront Complete with Blazeclan's Media Solution Stack
Amazon CloudFront Complete with Blazeclan's Media Solution Stack
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
 
AWS Meetup Fort Lauderdale Re:invent Recap
AWS Meetup Fort Lauderdale Re:invent RecapAWS Meetup Fort Lauderdale Re:invent Recap
AWS Meetup Fort Lauderdale Re:invent Recap
 
re:Invent Recap-AWSMeetup
re:Invent Recap-AWSMeetupre:Invent Recap-AWSMeetup
re:Invent Recap-AWSMeetup
 
Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million Users
 

Mais de David Sayed

IBC2010 Microsoft Media Platform Booth Demos
IBC2010 Microsoft Media Platform Booth DemosIBC2010 Microsoft Media Platform Booth Demos
IBC2010 Microsoft Media Platform Booth Demos
David Sayed
 

Mais de David Sayed (12)

Scaling Live OTT with DASH
Scaling Live OTT with DASHScaling Live OTT with DASH
Scaling Live OTT with DASH
 
HDR Formats and Trends
HDR Formats and TrendsHDR Formats and Trends
HDR Formats and Trends
 
Seattle Video Tech Meetup August 2019: Optimal Multi-codec Streaming
Seattle Video Tech Meetup August 2019: Optimal Multi-codec StreamingSeattle Video Tech Meetup August 2019: Optimal Multi-codec Streaming
Seattle Video Tech Meetup August 2019: Optimal Multi-codec Streaming
 
Seattle Video Tech: The Future of SSAI on OTT Devices
Seattle Video Tech: The Future of SSAI on OTT DevicesSeattle Video Tech: The Future of SSAI on OTT Devices
Seattle Video Tech: The Future of SSAI on OTT Devices
 
On ABR Streaming and CDN Performance
On ABR Streaming and CDN PerformanceOn ABR Streaming and CDN Performance
On ABR Streaming and CDN Performance
 
NGBP ATSC 3.0 Overview
NGBP ATSC 3.0 OverviewNGBP ATSC 3.0 Overview
NGBP ATSC 3.0 Overview
 
November 2018 Seattle Video Tech Meetup: SCTE-35 In-Band Event Signalling for...
November 2018 Seattle Video Tech Meetup: SCTE-35 In-Band Event Signalling for...November 2018 Seattle Video Tech Meetup: SCTE-35 In-Band Event Signalling for...
November 2018 Seattle Video Tech Meetup: SCTE-35 In-Band Event Signalling for...
 
AWS 2013 LA Media Event: Scalable Media Processing
AWS 2013 LA Media Event: Scalable Media ProcessingAWS 2013 LA Media Event: Scalable Media Processing
AWS 2013 LA Media Event: Scalable Media Processing
 
What's New with Amazon Elastic Transcoder November 2013
What's New with Amazon Elastic Transcoder November 2013What's New with Amazon Elastic Transcoder November 2013
What's New with Amazon Elastic Transcoder November 2013
 
Internet Strategies Forum 2010 - Microsoft Media Platform
Internet Strategies Forum 2010 - Microsoft Media PlatformInternet Strategies Forum 2010 - Microsoft Media Platform
Internet Strategies Forum 2010 - Microsoft Media Platform
 
IBC2010 Microsoft Media Platform Booth Demos
IBC2010 Microsoft Media Platform Booth DemosIBC2010 Microsoft Media Platform Booth Demos
IBC2010 Microsoft Media Platform Booth Demos
 
DSLR Shoot
DSLR ShootDSLR Shoot
DSLR Shoot
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

AWS re:Invent 2013 Scalable Media Processing in the Cloud

  • 1. Scalable Media Processing Phil Cluff, British Broadcasting Corporation David Sayed, Amazon Web Services November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Agenda • • • • Media workflows Where AWS fits Cloud media processing approaches BBC iPlayer in the cloud
  • 3. Media Workflows Archive Featurettes Networks Interviews Media Workflow 2D Movie 3D Movie Archive Materials Stills Theatrical DVD/BD Media Workflow Media Workflow Online MSOs Mobile Apps
  • 4. Where AWS Fits Into Media Processing Analytics and Monetization Amazon Web Services Playback Track Auth. Protect Package QC Process Index Ingest Media Asset Management
  • 6. Cloud Media Processing Approaches Phase 1: Lift processing from the premises and shift to the cloud
  • 7. Lift and Shift Media Processing Operation OS Media Processing Operation OS Storage EC2 Storage Media Processing Operation OS EC2 Storage
  • 8. The Problem with Lift and Shift Monolithic Media Processing Operation OS EC2 Storage Ingest Operation Postprocessing Export Workflow Media Processing Operation Parameters
  • 9. Cloud Media Processing Approaches: Phase 2 Phase 1: Lift processing from the premises and shift to the cloud Phase 2: Refactor and optimize to leverage cloud resources
  • 10. Refactor and Optimization Opportunities “Deconstruct monolithic media processing operations” – – – – – – Ingest Atomic media processing operation Post-processing Export Workflow Parameters
  • 11. Refactoring and Optimization Example EBS EC2 EBS EC2 EBS API Calls EC2 Source S3 Bucket SWF Output S3 Bucket
  • 12. Cloud Media Processing Approaches Phase 1: Lift processing from the premises and shift to the cloud Phase 2: Refactor and optimize to leverage cloud resources Phase 3: Decomposed, mo dular cloud-native architecture
  • 13. Decomposition and Modularization Ideas for Media Processing • Decouple *everything* that is not part of atomic media processing operation • Use managed services where possible for workflow, queues, databases, etc. • Manage – – – – Capacity Redundancy Latency Security
  • 14. in the Cloud AKA “Video Factory” Phil Cluff Principal Software Engineer & Team Lead BBC Media Services
  • 15. Sources: BBC iPlayer Performance Pack August 2013 http://www.bbc.co.uk/blogs/internet/posts/Video-Factory • The UK’s biggest video & audio on-demand service – And it’s free! • Over 7 million requests every day – ~2% of overall consumption of BBC output • Over 500 unique hours of content every week – Available immediately after broadcast, for at least 7 days • Available on over 1000 devices including – PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes… • Both streaming and download (iOS, Android, PC) • 20 million app downloads to date
  • 17. What Is Video Factory? • Complete in-house rebuild of ingest, transcode, and delivery workflows for BBC iPlayer • Scalable, message-driven cloud-based architecture • The result of 1 year of development by ~18 engineers
  • 19. Why Did We Build Video Factory? • Old system – – – – Monolithic Slow Couldn’t cope with spikes Mixed ownership with third party • Video Factory – Highly scalable, reliable – Completely elastic transcode resource – Complete ownership
  • 20. Why Use the Cloud? • Background of 6 channels, spikes up to 24 channels, 6 days a week • A perfect pattern for an elastic architecture Off-Air Transcode Requests for 1 week
  • 21. Video Factory – Architecture • Entirely message driven – Amazon Simple Queuing Service (SQS) • Some Amazon Simple Notification Service (SNS) – We use lots of classic message patterns • ~20 small components – Singular responsibility – “Do one thing, and do it well” • Share libraries if components do things that are alike • Control bloat – Components have contracts of behavior • Easy to test
  • 22. Video Factory – Workflow SDI Broadcast Video Feed Amazon Elastic Transcoder x 24 Broadcast Encoder SMPTE Timecode RTP Chunker Playout Video Amazon S3 Mezzanine Time Addressable Media Store Mezzanine Video Capture Mezzanine Elemental Cloud Live Ingest Logic Transcoded Video Metadata Playout Data Feed Transcode Abstraction Layer DRM QC Editorial Clipping MAM Amazon S3 Distribution Renditions
  • 23. Detail • Mezzanine video capture • Transcode abstraction • Eventing demonstration
  • 25. Mezzanine Capture SDI Broadcast Video Feed x 24 3 GB HD/1 GB SD SMPTE Timecode Broadcast Grade Encoder MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD RTP Chunker MPEG2 Transport Stream (H.264) Chunks Chunk Concatenator Chunk Uploader Amazon S3 Mezzanine Chunks Control Messages Amazon S3 Mezzanine
  • 26. Concatenating Chunks • Build file using Amazon S3 multipart requests – 10 GB Mezzanine file constructed in under 10 seconds • Amazon S3 multipart APIs are very helpful – Component only makes REST API calls • Small instances; still gives very high performance • Be careful – Amazon S3 isn’t immediately consistent when dealing with multipart built files – Mitigated with rollback logic in message-based applications
  • 27. By Numbers – Mezzanine Capture • 24 channels – 6 HD, 18 SD – 16 TB of Mezzanine data every day per capture • 200,000 chunks every day – And Amazon S3 has never lost one – That’s ~2 (UK) billion RTP packets every day… per capture • Broadcast grade resiliency – Several data centers / 2 copies each
  • 29. Transcode Abstraction • Abstract away from single supplier – – – Avoid vendor lock in Choose suppliers based on performance and quality and broadcaster-friendly feature sets BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles • Smart routing & smart bundling – – Save money on non–time critical transcode Save time & money by bundling together “like” outputs • Hybrid cloud friendly – Route a baseline of transcode to local encoders, and spike to cloud • Who has the next game changer?
  • 30. Transcode Abstraction Subtitle Extraction Backend Transcode Request SQS Transcode Router SQS Amazon Elastic Transcoder Backend Amazon Elastic Transcoder REST Elemental Backend Elemental Cloud Amazon S3 Mezzanine Amazon S3 Distribution Renditions
  • 31. Transcode Abstraction - Future Subtitle Extraction Backend Transcode Request SQS Transcode Router SQS Amazon Elastic Transcoder Backend Amazon Elastic Transcoder REST Elemental Backend Elemental Cloud Unknown Future Backend X ? Amazon S3 Mezzanine Amazon S3 Distribution Renditions
  • 32. Example – A Simple Elastic Transcoder Backend Amazon Elastic Transcoder XML Transcode Request Get Message from Queue POST Unmarshal and Validate Message Initialize Transcode SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP
  • 33. Example – Add Error Handling Amazon Elastic Transcoder XML Transcode Request Get Message from Queue Dead Letter Queue POST Unmarshal and Validate Message Initialize Transcode Bad Message Queue SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP Fail Queue
  • 34. Example – Add Monitoring Eventing Amazon Elastic Transcoder XML Transcode Request POST Get Message from Queue Unmarshal and Validate Message Monitoring Events Monitoring Events Dead Letter Queue Initialize Transcode Monitoring Events Bad Message Queue SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP Monitoring Events Fail Queue
  • 35. BBC eventing framework • Key-value pairs pushed into Splunk – Business-level events, e.g.: • Message consumed • Transcode started – System-level events, e.g.: • HTTP call returned status 404 • Application’s heap size • Unhandled exception • Fixed model for “context” data – Identifiable workflows, grouping of events; transactions – Saves us a LOT of time diagnosing failures
  • 36. Component Development – General Development & Architecture • Java applications – – – • Run inside Apache Tomcat on m1.small EC2 instances Run at least 3 of everything Autoscale on queue depth Built on top of the Apache Camel framework – – – A platform for build message-driven applications Reliable, well-tested SQS backend Camel route builders Java DSL • Full of messaging patterns • Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD) – • Cucumber Deployed continuously – Many times a day, 5 days a week
  • 37. Error Handling Messaging Patterns • We use several message patterns – Bad message queue – Dead letter queue – Fail queue • Key concept – Never lose a message – Message is either in-flight, done, or in an error queue somewhere • All require human intervention for the workflow to continue – Not necessarily a bad thing
  • 38. Message Patterns – Bad Message Queue The message doesn’t unmarshal to the object it should OR We could unmarshal the object, but it doesn’t meet our validation rules • • • • Wrapped in a message wrapper which contains context Never retried Very rare in production systems Implemented as an exception handler on the route builder
  • 39. Message Patterns – Dead Letter Queue We tried processing the message a number of times, and something we weren’t expecting went wrong each time • • • • Message is an exact copy of the input message Retried several times before being put on the DLQ Can be common, even in production systems Implemented as a bean in the route builder for SQS
  • 40. Message Patterns – Fail Queue Something I knew could go wrong went wrong • • • • Wrapped in a message wrapper that contains context Requires some level of knowledge of the system to be retried Often evolve from understanding the causes of DLQ’d messages Implemented as an exception handler on the route builder
  • 42.
  • 44. Please give us your feedback on this presentation MED302 As a thank you, we will select prize winners daily for completed surveys!

Notas do Editor

  1. Media here refers to video and audio content. Maybe you’re a media and entertainment company or build apps and websites that work with user generated content.
  2. Want to get a feel for the audience.Raise your hand if you do media processing in the cloud today.Raise your hand if you’re a developer.OK for those of you who are developers, have a nap and Phil will wake you up with a video in a few minutes.
  3. Start by talking about media workflows. Main point is there are many workflows.Use media workflows to go from what’s on the the left to what’s on the right.Steps themselves are generally pretty straightforward.Industry trends that are making workflows more complex:More content: at the pro end look at all the content on the left. On the consumer end, everyone is carrying around a 1080p camcorder. And the more content there is, the greater the opportunity to monetize it.Bigger content: the industry is moving to some combination of more pixels, faster pixels and better pixels. More pixels: 4K and beyond (4x pixels compared to 1080p). Faster pixels: higher frame rates. 48fps is 2x current cinema frame rate. Better pixels: higher dynamic range and brighter pixels, increase bit-depth.More processing: the amount of processing going up not down. At the high end, whether it is a commercial, a TV show or a movie, most shows contain visual effects. Even in corporate video, color correction is becoming a standard part of the workflow. And at the consumer level, all those Instagram like filters require processing.More output formats: not just renditions based on devices but also versions. ? One senior industry figure recently told me that a piece of finished content will have been converted 1000 times!So all of these trends have an impact on workflows especially when you factor in constrained budgets and timeframes.
  4. To give you context for what follows in Phil’s session, I thought I’d cover where AWS fits and then some approaches we’ve seen for doing media processing at scale in the cloud.As you know, AWS provides infrastructure services: compute, networking, database, storage and delivery and so on. We also provide application services and deployment and management services. Using these services, as your “software defined datacenter”, you can build media processing workflows.Typical operations in a media workflow would run on top of the AWS services. These operations could be provided by software that you’ve developed or they might be from another vendor like Aspera for ingest or Tektronix for video QC.On top of all that you’d have media applications – perhaps an Online Video Platform, a production management application, a digital dailies system or visual effects.So that’s where AWS fits. Now let’s look at some approaches for doing media processing on AWS.
  5. A useful way to think about any kind of processing in the cloud is that there are 3 phases or approaches.
  6. The first phase is simply taking what you do today and deploying it on AWS. This is the way a lot of people get started.
  7. You take your on-premise deployment on the left and run it on EC2. Your media processing operation runs on an operating system and storage, both of which are provided by EC2. You can spin up multiple instances of these and that’s a way to give you scale and/or redundancy.But let’s look closer at this “lift and shift” approach.
  8. Let’s break open that media processing operation black box and see what’s inside. What we find are discrete operations only one of which is the actual media processing operation – for example transcoding or scaling or feature extraction.So is there perhaps an opportunity to break apart the black box and derive some benefit?
  9. That brings us to phase 2, which is about refactoring – or breaking things apart and putting them back together again in a different way – and optimizing your media processing operation. By doing this you might find ways to better use some of the features of AWS because we give you a lot of fantastic services for doing things like automatically scaling or distributing jobs or storing objects.
  10. The cornerstone of phase 2 is to break apart monolithic operations. In our black box, we had these operations. Do they all need to happen inside one logical unit? Probably not. Are there benefits to breaking them apart? Absolutely. Why have each EC2 instance do its own ingest? Why have workflow that is an island?
  11. So here’s a refactored example. What hasn’t changed is that we have our media processing operation – but only the operation itself – taking place on EC2 instances. But now we’ve using S3 to store the input content and the output content. Maybe we’ve used Aspera or some other ingest technology to get the content there. Then we’re using Simple Workflow to manage the workflow operations across the various EC2 instances and we’re using APIs to have each element talk to the other. This lets us use the scale of S3 and SWF so that you don’t need to worry about it. Also instead of having a handful of EC2 instances running the monolithic application, we can have a fleet of instances running the essential media processing operation – decoupled from the rest of the workflow – and the external workflow engine will send the media processing job to the appropriate instance. So if an instance has a problem, the job won’t go there giving you better resiliency. If an instance dies, another one can spin up automatically giving you redundancy.
  12. The third phase builds on the second phase and decomposes your architecture still further. You’re now at the point where you are primarily writing or wrappering very atomic pieces of code that perform specific operations and leverage the AWS infrastructure for everything else.
  13. Some ways to do this are to decouple everything: you want to understand which parts of the architecture need to know about the implementation details of another part. Chances are that they do not. You also want to make sure that if an operation fails somewhere, the job itself does not get lost and this is where workflow management and queues come in. You also want to design your components so that when you instantiate them, they figure out what they are supposed to do. For example, you might have a media processing worker that starts up an queries what kind of instance type it is running on so that it knows how much work it can do or if there are additional capabilities that it can advertise to the rest of the system.This is a good time to think about how you are managing the attributes that you really care about in your system.For capacity, where are the bottlenecks, what can you do when you need to overcome them?For redundancy, how do you make sure that each of your components are redundant?Is latency a concern? For many media processing operations, it probably is. So how can you manage that, reduce it an make it predictable?Are you architecting security into every component and layer of your system?So that concludes my brief overview of approaches to running media processing workloads on AWS.Now I’d like to welcomePhilCluff, the team lead for taking the BBC iPlayer video service into the cloud. He’s going to show you how they moved their broadcast playout to VOD system into AWS to give them scalability, reliability and elasticity.
  14. Introduction:Phil CluffPrincipal Software Engineer & Team Lead @ BBC Media ServicesBeen with BBC for 3 ½ years, focused in Transcode architectures, Message Orientated Middleware & Reliable, Distributed systems in the cloud!I’m going to talk to you about BBC iPlayer and our journey into the Cloud.
  15. Hopefully you’ve all heard of the BBC, but you may not have all heard of iPlayer.So What is BBC iPlayer?UK online population is about 40m which is the size of the state of California.
  16. Now we’ll watch a short video produced by the BBC Director General, Tony Hall, which shows you where iPlayer has come from, and where we see It going in the future.
  17. As I said, I’m here to talk to you about Video Factory.So what Video Factory?Read Slide plus:“We actually started building Video Factory 1 year ago this week – I was putting together the final designs for our transcode architecture before I flew out to re:Invent this time, last year”
  18. As I said, I’m here to talk to you about Video Factory.So what Video Factory?Read Slide plus:“We actually started building Video Factory 1 year ago this week – I was putting together the final designs for our transcode architecture before I flew out to re:Invent this time, last year”
  19. Old:Designed with a very ambitiousthroughput in mind, 5 years ago, but industry has moved on – new devices, delivery methods, throughput increases.New:Full control to deploy & manage our applications, and change quickly in a changing marketplace
  20. Regional OPTs:18 channels, all on at once, 6 days a weekWant to transcode them all at the same time, but not to have those encoders hanging around idle at other timesPreviously have taken 9-12 hours for the queue to move through our systemIt’s news content – People want it while its still relevantNew system designed to cope with this (and more) throughput spikes
  21. Be really clear on Mezzanine definition since next 4 slides depend on it.Mention Mez video capture is classic broadcast technologies.Make note of the “Time addressable media store”
  22. We’re going to look at two areas in detail – Mez capture & transcode abstraction.
  23. On-Premise encoders produce MPEG2 Transport Streams from SDI onto RTP MulticastCapture RTP and split into ChunksUpload Chunks to an S3 BucketRe-Construct Chunks only when required for Transcode
  24. Vendor Lock in particularly important in SAAS models. I suggest you always have several options.
  25. So let’s take a look inside our transcode abstraction layer
  26. Blah blah…So let’s take a look inside an example transcode backend and think about how we might build one.
  27. Mention that the transaction runs as long as the transcode – Camel renews
  28. Give a one sentence summary of Camel.Give an overview ofBDD, TDD & Cucumber.Why is continuous deployment important.What happens if a deployment goes pear shaped?
  29. We use several, the key concept in all of these is that you never loose a message
  30. The message doesn’t un-marshal to the JaxB object it should. E.g.Not XMLDifferent type of messageWe could un-marshal the object, but it doesn’t meet our validation rules. E.g.Source must not be nullWrapped in a message wrapper which containsOriginal Message (Escaped)Exception MessageNever retriedAlways requires developer level interventionSuggests component version mismatchVery rare in production systemsSometimes caused by humans manually crafting messagesImplemented as an exception handler on the Route Builder
  31. We tried processing the message a number of times, and something went wrong each time that we weren’t expecting. E.g.Dependent system is downNetwork connectivity issues(Frequently) “Completely unexpected code path”Message is an exact copy of the input messageCan be replayed directly onto the input queueMore detail about what caused it can be found in the Eventing framework (Splunk)Retried several times before being put on the DLQ3 – 5 is common24/7 Operations level interventionUsually to fix the dependent system, and then replay messagesCan be common, even in production systemsBut suggests you may need to improve dependent systems, or increase your retry countImplemented as a bean in the Route Builder for SQSCheck “Approximate Delivery Count” before attempting to do any processing on a message, and redirect the message to the DLQ if necessaryOr broker side (E.g. ActiveMQ)
  32. Something I was expecting to go wrong, went wrong. E.g.State of a Dependent system wasn’t what is requiredA command line tool I use returned non-zeroBut I think the tool is likely dependable (IE a retry won’t help)Wrapped in a message wrapper which containsOriginal Message (Escaped)Exception MessageRequires some level of knowledge of the system to be retried24/7 Operations level intervention with RunBook, or Second Line supportWe have a console which un-wraps the message and replays itOften evolve from understanding the causes of DLQ’d messagesImplemented as an exception handler on the Route Builder