SlideShare a Scribd company logo
1 of 25
Download to read offline
Super-sizing YouTube
    with Python

       Mike Solomon
     mike@youtube.com
Welcome
this is about scaling a web application

there are a lot of things left out - mostly
mistakes and implementation details

this may generate more questions than it
answers

my goal is to give you ideas for solving your own
problems
Architecture
this is the core of scalability

systems change over time, so will your
architecture

impossible to predict the optimal approach

   start simple

   aim for local maxima

python enables flexibility
YouTube's Early Days
web boxes do everything

servlets, images, thumbnails, search

shoehorn everything into Apache, MySQL

very simple

  this survives longer than you'd think
hw load balancer




                                      httpd
                                 mod_python
                    db objects        search   thumbnails
                          biz logic
                          servlets
                         templates



                           Early Web Stack
db master



                           circa January ‘06

      db replicas
Early Key Factors in
     Engineering
really small team

  we     python

  logical separation in code

  discipline and honor - not linguistically
  enforced (don’t waste time writing code to
  restrict people)*

grown by systematically removing bottlenecks

  easy to know when something is a `win`
Running Without Tripping

user demand can grow 50% in a day

removing one bottleneck can immediately reveal
another (usually more heinous)

replace and migrate components as they become
problems

  good (python) components make this easy

  obviously, pick your battles
Good Components
     (Hypothetical)
minimize dependencies*

accept some latency

localize failures - don’t let them spread

  you are only down if it looks like you are

applies to both systems and software
Balance Machine
        Resources
more efficient resource utilization via specialized
deployment

balance based on CPU, RAM, network and disk
usage patterns

overlay orthogonal loads

  disjoint tasks running on the same physical
  hardware
Migratory Patterns of the
     Norwegian Blue
  move from mod_python to mod_fastcgi

  move thumbnails to their own machines

  make search to a remote service running on
  separate machines

  run transcoder processes on video servers

  do more with the same hardware
Serenity Now



Can you spot where we turned on
transcoding processes?
SQL Shenanigans
if you have a relational database, it will be
abused

   difficult to track the true source

series of object proxies for DB-API enable
logging

encode a portion of call stack as a query
comment* (more about this later)
Object Caching
take pressure off of relational db

can save additional resources if your objects
require significant computation to set up

memcached makes a good home for this

need good client to make this into a truly useful
service ‡

  pools and better failure handling
Software Optimization
fast vs fast enough

strive for machine efficiency - don't obsess

be scientific - collect data and understand it

can yield some surprising results

don't assume code optimization techniques from
another language are relevant

just like carpentry, measure twice cut once
Python Optimization
pure python HMAC was 40% of web cpu

  write a few lines of C

threaded comments fiasco

  overly complex algorithm to compute the
  display object tree

  simplify query, simplify algorithm
Python Optimization
psyco - specializing compiler for Python

  'hot' functions are psyco-ized

  there is a 'context switch' penalty so you
  need to experiment to see if it helps

previous threaded comments algorithm

  -closure +psyco = 400% boost
Reasonable Efficiency
pruned all the obvious leaf services

dynamic web requests are one `service`

web service is easy to scale, so it stresses out
other resources - probably a DB

DB’s are hard(er) to scale

  tricks of escalating cleverness‡

  eventually, no cards left to play
Scaling MySQL
pretty much have to go horizontal

choose your partition plan carefully

understand your data access patterns

  what queries do you run most often?

  do you have joins?

  do you need transactional consistency? why?

does an 'entity' emerge?
Partition By Entity
entities are 'transactional'

allow joins across properties of an entity

entities are migratory

cross entity is more complicated

   weaken guarantees to make it easier

   minimize activity by design
EMD, a TLA not an ORM!
 connection and transaction management

 lookup service

 query factory

 minimalist table abstraction

   ORM can be (is?) evil

 make common behaviors simple, while leaving
 some transparency to the actual database
Seismic Retrofit
apply this fundamental change to a large and
growing site

make it relatively painless with python

  multiple inheritance

  decorators

  AST plugins for validation and testing
Resulting API
all the scale-aware code nicely opaque to
application developers

base use cases are painless
User.select_by_username(db_context, username)

Video.select_by_id(db_context, video_id)

Video.select_by_user_id(db_context, user_id)
Bulk Entity Migration
hijack mysql replication to partition on the fly
while the live site is running

all DML gets tagged with an entity id

read master binlog and selectively replay it into
a set of new mini-masters

update lookup service to point to new resources
Recurring Themes
the elegance of simplicity

take reliable open software and customize it

`pythonic veneer`

DIY - filing a ticket for a bugfix doesn’t give me
a warm feeling - take matters into your own
hands*
Questions?

More Related Content

What's hot

Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기흥래 김
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithmMasahiko Umeno
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYAshnikbiz
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...ForgeRock
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Treasure Data, Inc.
 
Singer, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureSinger, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureDiscover Pinterest
 

What's hot (20)

Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기
엘라스틱서치 클러스터로 수십억 건의 데이터 운영하기
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
dbt 101
dbt 101dbt 101
dbt 101
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
 
Singer, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureSinger, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging Infrastructure
 

Similar to Scaling YouTube with Python

Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
PHP – Faster And Cheaper. Scale Vertically with IBM i
PHP – Faster And Cheaper. Scale Vertically with IBM iPHP – Faster And Cheaper. Scale Vertically with IBM i
PHP – Faster And Cheaper. Scale Vertically with IBM iSam Hennessy
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
Intro to Cloud Architecture
Intro to Cloud ArchitectureIntro to Cloud Architecture
Intro to Cloud Architecturewlscaudill
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Modern Application Stacks
Modern Application StacksModern Application Stacks
Modern Application Stackschartjes
 
AWS case study: real estate portal
AWS case study: real estate portalAWS case study: real estate portal
AWS case study: real estate portalAndreas Chatzakis
 
Class 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipClass 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipallanchao
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Gavin M
Gavin MGavin M
Gavin MOntico
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableComsysto Reply GmbH
 

Similar to Scaling YouTube with Python (20)

Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
PHP – Faster And Cheaper. Scale Vertically with IBM i
PHP – Faster And Cheaper. Scale Vertically with IBM iPHP – Faster And Cheaper. Scale Vertically with IBM i
PHP – Faster And Cheaper. Scale Vertically with IBM i
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons Learned
 
Intro to Cloud Architecture
Intro to Cloud ArchitectureIntro to Cloud Architecture
Intro to Cloud Architecture
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Modern Application Stacks
Modern Application StacksModern Application Stacks
Modern Application Stacks
 
AWS case study: real estate portal
AWS case study: real estate portalAWS case study: real estate portal
AWS case study: real estate portal
 
Class 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipClass 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurship
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Db trends final
Db trends   finalDb trends   final
Db trends final
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Gavin M
Gavin MGavin M
Gavin M
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
 

More from didip

Oregon Measure 66
Oregon Measure 66Oregon Measure 66
Oregon Measure 66didip
 
High Performance Erlang
High Performance ErlangHigh Performance Erlang
High Performance Erlangdidip
 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Pythondidip
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentationdidip
 
Ecma 262
Ecma 262Ecma 262
Ecma 262didip
 
Socket Programming In Python
Socket Programming In PythonSocket Programming In Python
Socket Programming In Pythondidip
 
Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paperdidip
 
Feed Burner Scalability
Feed Burner ScalabilityFeed Burner Scalability
Feed Burner Scalabilitydidip
 
Theory Psyco
Theory PsycoTheory Psyco
Theory Psycodidip
 

More from didip (9)

Oregon Measure 66
Oregon Measure 66Oregon Measure 66
Oregon Measure 66
 
High Performance Erlang
High Performance ErlangHigh Performance Erlang
High Performance Erlang
 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Python
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
 
Ecma 262
Ecma 262Ecma 262
Ecma 262
 
Socket Programming In Python
Socket Programming In PythonSocket Programming In Python
Socket Programming In Python
 
Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paper
 
Feed Burner Scalability
Feed Burner ScalabilityFeed Burner Scalability
Feed Burner Scalability
 
Theory Psyco
Theory PsycoTheory Psyco
Theory Psyco
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Scaling YouTube with Python

  • 1. Super-sizing YouTube with Python Mike Solomon mike@youtube.com
  • 2. Welcome this is about scaling a web application there are a lot of things left out - mostly mistakes and implementation details this may generate more questions than it answers my goal is to give you ideas for solving your own problems
  • 3. Architecture this is the core of scalability systems change over time, so will your architecture impossible to predict the optimal approach start simple aim for local maxima python enables flexibility
  • 4. YouTube's Early Days web boxes do everything servlets, images, thumbnails, search shoehorn everything into Apache, MySQL very simple this survives longer than you'd think
  • 5. hw load balancer httpd mod_python db objects search thumbnails biz logic servlets templates Early Web Stack db master circa January ‘06 db replicas
  • 6. Early Key Factors in Engineering really small team we python logical separation in code discipline and honor - not linguistically enforced (don’t waste time writing code to restrict people)* grown by systematically removing bottlenecks easy to know when something is a `win`
  • 7. Running Without Tripping user demand can grow 50% in a day removing one bottleneck can immediately reveal another (usually more heinous) replace and migrate components as they become problems good (python) components make this easy obviously, pick your battles
  • 8. Good Components (Hypothetical) minimize dependencies* accept some latency localize failures - don’t let them spread you are only down if it looks like you are applies to both systems and software
  • 9. Balance Machine Resources more efficient resource utilization via specialized deployment balance based on CPU, RAM, network and disk usage patterns overlay orthogonal loads disjoint tasks running on the same physical hardware
  • 10. Migratory Patterns of the Norwegian Blue move from mod_python to mod_fastcgi move thumbnails to their own machines make search to a remote service running on separate machines run transcoder processes on video servers do more with the same hardware
  • 11. Serenity Now Can you spot where we turned on transcoding processes?
  • 12. SQL Shenanigans if you have a relational database, it will be abused difficult to track the true source series of object proxies for DB-API enable logging encode a portion of call stack as a query comment* (more about this later)
  • 13. Object Caching take pressure off of relational db can save additional resources if your objects require significant computation to set up memcached makes a good home for this need good client to make this into a truly useful service ‡ pools and better failure handling
  • 14. Software Optimization fast vs fast enough strive for machine efficiency - don't obsess be scientific - collect data and understand it can yield some surprising results don't assume code optimization techniques from another language are relevant just like carpentry, measure twice cut once
  • 15. Python Optimization pure python HMAC was 40% of web cpu write a few lines of C threaded comments fiasco overly complex algorithm to compute the display object tree simplify query, simplify algorithm
  • 16. Python Optimization psyco - specializing compiler for Python 'hot' functions are psyco-ized there is a 'context switch' penalty so you need to experiment to see if it helps previous threaded comments algorithm -closure +psyco = 400% boost
  • 17. Reasonable Efficiency pruned all the obvious leaf services dynamic web requests are one `service` web service is easy to scale, so it stresses out other resources - probably a DB DB’s are hard(er) to scale tricks of escalating cleverness‡ eventually, no cards left to play
  • 18. Scaling MySQL pretty much have to go horizontal choose your partition plan carefully understand your data access patterns what queries do you run most often? do you have joins? do you need transactional consistency? why? does an 'entity' emerge?
  • 19. Partition By Entity entities are 'transactional' allow joins across properties of an entity entities are migratory cross entity is more complicated weaken guarantees to make it easier minimize activity by design
  • 20. EMD, a TLA not an ORM! connection and transaction management lookup service query factory minimalist table abstraction ORM can be (is?) evil make common behaviors simple, while leaving some transparency to the actual database
  • 21. Seismic Retrofit apply this fundamental change to a large and growing site make it relatively painless with python multiple inheritance decorators AST plugins for validation and testing
  • 22. Resulting API all the scale-aware code nicely opaque to application developers base use cases are painless User.select_by_username(db_context, username) Video.select_by_id(db_context, video_id) Video.select_by_user_id(db_context, user_id)
  • 23. Bulk Entity Migration hijack mysql replication to partition on the fly while the live site is running all DML gets tagged with an entity id read master binlog and selectively replay it into a set of new mini-masters update lookup service to point to new resources
  • 24. Recurring Themes the elegance of simplicity take reliable open software and customize it `pythonic veneer` DIY - filing a ticket for a bugfix doesn’t give me a warm feeling - take matters into your own hands*