SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Just-In-Time Scalability: Agile
Methods to Support Massive
           Growth
What is IMVU?

                 
Behind the scenes...

                       IMVU is LAMP, plus...
                        • Perlbal
                        • Memcached
                        • Solr
                        • MogileFS
                        • plus...
                                                      •   ADODB
                                                      •   b2evolution
                   •   Audiere
                                   •   BuildBot       •   Coppermine
                   •   Boost
                                   •   eAccelerator •     feed2js
                   •   Cal3D 
                                   •   Linux (Debian) •   FreeTag
                   •   CFL
                                   •   memcached •        Incutio XML-RPC
                   •   NSIS
                                   •   Nagios         •   jrcache
                   •   Pixomatic
                                   •   Perl           •   JSON-PHP
                   •   Python
                                   •   Roundup        •   Magpie
                   •   pywin32
                                   •   rrd            •   osCommerce
                   •   SCons
                                   •   Subversion     •   phpBB
                   •   wxPython
                                                      •   Phorum
                                                      •   SimpleTest
                                                      •   Selenium
Before and After Architecture

Before                            After

We started with a small site, a   We ended with a large site, a
mess of open source, and a        medium sized team, and an
small team that didn't know       architecture that has scaled. 
much about scaling. 




We never stopped. We used a roadmap and a compass, made
weekly changes in direction, regularly shipped code on
Wednesday to handle the next weekend's capacity constraints,
and shipped new features the whole time.  
Before and After Architecture (1/4)




                November
Before and After Architecture (2/4)




                December
Before and After Architecture (3/4)




                February
Before and After Architecture (4/4)




                May
Advanced planning vs. fast response
       “Rocket ship”                   “Driving”

• Figure out in advance what   • Continuously figure out
  is going to go wrong           what is going to go wrong
                                 soon
• Build a plan that prevents
  those things from            • Quickly fix it, without
  happening                      breaking something else
• Execute your plan            • Get feedback along the
                                 way
• Get feedback when done
Questions to ask
       “Rocket ship”                  “Driving”

• Are you sure you know      • How do you know you will
  what is going to happen?     be able to fix the problem
                               in time?
• Are you sure you can
                             • How can you be sure you
  execute?
                               won't cause collateral
• Can you afford it?           damage?
• Do you need feedback?      • How can you be sure you
                               won't code yourself into a
                               corner?
Continuous Ship
• Deploy new software quickly
   •   At IMVU time from check-in to production = 20 minutes

• Tell a good change from a bad change (quickly)

• Revert a bad change quickly

• Work in small batches
   •   At IMVU, a large batch = 3 days worth of work

• Break large projects down into small batches

• Don't have the same problem twice – fix the root cause of each
  class of problems

 IMVU pushes code to production 20-30 times every day
Cluster Immune System
What it looks like to ship one piece of code to production:
 • Run tests locally (SimpleTest, Selenium)
         Everyone has a complete sandbox
     o


 • Continuous Integration Server (BuildBot)
    o All tests must pass or “shut down the line”
         Automatic feedback if the team is going too fast
     o


 • Incremental deploy
         Monitor cluster and business metrics in real-time
     o
         Reject changes that move metrics out-of-bounds
     o


 • Alerting & Predictive monitoring (Nagios)
         Monitor all metrics that stakeholders care about
     o
         If any metric goes out-of-bounds, wake somebody up
     o
         Use historical trends to predict acceptable bounds
     o


 When customers see a failure:
         Fix the problem for customers
     o
         Improve your defenses at each level
     o
Case Study: Sharding

Problem: Spread write queries across multiple databases

Solution:
•Intercept and redirect queries based on SQL comments
• Move one table or sub-system at a time
   • Our experience was one engineer horizontally partitions one table or
     small sub-system in one week

•New engineers figure this out in about 5 minutes
db_query(“INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

db_query(quot;/*shard customer://$customer_id */
          INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

•Learning: cross shard joins & transactions aren’t required
Case Study: Caching
Problem: Cache frequently read data to memcached

Solution:
•Intercept and cache queries based on SQL comments
db_query_cache(BUDDY_CACHE_TIME,
              quot;/*shard customer://$customer_id */
               /*cache-class customer://$customer_id/buddies */
               SELECT friend_id, buddy_order FROM customers_friends
               WHERE customers_id=$customer_idquot;);

-----------------

db_query(“/*shard customer://$customer_id */
          DELETE FROM customers_friends
          WHERE customers_id = $customer_id
          AND friend_id = $friend_id”);
db_flush_cacheclass(quot;customer://$customer_id/buddies”);


•Learning: Flushing cache critical to users and performance
   –When a customer spends $24.95, they want the benefits immediately

•Learning: Test the cache behavior for critical systems
Case Study: Steering Data Design

Problem: Improve database schemas and data design to meet
scalability requirements without downtime

Solution:
•Measure to find the real problems (harder than it sounds)
•Migrate to new design that takes advantage of sharding and/or
caching
Case Study: Steering Data Design
Case Study: Steering Data Design
Case Study: Steering Data Design
Problem: You can’t bulk move large frequently accessed data
Solution:
•Copy on read
   –Use when you are read bound
   –Reads check cache, new location, and copy to new location if missing
   –Writes go to new location if data has been migrated, otherwise old

•Copy on write
   –Use when you are write bound
   –Reads check cache, new location, then old location
   –Writes go to new location, copying to new location if missing

•Copy all
   –Use when file system fills up
   –Reads & writes go to new location, falling back to old location if missing
   –Cron copies data a few records at a time
“Thank You for Listening!”

Mais conteúdo relacionado

Semelhante a Just In Time Scalability Agile Methods To Support Massive Growth Presentation

Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Coursepeter_marklund
 
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityHigh-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityAtlassian
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationTimothy Fitz
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Appsadunne
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRubyAmit Solanki
 
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Atlassian
 
Why Architecture in Web Development matters
Why Architecture in Web Development mattersWhy Architecture in Web Development matters
Why Architecture in Web Development mattersLars Jankowfsky
 
Agile Development Methodologies
Agile Development MethodologiesAgile Development Methodologies
Agile Development MethodologiesNainil Chheda
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupalRonan Berder
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swposcon2007
 
Modern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsModern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsRobert Glaser
 
Multi Core Playground
Multi Core PlaygroundMulti Core Playground
Multi Core PlaygroundESUG
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Clusterguestd34230
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug SquashingTony Brown
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile DevelopmentGabriele Lana
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Rubymattmatt
 
Gw Pres Agile 4slideshare
Gw Pres Agile 4slideshareGw Pres Agile 4slideshare
Gw Pres Agile 4slideshareDave Burke
 

Semelhante a Just In Time Scalability Agile Methods To Support Massive Growth Presentation (20)

Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
 
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityHigh-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRuby
 
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
 
Why Architecture in Web Development matters
Why Architecture in Web Development mattersWhy Architecture in Web Development matters
Why Architecture in Web Development matters
 
Agile Development Methodologies
Agile Development MethodologiesAgile Development Methodologies
Agile Development Methodologies
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupal
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
 
Modern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsModern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On Rails
 
Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16
 
Multi Core Playground
Multi Core PlaygroundMulti Core Playground
Multi Core Playground
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Cluster
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Ruby
 
Gw Pres Agile 4slideshare
Gw Pres Agile 4slideshareGw Pres Agile 4slideshare
Gw Pres Agile 4slideshare
 

Último

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Último (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

  • 1. Just-In-Time Scalability: Agile Methods to Support Massive Growth
  • 3. Behind the scenes... IMVU is LAMP, plus... • Perlbal • Memcached • Solr • MogileFS • plus... • ADODB • b2evolution • Audiere • BuildBot • Coppermine • Boost • eAccelerator • feed2js • Cal3D  • Linux (Debian) • FreeTag • CFL • memcached • Incutio XML-RPC • NSIS • Nagios • jrcache • Pixomatic • Perl • JSON-PHP • Python • Roundup • Magpie • pywin32 • rrd • osCommerce • SCons • Subversion • phpBB • wxPython • Phorum • SimpleTest • Selenium
  • 4. Before and After Architecture Before After We started with a small site, a We ended with a large site, a mess of open source, and a medium sized team, and an small team that didn't know architecture that has scaled.  much about scaling.  We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
  • 5. Before and After Architecture (1/4) November
  • 6. Before and After Architecture (2/4) December
  • 7. Before and After Architecture (3/4) February
  • 8. Before and After Architecture (4/4) May
  • 9. Advanced planning vs. fast response “Rocket ship” “Driving” • Figure out in advance what • Continuously figure out is going to go wrong what is going to go wrong soon • Build a plan that prevents those things from • Quickly fix it, without happening breaking something else • Execute your plan • Get feedback along the way • Get feedback when done
  • 10. Questions to ask “Rocket ship” “Driving” • Are you sure you know • How do you know you will what is going to happen? be able to fix the problem in time? • Are you sure you can • How can you be sure you execute? won't cause collateral • Can you afford it? damage? • Do you need feedback? • How can you be sure you won't code yourself into a corner?
  • 11. Continuous Ship • Deploy new software quickly • At IMVU time from check-in to production = 20 minutes • Tell a good change from a bad change (quickly) • Revert a bad change quickly • Work in small batches • At IMVU, a large batch = 3 days worth of work • Break large projects down into small batches • Don't have the same problem twice – fix the root cause of each class of problems IMVU pushes code to production 20-30 times every day
  • 12. Cluster Immune System What it looks like to ship one piece of code to production: • Run tests locally (SimpleTest, Selenium) Everyone has a complete sandbox o • Continuous Integration Server (BuildBot) o All tests must pass or “shut down the line” Automatic feedback if the team is going too fast o • Incremental deploy Monitor cluster and business metrics in real-time o Reject changes that move metrics out-of-bounds o • Alerting & Predictive monitoring (Nagios) Monitor all metrics that stakeholders care about o If any metric goes out-of-bounds, wake somebody up o Use historical trends to predict acceptable bounds o When customers see a failure: Fix the problem for customers o Improve your defenses at each level o
  • 13. Case Study: Sharding Problem: Spread write queries across multiple databases Solution: •Intercept and redirect queries based on SQL comments • Move one table or sub-system at a time • Our experience was one engineer horizontally partitions one table or small sub-system in one week •New engineers figure this out in about 5 minutes db_query(“INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); db_query(quot;/*shard customer://$customer_id */ INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); •Learning: cross shard joins & transactions aren’t required
  • 14. Case Study: Caching Problem: Cache frequently read data to memcached Solution: •Intercept and cache queries based on SQL comments db_query_cache(BUDDY_CACHE_TIME, quot;/*shard customer://$customer_id */ /*cache-class customer://$customer_id/buddies */ SELECT friend_id, buddy_order FROM customers_friends WHERE customers_id=$customer_idquot;); ----------------- db_query(“/*shard customer://$customer_id */ DELETE FROM customers_friends WHERE customers_id = $customer_id AND friend_id = $friend_id”); db_flush_cacheclass(quot;customer://$customer_id/buddies”); •Learning: Flushing cache critical to users and performance –When a customer spends $24.95, they want the benefits immediately •Learning: Test the cache behavior for critical systems
  • 15. Case Study: Steering Data Design Problem: Improve database schemas and data design to meet scalability requirements without downtime Solution: •Measure to find the real problems (harder than it sounds) •Migrate to new design that takes advantage of sharding and/or caching
  • 16. Case Study: Steering Data Design
  • 17. Case Study: Steering Data Design
  • 18. Case Study: Steering Data Design Problem: You can’t bulk move large frequently accessed data Solution: •Copy on read –Use when you are read bound –Reads check cache, new location, and copy to new location if missing –Writes go to new location if data has been migrated, otherwise old •Copy on write –Use when you are write bound –Reads check cache, new location, then old location –Writes go to new location, copying to new location if missing •Copy all –Use when file system fills up –Reads & writes go to new location, falling back to old location if missing –Cron copies data a few records at a time
  • 19. “Thank You for Listening!”