SlideShare uma empresa Scribd logo
1 de 131
Operational
Efficiency
Hacks
                          John Allspaw
          Operations Engineering, Flickr
who am I?
Manage the Flickr Operations group
Wrote a geeky book:
“Efficiencies”
“Efficiencies”
   Doing more with the robots you’ve got
“Efficiencies”
   Doing more with the robots you’ve got
   Doing more with the humans you’ve got
Some optimization
    “rules”
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
-   Don’t waste too much time tuning when
    you have no evidence it’ll matter.
Optimization “rules”
performance tuning gains




                                 time spent tuning
Optimization “rules”




“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.”
                         Knuth, (or Hoare)
however...
Optimization “rules”
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.


We lean on the experience of people in the
community for evidence that tuning(s) might
be a worthwhile thing to do.
Optimization “rules”

“Yet we should not pass up our
opportunities in that critical 3 percent.”
                       Knuth, (or Hoare)
So...
          stop somewhere
               in here




                           OMG
obvious
                           I'm wasting
tuning
                           !@#$ time
wins
                           for no reason
Our Context
Our Context

-   24 TB of MySQL data
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
-   15,362 service monitors (alerts)
Infrastructure Hacks
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
-   Examples of what changing hardware can do
                              (yay for Mr. Moore!)
Leaning on compilers
(synthetic PHP benchmarks, not real-world)




(http://sebastian-bergmann.de/archives/634-PHP-GCC-
                  ICC-Benchmark.html)
PHP (real-world)




php 4.4.8 to php 5.2.8 migration
Can now handle more with less




same
taste,
 less
filling
Image Processing
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
-   Only need a subset of ImageMagick features
    anyway for our purposes
Image Processing

-   OpenMP support
    (http://en.wikipedia.org/wiki/Openmp)
    - Allows parallelization of processing jobs,
       using multiple cores working on the same
       image
    - Some algorithms have more parallelization
       than others
Image Processing
-   Test script
     -   7 large-ish DSLR photos
     -   Cascade resizing each to 6 smaller sizes,
         semi-typical for Flickr’s workload
     -   Each resize processed serially
Image Processing
   compiler differences




(GM version 1.1.14, non-OpenMP)
Image Processing
          OpenMP differences


                    OpenMP
                    advantage




(gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
Image Processing
   CPU differences
Diagonal Scaling
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
-   a.k.a. “Moore’s Law Surfing”
Diagonal Scaling
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak
               per server   per server   per server

   67              2                                    8763.6
                              4GB        1x80GB

   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak

              ~70% LESS power
               per server   per server   per server

   67           2                                       8763.6
                     4GB    1x80GB
              49U LESS rack space
   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak



      23             1035           23          3008.4
      8              1120            8          1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak

                  ~75% FASTER
      23            1035         23             3008.4
                  15U LESS rack space
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8

        from this
                           to this
What do you do with
old/slow machines?
What do you do with
    old/slow machines?

-   Liquidate
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
-   “offline” tasks
Offline Tasks
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/

-   See Myles Grant talk about it more here:
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/

-   See Myles Grant talk about it more here:
    http://en.oreilly.com/velocity2009/public/schedule/detail/7552
Runbook Hacks




“WTF HAPPENED LAST NIGHT?!”
Why?
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
  - teach machines to fix themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  -   teach machines to build themselves
  -   teach machines to watch themselves
  -   teach machines to fix themselves
  -   reduce MTTR by streamlining
Automated
Infrastructure
Automated
          Infrastructure
-   If there is only one thing you do, automatic
    configuration and deployment management
    should be it.
Automated
              Infrastructure
-   If there is only one thing you do, automatic
    configuration and deployment management
    should be it.
-   See:
     -     Opscode/Chef (http://opscode.com/)

     -     Puppet (http://reductivelabs.com/products/puppet/)

     -     System Imager/Configurator
           (http://wiki.systemimager.org)
Conguration Management
     Codeswarm
Time
Machine time is cheaper than human time.


If a failure results in some commands being
run to ‘fix’ it, make the machines do it.


(i.e., don’t wake people up for stupid things!)
Aggregate Monitoring
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
High and low water marks for some metrics
Self-Healing
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Basic Apache Example
Basic Apache Example

1. Webserver not running?
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
3. Won’t start? Assume something’s really
   wrong, so don’t keep trying (email that, too)
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Issues “fixed” by the machines
MySQL Self-Healing
Some MySQL Issues “fixed” by the machines
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
- Keep track of the query types and databases that need
   the most killing, produce a “DBs that Suck” report
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Replication issues “fixed” by the
machines, by error
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
- Re-run queries on:
       - 1205 (ER_LOCK_WAIT_TIMEOUT)
       - 1213 (ER_LOCK_DEADLOCK)
Troubleshooting
Code and Config
 Deploy Logs
Code and Config
 Deploy Logs

1. ESSENTIAL
Code and Config
 Deploy Logs

1. ESSENTIAL
2. MANDATORY
Communications
Communications
•   Internal IRC

     -   For ongoing discussions
     -   Logged, so “infinite” scrollback
Communications
•   Internal IRC

      -   For ongoing discussions
      -   Logged, so “infinite” scrollback
•   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call
Communications
•   Internal IRC

      -   For ongoing discussions
      -   Logged, so “infinite” scrollback
•   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call

•   All of IRC and IM Bot slurped into a search index
Operational Efficiency Hacks Web20 Expo2009
when
when   what
when   what

              detailed
               what*
when                 what

                                                        detailed
                                                         what*




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who
     time of last deploy at top of ganglia




*also points to what commands should be used to back out the changes
Operational Efficiency Hacks Web20 Expo2009
Operational Efficiency Hacks Web20 Expo2009
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)




all IRC, IM
      bot
     into
searchable
  history
Morals of Our Stories
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
-   Plain old-fashioned intuition
Some Wisdom Nuggets


     Jon Prall’s 85 WebOps Rules:
  http://jprall.vox.com/library/post/85-
    operations-rules-to-live-by.html
Questions?

http://www.flickr.com/photos/ebarney/3348965637/
http://www.flickr.com/photos/dgmiller/1606071911/
http://www.flickr.com/photos/dannyboyster/60371673/
http://www.flickr.com/photos/bright/189338394/
http://www.flickr.com/photos/nickwheeleroz/2475011402/
http://www.flickr.com/photos/dramaqueennorma/191063346/
http://www.flickr.com/photos/telstar/2861103147/
http://www.flickr.com/photos/norby/2309046043/
http://www.flickr.com/photos/allysonk/201008992/

Mais conteúdo relacionado

Mais procurados

Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
Oracle Enterprise Manager
Oracle Enterprise ManagerOracle Enterprise Manager
Oracle Enterprise ManagerBob Rhubart
 
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1SolarWinds
 
MySQL Cluster 8.0 tutorial
MySQL Cluster 8.0 tutorialMySQL Cluster 8.0 tutorial
MySQL Cluster 8.0 tutorialFrazer Clement
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 MinutesSveta Smirnova
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
Best Practices to avoid ORA-01555
Best Practices to avoid ORA-01555Best Practices to avoid ORA-01555
Best Practices to avoid ORA-01555Deiby Gómez
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...Salesforce Engineering
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1Jean-François Gagné
 
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Andrejs Prokopjevs
 
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
SRE Demystified - 16 - NALSD - Non-Abstract Large System DesignSRE Demystified - 16 - NALSD - Non-Abstract Large System Design
SRE Demystified - 16 - NALSD - Non-Abstract Large System DesignDr Ganesh Iyer
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data SharingSurekha Parekh
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 

Mais procurados (20)

Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Less10 undo
Less10 undoLess10 undo
Less10 undo
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
Oracle Enterprise Manager
Oracle Enterprise ManagerOracle Enterprise Manager
Oracle Enterprise Manager
 
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
 
MySQL Cluster 8.0 tutorial
MySQL Cluster 8.0 tutorialMySQL Cluster 8.0 tutorial
MySQL Cluster 8.0 tutorial
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
Best Practices to avoid ORA-01555
Best Practices to avoid ORA-01555Best Practices to avoid ORA-01555
Best Practices to avoid ORA-01555
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
 
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
SRE Demystified - 16 - NALSD - Non-Abstract Large System DesignSRE Demystified - 16 - NALSD - Non-Abstract Large System Design
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 

Destaque

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationChristian Glahn
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comJohn Allspaw
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaAdriana Rocha
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidadescarorubi1
 
Saludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamáSaludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamámirepanama
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringMilling and Grain magazine
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormPaulo de Tarso Molina
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Managementfhanley
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)NLandUSA
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_aprilRene Jack
 
La gestión de la elección gd e e mail
La gestión de la elección gd e e mailLa gestión de la elección gd e e mail
La gestión de la elección gd e e mailmsanchezm
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4bJ M
 

Destaque (20)

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulation
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 folia
 
Yoga ocular
Yoga ocularYoga ocular
Yoga ocular
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidades
 
Saludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamáSaludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamá
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineering
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration Form
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Management
 
EBT Health & Safety
EBT Health & SafetyEBT Health & Safety
EBT Health & Safety
 
Pulpo congelado en españa
Pulpo congelado en españaPulpo congelado en españa
Pulpo congelado en españa
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_april
 
Cardiodiagnostico
CardiodiagnosticoCardiodiagnostico
Cardiodiagnostico
 
Cyclus-US Portfolio
Cyclus-US PortfolioCyclus-US Portfolio
Cyclus-US Portfolio
 
La gestión de la elección gd e e mail
La gestión de la elección gd e e mailLa gestión de la elección gd e e mail
La gestión de la elección gd e e mail
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4b
 
Final.gest.hum.1
Final.gest.hum.1Final.gest.hum.1
Final.gest.hum.1
 
Ba38
Ba38Ba38
Ba38
 
Numero
NumeroNumero
Numero
 

Semelhante a Operational Efficiency Hacks Web20 Expo2009

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web OperationsJohn Allspaw
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterTim Lossen
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchAzul Systems Inc.
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)Kenny Gryp
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineTianlun Zhang
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingSean Yu
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPJuan D. Deaton, Ph.D.
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 

Semelhante a Operational Efficiency Hacks Web20 Expo2009 (20)

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web Operations
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails cluster
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming Engine
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIP
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 

Mais de John Allspaw

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert DesignJohn Allspaw
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsJohn Allspaw
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages MaturelyJohn Allspaw
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex SystemsJohn Allspaw
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorJohn Allspaw
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?John Allspaw
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)John Allspaw
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMPJohn Allspaw
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008John Allspaw
 

Mais de John Allspaw (14)

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert Design
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMP
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008
 

Último

Hailey's On It! Road Trippin' Storyboard Sample 2
Hailey's On It! Road Trippin' Storyboard Sample 2Hailey's On It! Road Trippin' Storyboard Sample 2
Hailey's On It! Road Trippin' Storyboard Sample 2Casey Keith
 
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptxEaster Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx17vsar056
 
Introduction to Email Marketing 2024.pptx
Introduction to Email Marketing 2024.pptxIntroduction to Email Marketing 2024.pptx
Introduction to Email Marketing 2024.pptxlouise569794
 
Patrick Star Show: King Neptune's Ball (Revisionist) #1
Patrick Star Show: King Neptune's Ball (Revisionist) #1Patrick Star Show: King Neptune's Ball (Revisionist) #1
Patrick Star Show: King Neptune's Ball (Revisionist) #1JasmineR4
 
Script My Project class clown recent draft.pdf
Script My Project class clown recent draft.pdfScript My Project class clown recent draft.pdf
Script My Project class clown recent draft.pdfangzenafrost
 
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptx
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptxPOWER MIGRACIÓN PARA LA ASIGNATURA DE .pptx
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptxmluisa9715
 
Hallo-Wieners Full-Color Storyboard Project
Hallo-Wieners Full-Color Storyboard ProjectHallo-Wieners Full-Color Storyboard Project
Hallo-Wieners Full-Color Storyboard Projectsarahr51
 
SoftServe winning NATO Hackathon 2024.pdf
SoftServe winning NATO Hackathon 2024.pdfSoftServe winning NATO Hackathon 2024.pdf
SoftServe winning NATO Hackathon 2024.pdfSoftServe HRM
 
Irma Kukhianidze (Georgian, 1970) part.4
Irma Kukhianidze (Georgian, 1970) part.4Irma Kukhianidze (Georgian, 1970) part.4
Irma Kukhianidze (Georgian, 1970) part.4sandamichaela *
 
Sample Board from Disney TVA - Hailey's On It!
Sample Board from Disney TVA - Hailey's On It!Sample Board from Disney TVA - Hailey's On It!
Sample Board from Disney TVA - Hailey's On It!Casey Keith
 
jose rizal el filibusterismo el filibusterimo
jose rizal el filibusterismo el filibusterimojose rizal el filibusterismo el filibusterimo
jose rizal el filibusterismo el filibusterimobadangayonmgb
 
AD349_pr06_Kai_Mendoza_w24 Brand Identity Presentation
AD349_pr06_Kai_Mendoza_w24 Brand Identity PresentationAD349_pr06_Kai_Mendoza_w24 Brand Identity Presentation
AD349_pr06_Kai_Mendoza_w24 Brand Identity Presentationmakaiodm
 
Hailey's On It! Beta's Gonna Hate Storyboard Sample
Hailey's On It! Beta's Gonna Hate Storyboard SampleHailey's On It! Beta's Gonna Hate Storyboard Sample
Hailey's On It! Beta's Gonna Hate Storyboard SampleCasey Keith
 
Vietnamese artists capture war and the refugee experience
Vietnamese artists capture war and the refugee experienceVietnamese artists capture war and the refugee experience
Vietnamese artists capture war and the refugee experienceolivianb419
 
Culture advertisement from marketing ads
Culture advertisement from marketing adsCulture advertisement from marketing ads
Culture advertisement from marketing adsMuhammad Sohaib Afzaal
 
"witch fraud" board sequence _ 1x1 .pdf
"witch fraud" board sequence  _ 1x1 .pdf"witch fraud" board sequence  _ 1x1 .pdf
"witch fraud" board sequence _ 1x1 .pdfLeahAbrams4
 
False9 Programme - 16th March Chelmsford Theatre.pdf
False9 Programme - 16th March Chelmsford Theatre.pdfFalse9 Programme - 16th March Chelmsford Theatre.pdf
False9 Programme - 16th March Chelmsford Theatre.pdfoliverloom5
 
Action storyboard: "there's only one man who can catch me!"
Action storyboard: "there's only one man who can catch me!"Action storyboard: "there's only one man who can catch me!"
Action storyboard: "there's only one man who can catch me!"LyneSun
 
Media - Broadsheet new - Copy.ppt x
Media - Broadsheet  new - Copy.ppt          xMedia - Broadsheet  new - Copy.ppt          x
Media - Broadsheet new - Copy.ppt xWillowDryden
 

Último (20)

Hailey's On It! Road Trippin' Storyboard Sample 2
Hailey's On It! Road Trippin' Storyboard Sample 2Hailey's On It! Road Trippin' Storyboard Sample 2
Hailey's On It! Road Trippin' Storyboard Sample 2
 
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptxEaster Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx
Easter Eggs tfdyyguhiljhhxfghjgkhggcfxh(5).pptx
 
Introduction to Email Marketing 2024.pptx
Introduction to Email Marketing 2024.pptxIntroduction to Email Marketing 2024.pptx
Introduction to Email Marketing 2024.pptx
 
Patrick Star Show: King Neptune's Ball (Revisionist) #1
Patrick Star Show: King Neptune's Ball (Revisionist) #1Patrick Star Show: King Neptune's Ball (Revisionist) #1
Patrick Star Show: King Neptune's Ball (Revisionist) #1
 
Script My Project class clown recent draft.pdf
Script My Project class clown recent draft.pdfScript My Project class clown recent draft.pdf
Script My Project class clown recent draft.pdf
 
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptx
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptxPOWER MIGRACIÓN PARA LA ASIGNATURA DE .pptx
POWER MIGRACIÓN PARA LA ASIGNATURA DE .pptx
 
Hallo-Wieners Full-Color Storyboard Project
Hallo-Wieners Full-Color Storyboard ProjectHallo-Wieners Full-Color Storyboard Project
Hallo-Wieners Full-Color Storyboard Project
 
SoftServe winning NATO Hackathon 2024.pdf
SoftServe winning NATO Hackathon 2024.pdfSoftServe winning NATO Hackathon 2024.pdf
SoftServe winning NATO Hackathon 2024.pdf
 
Irma Kukhianidze (Georgian, 1970) part.4
Irma Kukhianidze (Georgian, 1970) part.4Irma Kukhianidze (Georgian, 1970) part.4
Irma Kukhianidze (Georgian, 1970) part.4
 
Sample Board from Disney TVA - Hailey's On It!
Sample Board from Disney TVA - Hailey's On It!Sample Board from Disney TVA - Hailey's On It!
Sample Board from Disney TVA - Hailey's On It!
 
jose rizal el filibusterismo el filibusterimo
jose rizal el filibusterismo el filibusterimojose rizal el filibusterismo el filibusterimo
jose rizal el filibusterismo el filibusterimo
 
AD349_pr06_Kai_Mendoza_w24 Brand Identity Presentation
AD349_pr06_Kai_Mendoza_w24 Brand Identity PresentationAD349_pr06_Kai_Mendoza_w24 Brand Identity Presentation
AD349_pr06_Kai_Mendoza_w24 Brand Identity Presentation
 
Hailey's On It! Beta's Gonna Hate Storyboard Sample
Hailey's On It! Beta's Gonna Hate Storyboard SampleHailey's On It! Beta's Gonna Hate Storyboard Sample
Hailey's On It! Beta's Gonna Hate Storyboard Sample
 
Vietnamese artists capture war and the refugee experience
Vietnamese artists capture war and the refugee experienceVietnamese artists capture war and the refugee experience
Vietnamese artists capture war and the refugee experience
 
Culture advertisement from marketing ads
Culture advertisement from marketing adsCulture advertisement from marketing ads
Culture advertisement from marketing ads
 
"witch fraud" board sequence _ 1x1 .pdf
"witch fraud" board sequence  _ 1x1 .pdf"witch fraud" board sequence  _ 1x1 .pdf
"witch fraud" board sequence _ 1x1 .pdf
 
False9 Programme - 16th March Chelmsford Theatre.pdf
False9 Programme - 16th March Chelmsford Theatre.pdfFalse9 Programme - 16th March Chelmsford Theatre.pdf
False9 Programme - 16th March Chelmsford Theatre.pdf
 
A picture from the mountains - my hometown Vanadzor
A picture from the mountains - my hometown VanadzorA picture from the mountains - my hometown Vanadzor
A picture from the mountains - my hometown Vanadzor
 
Action storyboard: "there's only one man who can catch me!"
Action storyboard: "there's only one man who can catch me!"Action storyboard: "there's only one man who can catch me!"
Action storyboard: "there's only one man who can catch me!"
 
Media - Broadsheet new - Copy.ppt x
Media - Broadsheet  new - Copy.ppt          xMedia - Broadsheet  new - Copy.ppt          x
Media - Broadsheet new - Copy.ppt x
 

Operational Efficiency Hacks Web20 Expo2009

  • 1. Operational Efficiency Hacks John Allspaw Operations Engineering, Flickr
  • 2. who am I? Manage the Flickr Operations group Wrote a geeky book:
  • 4. “Efficiencies” Doing more with the robots you’ve got
  • 5. “Efficiencies” Doing more with the robots you’ve got Doing more with the humans you’ve got
  • 6. Some optimization “rules”
  • 7. Some optimization “rules” - Don’t rely on being able to tweak anything.
  • 8. Some optimization “rules” - Don’t rely on being able to tweak anything. - Don’t waste too much time tuning when you have no evidence it’ll matter.
  • 10. Optimization “rules” “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” Knuth, (or Hoare)
  • 13. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient.
  • 14. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient.
  • 15. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient. We lean on the experience of people in the community for evidence that tuning(s) might be a worthwhile thing to do.
  • 16. Optimization “rules” “Yet we should not pass up our opportunities in that critical 3 percent.” Knuth, (or Hoare)
  • 17. So... stop somewhere in here OMG obvious I'm wasting tuning !@#$ time wins for no reason
  • 19. Our Context - 24 TB of MySQL data
  • 20. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes
  • 21. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads
  • 22. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos
  • 23. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day
  • 24. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day - 15,362 service monitors (alerts)
  • 26. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning)
  • 27. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning) - Examples of what changing hardware can do (yay for Mr. Moore!)
  • 28. Leaning on compilers (synthetic PHP benchmarks, not real-world) (http://sebastian-bergmann.de/archives/634-PHP-GCC- ICC-Benchmark.html)
  • 29. PHP (real-world) php 4.4.8 to php 5.2.8 migration
  • 30. Can now handle more with less same taste, less filling
  • 32. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9)
  • 33. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5)
  • 34. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5) - Only need a subset of ImageMagick features anyway for our purposes
  • 35. Image Processing - OpenMP support (http://en.wikipedia.org/wiki/Openmp) - Allows parallelization of processing jobs, using multiple cores working on the same image - Some algorithms have more parallelization than others
  • 36. Image Processing - Test script - 7 large-ish DSLR photos - Cascade resizing each to 6 smaller sizes, semi-typical for Flickr’s workload - Each resize processed serially
  • 37. Image Processing compiler differences (GM version 1.1.14, non-OpenMP)
  • 38. Image Processing OpenMP differences OpenMP advantage (gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
  • 39. Image Processing CPU differences
  • 41. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes
  • 42. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh”
  • 43. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh” - a.k.a. “Moore’s Law Surfing”
  • 45. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced
  • 46. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak per server per server per server 67 2 8763.6 4GB 1x80GB 18 8 2332.8 4GB 1x146GB
  • 47. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak ~70% LESS power per server per server per server 67 2 8763.6 4GB 1x80GB 49U LESS rack space 18 8 2332.8 4GB 1x146GB
  • 49. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced
  • 50. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak 23 1035 23 3008.4 8 1120 8 1036.8
  • 51. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak ~75% FASTER 23 1035 23 3008.4 15U LESS rack space 65% LESS power 8 8 1120 1036.8
  • 52. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 53. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 54. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8 from this to this
  • 55. What do you do with old/slow machines?
  • 56. What do you do with old/slow machines? - Liquidate
  • 57. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc
  • 58. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc - “offline” tasks
  • 60. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks
  • 61. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here:
  • 62. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/
  • 63. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/ - See Myles Grant talk about it more here:
  • 64. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/ - See Myles Grant talk about it more here: http://en.oreilly.com/velocity2009/public/schedule/detail/7552
  • 65. Runbook Hacks “WTF HAPPENED LAST NIGHT?!”
  • 66. Why?
  • 67. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand
  • 68. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How:
  • 69. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves
  • 70. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves
  • 71. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to fix themselves
  • 72. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to fix themselves - reduce MTTR by streamlining
  • 74. Automated Infrastructure - If there is only one thing you do, automatic configuration and deployment management should be it.
  • 75. Automated Infrastructure - If there is only one thing you do, automatic configuration and deployment management should be it. - See: - Opscode/Chef (http://opscode.com/) - Puppet (http://reductivelabs.com/products/puppet/) - System Imager/Configurator (http://wiki.systemimager.org)
  • 77. Time Machine time is cheaper than human time. If a failure results in some commands being run to ‘fix’ it, make the machines do it. (i.e., don’t wake people up for stupid things!)
  • 79. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change
  • 80. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change High and low water marks for some metrics
  • 82. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it.
  • 83. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it.
  • 84. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 85. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 86. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 87. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 89. Basic Apache Example 1. Webserver not running?
  • 90. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow)
  • 91. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow) 3. Won’t start? Assume something’s really wrong, so don’t keep trying (email that, too)
  • 93. MySQL Self-Healing Some MySQL Issues “fixed” by the machines
  • 94. MySQL Self-Healing Some MySQL Issues “fixed” by the machines
  • 95. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill)
  • 96. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments
  • 97. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results
  • 98. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results - Keep track of the query types and databases that need the most killing, produce a “DBs that Suck” report
  • 100. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error
  • 101. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads
  • 102. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR)
  • 103. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR) - Re-run queries on: - 1205 (ER_LOCK_WAIT_TIMEOUT) - 1213 (ER_LOCK_DEADLOCK)
  • 105. Code and Config Deploy Logs
  • 106. Code and Config Deploy Logs 1. ESSENTIAL
  • 107. Code and Config Deploy Logs 1. ESSENTIAL 2. MANDATORY
  • 109. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback
  • 110. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback • IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call
  • 111. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback • IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call • All of IRC and IM Bot slurped into a search index
  • 113. when
  • 114. when what
  • 115. when what detailed what*
  • 116. when what detailed what* *also points to what commands should be used to back out the changes
  • 117. when what detailed what* who *also points to what commands should be used to back out the changes
  • 118. when what detailed what* who *also points to what commands should be used to back out the changes
  • 119. when what detailed what* who time of last deploy at top of ganglia *also points to what commands should be used to back out the changes
  • 122. IM Bot (timestamps help correlation)
  • 123. IM Bot (timestamps help correlation)
  • 124. IM Bot (timestamps help correlation) all IRC, IM bot into searchable history
  • 125. Morals of Our Stories
  • 126. Morals of Our Stories - Optimizations can be a Very Good Thing™
  • 127. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains
  • 128. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios
  • 129. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios - Plain old-fashioned intuition
  • 130. Some Wisdom Nuggets Jon Prall’s 85 WebOps Rules: http://jprall.vox.com/library/post/85- operations-rules-to-live-by.html

Notas do Editor

  1. Finding huge gains from tweaking gets harder as you turn the knobs and pull the levers.
  2. He had some evidence that suggested compilers and versions would make a difference for PHP.
  3. So, we did it. (not just for performance reasons, but still...) Will this happen to you, too? Maybe. Maybe not.
  4. Performance gains like this don’t come very often, for “free”.
  5. We don’t need 80% of what ImageMagick has.
  6. We don’t need 80% of what ImageMagick has.
  7. We don’t need 80% of what ImageMagick has.
  8. Cascading resizing: Original -> Large Large -> Medium Large -> Small Medium -> Thumb Medium -> Square
  9. This was done before OpenMP support was in. Compilers and optimization flags can make a difference!
  10. Enter OpenMP. Example of how
  11. These have been the revisions of our image processing hardware over time. 4x faster
  12. Examples are contact notifications, large photoset deletions, etc.
  13. Examples are contact notifications, large photoset deletions, etc.
  14. Examples are contact notifications, large photoset deletions, etc.
  15. Examples are contact notifications, large photoset deletions, etc.
  16. Examples are contact notifications, large photoset deletions, etc.
  17. Examples are contact notifications, large photoset deletions, etc.
  18. Runbook hacks: tuning the process of operations failure handling and mitigation.
  19. Low water marks as indirect trouble indicators.
  20. Low water marks as indirect trouble indicators.
  21. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  22. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  23. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  24. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  25. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  26. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  27. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  28. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  29. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  30. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  31. Simple tips and tricks that can help in fixing things when they break.
  32. This shouldn’t be considered optional.
  33. This shouldn’t be considered optional.
  34. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  35. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  36. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  37. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.