SlideShare uma empresa Scribd logo
1 de 106
Baixar para ler offline
Metrics-Driven Engineering

Mike Brittain        @ mikebrittain
Director of engineering, Infrastructure

                                          October 13, 2011
Tools and Process at Etsy
How many new visits?
  How many listings created?
  How many registrations?
How do people use Etsy?
  How many convos sent?
    How many purchases?
     How many new shops?
Search indexing?
     How fast are pages generating?
   Async tasks currently in queue?
What is the application doing?
 Developer API auth and rate limiting?
       Images resized and stored?
          Error and warning rates?
Replication slave lag?
       Memcache hits/misses?
       Available connections?
Are the servers in good shape ?
    Database queries per second?
       Total outgoing bandwidth?
            CPU, Memory, I/O?
Business Metrics
Application Metrics
System Metrics
Visibility EVERYWHERE
Constant Change
$314 Million GMS 2010
  $180 Million GMS 2009
  $87 Million GMS 2008

  $26 Million GMS 2007




credit: pentarux (flickr)
25 Million Unique Visitors
  1 Billion page views per month




credit: pentarux (flickr)
Engineering team grew 500%
                        over 18 months


credit: martin_heigan (flickr)
Less talk, more do.
Always Be Shipping



credit: ibailemon (flickr)
Always Be Shipping
                             (even if it’s your first day)




credit: ibailemon (flickr)
90+ Engineers
                     40+ Deploys / day

credit: misswired (flickr)
credit: digidave (flickr)
Code Reviews
Automated Tests
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
Plus “admin-only,” percentage ramp-up, A/B testing,
whitelists, blacklists, etc...
Failure is not an option
inevitable!
Failure is not an option
inevitable!
Failure is not an option
            a learning opportunity!
inevitable!
Failure is not an option
            a learning opportunity!
     DETECTABLE!
Access
Detect problems quickly
CONFIDENCE
A:    Well, the Ops team manages the network, racks
     the servers, installed the monitoring tools, wears
                the pagers, blah, blah, blah...
Engineers build the application
Logging
      Graphing
OPS              ENG
      Trending
      Alerting
“Engineers are too busy writing
  features to build metrics.”
Metrics are part of every feature
        ...and so are config flags
Dead Simple
Simple, open source tools
Cacti (network, SNMP)
Ganglia (machines)
Graphite (application)
Splunk (log analysis, nightly reports)
Nagios (alerting)
                             Logging
                             Logster
                               StatsD
Ganglia
Ganglia
Cluster-oriented
Huge community contributed recipes
Custom metrics (gmetad)
Graphite
Graphite
                            Single-instance
              Create new metrics on-the-fly
   Customize via URLs and display functions
Logging
It’s 2:48 PM.
Do you know where your
       logs are?
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
LogFormat "%h %l %u %t "%r" %>s %b"
                common
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
apache_note()
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
grep "/listing/" access.log | 
awk '{sum=sum+$(NF-2)} END {print sum/NR}'
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0034   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0055   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
web0089   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087   [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
web0002   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0201   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0355   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0052   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0066   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
Logster
Fatals       Errors   Warnings
Logster
Run by cron
Keeps a cursor on your log file
Aggregate lines anyway you want
Output to Ganglia or Graphite
Simple parsers
                                  github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
^.+ [.+] [(?P<log_level>.+)]
if (fields['log_level'] == “fatal”):
   self.fatals += 1

elif (fields['log_level'] == “error”):
   self.errors += 1

elif (fields['log_level'] == “warning”):
   self.warnings += 1

...
MetricObject("fatals",
  (self.fatals / self.duration), "per sec")

MetricObject("errors",
  (self.errors / self.duration), "per sec")

MetricObject("warning",
  (self.warnings / self.duration), "per sec")
Fatals   Errors   Warnings
StatsD
StatsD
                           Network daemon (node.js)
                               Accepts data over UDP
                      Flushes to Graphite every 10 sec
                                     One-line of code
github.com/etsy
StatsD::increment("logins.success");
StatsD::increment("logins.success");




                                  logins
StatsD::timing("gearman.time", $msec);
StatsD::timing("gearman.time", $msec);



                                 90th pct

                                 average

                                 lower
Ad hoc
name value timestamp
echo "events.deploy.site 1 `date +%s`" 
     | nc graphite.etsycorp.com 2003
Vertical Line Technology!
target=drawAsInfinite(events.deploy.site)
We could stare at graphs all day...
http://graphite/render?
   from=-1hours&width=600&height=200
&target=webs.errorLog.warning&rawData=1
http://graphite/render?
       from=-1hours&width=600&height=200
    &target=webs.errorLog.warning&rawData=1

webs.errorLog.warning,1318444930,1318448530,60|
5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,
1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,
1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.
0,1.0,1.0,None
Holt-Winters Confidence Bands

upper

         lower
Holt-Winters Aberration
Business metrics
 + Confidence bands
_____________
    Alertable metrics
40,000+ metrics at Etsy
  Systems, Applications, Business
Dashboards
Dashboards
Kind of Hard :-/
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or
+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
     <img src="http://graphite.etsycorp.com/render?
from=-1hours&width=280&height=220&title=File+or+Script+Not
+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
</a>
Super Easy!
$g = new Graphite($time);
$g->setTitle('File Not Found');
$g->addMetric('webs.errorLog.notExist', '#00cc00');
echo $g->getDashboardHTML(280, 220);
Metrics!
Metrics!
Metrics + Events
Metrics!
Metrics + Events
Metrics + Alerts
Metrics!
Metrics + Events
Metrics + Alerts
Metrics + Metrics
High-level, real-time visibility
Detect problems quickly
CONFIDENCE
Make them required features
Make them dead simple
Make them accessible
Make them!
Homework
codeascraft.etsy.com
github.com/etsy                      Get in touch
                                     mike @ etsy . com
We’re always looking for people         @ mikebrittain
who are interested in this kind of
stuff...



Thank You
etsy.com/careers
Metrics-Driven Engineering

Mais conteúdo relacionado

Mais procurados

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Iakiv Kramarenko
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Michelangelo van Dam
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Codemotion
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETBen Hall
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with ReactFITC
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaExoLeaders.com
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - ENIakiv Kramarenko
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...apidays
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moondavejohnson
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012Nicholas Zakas
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybaraIakiv Kramarenko
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to comePablo Enfedaque
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...apidays
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitPeter Wilcsinszky
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajaxdavejohnson
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsSteve Wallin
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQueryhowlowck
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Stephan Hochdörfer
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Stephan Hochdörfer
 

Mais procurados (20)

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NET
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with React
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with Karma
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajax
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQuery
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010
 

Destaque

How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNMike Brittain
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCMike Brittain
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and TrendingMike Brittain
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and ResilienceMike Brittain
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackDaniel Schauenberg
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsMike Brittain
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightRoss Snyder
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2Paul Adams
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker, Inc.
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyMike Brittain
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018Brian Solis
 

Destaque (15)

Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Semelhante a Metrics-Driven Engineering

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software DesignPatrick McKenzie
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Chris Alfano
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTPMustafa TURAN
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wyciekówKonrad Kokosa
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009Robbie Cheng
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active CacheChing Yi Chan
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Librariesjeresig
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyonddion
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3masahiroookubo
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationBrian Hogg
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"LogeekNightUkraine
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointGeoff Varosky
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translationwcto2017
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...Fwdays
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 

Semelhante a Metrics-Driven Engineering (20)

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software Design
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active Cache
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for Translation
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePoint
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translation
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Metrics-Driven Engineering

  • 1. Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure October 13, 2011
  • 3. How many new visits? How many listings created? How many registrations? How do people use Etsy? How many convos sent? How many purchases? How many new shops?
  • 4. Search indexing? How fast are pages generating? Async tasks currently in queue? What is the application doing? Developer API auth and rate limiting? Images resized and stored? Error and warning rates?
  • 5. Replication slave lag? Memcache hits/misses? Available connections? Are the servers in good shape ? Database queries per second? Total outgoing bandwidth? CPU, Memory, I/O?
  • 11.
  • 12. $314 Million GMS 2010 $180 Million GMS 2009 $87 Million GMS 2008 $26 Million GMS 2007 credit: pentarux (flickr)
  • 13. 25 Million Unique Visitors 1 Billion page views per month credit: pentarux (flickr)
  • 14. Engineering team grew 500% over 18 months credit: martin_heigan (flickr)
  • 16. Always Be Shipping credit: ibailemon (flickr)
  • 17. Always Be Shipping (even if it’s your first day) credit: ibailemon (flickr)
  • 18.
  • 19. 90+ Engineers 40+ Deploys / day credit: misswired (flickr)
  • 23. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly
  • 24. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly Plus “admin-only,” percentage ramp-up, A/B testing, whitelists, blacklists, etc...
  • 25. Failure is not an option
  • 27. inevitable! Failure is not an option a learning opportunity!
  • 28. inevitable! Failure is not an option a learning opportunity! DETECTABLE!
  • 30.
  • 31.
  • 32.
  • 35.
  • 36. A: Well, the Ops team manages the network, racks the servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 37. Engineers build the application
  • 38. Logging Graphing OPS ENG Trending Alerting
  • 39. “Engineers are too busy writing features to build metrics.”
  • 40. Metrics are part of every feature ...and so are config flags
  • 43. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Logging Logster StatsD
  • 45. Ganglia Cluster-oriented Huge community contributed recipes Custom metrics (gmetad)
  • 47. Graphite Single-instance Create new metrics on-the-fly Customize via URLs and display functions
  • 49. It’s 2:48 PM. Do you know where your logs are?
  • 50. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 51. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 52. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 53. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 54. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 55. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 56. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 57. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 58. LogFormat "%h %l %u %t "%r" %>s %b" common
  • 59. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 61. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 62. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 63. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 64. grep "/listing/" access.log | awk '{sum=sum+$(NF-2)} END {print sum/NR}'
  • 65. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 66. Logster Fatals Errors Warnings
  • 67. Logster Run by cron Keeps a cursor on your log file Aggregate lines anyway you want Output to Ganglia or Graphite Simple parsers github.com/etsy
  • 68. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 70. if (fields['log_level'] == “fatal”): self.fatals += 1 elif (fields['log_level'] == “error”): self.errors += 1 elif (fields['log_level'] == “warning”): self.warnings += 1 ...
  • 71. MetricObject("fatals", (self.fatals / self.duration), "per sec") MetricObject("errors", (self.errors / self.duration), "per sec") MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 72. Fatals Errors Warnings
  • 74. StatsD Network daemon (node.js) Accepts data over UDP Flushes to Graphite every 10 sec One-line of code github.com/etsy
  • 78. StatsD::timing("gearman.time", $msec); 90th pct average lower
  • 79. Ad hoc name value timestamp
  • 80. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 82.
  • 83. We could stare at graphs all day...
  • 84. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1
  • 85. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1 webs.errorLog.warning,1318444930,1318448530,60| 5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0, 1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0, 1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5. 0,1.0,1.0,None
  • 88. Business metrics + Confidence bands _____________ Alertable metrics
  • 89. 40,000+ metrics at Etsy Systems, Applications, Business
  • 92. Kind of Hard :-/ <a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or +Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>
  • 93. Super Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); echo $g->getDashboardHTML(280, 220);
  • 97. Metrics! Metrics + Events Metrics + Alerts Metrics + Metrics
  • 101. Make them required features
  • 102. Make them dead simple
  • 105. Homework codeascraft.etsy.com github.com/etsy Get in touch mike @ etsy . com We’re always looking for people @ mikebrittain who are interested in this kind of stuff... Thank You etsy.com/careers