SlideShare uma empresa Scribd logo
1 de 91
Baixar para ler offline
METRICS-DRIVEN
                 ENGINEERING at
                      Kellan Elliott-McCrea, VP of Eng.
                           kellan@etsy.com @kellan




Tuesday, June 5, 12
Tuesday, June 5, 12
Tuesday, June 5, 12
What is Etsy?



Tuesday, June 5, 12
8.5+ million items
                      in the marketplace




Tuesday, June 5, 12
400,000+ active




Tuesday, June 5, 12
$300+ million in
                        sales in 2010

                      ~$41 million/month


Tuesday, June 5, 12
> $1000 / minute



Tuesday, June 5, 12
> 1 billion page
                      views / month


Tuesday, June 5, 12
business in over
                       150 countries


Tuesday, June 5, 12
deploy the site,
                      every ~20 minutes


Tuesday, June 5, 12
engineering team
                            grew
                        ~4x in 2010


Tuesday, June 5, 12
Metrics?



Tuesday, June 5, 12
Logs, Graphs,
                          Trends,
                      and Correlations


Tuesday, June 5, 12
Metrics Driven?



Tuesday, June 5, 12
Making Decisions



Tuesday, June 5, 12
How many visitors
                              are
                       using this thing?


Tuesday, June 5, 12
Can we deploy that
                       to
              100% of our visitors?


Tuesday, June 5, 12
Did we make it
                          faster?


Tuesday, June 5, 12
Did I just break
                        something?


Tuesday, June 5, 12
Q.  WHO MAKES THESE
                             GRAPHS?
           A. Well,racksOps team manages thethe
            network,
                     the
                         the servers, installed
                      monitoring tools, wears the pagers,
                              blah, blah, blah...




Tuesday, June 5, 12
but... Engineers
                            build
                      the application.


Tuesday, June 5, 12
Dev + Ops


Tuesday, June 5, 12
ACCESS


Tuesday, June 5, 12
Yes!   No.




Tuesday, June 5, 12
“Engineers are
                        too busy!”


Tuesday, June 5, 12
Here’s the BIG
                        SECRET...


Tuesday, June 5, 12
... MAKE IT EASY!



Tuesday, June 5, 12
Simple, open
                      source tools


Tuesday, June 5, 12
Cacti (network, SNMP)
                      Ganglia (machines)
                      Graphite (application)
                      Splunk (log analysis, nightly
                      reports)
                      Nagios (alerting)



Tuesday, June 5, 12
Gan
                ★cluster oriented
                ★huge community contributed
                recipes
                ★2.0 released today (including
                several Flickr and Etsy patches!)
                ★gmetad makes it easy to track
                custom metrics


Tuesday, June 5, 12
Tuesday, June 5, 12
Graphite
                ★super flexible collection and
                display
                ★per metrics buckets
                ★single instance
                ★super easy to write and use
                custom display functions



Tuesday, June 5, 12
Logging


Tuesday, June 5, 12
Logger::log_error("User login
                        failed. Reason: $msg for
                          $username", “login”);




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [error] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48
                      2011] [info] [login] [14531658]
                      User login failed. Reason: wrong
                              password for ...




Tuesday, June 5, 12
Counting
                      and Timing
                      http://code.flickr.com/blog/
                      2008/10/27/counting-timing/




Tuesday, June 5, 12
Logster


Tuesday, June 5, 12
Logster
                      https://github.com/etsy/logster




Tuesday, June 5, 12
Forked from ganglia-logtailer :

                            - Daemon mode
                (only cron mode)
                            + Support for
                Graphite
                            + Simplified parsing
                scripts




Tuesday, June 5, 12
web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0001        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0201        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0034        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web1101        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0201        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0055        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0002        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
       web0089        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0020        [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
       web1101        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0055        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0034        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0087        [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
       web0002        [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
       web0201        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
       web0077        [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
       web0355        [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
       web0052        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
       web0003        [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
       web0066        [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
       web0001        [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling
Tuesday, June 5, 12
Fatals   Errors   Warnings




Tuesday, June 5, 12
★runs out of cron
                ★maintains a cursor into log files
                ★supports ganglia and graphite
                ★custom parsers much easier to
                write then gmetad




Tuesday, June 5, 12
Apache access logs


Tuesday, June 5, 12
LogFormat "%h %l %u %t "%r"
                  %>s %b" common




Tuesday, June 5, 12
LogFormat "%{X-Forwarded-For}i %
             {True-Client-IP}i %l %u %t "%r" %>s %b
                "%{Referer}i" "%{User-Agent}i" %
                {etsy_shop_id}n %{etsy_uaid}n %V %
                       {etsy_ab_selections}n %
                       {etsy_request_uuid}n %
                    {etsy_api_consumer_key}n %
                    {etsy_api_method_name}n %
                  {php_memory_usage_bytes}n %
               {php_time_microsec}n %D" combined

Tuesday, June 5, 12
%{etsy_ab_selections}n




Tuesday, June 5, 12
%{etsy_uaid}n




Tuesday, June 5, 12
Graphs


Tuesday, June 5, 12
“If Engineering at Etsy has
        a religion, it’s the Church
        of Graphs. If it moves, we
          track it.” - Erik Kastner

   http://codeascraft.etsy.com/2011/02/15/measure-
   anything-measure-everything/




Tuesday, June 5, 12
Tuesday, June 5, 12
StatsD


Tuesday, June 5, 12
StatsD
                        https://github.com/
                        etsy/statsd/




Tuesday, June 5, 12
StatsD::increment("logins.success");
       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
90th pct

                                    average
                                    lower


       StatsD::timing("gearman.time", $msec);




Tuesday, June 5, 12
Ad hoc
                      name value timestamp




Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Correlations



Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" 
              | nc graphite.etsycorp.com 2003




Tuesday, June 5, 12
Trends + Events
         target=drawAsInfinite(events.deploy.site)




Tuesday, June 5, 12
What Happened?


Tuesday, June 5, 12
Holt-Winters


Tuesday, June 5, 12
"Forecasting Sales by
                      Exponentially Weighted
                      Moving Averages". Peter



Tuesday, June 5, 12
"Aberrant Behavior
                      Detection in Time Series
                      for Network Monitoring".



Tuesday, June 5, 12
"Holt-Winters Forecasting
                      Applied to Poisson
                   Processes in Real-Time".



Tuesday, June 5, 12
holtWintersConfidence(Upper|Lower)




Tuesday, June 5, 12
holtWintersAberration




Tuesday, June 5, 12
business metrics with
             confidence bands
                    ==
        alertable business metrics


Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
16,000 metrics in
                           GRAPHITE
                      (plus 32,000 metrics in GANGLIA)




Tuesday, June 5, 12
Dashboards


Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Dashboards



Tuesday, June 5, 12
Hard
       <a href="http://graphite.etsycorp.com/render?
       from=-1hours&width=800&height=600&title=File+or+Script+Not
       +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
       %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
       %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       
   <img src="http://graphite.etsycorp.com/render?
       from=-1hours&width=280&height=220&title=File+or+Script+Not
       +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
       %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
       %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
       %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
       %23ff0000,%23006633,%23cc6600">
       </a>




Tuesday, June 5, 12
Easy!
     $g = new Graphite($time);
     $g->setTitle('File Not Found');
     $g->addMetric('webs.errorLog.notExist', '#00cc00');
     $g->showDeploys(true);
     echo $g->getDashboardHTML(280, 220);




Tuesday, June 5, 12
48 dashboards by
                        32 engineers


Tuesday, June 5, 12
Application
                        health


Tuesday, June 5, 12
High-level
                       visibility


Tuesday, June 5, 12
Low MTTD


Tuesday, June 5, 12
Confidence


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Make metrics


Tuesday, June 5, 12
Not that much


Tuesday, June 5, 12
codeascraft.etsy.com
                      github.com/etsy/statsd
                      github.com/etsy/logster

                      bitbucket.org/maplebed/ganglia-
                      logtailer




Tuesday, June 5, 12
Questions?




Tuesday, June 5, 12

Mais conteúdo relacionado

Mais de Kellan

Future of handmade
Future of handmadeFuture of handmade
Future of handmadeKellan
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Kellan
 
Engineering Change
Engineering ChangeEngineering Change
Engineering ChangeKellan
 
Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Kellan
 
Social Software For Robots
Social Software For RobotsSocial Software For Robots
Social Software For RobotsKellan
 
Beyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPBeyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPKellan
 
Advanced OAuth Wrangling
Advanced OAuth WranglingAdvanced OAuth Wrangling
Advanced OAuth WranglingKellan
 
Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Kellan
 

Mais de Kellan (8)

Future of handmade
Future of handmadeFuture of handmade
Future of handmade
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
Engineering Change
Engineering ChangeEngineering Change
Engineering Change
 
Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem" Solving the "Brooklyn Problem"
Solving the "Brooklyn Problem"
 
Social Software For Robots
Social Software For RobotsSocial Software For Robots
Social Software For Robots
 
Beyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPPBeyond REST? Building data services with XMPP
Beyond REST? Building data services with XMPP
 
Advanced OAuth Wrangling
Advanced OAuth WranglingAdvanced OAuth Wrangling
Advanced OAuth Wrangling
 
Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)Casual Privacy (Ignite Web2.0 Expo)
Casual Privacy (Ignite Web2.0 Expo)
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Metrics driven engineering (velocity 2011)

  • 1. METRICS-DRIVEN ENGINEERING at Kellan Elliott-McCrea, VP of Eng. kellan@etsy.com @kellan Tuesday, June 5, 12
  • 5. 8.5+ million items in the marketplace Tuesday, June 5, 12
  • 7. $300+ million in sales in 2010 ~$41 million/month Tuesday, June 5, 12
  • 8. > $1000 / minute Tuesday, June 5, 12
  • 9. > 1 billion page views / month Tuesday, June 5, 12
  • 10. business in over 150 countries Tuesday, June 5, 12
  • 11. deploy the site, every ~20 minutes Tuesday, June 5, 12
  • 12. engineering team grew ~4x in 2010 Tuesday, June 5, 12
  • 14. Logs, Graphs, Trends, and Correlations Tuesday, June 5, 12
  • 17. How many visitors are using this thing? Tuesday, June 5, 12
  • 18. Can we deploy that to 100% of our visitors? Tuesday, June 5, 12
  • 19. Did we make it faster? Tuesday, June 5, 12
  • 20. Did I just break something? Tuesday, June 5, 12
  • 21. Q. WHO MAKES THESE GRAPHS? A. Well,racksOps team manages thethe network, the the servers, installed monitoring tools, wears the pagers, blah, blah, blah... Tuesday, June 5, 12
  • 22. but... Engineers build the application. Tuesday, June 5, 12
  • 23. Dev + Ops Tuesday, June 5, 12
  • 25. Yes! No. Tuesday, June 5, 12
  • 26. “Engineers are too busy!” Tuesday, June 5, 12
  • 27. Here’s the BIG SECRET... Tuesday, June 5, 12
  • 28. ... MAKE IT EASY! Tuesday, June 5, 12
  • 29. Simple, open source tools Tuesday, June 5, 12
  • 30. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Tuesday, June 5, 12
  • 31. Gan ★cluster oriented ★huge community contributed recipes ★2.0 released today (including several Flickr and Etsy patches!) ★gmetad makes it easy to track custom metrics Tuesday, June 5, 12
  • 33. Graphite ★super flexible collection and display ★per metrics buckets ★single instance ★super easy to write and use custom display functions Tuesday, June 5, 12
  • 35. Logger::log_error("User login failed. Reason: $msg for $username", “login”); Tuesday, June 5, 12
  • 36. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 37. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 38. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 39. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 40. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ... Tuesday, June 5, 12
  • 42. Counting and Timing http://code.flickr.com/blog/ 2008/10/27/counting-timing/ Tuesday, June 5, 12
  • 44. Logster https://github.com/etsy/logster Tuesday, June 5, 12
  • 45. Forked from ganglia-logtailer : - Daemon mode (only cron mode) + Support for Graphite + Simplified parsing scripts Tuesday, June 5, 12
  • 46. web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling Tuesday, June 5, 12
  • 47. Fatals Errors Warnings Tuesday, June 5, 12
  • 48. ★runs out of cron ★maintains a cursor into log files ★supports ganglia and graphite ★custom parsers much easier to write then gmetad Tuesday, June 5, 12
  • 50. LogFormat "%h %l %u %t "%r" %>s %b" common Tuesday, June 5, 12
  • 51. LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" % {etsy_shop_id}n %{etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combined Tuesday, June 5, 12
  • 55. “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it.” - Erik Kastner http://codeascraft.etsy.com/2011/02/15/measure- anything-measure-everything/ Tuesday, June 5, 12
  • 58. StatsD https://github.com/ etsy/statsd/ Tuesday, June 5, 12
  • 59. StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 60. 90th pct average lower StatsD::timing("gearman.time", $msec); Tuesday, June 5, 12
  • 61. Ad hoc name value timestamp Tuesday, June 5, 12
  • 62. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 64. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003 Tuesday, June 5, 12
  • 65. Trends + Events target=drawAsInfinite(events.deploy.site) Tuesday, June 5, 12
  • 68. "Forecasting Sales by Exponentially Weighted Moving Averages". Peter Tuesday, June 5, 12
  • 69. "Aberrant Behavior Detection in Time Series for Network Monitoring". Tuesday, June 5, 12
  • 70. "Holt-Winters Forecasting Applied to Poisson Processes in Real-Time". Tuesday, June 5, 12
  • 73. business metrics with confidence bands == alertable business metrics Tuesday, June 5, 12
  • 74. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 75. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA) Tuesday, June 5, 12
  • 79. Hard <a href="http://graphite.etsycorp.com/render? from=-1hours&width=800&height=600&title=File+or+Script+Not +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a> Tuesday, June 5, 12
  • 80. Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); $g->showDeploys(true); echo $g->getDashboardHTML(280, 220); Tuesday, June 5, 12
  • 81. 48 dashboards by 32 engineers Tuesday, June 5, 12
  • 82. Application health Tuesday, June 5, 12
  • 83. High-level visibility Tuesday, June 5, 12
  • 90. codeascraft.etsy.com github.com/etsy/statsd github.com/etsy/logster bitbucket.org/maplebed/ganglia- logtailer Tuesday, June 5, 12