SlideShare uma empresa Scribd logo
1 de 117
Baixar para ler offline
CONTINUOUS DELIVERY:
  THE DIRTY DETAILS
      Mike Brittain
         Etsy.com


       @mikebrittain
       mike@etsy.com
a.k.a. “Continuous Deployment”
www.   .com
AUGUST 2012
          1.4 Billion page views
          USD $76 Million in transactions
          3.8 Million items sold




http://www.etsy.com/blog/news/2012/etsy-statistics-august-2012-weather-report/
~170 Committers, everyone deploys




credit: martin_heigan (flickr)
40

30

20

10




     Very end of 2009
                        Today
Continuous delivery is a pattern language in
growing use in software development to
improve the process of software delivery.
Techniques such as automated testing,
continuous integration, and continuous
deployment allow software to be developed to
a high standard and easily packaged and
deployed to test environments, resulting in the
ability to rapidly, reliably and repeatedly push
out enhancements and bug fixes to customers
at low risk and with minimal manual
overhead. The technique was one of the
assumptions of extreme programming but at
an enterprise level has developed into a
discipline of its own, with job descriptions for
roles such as "buildmaster" calling for CD
skills as mandatory.                               ~wikipedia
+ DevOps
+ Working on mainline, trunk, master
+ Feature flags
+ Branching in code
An Apology
An Apology

We build primarily in PHP.
Please don’t run away!
The Dirty Details of...

“Continuous Deployment
   in Practice at Etsy”
Then             Now
     2009          2010-today

 Just before we
started using CD
Then               Now
    6-14 hours          15 mins
“Deployment Army”       1 person
Highly orchestrated   Rapid release
  and infrequent          cycle
Then                Now
Special event and   Commonplace and
highly disruptive    happens so often
                    we cannot keep up
Then                Now
   Blocked for        Blocked for
   6-14 hours,        15 minutes,
plus minimum of     next deploy will
   6 hours to          only take
    redeploy          15 minutes

                  Config flags <5 mins
Then               Now
  Release branch,        Mainline,
 database schemas,     minimal linking
  data transforms,      and building,
     packaging,            rsync,
   rolling restarts,       site up
   cache purging,
scheduled downtime
Then       Now
  Slow       Fast
Complex     Simple
 Special   Common
Deploying code is the very first thing
   engineers learn to do at Etsy.
1st day
Add your photo to Etsy.com.
1st day
       Add your photo to Etsy.com.

                  2nd day
Complete tax, insurance, and benefits forms.
WARNING
Continuous Deployment
      Small, frequent changes.
Constantly integrating into production.
         30 deploys per day.
“Wow... 30 deploys a day.
How do you build features so quickly?”
Software Deploy ≠ Product Launch
Deploys frequently gated by config flags
             (“dark” releases)
$cfg[‘new_search’]   =   array('enabled'   =>   'off');
$cfg[‘sign_in’]      =   array('enabled'   =>   'on');
$cfg[‘checkout’]     =   array('enabled'   =>   'on');
$cfg[‘homepage’]     =   array('enabled'   =>   'on');
$cfg[‘new_search’] = array('enabled' => 'off');
$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...




# old and boring search
$results = do_grep();
$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...

if ($cfg[‘new_search’] == ‘on’) {
  # New and fancy search
  $results = do_solr();
} else {
  # old and boring search
  $results = do_grep();
}
$cfg[‘new_search’] = array('enabled' => 'on');

// or...

$cfg[‘new_search’] = array('enabled' => 'staff');

// or...

$cfg[‘new_search’] = array('enabled' => '1%');

// or...

$cfg[‘new_search’] = array('enabled' => 'users',
                           'user_list' => 'mike,john,kellan');
Validate in production, hidden from public.
What’s in a deploy?
Small incremental changes to the application
New classes, methods, controllers
Graphics, stylesheets, templates
Copy/content changes

Turning flags on/off, or ramping up
Quickly Responding to issues
Security, bugs, traffic, load shedding,
adding/removing infrastructure.

Tweaking config flags or releasing patches.
http://www.flickr.com/photos/flyforfun/2694158656/
Config flags
                        Operator                       Metrics




http://www.flickr.com/photos/flyforfun/2694158656/
“How do you continuously deploy
  database schema changes?”
Code deploys: ~ every 15-20 minutes
    Schema changes: Thursday
Our web application is largely monolithic.
Etsy.com, support tools, developer API,
        back-office, analytics
External “services” are not deployed
     with the main application.
Databases, Search, Photo storage
For every config flag, there are two states
we can support — forward and backward.
Expose multiple versions in each service.
Expect multiple versions in the application.
Example: Changing a Database Schema
Prefer ADDs over ALTERs (“non-breaking expansions”)
Altering in-place requires coupling
    code and schema changes.
Merging “users” and “users_prefs”
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
0. Add new version to schema
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
0. Add new version to schema
Schema change to add prefs columns to “users” table.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “off”
“read_prefs_from_users_table” => “off”
1. Write to both versions
Write code for writing prefs to the “users” table.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “off”
2. Backfill historical data
Offline process to sync existing data from “user_prefs”
to new columns in “users”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “staff”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “1%”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “5%”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”


(“on” == “100%”)
4. Cut-off writes to old version
After running on the new table for a significant amount
of time, we can cut off writes to the old table.

“write_prefs_to_user_prefs_table” => “off”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”
“Branch by Astraction”

                             Controller                    Controller




                                          Users Model              (Abstraction)




      “users” (old)              “user_prefs”                            “users”

                      old schema                                      new schema


                         http://paulhammant.com/blog/branch_by_abstraction.html
http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
“The Migration 4-Step”

1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
“The Migration 4-Step”

1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
5. Clean up flags, code, columns (when?)
Architecture and Process
Deploying is cheap.
Some philosophies on product development...
Gathering data should be cheap, too.
        staff, opt-in prototypes, 1%
Treat first iterations as experiments.
Get into code as quickly as possible.
Architecture largely doesn’t matter.
Kill things that don’t work.
“Terminate with extreme predjudice.”
Is the dumb solution enough to build a product?
      How long will the dumb solution last?
Your assumptions will be wrong
   once you’ve scaled 10x.
“We don’t optimize for being right. We optimize for
     quickly detecting when we’re wrong.”
                               ~Kellan Elliott-McCrea, CTO
Become really good at changing
     your architecture.
Invest time in architecture by the
       2nd or 3rd iteration.
Integration and Operations
Continuous Deployment
      Small, frequent changes.
Constantly integrating into production.
         30 deploys per day.
Code review before commit
Automated tests before deploy
Why Integrate with Production?
Dev ≠ Prod
Verify frequently and in small batches.
Integrating with production is a test in itself.
    We do this frequently and in small batches.
"Production is truly the only place you
      can validate your code."
"Production is truly the only place you
      can validate your code."
                    ~ Michael Nygard, about 40 min ago
More database servers in prod.
Bigger database hardware in prod.
More web servers.
Various replication schemes.
Different versions of server and OS software.
Schema changes applied at different times.
Physical hardware in prod.
More data in prod.
Legacy data (7 years of odd user states).
More traffic in prod.
Wait, I mean MUCH more traffic in prod.
Fewer elves.
Faster disks (SSDs) in prod.
Using a MySQL database to test an application that
will eventually be deployed on Oracle:
Using a MySQL database to test an application that
will eventually be deployed on Oracle: Priceless.
Verify frequently and in small batches.
Dev ≠ Prod
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ Pre-Prod ⇾ Prod
Test and integrate where you’ll see value.
Config flags (again)
off, on, staff, opt-in prototypes, user list, 0-100%
Config flags (again)
off, on, staff, opt-in prototypes, user list, 0-100%

                     “canary pools”
Automated tests after deploy
Real-time metrics and dashboards
Network & Servers, Application, Business
Release Managers: 0
Is it Broken?
Or , is it just better?
Metrics + Configs ⇾ OODA Loop
“Theoretical” vs. “Practical”
Homepage (95th perc.)

                        Surprise!!!
                        Turning off multi-
                        language support
                        improves our page
                        generation times by
                        up to 25%.
Nope. It’s really broken.
Config flags
                        Operator                       Metrics




http://www.flickr.com/photos/flyforfun/2694158656/
Thursday, Nov 22 - Thanksgiving
 Friday, Nov 23 - “Black Friday”
Monday, Nov 26 - “Cyber Monday”

  ~30 days out from Christmas
40

30

20

10
Thank you.
  Mike Brittain
 mike@etsy.com
 @mikebrittain

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

JiraとConfluenceのTips集
JiraとConfluenceのTips集JiraとConfluenceのTips集
JiraとConfluenceのTips集
 
Spring Boot の Web アプリケーションを Docker に載せて AWS ECS で動かしている話
Spring Boot の Web アプリケーションを Docker に載せて AWS ECS で動かしている話Spring Boot の Web アプリケーションを Docker に載せて AWS ECS で動かしている話
Spring Boot の Web アプリケーションを Docker に載せて AWS ECS で動かしている話
 
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model  20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
 
Maven基礎
Maven基礎Maven基礎
Maven基礎
 
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
 
ソフトウェアテストの変遷と最近の品質管理の方向性
ソフトウェアテストの変遷と最近の品質管理の方向性ソフトウェアテストの変遷と最近の品質管理の方向性
ソフトウェアテストの変遷と最近の品質管理の方向性
 
SpringBootTest入門
SpringBootTest入門SpringBootTest入門
SpringBootTest入門
 
脆弱性スキャナVuls(入門編)
脆弱性スキャナVuls(入門編)脆弱性スキャナVuls(入門編)
脆弱性スキャナVuls(入門編)
 
20191002 AWS Black Belt Online Seminar Amazon EC2 Auto Scaling and AWS Auto S...
20191002 AWS Black Belt Online Seminar Amazon EC2 Auto Scaling and AWS Auto S...20191002 AWS Black Belt Online Seminar Amazon EC2 Auto Scaling and AWS Auto S...
20191002 AWS Black Belt Online Seminar Amazon EC2 Auto Scaling and AWS Auto S...
 
Amazon SageMaker 推論エンドポイントを利用したアプリケーション開発
Amazon SageMaker 推論エンドポイントを利用したアプリケーション開発Amazon SageMaker 推論エンドポイントを利用したアプリケーション開発
Amazon SageMaker 推論エンドポイントを利用したアプリケーション開発
 
20190130 AWS Well-Architectedの活用方法とレビューの進め方をお伝えしていきたい
20190130 AWS Well-Architectedの活用方法とレビューの進め方をお伝えしていきたい20190130 AWS Well-Architectedの活用方法とレビューの進め方をお伝えしていきたい
20190130 AWS Well-Architectedの活用方法とレビューの進め方をお伝えしていきたい
 
Java でつくる 低レイテンシ実装の技巧
Java でつくる低レイテンシ実装の技巧Java でつくる低レイテンシ実装の技巧
Java でつくる 低レイテンシ実装の技巧
 
週末趣味のAWS Transit Gatewayでの経路制御
週末趣味のAWS Transit Gatewayでの経路制御週末趣味のAWS Transit Gatewayでの経路制御
週末趣味のAWS Transit Gatewayでの経路制御
 
20190514 AWS Black Belt Online Seminar Amazon API Gateway
20190514 AWS Black Belt Online Seminar Amazon API Gateway 20190514 AWS Black Belt Online Seminar Amazon API Gateway
20190514 AWS Black Belt Online Seminar Amazon API Gateway
 
CloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しようCloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しよう
 
REST API のコツ
REST API のコツREST API のコツ
REST API のコツ
 
Aws auto scalingによるwebapサーバbatchサーバの構成例
Aws auto scalingによるwebapサーバbatchサーバの構成例Aws auto scalingによるwebapサーバbatchサーバの構成例
Aws auto scalingによるwebapサーバbatchサーバの構成例
 
20190806 AWS Black Belt Online Seminar AWS Glue
20190806 AWS Black Belt Online Seminar AWS Glue20190806 AWS Black Belt Online Seminar AWS Glue
20190806 AWS Black Belt Online Seminar AWS Glue
 
Spring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作るSpring Boot × Vue.jsでSPAを作る
Spring Boot × Vue.jsでSPAを作る
 
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
 

Destaque

Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two ApproachesContinuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
Ross Snyder
 
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at EtsyMetrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Mike Brittain
 
Application Security at DevOps Speed and Portfolio Scale
Application Security at DevOps Speed and Portfolio ScaleApplication Security at DevOps Speed and Portfolio Scale
Application Security at DevOps Speed and Portfolio Scale
Jeff Williams
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
Mike Brittain
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
Mike Brittain
 

Destaque (20)

Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two ApproachesContinuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
Take My Logs. Please!
Take My Logs. Please!Take My Logs. Please!
Take My Logs. Please!
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
 
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous DeploymentAdvanced Topics in Continuous Deployment
Advanced Topics in Continuous Deployment
 
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at EtsyMetrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
 
Metrics-Driven Engineering
Metrics-Driven EngineeringMetrics-Driven Engineering
Metrics-Driven Engineering
 
sandhiya
sandhiyasandhiya
sandhiya
 
Web Performance Culture and Tools at Etsy
Web Performance Culture and Tools at EtsyWeb Performance Culture and Tools at Etsy
Web Performance Culture and Tools at Etsy
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
 
10 Deploys a Day - A Case Study of Continuous Delivery at Envato
10 Deploys a Day - A Case Study of Continuous Delivery at Envato10 Deploys a Day - A Case Study of Continuous Delivery at Envato
10 Deploys a Day - A Case Study of Continuous Delivery at Envato
 
Continous Delivery Toronto Presentation
Continous Delivery Toronto PresentationContinous Delivery Toronto Presentation
Continous Delivery Toronto Presentation
 
Application Security at DevOps Speed and Portfolio Scale
Application Security at DevOps Speed and Portfolio ScaleApplication Security at DevOps Speed and Portfolio Scale
Application Security at DevOps Speed and Portfolio Scale
 
Apresentação docker
Apresentação dockerApresentação docker
Apresentação docker
 
Case Study: GM Financial Builds a Sustainable, Holistic, Continuous Delivery ...
Case Study: GM Financial Builds a Sustainable, Holistic, Continuous Delivery ...Case Study: GM Financial Builds a Sustainable, Holistic, Continuous Delivery ...
Case Study: GM Financial Builds a Sustainable, Holistic, Continuous Delivery ...
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Web Performance Culture and Tools at Etsy
Web Performance Culture and Tools at EtsyWeb Performance Culture and Tools at Etsy
Web Performance Culture and Tools at Etsy
 
Case Study: Rabobank's Journey From Waterfall To Continuous Delivery
Case Study: Rabobank's Journey From Waterfall To Continuous DeliveryCase Study: Rabobank's Journey From Waterfall To Continuous Delivery
Case Study: Rabobank's Journey From Waterfall To Continuous Delivery
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 

Semelhante a Continuous Delivery: The Dirty Details

Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Eric Ries
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
rantav
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012
threepointone
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
DataWorks Summit
 
Sai devops - the art of being specializing generalist
Sai   devops - the art of being specializing generalistSai   devops - the art of being specializing generalist
Sai devops - the art of being specializing generalist
Odd-e
 

Semelhante a Continuous Delivery: The Dirty Details (20)

Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016
Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016
Pitchero - Increasing agility through DevOps - Leeds DevOps November 2016
 
Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys Admins
 
Product! - The road to production deployment
Product! - The road to production deploymentProduct! - The road to production deployment
Product! - The road to production deployment
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
 
Distributed Release Management
Distributed Release ManagementDistributed Release Management
Distributed Release Management
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Sai devops - the art of being specializing generalist
Sai   devops - the art of being specializing generalistSai   devops - the art of being specializing generalist
Sai devops - the art of being specializing generalist
 
Pipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as CodePipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as Code
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Continuous Delivery: The Dirty Details

  • 1. CONTINUOUS DELIVERY: THE DIRTY DETAILS Mike Brittain Etsy.com @mikebrittain mike@etsy.com
  • 3.
  • 4. www. .com
  • 5.
  • 6. AUGUST 2012 1.4 Billion page views USD $76 Million in transactions 3.8 Million items sold http://www.etsy.com/blog/news/2012/etsy-statistics-august-2012-weather-report/
  • 7. ~170 Committers, everyone deploys credit: martin_heigan (flickr)
  • 8. 40 30 20 10 Very end of 2009 Today
  • 9. Continuous delivery is a pattern language in growing use in software development to improve the process of software delivery. Techniques such as automated testing, continuous integration, and continuous deployment allow software to be developed to a high standard and easily packaged and deployed to test environments, resulting in the ability to rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. The technique was one of the assumptions of extreme programming but at an enterprise level has developed into a discipline of its own, with job descriptions for roles such as "buildmaster" calling for CD skills as mandatory. ~wikipedia
  • 10. + DevOps + Working on mainline, trunk, master + Feature flags + Branching in code
  • 12. An Apology We build primarily in PHP. Please don’t run away!
  • 13. The Dirty Details of... “Continuous Deployment in Practice at Etsy”
  • 14.
  • 15. Then Now 2009 2010-today Just before we started using CD
  • 16. Then Now 6-14 hours 15 mins “Deployment Army” 1 person Highly orchestrated Rapid release and infrequent cycle
  • 17. Then Now Special event and Commonplace and highly disruptive happens so often we cannot keep up
  • 18. Then Now Blocked for Blocked for 6-14 hours, 15 minutes, plus minimum of next deploy will 6 hours to only take redeploy 15 minutes Config flags <5 mins
  • 19. Then Now Release branch, Mainline, database schemas, minimal linking data transforms, and building, packaging, rsync, rolling restarts, site up cache purging, scheduled downtime
  • 20. Then Now Slow Fast Complex Simple Special Common
  • 21. Deploying code is the very first thing engineers learn to do at Etsy.
  • 22. 1st day Add your photo to Etsy.com.
  • 23. 1st day Add your photo to Etsy.com. 2nd day Complete tax, insurance, and benefits forms.
  • 24.
  • 26.
  • 27. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30 deploys per day.
  • 28. “Wow... 30 deploys a day. How do you build features so quickly?”
  • 29. Software Deploy ≠ Product Launch
  • 30. Deploys frequently gated by config flags (“dark” releases)
  • 31. $cfg[‘new_search’] = array('enabled' => 'off'); $cfg[‘sign_in’] = array('enabled' => 'on'); $cfg[‘checkout’] = array('enabled' => 'on'); $cfg[‘homepage’] = array('enabled' => 'on');
  • 33. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... # old and boring search $results = do_grep();
  • 34. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr(); } else { # old and boring search $results = do_grep(); }
  • 35. $cfg[‘new_search’] = array('enabled' => 'on'); // or... $cfg[‘new_search’] = array('enabled' => 'staff'); // or... $cfg[‘new_search’] = array('enabled' => '1%'); // or... $cfg[‘new_search’] = array('enabled' => 'users', 'user_list' => 'mike,john,kellan');
  • 36. Validate in production, hidden from public.
  • 37. What’s in a deploy? Small incremental changes to the application New classes, methods, controllers Graphics, stylesheets, templates Copy/content changes Turning flags on/off, or ramping up
  • 38. Quickly Responding to issues Security, bugs, traffic, load shedding, adding/removing infrastructure. Tweaking config flags or releasing patches.
  • 40. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  • 41.
  • 42.
  • 43. “How do you continuously deploy database schema changes?”
  • 44. Code deploys: ~ every 15-20 minutes Schema changes: Thursday
  • 45. Our web application is largely monolithic.
  • 46. Etsy.com, support tools, developer API, back-office, analytics
  • 47. External “services” are not deployed with the main application.
  • 49. For every config flag, there are two states we can support — forward and backward.
  • 50. Expose multiple versions in each service. Expect multiple versions in the application.
  • 51. Example: Changing a Database Schema
  • 52. Prefer ADDs over ALTERs (“non-breaking expansions”)
  • 53. Altering in-place requires coupling code and schema changes.
  • 54. Merging “users” and “users_prefs”
  • 55. 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 56. 0. Add new version to schema 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 57. 0. Add new version to schema Schema change to add prefs columns to “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “off” “read_prefs_from_users_table” => “off”
  • 58. 1. Write to both versions Write code for writing prefs to the “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “off”
  • 59. 2. Backfill historical data Offline process to sync existing data from “user_prefs” to new columns in “users”
  • 60. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “staff”
  • 61. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “1%”
  • 62. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “5%”
  • 63. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on” (“on” == “100%”)
  • 64. 4. Cut-off writes to old version After running on the new table for a significant amount of time, we can cut off writes to the old table. “write_prefs_to_user_prefs_table” => “off” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on”
  • 65. “Branch by Astraction” Controller Controller Users Model (Abstraction) “users” (old) “user_prefs” “users” old schema new schema http://paulhammant.com/blog/branch_by_abstraction.html http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
  • 66. “The Migration 4-Step” 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 67. “The Migration 4-Step” 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version 5. Clean up flags, code, columns (when?)
  • 70. Some philosophies on product development...
  • 71. Gathering data should be cheap, too. staff, opt-in prototypes, 1%
  • 72. Treat first iterations as experiments.
  • 73. Get into code as quickly as possible.
  • 75. Kill things that don’t work.
  • 76. “Terminate with extreme predjudice.”
  • 77. Is the dumb solution enough to build a product? How long will the dumb solution last?
  • 78. Your assumptions will be wrong once you’ve scaled 10x.
  • 79. “We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.” ~Kellan Elliott-McCrea, CTO
  • 80. Become really good at changing your architecture.
  • 81. Invest time in architecture by the 2nd or 3rd iteration.
  • 83. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30 deploys per day.
  • 86. Why Integrate with Production?
  • 88. Verify frequently and in small batches.
  • 89. Integrating with production is a test in itself. We do this frequently and in small batches.
  • 90. "Production is truly the only place you can validate your code."
  • 91. "Production is truly the only place you can validate your code." ~ Michael Nygard, about 40 min ago
  • 92. More database servers in prod. Bigger database hardware in prod. More web servers. Various replication schemes. Different versions of server and OS software. Schema changes applied at different times. Physical hardware in prod. More data in prod. Legacy data (7 years of odd user states). More traffic in prod. Wait, I mean MUCH more traffic in prod. Fewer elves. Faster disks (SSDs) in prod.
  • 93. Using a MySQL database to test an application that will eventually be deployed on Oracle:
  • 94. Using a MySQL database to test an application that will eventually be deployed on Oracle: Priceless.
  • 95. Verify frequently and in small batches.
  • 97. Dev ⇾ QA ⇾ Staging ⇾ Prod
  • 98. Dev ⇾ QA ⇾ Staging ⇾ Prod
  • 99. Dev ⇾ Pre-Prod ⇾ Prod
  • 100. Test and integrate where you’ll see value.
  • 101. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100%
  • 102. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100% “canary pools”
  • 104. Real-time metrics and dashboards Network & Servers, Application, Business
  • 105.
  • 107.
  • 108. Is it Broken? Or , is it just better?
  • 109.
  • 110. Metrics + Configs ⇾ OODA Loop
  • 112. Homepage (95th perc.) Surprise!!! Turning off multi- language support improves our page generation times by up to 25%.
  • 114. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  • 115. Thursday, Nov 22 - Thanksgiving Friday, Nov 23 - “Black Friday” Monday, Nov 26 - “Cyber Monday” ~30 days out from Christmas
  • 117. Thank you. Mike Brittain mike@etsy.com @mikebrittain