Al Wagner from IBM presents how to avoid deployment failures, reviewing such topics as: Deployment models like canary, blue/green and rolling that can help prevent major production outages; How to pinpoint deployment failures in your process and correct them; Pulling together a basic failure response plan; and How you can roll forward while improving your deployment process.
Learn more about IBM UrbanCode: http://www.ibm.biz/learnurbancode
2. 2IBM
_
Avoiding
Deployment
Failures..
especially those that could cause a production outage, is top of
mind for many IT professionals. However, sometimes failures will
occur in production, which means that planning for recovery is
essential. Preventative measures like canary, blue/green or rolling
deployments can help, but also having the ability to roll forward
(instead of rolling back), also known as shifting right, means you can
push through a failure while learning from deployment process
mistakes and shortening mean time to recovery (MTTR).
• Deployment models like canary, blue/green and rolling that can
help prevent major production outages
• How to pinpoint deployment failures in your process and correct
them
• Pulling together a basic failure response plan
• How you can roll forward while improving your deployment
process
September 16, 2015Shift Happens
4. DevOps is all about executing with speed!
Line-of-
business
Customer
Getting ideas into production fast – Getting people to use it – Analyzing their feedback
Continuous Delivery
Continuous Feedback
Continuous Innovation
5. • Reducing Scope
ü Small batches of
incremental changes
• Empowering Resources
ü Co-located,
automatous teams
• Accelerating Schedules
ü Automate, automate,
automate
• Increasing Quality
ü Everyone contributes
ü Small batches of
incremental changes
ü Co-located, automatous
teams, collaboration
ü Continuous release &
deployment
ü Everyone contributes
5
Managing the Iron Triangle by…
Quality
Schedule
Scope
Resources
10. And when disaster strikes! You need to know…
What
failed?
Where did
it fail?
What apps
were
impacted?
Should I move
traffic to
another server?
Do we go
forward or
rollback?
If you fail to plan;
you plan to fail!
Why did
it fail?
11. During the post mortem, you need to uncover…
Did anything trigger the deployment failure?
What was the root cause of the failure?
What could we have done differently to avoid this situation?
How can we improve so it doesn’t happen again?
12. Accelerate delivery of incremental software change
Failures due to
inconsistent dev
and production
environments
Bottlenecks trying
to deliver more
frequent releases
to meet market
demands
Complex, manual,
processes for
release lack
repeatability and
speed
Poor visibility into
dependencies
across releases,
resources, and
teams
13. Accelerate delivery of incremental software change
Failures due to
inconsistent dev
and production
environments
Bottlenecks trying
to deliver more
frequent releases
to meet market
demands
Complex, manual,
processes for
release lack
repeatability and
speed
Poor visibility into
dependencies
across releases,
resources, and
teams
The Four Pillars of
Gold-Standard Deployment
• Use the same process
ü Reduces deployment errors
• Automate, automate, automate
ü Deliver repeatability, reliability, &
with traceability
• Deliver incremental changes
ü Reduces risk to business
• Release what you test
ü Increases confidence
14. Automate provisioning and deployments
SCM
Build
Automation
Publish build
Pull
changes
IBM Cloud Orchestrator
IBM PureApplication System
IBM Cloud Manager
with OpenStack
IBM Bluemix
Provision environment
with open patterns
Public: Shared
off premises cloud
Dedicated:
off premises cloud
Local: Dedicated
on premises
cloud
Traditional IT
ü Traceable
VMWare
vCenter
ü Repeatableü Reliable
IBM UrbanCode
Deploy
Automate
deployment to
hybrid environments
15. IBM Cloud UrbanCode Deploy as a Service
Develop Build
Mobile Device
Mainframe
Traditional
Deploy
Features of the new SaaS offering
• Full automated application delivery capabilities
• Hosted on IBM infrastructure, managed by IBM
• Monthly subscription, license managed by IBM
• Full product support
App
App
App
App
SoftLayer, AWS, Azure
App
IBM Cloud
UrbanCode
Deploy
NEW!
17. IBM UrbanCode Release & Deploy iOS mobile app
ü Monitor Progress:
Understand the overall
progress of your releases
and remaining work. Get
real time calculations of
the projected completion
time
ü Alert for Critical
issues: See critical data
of late tasks and idling
tasks so you can
encounter problems and
mitigate business risks.
ü Understand team
status: Learn from teams
what they are blocked by
to take the right corrective
actions
https://itunes.apple.com/ca/app/ibm-urbancode-release-deploy/id1084753666?mt=8
18. Shift right and continuously move forward
Accelerate releases by making a conscious
decision to carry an acceptable level of …
…into PRODUCTION!
19. Dark Launches & Toggles
• Feature toggle - restricts access to source code
in development until ready for release to end
users
if “work_in_progress” {
develop new functionality here
} else {
already deployed as production code
};
• Business toggle – control user or group of user
access to new functionality
if “beta_usergroup” {
provide access to new experiment
} else {
route user to existing production code
};
ü Pros
New experiments can
quickly be made
available to groups of
trusted users
X Cons
Increase in technical
debt as ”toggle” code
needs to be managed
20. Zero downtime deployment strategies
Canary Release Blue/Green Deployments Rolling Deployments
a technique to reduce
the risk of introducing a
new software version in
production by slowly
rolling out the change
to a small subset of
users before rolling it
out to the entire
infrastructure and
making it available to
everybody.
a release technique
that reduces downtime
and risk by running two
identical production
environments
called Blue and Green.
At any time, only one of
the environments is
live, with the live
environment serving all
production traffic.
a software release
strategy that staggers
deployment across
multiple phases, which
usually include one or
more servers
performing one or more
functions within
a server cluster to
reduce application
downtime.
21. Canary Releases (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Users
Old Version
Old Version
50% of
Users
Load
Balancer
50% of
Users
22. Canary Releases (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Users
Old Version
New Version
All
Users
Deployment
AutomationInventory
Load
Balancer
23. Canary Releases (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Users
Old Version
New Version
Most
Users
(95%)
Some
Users
(5%)
Deployment
AutomationInventory
Load
Balancer
As confidence in the new release
increases, the percentage of users
who have access is increased.
24. Canary Releases (example flow)
Web
Server
App
Server
Database
Server
Users Load
Balancer
Old Version
Web
Server
App
Server
Database
Server
New Version
All
Users
Deployment
AutomationInventory
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
New VersionNew Version
Eventually the new version is
deployed to the second
environment.
25. Canary Releases (example flow)
Web
Server
App
Server
Database
Server
Users Load
Balancer
Old Version
Web
Server
App
Server
Database
Server
New Version
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
New VersionNew Version
50% of
Users
50% of
Users
And the user load is split across the
two environments.
26. Blue / Green Deployments (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Environment #1
RouterUsers
All
Users
Two environments, each of
sufficient resources to serve the
application in production.
Environment #2
Previous Release
(hot stand-by)
27. Blue / Green Deployments (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Environment #1
Environment #2
RouterUsers
All
Users
Two environments, each of sufficient resources
to serve the application in production.
Deployment
AutomationInventory
The new release is
deployed to the idle
environment.
28. Blue / Green Deployments (example flow)
Web
Server
App
Server
Database
Server
Web
Server
App
Server
Database
Server
Environment #1
RouterUsers
All
Users
Two environments, each of sufficient resources
to serve the application in production.
Environment #2
Previous Release
(hot stand-by)
When the new deployment is
working as expected, users are
routed to the new version.
29. Load
Balancer
Rolling Deployments (example flow)
Web
Server
App
Server
Database
Server
Server Cluster #1
Web
Server
App
Server
Database
Server
Server Cluster #2
Web
Server
App
Server
Database
Server
Server Cluster #3
Web
Server
App
Server
Database
Server
Server Cluster #4
Users
30. Rolling Deployments (example flow)
Web
Server
App
Server
Database
Server
Server Cluster #1
Web
Server
App
Server
Database
Server
Server Cluster #2
Web
Server
App
Server
Database
Server
Server Cluster #3
Web
Server
App
Server
Database
Server
Server Cluster #4
Users
Deployment
AutomationInventory
Load
Balancer
1. Cluster #1 taken off-line
2. Application change deployed
3. Deployment tested
31. Rolling Deployments (example flow)
Web
Server
App
Server
Database
Server
Server Cluster #1
Web
Server
App
Server
Database
Server
Server Cluster #2
Web
Server
App
Server
Database
Server
Server Cluster #3
Web
Server
App
Server
Database
Server
Server Cluster #4
Users
Deployment
AutomationInventory
Load
Balancer
1. Cluster #1 brought back on-line
2. Cluster #2 is taken off-line
3. Application change deployed
4. Deployment tested
32. Rolling Deployments (example flow)
Web
Server
App
Server
Database
Server
Server Cluster #1
Web
Server
App
Server
Database
Server
Server Cluster #2
Web
Server
App
Server
Database
Server
Server Cluster #3
Web
Server
App
Server
Database
Server
Server Cluster #4
Users
Deployment
AutomationInventory
Load
Balancer
1. Cluster #3 brought back on-line
2. Cluster #3 & #4 is taken off-line
3. Application change deployed
4. Deployment tested
33. Rolling Deployments (example flow)
Web
Server
App
Server
Database
Server
Server Cluster #1
Web
Server
App
Server
Database
Server
Server Cluster #2
Web
Server
App
Server
Database
Server
Server Cluster #3
Web
Server
App
Server
Database
Server
Server Cluster #4
Users
Load
Balancer
All environments are presenting
the latest version of the
application.
34. Pros and Cons…
Canary Release Blue/Green Deployments Rolling Deployments
Pros
• No downtime of production
environment
• Quick access to a backup
environment
• A/B testing of new features and
functionality
• Capture performance metrics of
new release during early adoption
Cons
• Management and maintenance of
multiple versions of the software
• Maintain persistent sessions
during deployment
• Database must support two
versions of the application (until
cut-over is complete)
Pros
• No downtime of production
environment
• Quick access to a backup
environment – hot standby
• Ability to test application in a
production environment
Cons
• Requires two similar environments
• Maintain persistent sessions
during deployment
• Database must support two
versions of the application (until
cut-over is complete)
Pros
• No downtime of production
environment
• Incrementally validate
deployments and reduce risk
• Reduce visibility of performance
degradation
• Seamless user experience
Cons
• Maintain persistent sessions
during deployment
• Database must support two
versions of the application (until
deployment is complete)
35. Your mission if you choose to accept it…
Measure your DevOps progress
• Deployment / Change Frequency
– Measures delivery team responsiveness, cohesiveness, capabilities, efficiency, & tooling
effectiveness
• Change Lead Time
– Measure efficiency of end to end development process; from first code change to deployment
– Measure cycle time of the individual activities
• Change Failure Rate
– How many deployment fail / number of deployments
• Mean Time To Recover (MTTR)
– How long does it take to recover from a failure
– Understand the contributors to failure:
• code complexity, number of app changes, number of operating environment changes