2. Outline
• What is DevOps?
• What are the implications of DevOps practices
on system structure?
– Team practices.
– Deployment practices.
NICTA Copyright 2012
From imagination to impact
2
3. What is DevOps?
• “DevOps is a software development method that stresses
communication, collaboration, and integration between software
developers and IT professionals” – Wikipedia
• “DevOps is a new term describing what has also been called “agile
system administration” or “agile operations” joined together with the
values of agile collaboration between development and operations
staff.
Effectively, you can define DevOps as system administrators
participating in an agile development process alongside developers
and using a many of the same agile techniques for their systems
work. “ - http://theagileadmin.com/what-is-devops/
NICTA Copyright 2012
From imagination to impact
3
4. What is DevOps - 2
• DevOps is accompanied by a certain amount of
mysticism.
– “Be Self-Aware
– Be aware of a project’s maturity
– Be aware of others”
http://architects.dzone.com/articles/zen-and-art-collaborative
• Similar to the early days of agile.
NICTA Copyright 2012
From imagination to impact
4
5. What problem is DevOps trying to solve?
• Poor communication between developers and
operations personnel
• Slow release schedule
• Limited capacity of operations staff
• Limited organizational insight into operations
NICTA Copyright 2012
From imagination to impact
5
6. My Take on DevOps
• DevOps is a set of practices intended to
–
–
–
–
–
Reduce management overhead
Speed up deployment
Move some (formerly) IT responsibilities to developers
Increase communication between developers and operations
Reduce operations costs
• These practices have implications on
– Team size, communication, and responsibilities.
– Deployment
• In turn, there are implications for
– System structure
NICTA Copyright 2012
From imagination to impact
6
7. Outline
• What is DevOps?
• What are the implications of DevOps practices
on system structure?
– Team practices.
– Deployment practices.
NICTA Copyright 2012
From imagination to impact
7
9. Team Size
• Teams are small. Amazon has a “two pizza”
rule.
• It is easy for small teams to have good internal
coordination.
• Small teams mean
–
–
–
–
A lot of teams
Small scope for each team
Short delivery times
Coordination among teams
becomes an issue
NICTA Copyright 2012
From imagination to impact
9
10. Coordination among teams
• Asynchronous rather than synchronous
– Allows team members to respond when it is
convenient for them
– Avoids time zone coordination
• Persistent and visible
– E-mail is not generally visible to all of the team
– Chat boards, Wikis, issue trackers, comments in code
are all persistent and visible
– Connect a message to something – issue, feature,
person.
NICTA Copyright 2012
From imagination to impact
10
11. Team Responsibilities with respect to Services
• Requirements are sliced thinly both horizontally (breadth
of requirement) and vertically (decomposition of service
into utilities)
• Each service has an owner (a developer)
• Service owner decides when to deploy service to
production. Deployment done with tooling.
• Deployment may involve use of canaries (discussed with
deployment)
• When a service is deployed, service owner examines
monitoring data and decides when/if to roll back.
• Service owner is called if there is a problem.
NICTA Copyright 2012
From imagination to impact
11
12. Structural Implications of Team Practices
• Conway’s Law (1968)
– The structure of a system reflects the structure of the
organization that constructed the system.
• DevOps advocates
– Small teams
– Mostly independent teams
• Conway’s Law & many small, mostly
independent teams => Service Oriented
Architecture with
– Many services with small scope of each service
– Loose coupling between services
NICTA Copyright 2012
From imagination to impact
12
13. Outline
• What is DevOps?
• What are the implications of DevOps practices
on system structure?
– Team practices.
– Deployment practices.
NICTA Copyright 2012
From imagination to impact
13
14. Deployment Practices
• Deployment can be either an initial deployment
or an upgrade of an existing system.
• We will discuss
– Upgrade
– Continuous deployment
– Roll back
NICTA Copyright 2012
From imagination to impact
14
15. Deployment tools
• Have “recipes” for standard configurations
• Moving outside standard configurations may
introduce errors
• Recipes managed by DevOps group
• Configuration specification is version controlled
• Leads to “scripts are code too” mentality
– Development
– Staging
– Deployment
• Goal is to support developer’s ability to
automatically deploy
NICTA Copyright 2012
From imagination to impact
15
16. Upgrade
• How many at once?
– One at a time (rolling upgrade)
– Groups at a time (staged upgrade, e.g. canaries)
– All at once (big flip)
• Big flip requires double the number of resources.
Infeasible in environment that uses large
numbers of resources.
• Standard practice is rolling upgrade, possibly
with canaries.
NICTA Copyright 2012
From imagination to impact
16
17. Rolling Upgrade Process
• Suppose there are 100s or 1000s of
instances of an application running in the
cloud.
• Then it is too expensive to make a copy
of a new version while leaving the old
version running with all of its instances.
• The solution is to install the new version
on one server at a time – called a rolling
upgrade
• Figure on the right is an example of a
process for a rolling upgrade.
• This process is implemented by a
deployment tool.
Update Auto Scaling
Group
Sort Instances
Confirm Upgrade Spec
Remove & Deregister
Old Instance from ELB
Terminate Old
Instance
Wait for ASG to Start
New Instance
Register New Instance
with ELB
NICTA Copyright 2012
From imagination to impact
17
18. Upgrading a service within the service hierarchy
Suppose we are doing a rolling upgrade at Service
level N+1
Version B assumes new features from Service level
N+2
Service
level N
Service
level N+1
(A)
Service
level N+2
NICTA Copyright 2012
Service
level N+2
From imagination to impact
Service
level N+1
(B)
Service
level N+1
(B)
Service
level N+2
Service
level N+2
18
19. Staging Upgrades
• Service level N+2 must be activated before
activating service level N+1.
• Distinction between upgrading and activating.
Upgrades can occur at any time as long as they
are not activated.
• Structural implication
– Upgrades can be activated through software
switches. Could use Zookeeper for coordinating
active versions.
– Activates all of the instances at (essentially) same
time.
NICTA Copyright 2012
From imagination to impact
19
20. Upgrades can fail
• Functionally incorrect
• Incompatibility with other portions of the
application or infrastructure
• Resource limitations
• Configuration inconsistency
NICTA Copyright 2012
From imagination to impact
20
21. After failure is detected
• Turn off new features in level N+1 and its clients
(level N).
• May need to propagate to top of hierarchy.
• Structural implications
• Features are software switchable
• Require all versions to be backward compatible with previous
versions. If new version on level N+1 is switched off, do not need
to worry about level N+2.
NICTA Copyright 2012
From imagination to impact
21
22. Canaries
• Canaries are a small number of new versions in order to
perform live testing in a production environment.
• A/B testing is similar to canaries in that A/B testing
compares two proposed new interfaces. Different in that
canaries test new functionality.
NICTA Copyright 2012
From imagination to impact
22
23. Canary Issues
• Canaries are a form of live testing. Put a new
version into limited production to test its
correctness.
• Issues
– How long are new versions tested
to determine correctness?
• Period based – for some period of time
• Load based – under some utilization assumptions
• Result based – until some criteria is met
– How are clients of new version chosen and how is
this choice enforced?
NICTA Copyright 2012
From imagination to impact
23
24. Continuous Deployment
• Puts deployment decisions into developers hands.
• May mean simultaneous deployment from independent
teams. Some organizations report dozens of
deployments a day.
• Deployment tool must set configuration information e.g.,
in Zookeeper, so that services know what features are
currently active.
NICTA Copyright 2012
From imagination to impact
24
25. Structural Implications of Continuous
Deployment
• Packaging
• Maintaining Backward Compatibility
NICTA Copyright 2012
From imagination to impact
25
26. Packaging
• Two dimensions
– Flat vs deep service hierarchy
– One service per virtual machine vs many services per
virtual machine
NICTA Copyright 2012
From imagination to impact
26
27. Flat vs Deep Service Hierarchy
• Trading off independence of teams and
possibilities for reuse.
• Flat Service Hierarchy
– Limited dependence among services & limited
coordination needed among teams
– Difficult to reuse services
• Deep Service Hierarchy
– Provides possibility for reusing services
– Requires coordination among teams to discover
reuse possibilities.
NICTA Copyright 2012
From imagination to impact
27
28. Services per VM Image
One service per VM
VM image
Service
Develop
Embed
Multiple services per VM
Develop
Service
1
Embed
VM image
Develop
NICTA Copyright 2012
Service
2
From imagination to impact
Embed
28
29. One Possible Race Condition with Multiple
Services per VM
Initial State: VM image with Version N of Service 1 and Version N of Service 2
Developer 1
Developer 2
TIME
Build new image with VN+1|VN
Begin provisioning
process with new image
Build new image with VN|VN+1
Begin provisioning
process with new image
without new version of
Service 1
Results in Version N+1 of Service 1 not being
updated until next build of VM image
Could be prevented by VM image build tool
NICTA Copyright 2012
From imagination to impact
29
30. Another Possible Race Condition with Multiple
Services per VM
Initial State: VM image with Version N of Service 1 and Version N of Service 2
Developer 1
Developer 2
TIME
Build new image with VN+1|VN
Build new image with VN+1|VN+1
Begin provisioning
process with new image
overwrites image
created by developer 2
Begin provisioning
process with new image
Results in Version N+1 of Service 2 not being
updated until next build of VM image
Could be prevented by provisioning tool
NICTA Copyright 2012
From imagination to impact
30
31. Trade offs
• One service per VM
– Message from one service to another must go
through inter VM communication mechanism – adds
latency
– No possibility of race condition
• Multiple Services per VM
– Inter VM communication requirements reduced –
reduces latency
– Adds possibility of race condition caused by
simultaneous deployment
NICTA Copyright 2012
From imagination to impact
31
32. Motivating Backward Compatibility
• New version of a service may be introduced at
any time
• Existing clients of that service should not have to
be changed
• Require APIs and DB schemas to be backward
compatible.
NICTA Copyright 2012
From imagination to impact
32
33. Achieving Backwards Compatibility
• APIs and DB schemas can be extended but
must always be backward compatible.
• Leads to a translation layer
Client
Client
External APIs (unchanging but with ability to extend
or add new ones)
Translation to internal APIs
Internal APIs (changes require changes to
translation layer but do not propagate further)
NICTA Copyright 2012
From imagination to impact
34. Summary
• DevOps is a collection of practices that have
implications on system structure.
– These practices can be categorized into
• Team practices
• Deployment practices
• Some structural implications are
– Loosely coupled systems with deep hierarchy of
services
– Version aware
– Backward compatible
– Packaging services per VM
NICTA Copyright 2012
From imagination to impact
34