2. NICTA Copyright 2012
From imagination to impact
2
About NICTA
National ICT Australia
•Federal and state funded research company established in 2002
•Largest ICT research resource in Australia
•National impact is an important success metric
•~700 staff/students working in 5 labs across major capital cities
•22 university partners
•Providing R&D services, knowledge transfer to Australian (and global) ICT industry
NICTA technology is in over 1 billion mobile phones
3. NICTA Copyright 2012
From imagination to impact
“This project is vital to our company. How long will it take?”
3
Day 1
4. NICTA Copyright 2012
From imagination to impact
“Its taking too long!!! ”
4
Day 30
6. NICTA Copyright 2012
From imagination to impact
Where Does the Time Go?
•As Software Architects our view is that there are the following activities in software development
–Concept
–Requirements
–Design
–Implementation
–Test
•Code Complete
•Different methodologies will organize these activities in different ways.
•Agile focuses on getting to Code Complete faster than with other methods.
6
7. NICTA Copyright 2012
From imagination to impact
What is wrong?
•Code Complete Code in Production
•Between the completion of the code and the placing of the code into production is a step called: Deployment
•Deploying completed code can be very time consuming
7
8. NICTA Copyright 2012
From imagination to impact
Why is Deployment so Time Consuming?
•Errors in deployed code are a major source of outages.
•So much so that organizations have formal release plans.
•There is a position called a “Release Engineer” that has responsibility for managing releases.
8
9. NICTA Copyright 2012
From imagination to impact
Release plan
1.Define and agree release and deployment plans with customers/stakeholders.
2.Ensure that each release package consists of a set of related assets and service components that are compatible with each other.
3.Ensure that integrity of a release package and its constituent components is maintained throughout the transition activities and recorded accurately in the configuration management system.
4.„Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/or uninstalled or backed out, if appropriate.
5.„Ensure that change is managed during the release and deployment activities.
6.„Record and manage deviations, risks, issues related to the new or changed service, and take necessary corrective action.
7.„Ensure that there is knowledge transfer to enable the customers and users to optimise their use of the service to support their business activities.
8.„Ensure that skills and knowledge are transferred to operations and support staff to enable them to effectively and efficiently deliver, support and maintain the service, according to required warranties and service levels
*http://en.wikipedia.org/wiki/Deployment_Plan
9
10. NICTA Copyright 2012
From imagination to impact
Look at one requirement
2.Ensure that each release package consists of a set of related assets and service components that are compatible with each other.
–Every development team contributing to the release must have completed their code
–Every development team must have used the same version of every supporting library
–The development teams must have agreed on a common set of supporting technologies
•Every item requires coordination among developers
–Meetings
–Documents
•I.E. Time
10
11. NICTA Copyright 2012
From imagination to impact
How to Speed Up Deployment
11
•Set up a process and an architecture so that development teams do not need to coordinate with each other
•Support “partial release” deployments.
•This is called: continuous deployment
•Code Complete Code in Production
12. NICTA Copyright 2012
From imagination to impact
Continuous Deployment Pipeline
12
Developer pushes a button and, as long as all of the automated tests are passed, the code is placed into production automatically through a tool chain.
•No coordination with other teams during the execution of the tool chain
•No dependence on other teams activities The ability to have a continuous deployment pipeline depends on the architecture of the system being deployed.
13. NICTA Copyright 2012
From imagination to impact
~2002 Amazon instituted the following design rules - 1
•All teams will henceforth expose their data and functionality through service interfaces.
•Teams must communicate with each other through these interfaces.
•There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
13
14. NICTA Copyright 2012
From imagination to impact
Amazon design rules - 2
•It doesn’t matter what technology they[services] use.
•All service interfaces, without exception, must be designed from the ground up to be externalizable.
•Amazon is providing the specifications for what has come to be called “Microservice Architecture”.
•(Its really an architectural style).
14
15. NICTA Copyright 2012
From imagination to impact
In Addition
•Amazon has a “two pizza” rule.
•No team should be larger than can be fed with two pizzas (~7 members).
•Each (micro) service is the responsibility
of one team
•This means that microservices are
small and intra team bandwidth
is high
•Large systems are made up of many microservices.
•There may be as many as 140 in a typical Amazon page.
15
16. NICTA Copyright 2012
From imagination to impact
Microservice architecture supports continuous deployment
•Two topics:
–What is microservice architecture?
–What are the deployment issues and how do I deal with them?
16
17. NICTA Copyright 2012 From imagination to impact
Micro service architecture
17
Service
• Each user request is
satisfied by some sequence
of services.
• Most services are not
externally available.
• Each service communicates
with other services through
service interfaces.
• Service depth may be 70,
e.g. LinkedIn
18. NICTA Copyright 2012
From imagination to impact
Relation of teams and services
•Each service is the responsibility of a single development team
•Individual developers can deploy new version without coordination with other developers.
•It is possible that a single development team is responsible for multiple services
18
19. NICTA Copyright 2012
From imagination to impact
Coordination model of microservice architecture
•Elements of service interaction
–Services communicate asynchronously through message passing
–Each service could (in principle) be deployed anywhere on the net.
•Latency requirements will probably force particular deployment location choices.
•Services must discover location of dependent services.
–State must be managed
19
20. NICTA Copyright 2012
From imagination to impact
Service discovery
20
•When an instance of a service is launched, it registers with a registry/load balancer
•When a client wishes to utilize a service, it gets the location of an instance from the registry/load balancer.
•Eureka is an open source registry/load balancer
Instance of a service
Client
Register
Invoke
Registry/
load balancer
Query registry
21. NICTA Copyright 2012
From imagination to impact
Subtleties of registry/load balancer
•When multiple instances of the same service have registered, the load balancer can rotate through them to equalize number of requests to each instance.
•Each instance must renew its registration periodically (~90 seconds) so that load balancer does not schedule message to failed instance.
•Registry can keep other information as well as address of instance. For example, version number of service instance.
21
22. NICTA Copyright 2012
From imagination to impact
State management
•Services can be stateless or stateful
–Stateless services
•Allow arbitrary creation of new instances for performance and availability
•Allow messages to be routed to any instance
•State must be provided to stateless services
–Stateful services
•Require clients to communicate with same instance
•Reduces overhead necessary to acquire state
22
23. NICTA Copyright 2012
From imagination to impact
Where to keep the state?
•Persistent state is kept in a database
–Modern database management systems (relational) provide replication functionality
–Some NoSQL systems may be replicated. Others will require manual replication.
•Transient small amounts of state can be kept consistent across instances by using tools such as Memcached or Zookeeper.
•Instances may cache state for performance reasons. It may be necessary to purge the cache before bringing down an instance.
23
24. NICTA Copyright 2012
From imagination to impact
Provisioning new instances
•When the desired workload of a service is greater than can be provided by the existing number of instances of that service, new instances can be instantiated (at runtime).
•Four possibilities for initiating new instance of a service:
1.Client. Client determines whether service is adequately provisioned for its needs based on service SLA and services current workload.
2.Service. Service determines whether it is adequately provisioned based on number of requests it expects from clients.
3.Registry/load balancer determines appropriate number of instances of a service based on SLA and client instance requests.
4.External entity can initiate creation of new instances
24
25. NICTA Copyright 2012
From imagination to impact
Questions about Micro SOA
•/Q/ Isn’t it possible that different teams will implement the same functionality, likely differently?
•/A/ Yes, but so what? Major duplications are avoided through assignment of responsibilities to services. Minor duplications are the price to be paid to avoid necessity for synchronous coordination.
•/Q/ what about transactions?
•/A/ Micro SOA privileges flexibility above reliability and performance. Transactions are recoverable through logging of service interactions. This may introduce some delays if failures occur.
25
26. NICTA Copyright 2012
From imagination to impact
Microservice architecture supports continuous deployment
•Two topics:
–What is microservice architecture?
–What are the deployment issues and how do I deal with them?
26
27. NICTA Copyright 2012 From imagination to impact
Deploying a new version of an application
27
Multiple instances
of a application are
executing
• Red is service being
replaced with new version
• Blue are clients
• Green are dependent
services
VVA B VV B B
UAT / staging /
performance
tests
28. NICTA Copyright 2012
From imagination to impact
Deployment goal and constraints
•Goal of a deployment is to move from current state (N instances of version A of a app) to a new state (N instances of version B of a app)
•Constraints:
–Any development team can deploy their app at any time. I.e. New version of a app can be deployed either before or after a new version of a client. (no synchronization among development teams)
–It takes time to replace one instance of version A with an instance of version B (order of minutes)
–Service to clients must be maintained while the new version is being deployed.
28
29. NICTA Copyright 2012
From imagination to impact
Deployment strategies
•Two basic all of nothing strategies
–Red/Black – leave N instances with version A as they are, allocate and provision N instances with version B and then switch to version B and release instances with version A.
–Rolling Upgrade – allocate one instance, provision it with version B, release one version A instance. Repeat N times.
•Partial strategies are canary testing and A/B testing.
29
30. NICTA Copyright 2012
From imagination to impact
Trade offs – Red/Black and Rolling Upgrade
•Red/Black
–Only one version available to the client at any particular time.
–Requires 2N instances (additional costs)
•Rolling Upgrade
–Multiple versions are available for service at the same time
–Requires N+1 instances.
•Rolling upgrade is widely used.
30
Update Auto Scaling Group
Sort Instances
Remove & Deregister Old Instance from ELB
Confirm Upgrade Spec
Terminate Old Instance
Wait for ASG to Start New Instance
Register New Instance with ELB
Rolling
Upgrade in EC2
31. NICTA Copyright 2012
From imagination to impact
Types of failures during rolling upgrade
Rolling Upgrade Failure
Provisioning
See references at end
Logical failure
Inconsistencies to be discussed
Instance failure
Handled by Auto Scaling Group in EC2
31
32. NICTA Copyright 2012
From imagination to impact
What are the problems with Rolling Upgrade?
•Any development team can deploy their app at any time.
•Three concerns
–Maintaining consistency between different versions of the same app when performing a rolling upgrade
–Maintaining consistency among different apps
–Maintaining consistency between an app and persistent data
32
33. NICTA Copyright 2012
From imagination to impact
Maintaining consistency between different versions of the same app
•Key idea – differentiate between installing a new version and activating a new version
•Involves “feature toggles” (described momentarily)
•Sequence
–Develop version B with new code under control of feature toggle
–Install each instance of version B with the new code toggled off.
–When all of the instances of version A have been replaced with instances of version B, activate new code through toggling the feature.
33
34. NICTA Copyright 2012
From imagination to impact
Issues
•What is a feature toggle?
•How do I manage features that extend across multiple apps?
•How do I activate all relevant instances at once?
34
35. NICTA Copyright 2012
From imagination to impact
Feature toggle
•Place feature dependent new code inside of an “if” statement where the code is executed if an external variable is true. Removed code would be the “else” portion.
•Used to allow developers to check in uncompleted code. Uncompleted code is toggled off.
•During deployment, until new code is activated, it will not be executed.
•Removing feature toggles when a new feature has been committed is important.
35
36. NICTA Copyright 2012
From imagination to impact
Multi app features
•Most features will involve multiple apps.
•Each app has some code under control of a feature toggle.
•Activate feature when all instances of all apps involved in a feature have been installed.
–Maintain a catalog with feature vs service version number.
–A feature toggle manager determines when all old instances of each version have been replaced. This could be done using registry/load balancer.
–The feature manager activates the feature.
–Archaius is an open source feature toggle manager.
36
37. NICTA Copyright 2012
From imagination to impact
Activating feature
•The feature toggle manager changes the value of the feature toggle. Two possible techniques to get new value to instances.
–Push. Broadcasting the new value will instruct each instance to use new code. If a lag of several seconds between the first service to be toggled and the last can be tolerated, there is no problem. Otherwise synchronizing value across network must be done.
–Pull. Querying the manager by each instance to get latest value may cause performance problems.
•A coordination mechanism such as Zookeeper will overcome both problems.
37
38. NICTA Copyright 2012
From imagination to impact
Maintaining consistency across versions (summary)
•Install all instances before activating any new code
•Use feature toggles to activate new code
•Use feature toggle manager to determine when to activate new code
•Use Zookeeper to coordinate activation with low overhead
38
39. NICTA Copyright 2012
From imagination to impact
Maintaining consistency among different services
•Use case:
–Wish to deploy new version of app A without coordinating with development team for clients of app A.
•I.e. new version of app A should be backward compatible in terms of its interfaces.
•May also require forward compatibility in certain circumstances, e.g. rollback
39
40. NICTA Copyright 2012
From imagination to impact
Maintaining consistency between an app and persistent data
•Assume new version is correct. Rollback discussed in a minute.
•Inconsistency in persistent data can come about because data schema or semantics change.
•Effect can be minimized by the following practices (if possible).
–Only extend schema – do not change semantics of existing fields. This preserves backwards compatibility.
–Treat schema modifications as features to be toggled. This maintains consistency among various apps that access data.
40
41. NICTA Copyright 2012
From imagination to impact
Summary of consistency discussion so far.
•Feature toggles are used to maintain consistency within instances of an app
•Disallowing modification of schema will maintain consistency between apps and persistent data.
41
42. NICTA Copyright 2012
From imagination to impact
Canary testing
•Canaries are a small number of instances of a new version placed in production in order to perform live testing in a production environment.
•Canaries are observed closely to determine whether the new version introduces any logical or performance problems. If not, roll out new version globally. If so, roll back canaries.
•Named after canaries
in coal mines.
42
43. NICTA Copyright 2012
From imagination to impact
Implementation of canaries
•Designate a collection of instances as canaries. They do not need to be aware of their designation.
•Designate a collection of customers as testing the canaries. Can be, for example
–Organizationally based
–Geographically based
•Then
–Activate feature or version to be tested for canaries. Can be done through feature activation synchronization mechanism
–Route messages from canary customers to canaries. Can be done through making registry/load balancer canary aware.
43
44. NICTA Copyright 2012
From imagination to impact
A/B testing
•Suppose you wish to test user response to a system variant. E.g. UI difference or marketing effort. A is one variant and B is the other.
•You simultaneously make available both variants to different audiences and compare the responses.
•Implementation is the same as canary testing.
44
45. NICTA Copyright 2012
From imagination to impact
Rollback
•New versions of an app may be unacceptable either for logical or performance reasons.
•Two options in this case
•Roll back (undo deployment)
•Roll forward (discontinue current deployment and create a new release without the problem).
•Decision to rollback or roll forward is almost never automated because there are multiple factors to consider.
•Forward or backward recovery
•Consequences and severity of problem
•Importance of upgrade
45
46. NICTA Copyright 2012
From imagination to impact
Summary
•Speeding up deployment time will reduce time to market
•Continuous deployment is a technique to speed up deployment time
•Microservice architecture is designed for minimizing coordination needs and allowing independent deployment
•Multiple simultaneous versions managed with feature toggles.
•Feature toggles support rollback, canary testing, and A/B testing.
46
47. NICTA Copyright 2012
From imagination to impact
NICTA team
•Liming Zhu
•Ingo Weber
•Min Fu
•Sherry Xu
•Daniel Sun
•Ah Binh Tran
•Chao Li
47
48. NICTA Copyright 2012
From imagination to impact
More Information
Contact len.bass@nicta.com.au
Book is due out May, 2015
Research papers:
ssrg.nicta.com.au’projects/cloud
48