MIGRATING TO CONTINUOUS DELIVERY IN THE WORLD OF FINANCIAL TRADING
IG Group is a global leader in online retail trading. It has over 120,000 active users around the world, and serves over 4.5 million trades per month. IG has had an online presence since the early 2000s and has grown sizeable IT real estate to support its many users and activities. Like many large companies, IG has depended on monthly big bang release cycles with long regression periods to guarantee the quality and reliability of its software.
In 2013 IG started to move towards a continuous delivery model. This allowed us to break away from an ever-increasing monthly software release, and to enable higher quality and stability by releasing small change sets into production more frequently.
In the world of financial trading, uptime and reliability are carefully monitored by financial regulators around the globe. In this talk, we look at the approaches and techniques IG have used to move to a continuous delivery model. Their API layer serves upwards of 40k requests per minute and is connected to numerous web, mobile, and public API clients. We will explore the technical and organisational challenges faced along the way, as well as some of the unexpected benefits.
28. Master
v1.0.0-
SNAPSHOT
1. New feature
branch
2. Developers
commit to feature
branch
3. Pull request
4. Merge to master
5. Bamboo
checkouts a
new branch:
release-
1.0.0
6. If the build
passes, a tag is
created:
1.0.0
29. Master
v1.0.0-
SNAPSHOT
1. New feature
branch
2. Developers
commit to feature
branch
3. Pull request
4. Merge to master
5. Bamboo
checkouts a
new branch:
release-
1.0.0
6. If the build
passes, a tag is
created:
1.0.0
7. The new
branch
and tag
are pushed
to
origin
32. Developer
merges code
Bamboo tags
the code and
builds it
Runs
automated
tests
Deploys to
dark test
environment
Runs
automated
tests
MANUAL
STOP
Flip dark to
light in test
MANUAL
STOP
Deploy to UAT
dark
Thank you for coming to listen
CD is ecoming an increasingly popular and common place term
Part of the DevOps movement
Talking about:
Why CD is essential
The challenges of implementing it in a FinTech company
Some lessons from our real world implementation
I’m David Genn
Tech lead of our API team
We provide web and mobile based platforms to allow retail customers to trade the financial markets
We have ~120k customers across the globe
Process in excess of 4 million trades a month
For us to really understand why Continuous Delivery is something we need to take seriously we need to consider the principles behind how we build software
The Lean movement has helped shape the way the most effective development teams work
I’d like to recommend this book as a great starting point for understanding it
“This is Lean” by Niklas Modig and Par Ahlstrom
Book starts with an anecdote about seeing your doctor
It tells the tale of a woman who wakes up and discovers a lump in her breast and is worried that she may have cancer
She goes to her doctor
Her doctor refers her to a specialist
2 weeks later she goes to see the specialist who tells her that she needs to have a biopsy
10 days after that she goes to see the surgeon who can take the biopsy
The results are sent back to her original doctor who finally gives her a diagnosis 5 days later
Total time to get a diagnosis: 29 days
A second woman also wakes up one morning and discovers a lump
She goes to a specialist walk in clinic where everyone is onsite who she could need to see
She sees a nurse who gives her a preliminary check
She immediately sees a specialist who sends hr down the corridor to have the biopsy taken
She waits two hours whilst the biopsy results come back
She is given her diagnosis
Total time to get a diagnosis 4 hours
The first process focussed on efficiently using resources
The doctors and nurses were kept completely busy because they had full waiting rooms of people to see
To ensure all the medical staff were used effectively, you need a separate appointment to see each specialist
This is called Resource Efficiency
The second approach focuses on the time it takes to get a diagnosis.
This means having everyone available in one site that may be needed to get a diagnosis
You have no guarantee that every medical specialist will be fully utilised but you do guarantee a quick diagnosis
This is called Flow Efficiency
In software development we are still focussed on efficiently using our resources.
We want our development teams and our QA teams to be fully occupied
This means we want everyone working on delivering as many features as we can
Inevitably this means we build up large batches of features that need to be released
This approach does efficiently use our development and QA teams resources
However it means we inevitably have to go slower as more co-ordination is needed and there are more hand offs and queues between teams
This is ‘smart motorway’ in the UK which adjust the speed limit depending on the volume of traffic
The more traffic, the slower they have to go to prevent traffic jams
It’s the same with software development – the more code being deployed, the more co-ordination you need and the slower you have to go
Large releases of code need large amounts of infrastructure to support it
We need to co-ordinate release into each environment
We need to co-ordinate regression testing, code freezes, bug fixes etc
We end up creating roles like Release Manager and Delivery Manager to keep things on track
And of course, large releases of code are much riskier as so much is changing
If one aspect of the release goes wrong, it can have a catastrophic failure on the other features being deployed
Rather than being like those large container ships, we need to be more like Amazon
They’ve started a 1 hour delivery service for certain goods in certain cities
They’re focussing on how quickly they can get a product to their customer, rather than how many things are being delivered at any one time
Amazon clearly don’t use large vehicles to do their Prime Now deliveries – they use cycle couriers
These guys can’t carry very much, but they can deliver very quickly
We need to deliver software more like a bike courier and less like a container ship
Too often we measure a development teams effectiveness based on their velocity - this merely measures our resource efficiency
To measure our flow efficiency we need to measure our cycle time – how long does it take to go from having an idea or seeing a problem to getting the solution into prod
The time it takes to deliver an idea in to production is what gives you a competitive edge.
It doesn’t matter how many projects you deliver per year if your competitor can deliver ideas to customers more quickly.
It is your cycle time that will determine your success.
Facebook deploy features every week
Amazon deliver code into prod every 12 seconds
Google do 2 billion deploys every week
It’s no wonder that these companies dominate so successfully
Continuous Delivery is a principle that has been around for almost 10 years
It is a set of principles for helping teams deliver valuable software to production as soon as it’s ready and move away from the large batch release process
This infographic is a summary of Jez Humble and Dave Farley’s book on Continuous Delivery
Continuous Delivery in summary
As a Fintech company we face a number of challenges that we’re overcoming to allow us to get the benefits of Continuous Delivery
Uptime SLAs – our uptime is carefully monitored which is essential given we’re handing client’s money. Any changes to our release process need to guarantee we don’t impact our uptime.
Regulation / Auditors – IG’s operations are regulated by a number of different financial regulators around the globe. The key thing they are concerned about is that we have the correct processes in place to understand and control the change to each environment. We need to work with this process to ensure we continue to meet these requirements.
Legacy code – legacy code and monolithic applications can present significant technical challenges to achieving continuous delivery
Complex cross-tem projects – this involves large amounts of co-ordination between teams and big-batch releases have often been used as a mechanism for co-ordinating the release of code. This approach no longer works if each team is deploying as soon as they’re ready
Physical servers - Many companies that do CD successfully use the flexibility that the cloud offers them – the ability to provision new server programmatically
We are tied to using our own data centres which means we have to build much of the infrastructure that you would get for free in the cloud
Developer attitude – CD puts all the responsibility for releasing code on to the development team. If they’ve been used to finishing their development and letting others worry about testing and deploying it then moving to CD will require a shift in attitude
What about the code freeze – the code freeze has always been used as a way of giving QA teams a stable, consistent environment to regression test the platform. This is no longer available if teams are deploying continually.
Those are the challenges many companies face, ourselves included, when looking to adopt CD.
Many companies use different technical approaches to implement CD but however you do it, there are some core principles
Every commit is a release candidate – every time you merge to master, you need to consider that code ready to go to prod. Gone are the days of coding for 2-4 weeks and then taking a ‘cut’ which you merge to a Release branch. Every time you merge to master you are saying that you’re confident your code is ready to go.
Trust your tests – if you’re deploying to prod frequently you can’t rely on long manual regression cycles, you need suites of automated tests that you trust and give you confidence that your app performs as expected.
Automate everything – deploying frequently is only possible if it’s easy to do without making mistakes. This means you need to automate as much of the process as possible.
Separate deployment from release – there are two risky parts to putting new software live – deploying the new binary and then making the new functionality available. In a raditional release process we do both stages together. In CD we want to separate them. We want to be confident that the new binary is behaving itself before we allow it to serve live traffic.
We want to be able to test the new functionality before we switch it on
Those are the principles we’ve focussed on.
Here is an overview of how we’ve implemented it.
Firstly, our Git branching model.
As we’ve already mentioned, in CD, every commit to master is an RC
Developers work on a feature branch
1 or more devs will commit to that branch
When the feature is complete they raise a PR
When that PR is approved it is merged directly to master
This triggers a Bamboo build which creates a branch release-1.0.0
If the build passes it creates a tag 1.0.0 and publishes the binary to Nexus
The branch and tag are pushed to origin and the binary is deployed
If a bug is found, you would typically fix it by following the same process – create a new feature branch and merge it to master, creating a new tag to be released.
If this isn’t possible, you can create a hotfix branch off the deployed tag.
Another principle of CD is the ability separate the deployment of a new application from the release of the new funcitonality
1-3 deploys per week during core hours
QA and devs running deploys
Done means in prod