This describes a story about a couple of teams that started their migration to the public cloud so the platform becomes available for ~300 teams. War stories, their journey, bloopers and their choices all shared.
5. On-prem environment
Linux box
Portal Framework
Shared libraries (API's)
AppApp
App
App App
App
AppPlugins App
IBM Websphere Application Server
/*
6. 2008 - 2019
2 mln
Logins / day
3000
Req. / second
150+
Teams
400+
Applications
Very stable and controlled environment.
Why do we need to move to a new one?
8. IBM Websphere Application Server
Linux box
Edge Service
AppApp
App
App App
App
App App
User
Profile
Page Render
User
Preference
State
Store
VCM
Gateway
/* Portal Framework
Plugins
Shared libraries (API's)
13. PCF & RaboBank
Public cloud
Preferred solution for application workloads
No containerization by teams
Promise simplicity
Teams build, platform runs it
Independent teams
Large scale (~300 teams)
Maintainable (3-6 operators)
18. With freedom comes responsibility.
Freedom makes a huge requirement of every
human being.
For the person who is unwilling to grow up,
the person who does not want to carry his
own weight, this is a frightening prospect.
Eleanor Roosevelt
19. With freedom comes responsibility.
Freedom makes a huge requirement of every
DevOps team.
For the team who is unwilling to grow up, the
team who does not want to carry its own
weight, this is a frightening prospect.
22. “We need certificates to connect to these on-
premises services, can you please send your
certificates to us?“
23. “In my (stateless) microservice I want to store
a file on the running instance, how can I
access it and serve it to a customer in the next
request?”
24. “So, every team can deploy their own service
to production. Why do we want that, it sounds
very insecure!
Why do we want this microservice /
independent team thing again?”
25. “I'm told that in a Microservice environment I
should spin up an instance per
thread/request. How can I configure the
autoscaler to do that”
38. First implementation
2017 - 2018
Zuul 1
Spring Boot 1.5.x (MVC)
Spring Cloud Zuul
Hystrix
Config Server
Micrometer
39. Second implementation
2019 - now
Spring Boot 2.x (WebFlux)
Spring Cloud Gateway
Hystrix (and others)
Config Server
Micrometer
40. Why refactoring within a year
Scale in requests
Number of routes
Memory of hystrix circuits
Thread pools (thread per request)
Performance
Zuul 1 in maintenance mode
41. Why Spring Cloud Gateway
+ Zuul 2 not yet open sourced
+ Spring Core Team: SCG the next thing
+ Good integration with PCF
- Not (yet) proven technology
44. Lessons learned
Be patient with new solutions
Reactive is more resilient
Reactive is as easy as.... Reactive
Test, test and test
Monitor
Canary release
Share your story
Accept doing things twice
70. Lessons learned
Microservice is hard, easy and fast
Independent
Own your service
Release early, deploy often
Automate everything
Learning on the job
Fun
Get help
Culture change
Be resilient
GOOD MORNING EVERYONE
How many of you have worked in a Bank?
Production and bank regulation
- Public cloud
- Old on-prem env. takes 4 ours with a lot of automation
- Here we are, multiple times a day, 10 minutes
- Demonstrated internal conf. 1 hour production
- Looking back, what, bank/tech
- Thanks, lots of quality talks to choose
- Solution Architect for teams responsible for the core backend services in the online platform
-
From, what, to
- Realy focussing on moving to the cloud
On prem, nothing special
Own portal impl
Modular monolith
Our teams responsible for orange blocks
We monitor and serve it
- 11y served us well
- Almost no disruptance
Aging Technology:
Topics during conferences.
Could not choose fitting technologie
Scaling:
Lots of effort to scale
PCF: Per microservice and on platform level
DevOps:
Not only say we are doing devops
Deploy independent to production. When we deploy no other team can do it
Need to be fast
Deliver!
From this, to: This
Explain EdgeService later
Orange is ROCS
On the ROCS, shaken not Stirred (dr. No 1962)
Let the platform run it
Focus on building functionality
Fintan Ryan: Senior research Director and analyst
new platform ~30teams.
!!!(10 min)
Some secundary subjects
10 min
Let me first describe PCF a bit.
Putting it in the public cloud (Got adobted as a Rabobank Strategy)
Keep it simple
Focus on delivering functionality
Independent teams
Brief overview of PCF runtime
Router for traffic
Cloud Controller for communicating with CF
Diegocells each on separate cloud VM
When an app is deployed, route is created and the Router knows where to find the instances
15 min
Not only tech change, also cult.
What can go wrong
We use this quote a lot
Autobiograpy "you learn by living(1960)"
Devops teams
What is cost of freedom, moving away controlled env.
And a lot more...
Don't take this personal if your qoute is in here
Not going to say which ones are my own...
It's like passport border control
Stateless
Instances, rolling upgrades.
Security intake.
Teams asking for help
(15 minutes)
20 min
One of the things we did from the start is implement an EdgeService strategy
Common Edgeservice strategy
It is a seperation between outside and inside.
Directing inbound trafic to the correct components
- inspect and judge traffic
- Do things ones, and pass information in headers to downstream services
- Implement security and authorization
From outside not able to connect to health endpoints
Some intake on urls, automate.
Takes less than 5 minutes to update
User Authorization: Check if the (logged in) user is allowed to access a service with the requested HTTP methods
Governance: multiple departments, multiple business domains, but the same web domain. Where we want governance on the exposed API's
Red button: An emergengy button to disable inbound traffic to a certain service and minimize damage.
Circuit Breaking per route. Breathing space and fail fast.
Control over the endpoints to expose
Setting default headers when not set, for example caching.
Cross Site Reference Forcery
Rate limit
You are talking about independent teams etc. Why are you introducing a single piont of failure?
- Not 1 instance but multiple (currently 4)
- Not 1 application deployed, but isolated environments for flows
- And those are also scaled
- Multiple regions and/or availability zones
- Because developers are human and might forget something
I know, everybody in this room never makes mistakes, but there is also a world outside this room
- Everybody thinks about URL naming right?
- Nice, full rest urls, with resource names and HTTP Methods etc.
- Separation between URL design and implementation
- Good rules cross domains.
- Governance on the URLs we expose as the domain of the bank
- Following the URL API guildelines from the Rabobank.
- Automated as much as possible
- Circuitbreaker to allow broken services to recover
- Rate limit
Stop traffic to service when disfunctions (red-button)
This reduces time to market along with automation
30 minutes
Only left the configuration and test cases intact.
We did a lot of tests, also future expectations
Microservices are like toilet paper
Vice President Strategy Pivotal
Realy simple service
Lessons learned going from zuul to SCG
35 - 40 minutes
We can only go to production within one hour if we abuse automation
Panzer team is only 3 people to manage all the pcf environments!
Less then 15 m later we have access
40 minutes
Duplication initially and we knew we had to remove it later
Then we created ROCS pipeline library!!!! Rabobank Online Cloud Services Or Core as some might call it…
Community driven library
Each change in production needs to be registered in our ticketing system (SM9) and that is mandatory. We took a lot of effort to automate this part.
Withoutdown time will be explained in the next slide
Verification route is available in a production environment
All interactions are made through Jenkins pipeline using a functional account
With this we can deploy fast to production, but what about the quality of what we deliver?
From nothing to production in one hour is possible due to PCF in the public cloud - 15m
Edge Service to be able to expose our features to the customers through an automated process - 15m DONE ONCE
After the initial set up you only need the CI/CD that takes us 10minutes
Presented this a year ago on a big internal seminar @ Rabobank
45minutes
Testing will allow us to increase quality
)
Why we have two PR’s? I will explain in the next slides
- Stub all dependencies
- Test in ISOLATION
- No dependency on infrastructure or network
Draw two pcf foundations and how we stop one region and things should still continue working
We do this quite often and everything runs fine but sometimes applications fail 50% or never work since they deployed them manually
50 minutes
60 minutes
From nothing to production in one hour is possible due to PCF in the public cloud - 15m
Edge Service to be able to expose our features to the customers through an automated process - 15m DONE ONCE
After the initial set up you only need the CI/CD that takes us 10minutes