Your monolithic system is difficult to work with and maintain. Moving to a distributed system will solve all your problems and you will be in developer heaven. Right? You will be working with cool technologies and amazing concepts. Plus, Microservices! So what could possibly go wrong?
In my talk I will relate my real life experience of migrating a single ASP.NET application with a monolithic database to a distributed system with hundreds of services dealing with £100,000 transactions every hour. I will cover the challenges faced and the lessons learned in order to offer some final takeaways.
This „from the trenches” story will show you the pitfalls to avoid when dealing with Microservices.
5. ● One monolithic database
● ASP.NET web application
● Doesn't scale
● Can't experiment with different technologies
● Versioning
● No domain ownership
● Long and hard deployments
Why Distributed?
What we will be talking about
look at all these awesome things im going to talk about
Im going to use visually confusing diagrams
ITS A WORD CLOUD!
Were going to talk about DDD, commands and events, the cloud, microservices, myself Sam Elamin - I am interesting to talk about
theres so much awesomeness here :D
DDD
Events
Messages
Commands
Cloud
Samelamin
CQRS
EnterpriseBus
Monolithic
Distributed
Microservices
Docker
this is a very interesting debate, if you are a startup do you want to start off building microservices?
Theres ALOT to do when building microservices
much easier to just add to the monolith, costs are cheaper to the user
we had multiple moving parts, making a change something breaks
It was fragile and hard to work with
https://www.youtube.com/watch?v=qUmBgM6dmQA
http://highscalability.com/blog/2014/4/8/microservices-not-a-free-lunch.html
basically felt like this
we went to the business and said we need to break things down into smaller pieces so its easier to add more features, No
ok then we say we can deploy more thngs frequently No
just no!
Didn't really involve the business, its only refactoring right?
fixing underlying architectural issues felt very underhanded
like we are cheating are way to it
But what we didnt really fully comprehend or appreciate is that under pressure, code is often changed in ways that increases complexity
Micheal feathers said this brilliantly
“ When we break up big things into small pieces we invariably push the complexity to their interaction.”
its just refactoring right?
Read the blue book! very useful
Publish events
these interested systems can then publish more events or interactions
interested application subscribe to these events
one system would say oh an order was made
one system would be ok let me try taking payment
another system might say let me see if i have enough stock to fulfill it
sounded like a nice structure to put in place
Publish events
these interested systems can then publish more events or interactions
interested application subscribe to these events
one system would say oh an order was made
one system would be ok let me try taking payment
another system might say let me see if i have enough stock to fulfill it
sounded like a nice structure to put in place
define a bounded context
at first it wil be difficult because there will be 2 places where the data will be updated
but slowly you will see that only the bounded context calls it own data
We can get the read only models via http request or messages, we used messages
once the code is separate try and only allow the data to be updated inside the context
Unfortunately the monoliths still need access to the data because state was managed via the database
This enables us to get the hardest thing out which is the shared database
We can now move the data (tables) anywhere outside the shared database nothing in the monoliths writes or reads from those tables directly
We define an interface,and the only way to access the data is through the interface, this helps us break away from the shared database
it means i can change whatever i want internally and as long as the contract remains vaild we wil be ok
We realised very quickly that messages are asynchronous and can arrive out of order
we kept getting weird bugs that only happen in production because the messages appeared out of order
and every time we fix a bug we have to explain to our stake holders why its hard to debug a raze hazard and almost impossible to predict
we started seeing problems we never really had to think about
Yep, the second you introduce distributed, you need to leverage infrastructure that addresses network latency, fault tolerance, message serialization, unreliable networks, asynchronicity, versioning, varying loads within the application tiers etc\
put pic in separate slide
Solve race conditions by turning them into sequential conditions
decoupling over code or data duplication
What are the takeaways?
When you have so many services, how can you run them in your local machine, you dont want to compile that many services
and keeping all of them up to date and making sure everything works is a pain
once you have over 20 services how do you run end to end tests?
Monitoring?
Hard to do functional tests between services
Frustrating to deploy other teams services
what we did was we pushed to an integration environment and you only test the service you care about locally
Deploying so many services was a pain
each service or groups of services were deployed differently, so consistency between teams did not exist
so we used team city to deploy to one staging environments and that worked
but now we could have used docker to deploy each service in a container
so instead of having multiple vms or ec2 instances for each service (which can get expensive)
we can deploy multiple docker containers
explain briefly how docker works
We needed to define the ownership of the service, this means both the code and the data related to it. Basically we always tried to ask what parts of the business maps to this code?
The domain language in the code was different than the business domain language which lead to more confusion
We basically spoke one language in the team, the domain experts spoke a different language and the code was yet another language, so the developers
had to work as translators which caused even more overheard
At first it was individuals that owned the services rather than teams
Developing and deploying features that spanned multiple features required careful coordination
Especially when breaking the contract, we learnt very quickly that you deploy the client or subscriber first then the publisher
We had very strict reviews and in hindsight it felt like we were being very knitty picky, during one retro someone raised it as a frustration so tried a new idea called an experiments board
where you try a new idea for a week, the very first idea was always accept prs. If it passes in CI then it should be merged, this quickly enabled the junior members of the team to fell empowered to make change
It also made us rely more on tests, PR should not be used to stop outages happening
While small microservices are certainly simpler to reason about, I worry that this pushes complexity into the interconnections between services, where it's less explicit and thus harder to figure out when it goes wrong. Refactoring becomes much harder when you have to do it across remote boundaries
Is data complication a bad thing?
Data duplication will happen but is that really an issue?
Data integrity is really important to the business
What your worried about is staleness and inconsistency
Is data complication a bad thing?
Data duplication will happen but is that really an issue?
Data integrity is really important to the business
What your worried about is staleness and inconsistency
we didnt need unit tests, because if you need a unit test to describe what is happening in 100 lines of code and help you design it then you are in the wrong profession
Most of the time our services ran fine only when they did break they broke for very obscure reasons
like they couldnt hit the database, or couldnt find the endpoints it was subscribing to
they would raise an alert and say excuse me i cant run. alerts would pop messages into our hipchat rooms or raise alarms and we would quickly go investigate that
That basically became our concept of unit tests because essentially an alert .
if a test continually running
a unit tests would not have helped us fine that problem
services would break because they received rubbish from somewhere else, they will raise an alert and say excuse me I cant run
we didnt need unit tests, because if you need a unit test to describe what is happening in 100 lines of code and help you design it then you are in the wrong profession
Most of the time our services ran fine only when they did break they broke for very obscure reasons
like they couldnt hit the database, or couldnt find the endpoints it was subscribing to
they would raise an alert and say excuse me i cant run. alerts would pop messages into our hipchat rooms or raise alarms and we would quickly go investigate that
That basically became our concept of unit tests because essentially an alert .
if a test continually running
a unit tests would not have helped us fine that problem
services would break because they received rubbish from somewhere else, they will raise an alert and say excuse me I cant run
A side effect of all this is we got rid of our acceptance tests
you would say you have a complex system you need acceptance tests,
I would say its going to be very complex give up on that
we found out that having business metrics like how many orders are we making per minute, how much money are we making, how many things did we sell today
we decided that makes a brilliant acceptance test, first of all it isnt running once when you deploy its running all the time\
couple that with the alerts and you have a true blackbox test, that is the real acceptance test
When metrics start going down we get suspicious, was it a recent deploy? maybe people dont want to eat takeaways anymore
Business metrics are the true acceptance tests, you dont have to understand anything about the internals to measure the business, if your measuring the business that is a very robust test
Another side effect means we dont care what language these services are written in, as long as they are able to communicate then you can write it in whatever language you want
its a 100 lines of code, you can write it in C#, Ruby, Java, Node even Erlang. It doesnt matter
if you decide to come later and look at a service you cant understand you can just rewrite it in a day or 2 in any language you prefer
This got our developers excited because they can experiment and learn which increased motivation
check gregs talk https://vimeo.com/108441214
Its very libral feeling when you concentrate on deleting code
its a 2 day problem not a 12 month rewrite
optomise for decoupling
optomise for deletability
Check out my blog post on building an asp.net vNext app on AWS EC2 instance at dotnetkicks.com
Be careful by not implementing very tiny services because that becomes the nanoservice anti pattern
you dont want to raise the overhead on running the actual service, running an ec2 instance for 10 lines of code is one hell of an overhead
another consideration are the network hops, the more of them the more the overhead
If service A calls Service B which calls Service C, there are performance issues
“From what I have read on software systems, one lesson is to create services that do one thing, and one thing well, and communicate with each other”
Under pressure, code is often changed in ways that increases complexity