Integrating microservices and taming distributed systems is hard. In this talk I will present three challenges we observed in real-life projects and discuss how to avoid them using Open Source orchestration. Communication is complex. With everything being distributed failures are normal, so you need sophisticated failure handling strategies (e.g. stateful retry). A synchronicity requires you to handle timeouts. This is not only about milliseconds, systems get much more resilient when you can wait for minutes, hours or even longer. Distributed transactions cannot simply be delegated to protocols like XA. So, you need to solve the requirement to retain consistency in case of failures. I will not only use slides but demonstrate concrete source code examples available on GitHub.
4. Do A
Do B
All or
nothing
+
try {
tx.begin();
doA();
doB();
tx.commit();
} catch (Exception e) {
tx.rollback();
}
@Transactional
public void createCustomer(Customer cust) {
// ...
}
Or simply:
Once upon a time:
13. Starbucks does not use two phase commit
Gregor Hohpe https://www.enterpriseintegrationpatterns.com/ramblings/18_starbucks.html
Photo by John Ingle
14. That means
Do A
Do B
Temporarily
inconsistent
Eventually
consistent
again
t
Consistent
Local
ACID
Local
ACID
1 (micro-)service
1 aggregate
1 program
1 resource
Violates „I“
of ACID
35. We are having some technical
difficulties and cannot present you
your boarding pass right away.
But we do actively retry ourselves, so
lean back, relax and we will send it
on time.
52. We are processing your payment.
Do not leave this page.
And for god sake – do not reload!
It is a business
problem
anyway!
53. We are processing your payment.
Do not leave this page.
And for god sake – do not reload!
It is a business
problem
anyway!
We are currently processing your request.
Don‘t worry, it will happen safely –
even if you loose connection.
Feel free to reload this page any time!
60. „The customer wants a synchronous response“
Check-in
Barcode
Generator
Web-UI
Bernd
Output
Mgmt
„Eh – no!“
61. generateBoardingPass
HTTP 200 OK
HTTP 202 ACCEPTED
Check-in
A synchronous response is possible in the
happy case, otherwise it is switched to
asynchronous processing.
86. Compensation – the classical example
Saga
book
hotel
book
car
book
flight
cancel
hotel
cancel
car
1. 2. 3.
5.6.
In case of failure
trigger compensations
book
trip
90. Implementing changes in the process
Hotel
Flight
Car
Trip
Trip
failed
Trip
requested
Hotel
booked
Car
booked
Request
trip
Flight
failed
Car
canceled
Hotel
canceled
We have a new basic agreement
with the car rental agency and
can cancel for free within 1 hour
– do that first!
91. Implementing changes in the process
Hotel
Flight
Car
Trip
Trip
failed
Trip
requested
Hotel
booked
Car
booked
Request
trip
Flight
failed
Car
canceled
Hotel
canceled
You have to adjust all services and redeploy at the same time!
We have a new basic agreement
with the car rental agency and
can cancel for free within 1 hour
– do that first!
92. What we wanted
Photo by Lijian Zhang, available under Creative Commons SA 2.0 License
99. has to implement
Timeout, Retry,
Compensation
has to offer
Compensation
has to implement
Idempotency
Client Service Provider
100. has to implement
Timeout, Retry,
Compensation
has to offer
Compensation
has to implement
Idempotency
Client Service ProviderDon‘t forget
about state
101. # Be aware of complexity of distributed systems
# Know strategies and tools to handle it
e.g. Circuit breaker (Hystrix)
Workflow engine for stateful retry, waiting, timeout
and compensation (Camunda)