Jamie Allen, Björn Antonsson and Patrik Nordwall discuss patterns of building Akka systems, including the Sentinel for handling failure above a supervisor, flow control and distributed workers. Patrik also built a template for Typesafe Activator v0.2.1 allowing you to try out this distributed workers pattern on your own machine.
3. Guaranteed Delivery
• From Enterprise Integration Patterns
• Messaging system uses built-in store
to persist
• ACK everywhere
– Producer to sender
– Sender to receiver
– Receiver to consumer
@jamie_allen
3
4. Akka Guarantees
• Not much, intentionally
• At most once, with no reordering
• Pick your poison:
– At most once
– At least once
– Exactly once
• You have to add it on top
@jamie_allen
4
5. How Do I Do It?
• Handle “at least” semantics on receiver
to deal with duplicates
– Idempotent behavior in receiver
– Check message ID
• Handle “at most” semantics on the
sender via retries
– ACK every time message is handled
– Cancel repeated send
@jamie_allen
5
7. @jamie_allen
• Doesn’t work with future-based message
sending (ask, ?)
• No guarantee is there that the message
even got to the mailbox in distributed
systems
• Asking for guarantees in an uncertain
world
7
Durable Mailboxes
8. Event Sourcing?
• Wonderful pattern for compiling a list of
time-series events
• Separation of concerns from actor
mailboxes
• Still lots of things that can go wrong
– Disk writing
– Replication consistency
– Network partitions
– Application latency
@jamie_allen
8
9. External Durable Message Queue
• You still have to ACK
• No certainty the message you needed
even got this far
• Additional dependencies in your
architecture
@jamie_allen
9
10. Guaranteed Delivery Doesn’t Exist
• We don’t know what we don’t know
• Increased effort
• Increased complexity
• Increased latency
• No guarantees of consistency
• Doesn’t guarantee ordering
@jamie_allen
10
11. So What Do We Do?
• This falls outside of actor supervision;
nothing the actors know about has
gone wrong
• Listen to Roland Kuhn:
“Recovery ... should ideally be
automatic in order to restore normal
service as quickly as possible.”
@jamie_allen
11
13. Sentinels
• Supervisors handle failure BELOW them.
Sentinels handle failure ABOVE.
• Responsible for querying a “source of truth” and
getting latest state
• Sends data to supervisor, who resolves
differences in which instances of actors should
exist versus those that do
• Supervisor forwards data to instances that
should exist for them to resolve their internal
state
@jamie_allen
13
20. Sentinels
• Localize them for each kind of data
that must be synchronized in your
supervisor hierarchy
• Do not create one big one and try to
resolve the entire tree at once
@jamie_allen
20
21. Drawbacks
• Doesn’t work well with localized event
sourcing - time series can be lost
• Does introduce additional complexity
and tunable latency over applications
with no guarantees
• Pattern only works when there is a
queryable source of truth
@jamie_allen
21
22. Inconsistent Views?
• Using Sentinels at multiple levels of a
supervisory hierarchy can lead to
temporarily inconsistent views when
child actors are resolved before
parents on delete (no atomicity)
• But is this necessarily bad?
@jamie_allen
22
29. A Huge Win
• Your system is resilient to external
failures
• You can tune sentinel update
frequency to meet changing
requirements
• Your system is considerably less
complex than attempting to guarantee
no message loss
@jamie_allen
29
31. Pure Push Applications
• Often the first Actor application you
write
– Once you start telling and stop asking
• Easy to implement and reason about
• Fits nicely with short lived jobs that
come at a fixed rate
31
@bantonsson
33. Why do you need anything else?
• Produce jobs faster than you can finish
them
• Jobs are expensive compute/memory
wise
• External resources impose limits
• Unpredictable job patterns
33
@bantonsson
34. What can you do instead?
• Push with rate limiting
– A fixed number of jobs per time unit are
pushed
• Push with acknowledgment
– A fixed number of jobs can be in progress.
– New jobs are pushed after old jobs finish
• Pull
– Jobs are pulled from the master at the rate
that they are completed
34
@bantonsson
35. Push with rate limiting
• A timer sends the master ticks at fixed
intervals
• When a tick arrives, the master fills up
its token count
• If a job arrives and there are no tokens,
it gets queued
• When the master has tokens, it pulls
jobs off the queue and pushes them
35
@bantonsson
37. Push with acknowledgement
• The master push a fixed number of jobs
before waiting for an acknowledgement
• If a job arrives and the master can't
push, it gets queued
• To keep workers busy, push more than
one job per worker
– You can use a high water mark to stop and
a low water mark to start pushing
37
@bantonsson
39. Pull
• The master actor queues incoming jobs
• Worker actors ask the master for a job
and receives jobs when available
• The workers don't need to do active
polling
• Can lead to lag if jobs are small
compared to the time it takes to get a
new one
– Use batching to counteract lag
39
@bantonsson