How do I create an application that is always available, even in the face of catastrophic hardware failures? How do I make it cope with large fluctuations in the number of active users or their behavior? The answers to both questions have one thing in common: the need to spread the application out, to distribute it. In this presentation we take a deep dive into the architecture of reactive applications (see http://reactivemanifesto.org/), starting from cheap and efficient best-effort messaging and building delivery guarantees and persistence on top—with examples using Akka. Understanding these mechanisms and how they relate to each other will allow you to build systems that strike the right balance between partition tolerance, consistency, availability, implementation complexity and operational cost.
2. The Future is Bright
2
source: http://nature.desktopnexus.com/wallpaper/58502/
3. The Future is Bright
• Computing Resources: Effectively Unbounded
• 18-core CPU packages, 1024 around the corner
• on-demand provisioning in the cloud
• Demand for Web-Scale Apps: Ravenous
• everyone has the tools
• successful ideas are bought for Giga-$
3
4. How do I make a Large-Scale App?
• figure out a service to provide to everyone
• this will require resources accordingly:
• storage
• computation
• communication
• scalability implies no shared contention points
• need to be elastic ➟ use the cloud
4
5. How do I make an Always-Available App?
• must not have single point of failure
• all functions and resources are replicated
• protect against data loss
• protect against service downtime
• regional content delivery
• need to distribute ➟ use the cloud
5
6. How do I make a Persistent App?
• runtime replication usually not sufficient
• system of record needed
• one central database is not tenable
• persist “locally” and replicate changes
6
9. How do I make a Persistent App?
• runtime replication usually not sufficient
• system of record needed
• one central database is not tenable
• persist “locally” and replicate changes
• temporally distributed change records
• spatially distributed change log
• granularity can be one Aggregate Root (Event Sourcing)
• extract business intelligence from history
• scale & distribute persistence ➟ in the cloud
9
10. As Noted Earlier: The Future is Bright
10
source: http://nature.desktopnexus.com/wallpaper/58502/
11. … or is it?
11
Photograph: Patrick Cromerford, 2011
12. Distribution: a Double-Edged Sword
• did that message make it over the network?
• is that other machine running?
• are our data centers in sync?
• did we lose messages in that hardware crash?
12
13. The Essence of the Problem
A distributed system is one whose parts can fail
independently of each other.
13
16. Transactional Storage
• Serializability simplifies reasoning about behavior
• … but is at odds with distribution (CAP theorem)
• Eventual Consistency is often sufficient
(PeterBailis:ProbabilisticallyBoundedStaleness)
• Scalability can be regained by partitioning
(PatHelland:LifebeyondDistributedTransactions)
• consistency has a cost
• you need to choose how much is necessary
16
17. PAXOS and Friends
• problem: distributed consensus
• hard solutions favor Consistency
• 2PC, PAXOS, RAFT
• if a node is unreachable then consensus cannot be reached
• soft solutions favor Availability
• quorum, allow nodes to drop out
• partitions can lead to more than one active set
• consistency always has a cost
• you need to choose which one is appropriate
17
18. Fighting Message Loss
• resending is the only cure
• sender stores the message and retries
• recipient must send acknowledgement eventually
• fighting unreliable medium is sender’s obligation
• sender’s buffer must be persistent
• resending must begin anew after a crash
18
19. At-Least-Once Delivery with Akka
19
class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery {
val IDs = Iterator from 1
def receiveCommand = {
case Command(x) =>
persist(ImportantResult(x)) { ir =>
sender() ! Done(x)
deliver(other, FollowUpCommand(x, IDs.next(), _))
}
case rc: ReceiptConfirmed =>
persist(rc) { rc => confirmDelivery(rc.deliveryId) }
}
def receiveRecover = {
case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _))
case ReceiptConfirmed(_, id) => confirmDelivery(id)
}
}
20. At-Least-Once Delivery with Akka
20
class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery {
val IDs = Iterator from 1
def receiveCommand = {
case Command(x) =>
persist(ImportantResult(x)) { ir =>
sender() ! Done(x)
deliver(other, FollowUpCommand(x, IDs.next(), _))
}
case rc: ReceiptConfirmed =>
persist(rc) { rc => confirmDelivery(rc.deliveryId) }
}
def receiveRecover = {
case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _))
case ReceiptConfirmed(_, id) => confirmDelivery(id)
}
}
21. At-Least-Once Delivery with Akka
21
class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery {
val IDs = Iterator from 1
def receiveCommand = {
case Command(x) =>
persist(ImportantResult(x)) { ir =>
sender() ! Done(x)
deliver(other, FollowUpCommand(x, IDs.next(), _))
}
case rc: ReceiptConfirmed =>
persist(rc) { rc => confirmDelivery(rc.deliveryId) }
}
def receiveRecover = {
case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _))
case ReceiptConfirmed(_, id) => confirmDelivery(id)
}
}
22. At-Least-Once Delivery with Akka
22
class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery {
val IDs = Iterator from 1
def receiveCommand = {
case Command(x) =>
persist(ImportantResult(x)) { ir =>
sender() ! Done(x)
deliver(other, FollowUpCommand(x, IDs.next(), _))
}
case rc: ReceiptConfirmed =>
persist(rc) { rc => confirmDelivery(rc.deliveryId) }
}
def receiveRecover = {
case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _))
case ReceiptConfirmed(_, id) => confirmDelivery(id)
}
}
23. At-Least-Once Delivery with Akka
23
class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery {
val IDs = Iterator from 1
def receiveCommand = {
case Command(x) =>
persist(ImportantResult(x)) { ir =>
sender() ! Done(x)
deliver(other, FollowUpCommand(x, IDs.next(), _))
}
case rc: ReceiptConfirmed =>
persist(rc) { rc => confirmDelivery(rc.deliveryId) }
}
def receiveRecover = {
case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _))
case ReceiptConfirmed(_, id) => confirmDelivery(id)
}
}
24. Mysterious Double-Bookings
• acknowledgements can be lost
• sender will faithfully duplicate the message
• therefore named at-least-once
• recipient is responsible for dealing with this
24
25. Exactly-Once Delivery
• processing occurs only once for each message
• all middleware does it*
• EIP: Guaranteed Delivery just means persistence
• JMS: configure session for AUTO_ACKNOWLEDGE
• Google’s MillWheel
25
26. Exactly-Once Delivery with Akka
26
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
persist(rc) { rc =>
sender() ! rc
doTheAction()
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
27. Exactly-Once Delivery with Akka
27
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
persist(rc) { rc =>
sender() ! rc
doTheAction()
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
28. Exactly-Once Delivery with Akka
28
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
persist(rc) { rc =>
sender() ! rc
doTheAction()
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
29. Exactly-Once Delivery with Akka
29
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
persist(rc) { rc =>
sender() ! rc
doTheAction()
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
30. Exactly-Once Delivery with Akka
30
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
persist(rc) { rc =>
sender() ! rc
doTheAction()
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
31. Exactly-Once Delivery with Akka
31
class PersistThenAct extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
doTheAction()
persist(rc) { rc =>
sender() ! rc
nextId += 1
}
} else if (id < nextId) sender() ! rc
// otherwise an older command was lost, so wait for the resend
}
def receiveRecover = {
case ReceiptConfirmed(id, _) => nextId = id + 1
}
private def doTheAction(): Unit = () // actually do the action
}
32. The Hard Truth
CPU can stop between any two instructions, therefore
exactly-once semantics cannot be Guaranteed*.
*with a capital G, i.e. with 100% absolute coverage
32
33. Mitigating that Hard Truth
• probability of loss or duplication can be small
• many systems are fine with almost-exactly-once
• still need to decide on which side to err
• persist-then-act ➟ almost-exactly-but-at-most-once
• act-then-persist ➟ almost-exactly-but-at-least-once
33
34. We Can Do Better!
• an operation is idempotent if multiple application
produces the same result as single application
• (side-)effects must also be idempotent
• unique transaction identifiers etc.
• this achieves effectively-exactly-once
• can save persistence round-trip latency by
optimistic execution
34
35. Effectively-Exactly-Once with Akka
35
class Idempotent(other: ActorRef) extends PersistentActor {
var nextId = 1
def receiveCommand = {
case FollowUpCommand(x, id, deliveryId) =>
val rc = ReceiptConfirmed(id, deliveryId)
if (id == nextId) {
// optimistically execute
if (isValid(x)) {
updateState(x)
other ! Command(x) // assuming idempotent receiver
} else sender() ! Invalid(x)
nextId += 1
// then take care of reliable delivery handling
persistAsync(rc) { rc =>
sender() ! rc
}
} else if (id < nextId) sender() ! rc
}
...
}
36. The Future is Bright Again!
36
source: http://freewallpapersbackgrounds.com/
37. To Recapitulate:
37
Problem Choices
Transactional Storage
monolithic or distributed/partitioned,
serializable or eventually consistent
Cluster Management
strong consensus or convergent gossip,
consistency or availability
Message Delivery
at-most-once, at-least-once,
almost-exactly-once, effectively-exactly-once
… …
38. Conclusion: Consistency à la Carte
• not all parts of a system have the same needs
• expend the effort only where necessary
• idempotency does not come for free
• confirmation handling cannot always be abstracted away
• incur the runtime overhead only where needed
• persistence round-trips take time
• consensus protocols take more time
• include required semantics in system design
38
39. Remember that the trade-off is always between
consistency and availability.
And latency.