Data stream processing platforms and microservices platform infrastructure and strategies are converging. As we edge towards larger, more complex and decoupled systems, combined with the continual growth of the global information graph, our frontiers of unsolved challenges grow equally as fast. Central challenges for distributed systems include persistence strategies across DCs, zones or regions, network partitions, data optimization, system stability in all phases.
How does leveraging CRDTs and Event Sourcing address several core distributed systems challenges? What are useful strategies and patterns involved in the design, deployment, and running of stateful and stateless applications for the cloud, for example with Kubernetes. Combined with code samples, we will see how Akka Cluster, Multi-DC Persistence, Split Brain, Sharding and Distributed Data can help solve these problems.
2. @helenaedelson
Helena Edelson
● Principal Engineer @ Lightbend
● Member of the Akka team
● Former: Apple, Crowdstrike, VMware,
SpringSource, Tuplejump
● github.com/helena
● twitter.com/helenaedelson
● speakerdeck.com/helenaedelson
Data, Analytics & ML Platform Infrastructure and Cloud Engineer
Former biologist
4. @helenaedelson
When systems reach a critical level of dynamism we have to change our way of
modeling and designing them
• Stateful in a stateless world
• Automation of everything - Ops, *aaS platforms
• Persistence strategies across DCs, zones and regions
• Data and query optimization
• System availability and stability in all states of deployment and rolling restarts
• Leveraging AI / ML to
Rethinking Strategies
5. @helenaedelson
Computational model embracing non-determinism
- Actor Model of Computation, Carl Hewitt
• Mathematical theory treating "Actors" as primitives of concurrent computation
• Framework for a theoretical understanding of concurrency
• Asynchronous communication
• Stateful isolated processes
• Non-observable state within
• Decoupling in space and time
The Network and Autonomous Processes
6. @helenaedelson
Principles that Akka stands on can be traced back to the ’70s and ’80s
• Carl Hewitt invented the Actor Model, early 70s
• Jim Gray and Pat Helland on the Tandem System, 80s
• Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986
Look Back Before Looking Forward
7. @helenaedelson
• From the ’40s and still being heavily developed today across many fields of
research and application in industry.
• 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John
von Neumann, Los Alamos National Laboratory
• 1970s: Conway's Game of Life
• Asynchronous Cellular Automaton
Complex Adaptive Systems, Systems Theory,
early AI
8. @helenaedelson
Can solve problems difficult or impossible for an individual agent or a monolithic
system to solve
• The foundations for artificial neural networks and NLP
• Composed of multiple autonomous agents, interacting to achieve common goals
• Decentralized, no control point of decisions making
• More fault tolerant, no single point of failure
• Reach higher degrees of dependability
Multi-Agent Systems (MAS)
9. @helenaedelson@helenaedelson
Complex Adaptive Systems (CAS)
Self-Organization
Theory
Emergence
Synchronization
Amplification
Distributed
Networks
cellular
automata
Feedback
Loops
Systems
Evolution
Swarming
localAsynchronous
Unpredictable
Non-Linear
Adaptive
Versatile
13. @helenaedelson
• Stateful - in-memory yet durable and resilient state
• Long-lived - lifecycle is not bound to a specific session, context available until
explicitly destroyed
• Virtual - location transparent and not bound to a physical location
• Addressable - referenced through a stable address
Akka Actors Also Happen To Be
20. @helenaedelson
• Complex Event Processing (CEP) - developed 1989-1995 to analyze event-driven simulations of
distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large,
distributed, time-sensitive systems
• Stream Processing - mid-1990s research in real-time event data analysis, internet companies
processing large number of events
• Event Sourcing (ES) - from domain-driven design and enterprise development, processing very
complex data models with often smaller datasets than internet companies
• Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES
• Also - CDC
Structuring data as a stream of events
21. @helenaedelson
• How data from system behavior is structured
• Capture all changes as a sequence of events in time
• Store events as an immutable event log / append-only storage
• Preserves the happened-before causality of events
• Replay event log to reconstruct state within a given time window or all
Event Sourcing
22. @helenaedelson
Requirements - forensics
• Auditable - what is the current state and how it arrived there
• Causality - observe and analyze a system's causal structure
Applications For ES In Distributed
Asynchronous Systems
For example
• Cybersecurity and Vulnerability Detection
• Banking - what is the account balance and how did it arrive at that
• Click stream
• Accounting & Ledgers
• Shopping Cart
• Anything with a sequence of events that lead to X which must be preserved
23. @helenaedelson
A pattern decoupling the write path (commands) from the read path (queries)
• Different access patterns and differing ratios of reads to writes is typical
• Different schemas / data structures
• Typically different teams around orgs owning the write and using/owning the read
• No reason to share structure and bad practice (no monolith, loose coupling, etc.)
• Command - Writers / Publishers publish without having awareness who needs to
receive it or how to reach them (location, protocol...)
• Query - Readers / Subscribers should be able to subscribe and asynchronously receive
from topics of interest
Command Query Responsibility
Segregation (CQRS)
24. @helenaedelson
My old diagram from 3 years ago: Kafka Summit:
Real Time Bidding (RTB)
The write path and model is naturally separate and differs from the read:
25. @helenaedelson
• Ingest large amounts of data, from multiple
sources, sometimes bursty so it can't overload
the system
• Write the raw data to a store so that
• when algorithms change I can run the data
stream over for new meaning
• when nodes or applications fail I can replay
data from a checkpoint to recover
• Route the event streams to my ML/Analytics
streams
It Doesn't Matter What We Call It
or Whether It's Microservices Or A
Streaming Data Pipeline
• Process and aggregate inbound data and store
aggregates for querying historical against the
stream
• Not loose data
• Be secure, probably encrypt/decrypt everything
• Not pay massive cloud and data storage fees
• Be sure my team can handle infrastructure
TOC
28. @helenaedelson
Akka Persistence Stateful Actors
• Enables stateful actors to persist their state for recovery and replay from failure
and error
• Events persisted to storage, nothing is mutated (no read-modify-write)
• Allows higher transaction rates and efficient replication
• Only events received by the actor are persisted
• Snapshotting for checkpoint replay
• At least once message delivery semantics
Event Stream As Replication Fabric
29. @helenaedelson
Connect different event logs with Event-sourced processors for event processing
pipelines or graphs
• Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and
more
• Built-in: in-memory heap based journal, local file-system based snapshot-store
and LevelDB based journal
Storage Plugins
30. @helenaedelson
• Your algorithms have changed, you need to replay historic data against the new
logic
• Rolling upgrade, restart, cluster migration
• Error, e.g. after a JVM crash
• Failure, e.g. cluster nodes or a DC went down, a network outage or partition
• Cloud compute layer planned maintenance restarts
• Application throws exception, if a persistent Actor is configured to restart by a
supervisor
Replay Reasons
31. @helenaedelson
Akka out of the box gives us tooling for each of these steps:
• Failure awareness and lifecycle
• Save state of failed node before failure
• Load state that was in flight at time of failure (define time slice)
• Replay from a checkpoint in a snapshot or run the full history
• Resume operations
Failure And Recovery
33. @helenaedelson
● Decentralized peer-to-peer
● Cluster Formation and membership service
● Communication and Consensus
● Leader and Roles
● Cluster Lifecycle and Events
● Failure Detector
● Self-Healing
● CoordinatedShutdown
Akka Cluster: Quick Premise
34. @helenaedelson
Cluster User API
• What roles am I in, what is my address
• Join, Leave, Down
• Programatic membership control
• Register listeners to cluster events
• Startup when configurable cluster size
reached
• Highly tunable behavior
40. @helenaedelson
• ClusterDomainEvent: base type
• MemberUp: member status changed to Up
• UnreachableMember: member considered unreachable by failure detector
• MemberRemoved: member completely removed from the cluster
• MemberEvent: member status change Up, Removed
• Leader events
• Reachability events
Cluster Events
41. @helenaedelson
• CurrentClusterState: current snapshot state of the cluster, sent to new
subscribers, unless InitialStateAsEvents specified
• InitialStateAsEvents to receive messages which replay events to restore the
current snapshot of the cluster state
Cluster State
44. @helenaedelson
(leader)
• Masterless
• No Leader Election
• Role of the leader: only one
who can change status
• joining to up
• exiting to removed
Leader decisions are local to
DC
Cluster Leader
47. @helenaedelson
Cluster Membership State
A CRDT which can be deterministically merged
Joining
Up
Leaving
Exiting
removedDown
User Action
Join
Leader
Action
User Action
Leave Leader
Action
Leader
Action
User Action
Down
54. @helenaedelson
Cluster Singleton
Single point of cluster-wide decisions or coordination
ClusterSingletonManager
ClusterSingletonManager
(oldest)
SingletonActor
ClusterSingletonManager
58. @helenaedelson
Cluster Singleton: On Failure
(oldest)
Failover
Message
ClusterSingletonManager
SingletonActorDowned or Network Partition
ClusterSingletonProxy
ClusterSingletonManager
59. @helenaedelson
Strong Consistency Always Available
Guarantees one instance of a particular
actor type per cluster
Cluster Singleton
doc.akka.io/docs/akka/current/scala/cluster-singleton
61. @helenaedelson
An approach to eventual distributed consistency
• Replicate data across the network
• Concurrent updates from different nodes without coordination
• Mathematical properties guarantee eventual consistency
• Updates execute immediately, unaffected by network faults
• Consistency without consensus
• Highly scalable and fault tolerant
Conflict-Free Replicated Data Types (CRDT)
A comprehensive study of Convergent and Commutative Replicated Data Types
62. @helenaedelson
A replicated counter, which converges because the increment / decrement operations
commute
• Service Discovery
• Shopping Cart
• Priority on low latency and full availability
• Computation in delay-tolerant networks
• Data aggregation
• Partition-tolerant cloud computing
• Collaborative text editing
Application Of CRDTs
A few implementations:
• Riak Data Types
• SoundCloud Roshi
• Akka Distributed Data
63. @helenaedelson
1976: The maintenance of duplicate databases, Paul Johnson, Robert Thomas
1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein
1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M.
Satyanarayanan, R. Sidebotham, M. West
1988: Commutativity-based concurrency control for abstract data types, W. Weihl
1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs
1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek
1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern
1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura
1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura
2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia
2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C.
Baquero, M. Zawirski
Not New
64. @helenaedelson
• Low latency and high availability
• Data availability despite network partitions
• Nodes concurrently update as multi-master
• Async state replication across the cluster
• Granular control of consistency level for reads and writes
• Key-value store like API
Akka Distributed Data
doc.akka.io/docs/akka/current/scala/distributed-data
Replicated in-memory data store using CvRDT to share data between cluster nodes
65. @helenaedelson
Concurrent updates from different nodes resolve via the monotonic merge function,.
Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement
Registers Flag toggle boolean, LWWRegister - Last Write Wins register
Sets GSet grow-only merge by union, ORSet observer-remove version vector
Maps ORMap, ORMultiMap, LWWMap, PNCounterMap
Graphs DAG
Composable For More Advanced Types
A comprehensive study of Convergent and Commutative Replicated Data Types
66. @helenaedelson
Delta State CRDTs (δ-CRDTs)
• A way to reduce the need for sending the full state for updates
• Sending only what changed
• Merging done on the receiving side
• Eventually consistent by default, and supports opt-in causal
consistency
Delta State Replicated Data Types
GCounter
GSet
PNCounter
PNCounterMap
LWWMap
ORMap
ORMultiMap
ORSet
LWWRegister
75. @helenaedelson
• By default the data is only kept in memory and replicated to other nodes
• If all nodes are stopped the data is lost
• You can configure it to store on the local disk on each node (LMDB)
• Or implement your own to another store via the trait
• It will be loaded the next time the replicator is started
Configurable Durable Storage
76. @helenaedelson
Strong Consistency Always Available
doc.akka.io/docs/akka/current/distributed-data
Distributed Data
Eventually consistent - always accepts writes
77. @helenaedelson
• Needing high consistency over availability and low latency
• Big Data - not currently intended for billions of entries
• When a new node is added to the cluster all entries are propagated to it,
hence top level entries should not exceed 100000
• Data is held in memory
• If not using a delta-CRDT, when a data entry is changed the full state of that
entry may be replicated to other nodes.
Not Designed For
78. @helenaedelson
Cluster Sharding
Scale, Resilience & Consistency
• Automatically distribute entities of the same type over several nodes
• Balance resources (memory, disk space, network traffic) across
multiple nodes for scalability
• Location transparency: Interact by logical ID
• Increased fault tolerance - relocation on failure
Life beyond Distributed Transactions
Node 1
SR1
S1 S2 S3
79. @helenaedelson
Each Entity Is A Consistency Boundary
Sender on Node 1
Local ShardRegion
Shards: groups of entities
Node 1
SR 1
S1 S2 S3
Your Code, Supervised By Shards
Message(gid)
80. @helenaedelson
• Creates entity actors on demand
• Supervises group of entities - defined by the shard ID extraction
N-Shards Per Cluster Node
Entity B-1
SR2
SC
SR1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
SR3
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
81. @helenaedelson
• Creates and supervises its shards
• Knows how to route messages by routing key
ShardRegion Per Cluster Node
Envelope(“c-1”)
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
Node 1
Node 2 Node 3
82. @helenaedelson
• Stores Shard to Region mappings with Akka Persistence
• Monitors all cluster node status
• If the SC goes down it starts up on another node and
replays the state
Shard Coordination
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
(Cluster Singleton)
ShardRegion 1
ShardRegion 2
ShardRegion 3
83. @helenaedelson
Start Cluster Sharding On Node
Sending data
Your Entity ID
Extraction function
Your Shard ID
Extraction function
Your custom shard
allocation strategy
Your Envelope type
Or use built-in
HashExtractor
84. @helenaedelson
Cluster Sharding: Failover
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
ShardCoordinator
Downed
Location Transparency
Failover
Entity C-1
Shard C
ShardRegion 1
ShardRegion 2Envelope(“c-1”)
85. @helenaedelson
Strong Consistency Always Available
Each entity is a boundary of consistency
Guarantees one instance per entity type at a time per cluster
doc.akka.io/docs/akka/current/scala/cluster-sharding
Cluster Sharding
86. @helenaedelson
"Serverless is a new generation of platform-as-a-service offerings where
the infrastructure provider takes responsibility for receiving client
requests and responding to them, capacity planning, task scheduling,
and operational monitoring. Developers need to worry only about the
logic for processing client requests."
- Adzic et al
Serverless computing: economic and architectural impact
Serverless
87. @helenaedelson
• Automated infrastructure running in a container pool
• A classic data-shipping architecture - we move data to the code, not the other
way round
• Pay be execution time
• Autoscales with load
• Event driven
• Stateless
• Ephemeral (5-15 minutes)
FaaS
89. @helenaedelson
• Load and event spikes needing massive parallelism
• Scaling from 0 to 10000s requests and down to zero
• Simplifies delivery of scale and availability
• As integration layer between various (ephemeral and durable) data sources
• Processing stateless intensive workloads
• As data backbone moving data from A to B and transforming it
• Can work well for event-driven use cases
What Is FaaS Good At Currently?
91. @helenaedelson
• Functions handle only one event source
• Functions are stateless, ephemeral, and short-lived
• Computational context easily lost
• Limited options for managing and coordinating distributed state
• Limited options for the right consistency guarantees
• Limited options for durable state, that is scalable and available
• Expensive to load and store state from storage repeatedly
Limitations With Serverless
Distributed state is not well supported for complex distributed data workflows
92. @helenaedelson
• No direct communication which means applications must pub-sub all data over a
storage medium
• Too high latency for general purpose distributed computing problems
For a discussion on this, and other limitations with FaaS read the paper,
“Serverless Computing: One Step Forward, Two Steps Back”
by Joe Hellerstein, et al.
FaaS Does Not Have Addressability
97. @helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
Knative Events
User Function
(JavaScript, Go, Java,…)
KNative Serving of Stateful Functions
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)
gRPC
98. @helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
User Function
(JavaScript, Go, Java,…)
Powered by Akka Cluster Sidecars
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Akka Sidecar
Akka Sidecar
Akka Sidecar
Akka Cluster
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)