SlideShare uma empresa Scribd logo
1 de 101
Baixar para ler offline
@helenaedelson
Toward Predictability
and Stability
At the edge of chaos
@helenaedelson
Helena Edelson
● Principal Engineer @ Lightbend
● Member of the Akka team
● Former: Apple, Crowdstrike, VMware,
SpringSource, Tuplejump
● github.com/helena
● twitter.com/helenaedelson
● speakerdeck.com/helenaedelson
Data, Analytics & ML Platform Infrastructure and Cloud Engineer
Former biologist
@helenaedelson
Word Salad
Behind the buzzwords
©
@helenaedelson
When systems reach a critical level of dynamism we have to change our way of
modeling and designing them
• Stateful in a stateless world
• Automation of everything - Ops, *aaS platforms
• Persistence strategies across DCs, zones and regions
• Data and query optimization
• System availability and stability in all states of deployment and rolling restarts
• Leveraging AI / ML to
Rethinking Strategies
@helenaedelson
Computational model embracing non-determinism
- Actor Model of Computation, Carl Hewitt
• Mathematical theory treating "Actors" as primitives of concurrent computation
• Framework for a theoretical understanding of concurrency
• Asynchronous communication
• Stateful isolated processes
• Non-observable state within
• Decoupling in space and time
The Network and Autonomous Processes
@helenaedelson
Principles that Akka stands on can be traced back to the ’70s and ’80s
• Carl Hewitt invented the Actor Model, early 70s
• Jim Gray and Pat Helland on the Tandem System, 80s
• Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986
Look Back Before Looking Forward
@helenaedelson
• From the ’40s and still being heavily developed today across many fields of
research and application in industry.
• 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John
von Neumann, Los Alamos National Laboratory
• 1970s: Conway's Game of Life
• Asynchronous Cellular Automaton
Complex Adaptive Systems, Systems Theory,
early AI
@helenaedelson
Can solve problems difficult or impossible for an individual agent or a monolithic
system to solve
• The foundations for artificial neural networks and NLP
• Composed of multiple autonomous agents, interacting to achieve common goals
• Decentralized, no control point of decisions making
• More fault tolerant, no single point of failure
• Reach higher degrees of dependability
Multi-Agent Systems (MAS)
@helenaedelson@helenaedelson
Complex Adaptive Systems (CAS)
Self-Organization
Theory
Emergence
Synchronization
Amplification
Distributed
Networks
cellular
automata
Feedback
Loops
Systems
Evolution
Swarming
localAsynchronous
Unpredictable
Non-Linear
Adaptive
Versatile
@helenaedelson
Akka
ActorSystem
Message
Message
Actor
Actor
Actor
@helenaedelson
Actor Task Delegation & Supervision
ActorSystem Hierarchy
@helenaedelson
Akka Cluster: Distributed & Multi-DC
JVM
JVM
ActorSystem
ActorSystem
Message
Message
Actor
Actor
Actor
Message
@helenaedelson
• Stateful - in-memory yet durable and resilient state
• Long-lived - lifecycle is not bound to a specific session, context available until
explicitly destroyed
• Virtual - location transparent and not bound to a physical location
• Addressable - referenced through a stable address
Akka Actors Also Happen To Be
@helenaedelson
Consistency vs Availability
Strong Consistency Always Available
Operational Complexity
Total Cost of Ownership (TCO)
@helenaedelson
Consistency vs Availability
Strong Consistency Always Available
Node 1 Node 2
Partition Tolerance
Conflicting goals to weigh against each other
@helenaedelson
Finding Balance
CAP, Operational Complexity and the Network
@helenaedelson
Everything We Do
Is About Data
@helenaedelson
Everything We Do
Delivering Meaning
Is Data
@helenaedelson
Stream Processing
Event Sourcing
CQRS
A few patterns and approaches to event processing
@helenaedelson
• Complex Event Processing (CEP) - developed 1989-1995 to analyze event-driven simulations of
distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large,
distributed, time-sensitive systems
• Stream Processing - mid-1990s research in real-time event data analysis, internet companies
processing large number of events
• Event Sourcing (ES) - from domain-driven design and enterprise development, processing very
complex data models with often smaller datasets than internet companies
• Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES
• Also - CDC
Structuring data as a stream of events
@helenaedelson
• How data from system behavior is structured
• Capture all changes as a sequence of events in time
• Store events as an immutable event log / append-only storage
• Preserves the happened-before causality of events
• Replay event log to reconstruct state within a given time window or all
Event Sourcing
@helenaedelson
Requirements - forensics
• Auditable - what is the current state and how it arrived there
• Causality - observe and analyze a system's causal structure
Applications For ES In Distributed
Asynchronous Systems
For example
• Cybersecurity and Vulnerability Detection
• Banking - what is the account balance and how did it arrive at that
• Click stream
• Accounting & Ledgers
• Shopping Cart
• Anything with a sequence of events that lead to X which must be preserved
@helenaedelson
A pattern decoupling the write path (commands) from the read path (queries)
• Different access patterns and differing ratios of reads to writes is typical
• Different schemas / data structures
• Typically different teams around orgs owning the write and using/owning the read
• No reason to share structure and bad practice (no monolith, loose coupling, etc.)
• Command - Writers / Publishers publish without having awareness who needs to
receive it or how to reach them (location, protocol...)
• Query - Readers / Subscribers should be able to subscribe and asynchronously receive
from topics of interest
Command Query Responsibility
Segregation (CQRS)
@helenaedelson
My old diagram from 3 years ago: Kafka Summit:
Real Time Bidding (RTB)
The write path and model is naturally separate and differs from the read:
@helenaedelson
• Ingest large amounts of data, from multiple
sources, sometimes bursty so it can't overload
the system
• Write the raw data to a store so that
• when algorithms change I can run the data
stream over for new meaning
• when nodes or applications fail I can replay
data from a checkpoint to recover
• Route the event streams to my ML/Analytics
streams
It Doesn't Matter What We Call It
or Whether It's Microservices Or A
Streaming Data Pipeline
• Process and aggregate inbound data and store
aggregates for querying historical against the
stream
• Not loose data
• Be secure, probably encrypt/decrypt everything
• Not pay massive cloud and data storage fees
• Be sure my team can handle infrastructure
TOC
@helenaedelson
Buzzwords Are For
Analysts
@helenaedelson
Boundaries between
Microservices and Stream
Processing are gone
@helenaedelson
Akka Persistence Stateful Actors
• Enables stateful actors to persist their state for recovery and replay from failure
and error
• Events persisted to storage, nothing is mutated (no read-modify-write)
• Allows higher transaction rates and efficient replication
• Only events received by the actor are persisted
• Snapshotting for checkpoint replay
• At least once message delivery semantics
Event Stream As Replication Fabric
@helenaedelson
Connect different event logs with Event-sourced processors for event processing
pipelines or graphs
• Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and
more
• Built-in: in-memory heap based journal, local file-system based snapshot-store
and LevelDB based journal
Storage Plugins
@helenaedelson
• Your algorithms have changed, you need to replay historic data against the new
logic
• Rolling upgrade, restart, cluster migration
• Error, e.g. after a JVM crash
• Failure, e.g. cluster nodes or a DC went down, a network outage or partition
• Cloud compute layer planned maintenance restarts
• Application throws exception, if a persistent Actor is configured to restart by a
supervisor
Replay Reasons
@helenaedelson
Akka out of the box gives us tooling for each of these steps:
• Failure awareness and lifecycle
• Save state of failed node before failure
• Load state that was in flight at time of failure (define time slice)
• Replay from a checkpoint in a snapshot or run the full history
• Resume operations
Failure And Recovery
@helenaedelson
Stateful Clusters
• Cluster Singleton
• Distributed Data
• Cluster Sharding
• Split Brain Resolver
• Distributed Lock & Kubernetes
• Multi-DC
• Cluster Bootstrapping & Service Discovery
• Cluster Management APIs
@helenaedelson
● Decentralized peer-to-peer
● Cluster Formation and membership service
● Communication and Consensus
● Leader and Roles
● Cluster Lifecycle and Events
● Failure Detector
● Self-Healing
● CoordinatedShutdown
Akka Cluster: Quick Premise
@helenaedelson
Cluster User API
• What roles am I in, what is my address
• Join, Leave, Down
• Programatic membership control
• Register listeners to cluster events
• Startup when configurable cluster size
reached
• Highly tunable behavior
@helenaedelson
Cluster Communication
S
S
S
S
S
(leader)
@helenaedelson
Heartbeats & Failure Detection
A is unreachable!
S
S
S
S
S
🤢
A
(leader)
@helenaedelson
Failure Detector
@helenaedelson
S
S
S
S
A
A is unreachable
😵
(leader)
Failure Detector
@helenaedelson
A is reachable again
S
S
S
S
S
🤕
A
(leader)
Failure Detector
@helenaedelson
• ClusterDomainEvent: base type
• MemberUp: member status changed to Up
• UnreachableMember: member considered unreachable by failure detector
• MemberRemoved: member completely removed from the cluster
• MemberEvent: member status change Up, Removed
• Leader events
• Reachability events
Cluster Events
@helenaedelson
• CurrentClusterState: current snapshot state of the cluster, sent to new
subscribers, unless InitialStateAsEvents specified
• InitialStateAsEvents to receive messages which replay events to restore the
current snapshot of the cluster state
Cluster State
@helenaedelson
Gossip Protocol
@helenaedelson
Gossip Convergence
The cluster state is a CRDT which can be deterministically merged
@helenaedelson
(leader)
• Masterless
• No Leader Election
• Role of the leader: only one
who can change status
• joining to up
• exiting to removed
Leader decisions are local to
DC
Cluster Leader
@helenaedelson
Cluster Leader
@helenaedelson
[api]
[api]
[worker, backend]
[worker]
[worker]
Cluster Roles
@helenaedelson
Cluster Membership State
A CRDT which can be deterministically merged
Joining
Up
Leaving
Exiting
removedDown
User Action
Join
Leader
Action
User Action
Leave Leader
Action
Leader
Action
User Action
Down
@helenaedelson
Cluster Member Node Lifecycle
Node Lifecycle: failure
Node Lifecycle: clean startup and graceful , coordinated shutdown
@helenaedelson
Network Partitions
Split Brain
A, E & D
Unreachable
A
E
B
S
S
S
B & C
Unreachable
B
C
D
@helenaedelson
Network Partition: Split Brain
Cluster State Cluster State
@helenaedelson
Network Partition: Split Brain
Cluster State Cluster State
@helenaedelson
developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver
Split Brain Resolver (SBR) Strategies
@helenaedelson
developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver
SBR Strategy: Keep Majority
Keep Majority:
keep = 3
@helenaedelson
Cluster Singleton
Single point of cluster-wide decisions or coordination
ClusterSingletonManager
ClusterSingletonManager
(oldest)
SingletonActor
ClusterSingletonManager
@helenaedelson
Cluster Singleton
ClusterSingletonProxy
Message
ClusterSingletonManager
ClusterSingletonManager
(oldest)
ClusterSingletonManager
SingletonActor
@helenaedelson
Cluster Singleton
ClusterSingletonProxy
Message
ClusterSingletonManager
ClusterSingletonManager
(oldest)
ClusterSingletonManager
SingletonActor
@helenaedelson
Cluster Singleton
@helenaedelson
Cluster Singleton: On Failure
(oldest)
Failover
Message
ClusterSingletonManager
SingletonActorDowned or Network Partition
ClusterSingletonProxy
ClusterSingletonManager
@helenaedelson
Strong Consistency Always Available
Guarantees one instance of a particular
actor type per cluster
Cluster Singleton
doc.akka.io/docs/akka/current/scala/cluster-singleton
@helenaedelson
Distributed Data, CRDTs
& Eventual Consistency
Partition and delay tolerant data availability with multi-master replication
@helenaedelson
An approach to eventual distributed consistency
• Replicate data across the network
• Concurrent updates from different nodes without coordination
• Mathematical properties guarantee eventual consistency
• Updates execute immediately, unaffected by network faults
• Consistency without consensus
• Highly scalable and fault tolerant
Conflict-Free Replicated Data Types (CRDT)
A comprehensive study of Convergent and Commutative Replicated Data Types
@helenaedelson
A replicated counter, which converges because the increment / decrement operations
commute
• Service Discovery
• Shopping Cart
• Priority on low latency and full availability
• Computation in delay-tolerant networks
• Data aggregation
• Partition-tolerant cloud computing
• Collaborative text editing
Application Of CRDTs
A few implementations:
• Riak Data Types
• SoundCloud Roshi
• Akka Distributed Data
@helenaedelson
1976: The maintenance of duplicate databases, Paul Johnson, Robert Thomas
1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein
1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M.
Satyanarayanan, R. Sidebotham, M. West
1988: Commutativity-based concurrency control for abstract data types, W. Weihl
1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs
1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek
1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern
1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura
1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura
2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia
2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C.
Baquero, M. Zawirski
Not New
@helenaedelson
• Low latency and high availability
• Data availability despite network partitions
• Nodes concurrently update as multi-master
• Async state replication across the cluster
• Granular control of consistency level for reads and writes
• Key-value store like API
Akka Distributed Data
doc.akka.io/docs/akka/current/scala/distributed-data
Replicated in-memory data store using CvRDT to share data between cluster nodes
@helenaedelson
Concurrent updates from different nodes resolve via the monotonic merge function,.
Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement
Registers Flag toggle boolean, LWWRegister - Last Write Wins register
Sets GSet grow-only merge by union, ORSet observer-remove version vector
Maps ORMap, ORMultiMap, LWWMap, PNCounterMap
Graphs DAG
Composable For More Advanced Types
A comprehensive study of Convergent and Commutative Replicated Data Types
@helenaedelson
Delta State CRDTs (δ-CRDTs)
• A way to reduce the need for sending the full state for updates
• Sending only what changed
• Merging done on the receiving side
• Eventually consistent by default, and supports opt-in causal
consistency
Delta State Replicated Data Types
GCounter
GSet
PNCounter
PNCounterMap
LWWMap
ORMap
ORMultiMap
ORSet
LWWRegister
@helenaedelson
Replicator
Replicator
Replicator
Replicator
Replicator
Replicator
in memory key-value store
@helenaedelson
Replicator
Replicator
Replicator
Replicator
Replicator
Update(key, ddata)
Get(key)
Subscribe(key, actor)
Update(key, delta)
Replicator Protocol
Delete(key)
@helenaedelson
Simple Replicated Counter
Monotonic sequence: increment / decrement
@helenaedelson
Custom CvRDTs
@helenaedelson
Granular Consistency Levels
• strong consistency
• highest latency
• lowest availability
Majority is N/2 + 1
(nodes_written + nodes_read) > N
@helenaedelson
Granular Consistency Levels
• eventual consistency
• low latency
• high availability
(nodes_written + nodes_read) > N
@helenaedelson
Capacity Tracker
} put in common trait
@helenaedelson
CDC Capacity Listener
} put in common trait
@helenaedelson
• By default the data is only kept in memory and replicated to other nodes
• If all nodes are stopped the data is lost
• You can configure it to store on the local disk on each node (LMDB)
• Or implement your own to another store via the trait
• It will be loaded the next time the replicator is started
Configurable Durable Storage
@helenaedelson
Strong Consistency Always Available
doc.akka.io/docs/akka/current/distributed-data
Distributed Data
Eventually consistent - always accepts writes
@helenaedelson
• Needing high consistency over availability and low latency
• Big Data - not currently intended for billions of entries
• When a new node is added to the cluster all entries are propagated to it,
hence top level entries should not exceed 100000
• Data is held in memory
• If not using a delta-CRDT, when a data entry is changed the full state of that
entry may be replicated to other nodes.
Not Designed For
@helenaedelson
Cluster Sharding
Scale, Resilience & Consistency
• Automatically distribute entities of the same type over several nodes
• Balance resources (memory, disk space, network traffic) across
multiple nodes for scalability
• Location transparency: Interact by logical ID
• Increased fault tolerance - relocation on failure
Life beyond Distributed Transactions
Node 1
SR1
S1 S2 S3
@helenaedelson
Each Entity Is A Consistency Boundary
Sender on Node 1
Local ShardRegion
Shards: groups of entities
Node 1
SR 1
S1 S2 S3
Your Code, Supervised By Shards
Message(gid)
@helenaedelson
• Creates entity actors on demand
• Supervises group of entities - defined by the shard ID extraction
N-Shards Per Cluster Node
Entity B-1
SR2
SC
SR1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
SR3
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
@helenaedelson
• Creates and supervises its shards
• Knows how to route messages by routing key
ShardRegion Per Cluster Node
Envelope(“c-1”)
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
ShardRegion 1
ShardRegion 2
ShardRegion 3
Node 1
Node 2 Node 3
@helenaedelson
• Stores Shard to Region mappings with Akka Persistence
• Monitors all cluster node status
• If the SC goes down it starts up on another node and
replays the state
Shard Coordination
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
Entity C-1
Shard C
ShardCoordinator
(Cluster Singleton)
ShardRegion 1
ShardRegion 2
ShardRegion 3
@helenaedelson
Start Cluster Sharding On Node
Sending data
Your Entity ID
Extraction function
Your Shard ID
Extraction function
Your custom shard
allocation strategy
Your Envelope type
Or use built-in
HashExtractor
@helenaedelson
Cluster Sharding: Failover
Entity B-1
Shard A
Shard B
Entity A-1
Entity A-2
ShardCoordinator
Downed
Location Transparency
Failover
Entity C-1
Shard C
ShardRegion 1
ShardRegion 2Envelope(“c-1”)
@helenaedelson
Strong Consistency Always Available
Each entity is a boundary of consistency
Guarantees one instance per entity type at a time per cluster
doc.akka.io/docs/akka/current/scala/cluster-sharding
Cluster Sharding
@helenaedelson
"Serverless is a new generation of platform-as-a-service offerings where
the infrastructure provider takes responsibility for receiving client
requests and responding to them, capacity planning, task scheduling,
and operational monitoring. Developers need to worry only about the
logic for processing client requests."
- Adzic et al
Serverless computing: economic and architectural impact
Serverless
@helenaedelson
• Automated infrastructure running in a container pool
• A classic data-shipping architecture - we move data to the code, not the other
way round
• Pay be execution time
• Autoscales with load
• Event driven
• Stateless
• Ephemeral (5-15 minutes)
FaaS
@helenaedelson
Message In
A FaaS Serverless Deployment
User Function
Deployment
Message Out
@helenaedelson
• Load and event spikes needing massive parallelism
• Scaling from 0 to 10000s requests and down to zero
• Simplifies delivery of scale and availability
• As integration layer between various (ephemeral and durable) data sources
• Processing stateless intensive workloads
• As data backbone moving data from A to B and transforming it
• Can work well for event-driven use cases
What Is FaaS Good At Currently?
@helenaedelson
Message In User Function
Deployment
Database
Message Out
Not Serverless
In An Ideal World
FaaS With CRUD
@helenaedelson
• Functions handle only one event source
• Functions are stateless, ephemeral, and short-lived
• Computational context easily lost
• Limited options for managing and coordinating distributed state
• Limited options for the right consistency guarantees
• Limited options for durable state, that is scalable and available
• Expensive to load and store state from storage repeatedly
Limitations With Serverless
Distributed state is not well supported for complex distributed data workflows
@helenaedelson
• No direct communication which means applications must pub-sub all data over a
storage medium
• Too high latency for general purpose distributed computing problems
For a discussion on this, and other limitations with FaaS read the paper,
“Serverless Computing: One Step Forward, Two Steps Back”
by Joe Hellerstein, et al.
FaaS Does Not Have Addressability
@helenaedelson
Stateful Serverless
Knative, Akka Cluster, gRPC, CRDT
@helenaedelson
Stateful Serverless
Message In
User Function
Deployment
Message Out
State In State Out
We Need Better Models
For Distributed State
@helenaedelson
Serverless Event Sourcing
Command In
User Function
Deployment
Reply Out
Event Log In Events OUt
@helenaedelson
Message In
User Function
Deployment
Message Out
States/Deltas IN States/deltas OUT
Serverless CRDTs
@helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
Knative Events
User Function
(JavaScript, Go, Java,…)
KNative Serving of Stateful Functions
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)
gRPC
@helenaedelson
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Kubernetes Pod
Knative stateful serving
User Function
(JavaScript, Go, Java,…)
Powered by Akka Cluster Sidecars
User Function
(JavaScript, Go, Java,…)
User Function
(JavaScript, Go, Java,…)
Akka Sidecar
Akka Sidecar
Akka Sidecar
Akka Cluster
Distributed Datastore
(Cassandra, DynamoDB, Spanner,…)
@helenaedelson
Get Involved
github.com/lightbend/stateful-serverless
bit.ly/stateful-serverless-intro
@helenaedelson
Find Out More
• akka.io/docs
• developer.lightbend.com - sample
distributed workers project
• github.com/akka/akka-samples - many
sample projects
• discuss.akka.io - forums
• academy.lightbend.com
• developer.lightbend.com/docs/akka-
commercial-addons
• lightbend.com/videos-and-webinars
• lightbend.com/learn
@helenaedelson
Thank you
speakerdeck.com/helenaedelson
@helenaedelson
github.com/helena
Slides

Mais conteúdo relacionado

Mais procurados

Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
HostedbyConfluent
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
confluent
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 

Mais procurados (20)

Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafka
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Architecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant AnalyticsArchitecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant Analytics
 
What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
 
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart LabsJun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
Jun Rao, Confluent | Kafka Summit SF 2019 Keynote ft. Chris Kasten, Walmart Labs
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Simplify Governance of Streaming Data
Simplify Governance of Streaming Data Simplify Governance of Streaming Data
Simplify Governance of Streaming Data
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 

Semelhante a Toward Predictability and Stability

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
confluent
 

Semelhante a Toward Predictability and Stability (20)

Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Presentation cloud control enterprise manager 12c
Presentation   cloud control enterprise manager 12cPresentation   cloud control enterprise manager 12c
Presentation cloud control enterprise manager 12c
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
Building large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor frameworkBuilding large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor framework
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
Rethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for ScaleRethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for Scale
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and Alerting
 
Product Information - Fuse Management Central 1.0.0
Product Information - Fuse Management Central 1.0.0Product Information - Fuse Management Central 1.0.0
Product Information - Fuse Management Central 1.0.0
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEX
 
Iot cloud service v2.0
Iot cloud service v2.0Iot cloud service v2.0
Iot cloud service v2.0
 

Mais de Helena Edelson

Mais de Helena Edelson (11)

Patterns In The Chaos
Patterns In The ChaosPatterns In The Chaos
Patterns In The Chaos
 
Disorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleDisorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At Scale
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Toward Predictability and Stability

  • 2. @helenaedelson Helena Edelson ● Principal Engineer @ Lightbend ● Member of the Akka team ● Former: Apple, Crowdstrike, VMware, SpringSource, Tuplejump ● github.com/helena ● twitter.com/helenaedelson ● speakerdeck.com/helenaedelson Data, Analytics & ML Platform Infrastructure and Cloud Engineer Former biologist
  • 4. @helenaedelson When systems reach a critical level of dynamism we have to change our way of modeling and designing them • Stateful in a stateless world • Automation of everything - Ops, *aaS platforms • Persistence strategies across DCs, zones and regions • Data and query optimization • System availability and stability in all states of deployment and rolling restarts • Leveraging AI / ML to Rethinking Strategies
  • 5. @helenaedelson Computational model embracing non-determinism - Actor Model of Computation, Carl Hewitt • Mathematical theory treating "Actors" as primitives of concurrent computation • Framework for a theoretical understanding of concurrency • Asynchronous communication • Stateful isolated processes • Non-observable state within • Decoupling in space and time The Network and Autonomous Processes
  • 6. @helenaedelson Principles that Akka stands on can be traced back to the ’70s and ’80s • Carl Hewitt invented the Actor Model, early 70s • Jim Gray and Pat Helland on the Tandem System, 80s • Joe Armstrong, Robert Virding and Mike Williams on Erlang, 1986 Look Back Before Looking Forward
  • 7. @helenaedelson • From the ’40s and still being heavily developed today across many fields of research and application in industry. • 1940s: Cellular automata (CA), originally discovered by Stanislaw Ulam and John von Neumann, Los Alamos National Laboratory • 1970s: Conway's Game of Life • Asynchronous Cellular Automaton Complex Adaptive Systems, Systems Theory, early AI
  • 8. @helenaedelson Can solve problems difficult or impossible for an individual agent or a monolithic system to solve • The foundations for artificial neural networks and NLP • Composed of multiple autonomous agents, interacting to achieve common goals • Decentralized, no control point of decisions making • More fault tolerant, no single point of failure • Reach higher degrees of dependability Multi-Agent Systems (MAS)
  • 9. @helenaedelson@helenaedelson Complex Adaptive Systems (CAS) Self-Organization Theory Emergence Synchronization Amplification Distributed Networks cellular automata Feedback Loops Systems Evolution Swarming localAsynchronous Unpredictable Non-Linear Adaptive Versatile
  • 11. @helenaedelson Actor Task Delegation & Supervision ActorSystem Hierarchy
  • 12. @helenaedelson Akka Cluster: Distributed & Multi-DC JVM JVM ActorSystem ActorSystem Message Message Actor Actor Actor Message
  • 13. @helenaedelson • Stateful - in-memory yet durable and resilient state • Long-lived - lifecycle is not bound to a specific session, context available until explicitly destroyed • Virtual - location transparent and not bound to a physical location • Addressable - referenced through a stable address Akka Actors Also Happen To Be
  • 14. @helenaedelson Consistency vs Availability Strong Consistency Always Available Operational Complexity Total Cost of Ownership (TCO)
  • 15. @helenaedelson Consistency vs Availability Strong Consistency Always Available Node 1 Node 2 Partition Tolerance Conflicting goals to weigh against each other
  • 19. @helenaedelson Stream Processing Event Sourcing CQRS A few patterns and approaches to event processing
  • 20. @helenaedelson • Complex Event Processing (CEP) - developed 1989-1995 to analyze event-driven simulations of distributed systems, abstracting causal event histories, patterns, filtering and aggregation in large, distributed, time-sensitive systems • Stream Processing - mid-1990s research in real-time event data analysis, internet companies processing large number of events • Event Sourcing (ES) - from domain-driven design and enterprise development, processing very complex data models with often smaller datasets than internet companies • Command Query Responsibility Segregation (CQRS) - isn't about events, but often combined with ES • Also - CDC Structuring data as a stream of events
  • 21. @helenaedelson • How data from system behavior is structured • Capture all changes as a sequence of events in time • Store events as an immutable event log / append-only storage • Preserves the happened-before causality of events • Replay event log to reconstruct state within a given time window or all Event Sourcing
  • 22. @helenaedelson Requirements - forensics • Auditable - what is the current state and how it arrived there • Causality - observe and analyze a system's causal structure Applications For ES In Distributed Asynchronous Systems For example • Cybersecurity and Vulnerability Detection • Banking - what is the account balance and how did it arrive at that • Click stream • Accounting & Ledgers • Shopping Cart • Anything with a sequence of events that lead to X which must be preserved
  • 23. @helenaedelson A pattern decoupling the write path (commands) from the read path (queries) • Different access patterns and differing ratios of reads to writes is typical • Different schemas / data structures • Typically different teams around orgs owning the write and using/owning the read • No reason to share structure and bad practice (no monolith, loose coupling, etc.) • Command - Writers / Publishers publish without having awareness who needs to receive it or how to reach them (location, protocol...) • Query - Readers / Subscribers should be able to subscribe and asynchronously receive from topics of interest Command Query Responsibility Segregation (CQRS)
  • 24. @helenaedelson My old diagram from 3 years ago: Kafka Summit: Real Time Bidding (RTB) The write path and model is naturally separate and differs from the read:
  • 25. @helenaedelson • Ingest large amounts of data, from multiple sources, sometimes bursty so it can't overload the system • Write the raw data to a store so that • when algorithms change I can run the data stream over for new meaning • when nodes or applications fail I can replay data from a checkpoint to recover • Route the event streams to my ML/Analytics streams It Doesn't Matter What We Call It or Whether It's Microservices Or A Streaming Data Pipeline • Process and aggregate inbound data and store aggregates for querying historical against the stream • Not loose data • Be secure, probably encrypt/decrypt everything • Not pay massive cloud and data storage fees • Be sure my team can handle infrastructure TOC
  • 28. @helenaedelson Akka Persistence Stateful Actors • Enables stateful actors to persist their state for recovery and replay from failure and error • Events persisted to storage, nothing is mutated (no read-modify-write) • Allows higher transaction rates and efficient replication • Only events received by the actor are persisted • Snapshotting for checkpoint replay • At least once message delivery semantics Event Stream As Replication Fabric
  • 29. @helenaedelson Connect different event logs with Event-sourced processors for event processing pipelines or graphs • Cassandra, Redis, DynamoDB, Couchbase, MongoDB, Hazelcast, JDBC and more • Built-in: in-memory heap based journal, local file-system based snapshot-store and LevelDB based journal Storage Plugins
  • 30. @helenaedelson • Your algorithms have changed, you need to replay historic data against the new logic • Rolling upgrade, restart, cluster migration • Error, e.g. after a JVM crash • Failure, e.g. cluster nodes or a DC went down, a network outage or partition • Cloud compute layer planned maintenance restarts • Application throws exception, if a persistent Actor is configured to restart by a supervisor Replay Reasons
  • 31. @helenaedelson Akka out of the box gives us tooling for each of these steps: • Failure awareness and lifecycle • Save state of failed node before failure • Load state that was in flight at time of failure (define time slice) • Replay from a checkpoint in a snapshot or run the full history • Resume operations Failure And Recovery
  • 32. @helenaedelson Stateful Clusters • Cluster Singleton • Distributed Data • Cluster Sharding • Split Brain Resolver • Distributed Lock & Kubernetes • Multi-DC • Cluster Bootstrapping & Service Discovery • Cluster Management APIs
  • 33. @helenaedelson ● Decentralized peer-to-peer ● Cluster Formation and membership service ● Communication and Consensus ● Leader and Roles ● Cluster Lifecycle and Events ● Failure Detector ● Self-Healing ● CoordinatedShutdown Akka Cluster: Quick Premise
  • 34. @helenaedelson Cluster User API • What roles am I in, what is my address • Join, Leave, Down • Programatic membership control • Register listeners to cluster events • Startup when configurable cluster size reached • Highly tunable behavior
  • 36. @helenaedelson Heartbeats & Failure Detection A is unreachable! S S S S S 🤢 A (leader)
  • 39. @helenaedelson A is reachable again S S S S S 🤕 A (leader) Failure Detector
  • 40. @helenaedelson • ClusterDomainEvent: base type • MemberUp: member status changed to Up • UnreachableMember: member considered unreachable by failure detector • MemberRemoved: member completely removed from the cluster • MemberEvent: member status change Up, Removed • Leader events • Reachability events Cluster Events
  • 41. @helenaedelson • CurrentClusterState: current snapshot state of the cluster, sent to new subscribers, unless InitialStateAsEvents specified • InitialStateAsEvents to receive messages which replay events to restore the current snapshot of the cluster state Cluster State
  • 43. @helenaedelson Gossip Convergence The cluster state is a CRDT which can be deterministically merged
  • 44. @helenaedelson (leader) • Masterless • No Leader Election • Role of the leader: only one who can change status • joining to up • exiting to removed Leader decisions are local to DC Cluster Leader
  • 47. @helenaedelson Cluster Membership State A CRDT which can be deterministically merged Joining Up Leaving Exiting removedDown User Action Join Leader Action User Action Leave Leader Action Leader Action User Action Down
  • 48. @helenaedelson Cluster Member Node Lifecycle Node Lifecycle: failure Node Lifecycle: clean startup and graceful , coordinated shutdown
  • 49. @helenaedelson Network Partitions Split Brain A, E & D Unreachable A E B S S S B & C Unreachable B C D
  • 50. @helenaedelson Network Partition: Split Brain Cluster State Cluster State
  • 51. @helenaedelson Network Partition: Split Brain Cluster State Cluster State
  • 54. @helenaedelson Cluster Singleton Single point of cluster-wide decisions or coordination ClusterSingletonManager ClusterSingletonManager (oldest) SingletonActor ClusterSingletonManager
  • 58. @helenaedelson Cluster Singleton: On Failure (oldest) Failover Message ClusterSingletonManager SingletonActorDowned or Network Partition ClusterSingletonProxy ClusterSingletonManager
  • 59. @helenaedelson Strong Consistency Always Available Guarantees one instance of a particular actor type per cluster Cluster Singleton doc.akka.io/docs/akka/current/scala/cluster-singleton
  • 60. @helenaedelson Distributed Data, CRDTs & Eventual Consistency Partition and delay tolerant data availability with multi-master replication
  • 61. @helenaedelson An approach to eventual distributed consistency • Replicate data across the network • Concurrent updates from different nodes without coordination • Mathematical properties guarantee eventual consistency • Updates execute immediately, unaffected by network faults • Consistency without consensus • Highly scalable and fault tolerant Conflict-Free Replicated Data Types (CRDT) A comprehensive study of Convergent and Commutative Replicated Data Types
  • 62. @helenaedelson A replicated counter, which converges because the increment / decrement operations commute • Service Discovery • Shopping Cart • Priority on low latency and full availability • Computation in delay-tolerant networks • Data aggregation • Partition-tolerant cloud computing • Collaborative text editing Application Of CRDTs A few implementations: • Riak Data Types • SoundCloud Roshi • Akka Distributed Data
  • 63. @helenaedelson 1976: The maintenance of duplicate databases, Paul Johnson, Robert Thomas 1984: Efficient solutions to the replicated log and dictionary problems, Gene Wuu, Arthur Bernstein 1988: Scale and performance in a distributed file system, J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, M. West 1988: Commutativity-based concurrency control for abstract data types, W. Weihl 1989: Concurrency control in groupware systems, C. Ellis, S. Gibbs 1994: Resolving file conflicts in the Ficus file system, P. Reiher, J. Heidemann, D. Ratner, G. Skinner, and G. Popek 1994: Detecting causal relationships in distributed computations: In search of the holy grail, R. Schwarz, F. Mattern 1997: Specification of convergent abstract data types for autonomous mobile computing, C. Baquero, F. Moura 1999: Using structural characteristics for autonomous operation, Carlos Baquero, Francisco Moura 2009: A commutative replicated data type for cooperative editing, N. Preguiça, J. Marquès, M. Shapiro, M. Leţia 2011: A comprehensive study of Convergent and Commutative Replicated Data Types, M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski Not New
  • 64. @helenaedelson • Low latency and high availability • Data availability despite network partitions • Nodes concurrently update as multi-master • Async state replication across the cluster • Granular control of consistency level for reads and writes • Key-value store like API Akka Distributed Data doc.akka.io/docs/akka/current/scala/distributed-data Replicated in-memory data store using CvRDT to share data between cluster nodes
  • 65. @helenaedelson Concurrent updates from different nodes resolve via the monotonic merge function,. Counters GCounter grow-only, PNCounter (2 GCounters) increment decrement Registers Flag toggle boolean, LWWRegister - Last Write Wins register Sets GSet grow-only merge by union, ORSet observer-remove version vector Maps ORMap, ORMultiMap, LWWMap, PNCounterMap Graphs DAG Composable For More Advanced Types A comprehensive study of Convergent and Commutative Replicated Data Types
  • 66. @helenaedelson Delta State CRDTs (δ-CRDTs) • A way to reduce the need for sending the full state for updates • Sending only what changed • Merging done on the receiving side • Eventually consistent by default, and supports opt-in causal consistency Delta State Replicated Data Types GCounter GSet PNCounter PNCounterMap LWWMap ORMap ORMultiMap ORSet LWWRegister
  • 69. @helenaedelson Simple Replicated Counter Monotonic sequence: increment / decrement
  • 71. @helenaedelson Granular Consistency Levels • strong consistency • highest latency • lowest availability Majority is N/2 + 1 (nodes_written + nodes_read) > N
  • 72. @helenaedelson Granular Consistency Levels • eventual consistency • low latency • high availability (nodes_written + nodes_read) > N
  • 75. @helenaedelson • By default the data is only kept in memory and replicated to other nodes • If all nodes are stopped the data is lost • You can configure it to store on the local disk on each node (LMDB) • Or implement your own to another store via the trait • It will be loaded the next time the replicator is started Configurable Durable Storage
  • 76. @helenaedelson Strong Consistency Always Available doc.akka.io/docs/akka/current/distributed-data Distributed Data Eventually consistent - always accepts writes
  • 77. @helenaedelson • Needing high consistency over availability and low latency • Big Data - not currently intended for billions of entries • When a new node is added to the cluster all entries are propagated to it, hence top level entries should not exceed 100000 • Data is held in memory • If not using a delta-CRDT, when a data entry is changed the full state of that entry may be replicated to other nodes. Not Designed For
  • 78. @helenaedelson Cluster Sharding Scale, Resilience & Consistency • Automatically distribute entities of the same type over several nodes • Balance resources (memory, disk space, network traffic) across multiple nodes for scalability • Location transparency: Interact by logical ID • Increased fault tolerance - relocation on failure Life beyond Distributed Transactions Node 1 SR1 S1 S2 S3
  • 79. @helenaedelson Each Entity Is A Consistency Boundary Sender on Node 1 Local ShardRegion Shards: groups of entities Node 1 SR 1 S1 S2 S3 Your Code, Supervised By Shards Message(gid)
  • 80. @helenaedelson • Creates entity actors on demand • Supervises group of entities - defined by the shard ID extraction N-Shards Per Cluster Node Entity B-1 SR2 SC SR1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C SR3 ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3
  • 81. @helenaedelson • Creates and supervises its shards • Knows how to route messages by routing key ShardRegion Per Cluster Node Envelope(“c-1”) Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator ShardRegion 1 ShardRegion 2 ShardRegion 3 Node 1 Node 2 Node 3
  • 82. @helenaedelson • Stores Shard to Region mappings with Akka Persistence • Monitors all cluster node status • If the SC goes down it starts up on another node and replays the state Shard Coordination Entity B-1 Shard A Shard B Entity A-1 Entity A-2 Entity C-1 Shard C ShardCoordinator (Cluster Singleton) ShardRegion 1 ShardRegion 2 ShardRegion 3
  • 83. @helenaedelson Start Cluster Sharding On Node Sending data Your Entity ID Extraction function Your Shard ID Extraction function Your custom shard allocation strategy Your Envelope type Or use built-in HashExtractor
  • 84. @helenaedelson Cluster Sharding: Failover Entity B-1 Shard A Shard B Entity A-1 Entity A-2 ShardCoordinator Downed Location Transparency Failover Entity C-1 Shard C ShardRegion 1 ShardRegion 2Envelope(“c-1”)
  • 85. @helenaedelson Strong Consistency Always Available Each entity is a boundary of consistency Guarantees one instance per entity type at a time per cluster doc.akka.io/docs/akka/current/scala/cluster-sharding Cluster Sharding
  • 86. @helenaedelson "Serverless is a new generation of platform-as-a-service offerings where the infrastructure provider takes responsibility for receiving client requests and responding to them, capacity planning, task scheduling, and operational monitoring. Developers need to worry only about the logic for processing client requests." - Adzic et al Serverless computing: economic and architectural impact Serverless
  • 87. @helenaedelson • Automated infrastructure running in a container pool • A classic data-shipping architecture - we move data to the code, not the other way round • Pay be execution time • Autoscales with load • Event driven • Stateless • Ephemeral (5-15 minutes) FaaS
  • 88. @helenaedelson Message In A FaaS Serverless Deployment User Function Deployment Message Out
  • 89. @helenaedelson • Load and event spikes needing massive parallelism • Scaling from 0 to 10000s requests and down to zero • Simplifies delivery of scale and availability • As integration layer between various (ephemeral and durable) data sources • Processing stateless intensive workloads • As data backbone moving data from A to B and transforming it • Can work well for event-driven use cases What Is FaaS Good At Currently?
  • 90. @helenaedelson Message In User Function Deployment Database Message Out Not Serverless In An Ideal World FaaS With CRUD
  • 91. @helenaedelson • Functions handle only one event source • Functions are stateless, ephemeral, and short-lived • Computational context easily lost • Limited options for managing and coordinating distributed state • Limited options for the right consistency guarantees • Limited options for durable state, that is scalable and available • Expensive to load and store state from storage repeatedly Limitations With Serverless Distributed state is not well supported for complex distributed data workflows
  • 92. @helenaedelson • No direct communication which means applications must pub-sub all data over a storage medium • Too high latency for general purpose distributed computing problems For a discussion on this, and other limitations with FaaS read the paper, “Serverless Computing: One Step Forward, Two Steps Back” by Joe Hellerstein, et al. FaaS Does Not Have Addressability
  • 94. @helenaedelson Stateful Serverless Message In User Function Deployment Message Out State In State Out We Need Better Models For Distributed State
  • 95. @helenaedelson Serverless Event Sourcing Command In User Function Deployment Reply Out Event Log In Events OUt
  • 96. @helenaedelson Message In User Function Deployment Message Out States/Deltas IN States/deltas OUT Serverless CRDTs
  • 97. @helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Knative stateful serving Knative Events User Function (JavaScript, Go, Java,…) KNative Serving of Stateful Functions User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Distributed Datastore (Cassandra, DynamoDB, Spanner,…) gRPC
  • 98. @helenaedelson Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod Knative stateful serving User Function (JavaScript, Go, Java,…) Powered by Akka Cluster Sidecars User Function (JavaScript, Go, Java,…) User Function (JavaScript, Go, Java,…) Akka Sidecar Akka Sidecar Akka Sidecar Akka Cluster Distributed Datastore (Cassandra, DynamoDB, Spanner,…)
  • 100. @helenaedelson Find Out More • akka.io/docs • developer.lightbend.com - sample distributed workers project • github.com/akka/akka-samples - many sample projects • discuss.akka.io - forums • academy.lightbend.com • developer.lightbend.com/docs/akka- commercial-addons • lightbend.com/videos-and-webinars • lightbend.com/learn