Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluent) Kafka Summit London 2019

Event Sourcing, Stream Processing & Serverless
Ben Stopford
Office of the CTO, Confluent

What we’re going to talk about
• Event Sourcing
• What it is and how does it relate to Event Streaming?
• Stream Processing as a kind of “Database”
• What does this mean?
• Serverless Functions
• How do this relate?

Can you do event sourcing
with Kafka?

Popular example: Shopping Cart
DB
Apps
Search
Apps Apps
Database Table matches
what the user sees.

12.42
12.44
12.49
12.50
12.59
Event Sourcing stores events, then derives the
‘current state view’
Apps Apps
DERIVE
Chronological Reduce
Event
Timeseries
of user
activity

Traditional Event Sourcing
(Store immutable events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R MTable of events
Persist events
Apps Apps

Traditional Event Sourcing (Read)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search Monitoring
Apps Apps
Chronological
Reduce on read
(done inside the app)
Query by
customer Id
(+session?)
- No schema migration
- Similar to ’schema on read’

Evidentiary
Accountants don’t use erasers
(e.g. audit, ledger, git)

Replayability
Recover corrupted data after a
programmatic bug

Analytics
Keep the data needed to
extract trends and behaviors
i.e. non-lossy
(e.g. insight, metrics, ML)

• Use a database (any one will do)
• Create a table and insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state
*Aggregate ID in DDD parlance

Traditional Event Sourcing with Kafka
• Use a database Kafka
• Create a table topic insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state

Confusion: You can’t query Kafka by say Customer Id*

Events are a good write model,
but make a tricky read model

CQRS is a tonic: Cache the projection in a ‘View’
Apps
Search Monitoring
Apps Apps
Query by customer Id
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
View
Events/Command
Events accumulate
in the log
Stream Processor
Cache/DB/Ktable etc.

Even with CQRS, Event Sourcing is Hard
CQRS helps, but it’s still quite hard if you’re a CRUD app

What’s the problem?
Harder:
• Eventually Consistent
• Multi-model (Complexity ∝ #Schemas in the log)
• More moving parts
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L A T F O R M
CRUD System CQRS

Eventual Consistency is often good for serving layers
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”

If your system is both simple and transactional:
stick with CRUD and an audit/history table
Trigger
Evidentiary Yes
Replayable N/A to web app
Analytics Yes
CDC

More advanced: Use a Bi-Temporal Database

Use Traditional Event
Sourcing judiciously,
where it makes sense

CQRS comes into its own
when the events move data

Online Transaction Processing: e.g. a Flight Booking System
- Flight price served 10,000 x #bookings
- Consistency required only at booking time

CQRS with event movement
Apps
Search Monitoring
Apps Apps
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
View
Book Flight
Events accumulate
in the log
Apps
Search
Apps
S T R E A M I N G P L A
View
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L A
View
Get Flights
Get Flights
Get Flights
Global Read
Central Write

The exact same logic applies
to microservices

Microservices
Orders Service
Fraud Service
Billing Service
Email Service
Orders

Fraud service doesn’t have to be consistent with the Orders
service because it just creates new data (new events)
Orders Service
Fraud Service
Billing Service
Email Service
Orders
Consistent?

Microservices
Orders Service
Fraud Service
Billing Service
Email Service
Orders
Start to build things
“Event Driven”

Event Streaming is a more general form of Event Sourcing/CQRS
Event Streaming
• Events as shared data model
• Many microservices
• Polyglot persistence
• Data-in-flight
• Events as a storage model
• Single microservice
• Single DB
• data-at-rest

Benefits of Event Streaming
stand out where there are
multiple data sources.

Join, Filter, Transform and Summarize Events from
Different Sources
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API

KStreams & KSQL have different positioning
•KStreams is a library for Dataflow programming:
• App logic lives in stream processor and can use state stores
• Statefulness limited by operational constraints.
•KSQL is a ‘database’ for event preparation:
• App logic is a separate process (can’t use state stores)
• Statefulness unlimited, like a DB.
• App uses consumer in any language

This difference makes most
sense if we we look to the
future.

Thesis
• Serverless provides real-time, event-driven infrastructure and
compute.
• A stream processor provides the corollary: a database-equivalent
for real-time, event-driven data.

Using FaaS
• Write a function
• Upload
• Configure a trigger (HTTP, Event, Object Store, Database, Timer etc.)

FaaS in a Nutshell
• Fully managed (Runs in a container pool)
• Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s)
• Pay for execution time (not resources used)
• Auto-scales with load
• 0-1000+ concurrent functions
• Event driven
• Stateless
• Short lived (limit 5-15 mins)
• Weak ordering guarantees

Where is FaaS useful?
• Spikey workloads
• Use cases that don’t typically warrant massive parallelism
e.g. CI systems.
• General purpose programming paradigm?

Serverless Developer Ecosystem
• Runtime diagnostics
• Monitoring
• Deploy loop
• Testing
• IDE integration
Currently quite poor

Harder than current approaches Easier than current approaches
Amazon
Google
Microsoft
Serverless programming will likely become prevalent

In the future it seems
unlikely we’ll manage our
own infrastructure.

Event-Streaming approaches this
from a different angle

FaaS is event-driven
But it isn’t streaming

Complex, Timing issues, Scaling limits
Customers
Event Source
Orders
Event Source
Payments
Event Source
Serverless functions handle only one event source
FaaS/μS
FaaS/μS
FaaS/μS

Send SQL
Process
boundary
Orders
Payments
KSQL
Customers
Table
Customers
KSQL simplifies these issues by pre-preparing events
from different sources into one event stream
App
Logic
CREATE STREAM order-
payments AS
SELECT * FROM orders,
payments, customers
LEFT JOIN…
Order
Payment
Customer

KSQL prepares data so,
when a function is called,
a single event has all the
data that function needs.

KSQL also separates
stateful operations
from event-driven
application logic

FaaSFaaSFaaSKSQL
Customers
Table
KSQL as a “Data Layer” for Serverless Functions
FaaSFaaS
STATELESS
Fully elastic
STATEFUL
Orders
Payments
Customers
Autoscale
with load
Filter, transform, join, summarizations

Familiar
Apps
Search
Apps Apps
Apps
Search Monitorin
Apps Apps
Apps
Search
AppsApps
Search Monitor
Apps Apps
Stateful
Stateless

FaaS
Traditional
Application
Event-Driven
Application
Application
Database
KSQL
Stateful
Data Layer
FaaS
FaaS
FaaS
FaaS
FaaS
Streaming
Stateless
Stateless
Stateless
Compute Layer
Massive linear scalability with elasticity

Use stream processors to
make the consumption of
events both simple and
scalable
Think
Event-
Driven

Summary
• Events underpin the storage models of truthful/factful architectures.
• Event sourcing is most useful when it embraces events as data-in-flight
• A stream processor provides a database-like equivalent for real-time,
event-driven data
• Serverless provides the corollary: real-time, event-driven infrastructure
and compute

Things I didn’t tell you 1/2
• Tools like KSQL provide data provisioning, not state mutation.
• Good for offline services & data pipelines
• Not good for CRUD (but it’s ok to mix and match)
• Kafka’s serverless integration is in it’s early stages.
• Existing connector for Kafka (Limited functionality).
• Confluent connector coming.
• Can KSQL handle large state?
• Unintended rebalance can stall processing
• Static membership (KIP-345) – name the list of stream processors
• Increase the timeout for rebalance after node removal (group.max.session.timeout.ms)
• Worst case reload: RocksDB ~GbE speed

Things I didn’t tell you 2/2
• Can Kafka be used for long term storage?
• Log files are immutable once they roll (unless compacted)
• Jun spent a decade working on DB2
• Careful:
• Historical reads can stall real-time requests (cached)
• ZFS has several page cache optimizations
• Tiered storage will help

Find out More
• Peeking Behind the Curtains of Serverless Platforms, Wang et al.
• Cloud Programming Simplified: A Berkeley View on Serverless Compute
• Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless.
• Designing Event Driven Systems, Ben Stopford

Thank you
@benstopford
Book:
https://www.confluent.io/designing-event-driven-systems
Github:
http://bit.ly/kafka-microservice-examples
Example ecosystem built with streams.
Includes KSQL, Control Centre, Elastic etc.

Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluent) Kafka Summit London 2019

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluent) Kafka Summit London 2019

Semelhante a Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluent) Kafka Summit London 2019 (20)

Mais de confluent

Mais de confluent (20)

Último

Último (20)

Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluent) Kafka Summit London 2019