SlideShare uma empresa Scribd logo
1 de 101
Baixar para ler offline
Patterns and Pitfalls When
Marrying a Microservices
Architecture with an Event
Driven Architecture
Sept 2nd, 2019
Preferred Pronouns:
He/Him
On the Pride@Tinder ERG
Extensive Work in Large
Scale Real Time ETLs
ML Infrastructure /
Modeling / Personalization
in Applications
Ad-Hoc Developer
Advocate / Evangelist for
All Things Data, Both Big
and Fast, @Tinder
Who Am I: Kyle
Bendickson*
Data Software
Engineer, Machine
Learning @ Tinder
(all views are my own…)
2
Also, A Dog Dad
3
Who I think you wonderful people are:
- Respectful* of others
- Have worked in a microservices architecture
- Aware of the basics of Apache Kafka
- Can read basic and speak SQL
- Want to use tools from the Apache Ecosystem
to scale out and decouple your application
components and infrastructure via an
Event-Driven Application Architecture.
Particularly with Kafka as a Shared Source of
Truth. 4
So what will we cover?
5
Covering:
- Quick review of Tinder + some of our scale
- Discuss Quickfire, Tinder’s current unified
data event platform for client and server
side data, as well as its continued evolution
- Discuss some patterns which have helped
us move to a more event-driven architecture
via Kafka, as well as pitfalls along the way
- Why you should embrace SQL
6
A Quick Review of the
Classic Tinder Swiping
Experience
8
Strictly One
Card at a Time
Swipe Left to
Nope
Swipe Right to
Like
If both swipers
like, It’s a
Match!
The Original Swiping Experience
A Quick Primer on The Quickfire
Pipeline - Our Primary Unified
Pipeline For Client and
Server-Side Events
10
Global Scale
190+
countries
40+
languages
2.2B+
swipes daily
30B
matches
11
More Tinder Scale
~86B
Events/Day
~1M
Events/Sec
Cost Savings
~90%
Using Kafka over SQS / Kinesis
/ SNS / Lambda / Kinesis
Firehoses, etc saves us
approximately 90% on costs.*
>40TB
Data/Day
Kafka delivers the
performance and throughput
needed to sustain this scale of
data processing
Data
Look Back / Data Backbone
Schemas
Amazon
Kinesis Streams
Amazon
Kinesis Firehose
Amazon
S3
Amazon
Redshift
Ingestion ETL Importer
12
Data
Look Back / Batch Compute
Amazon EMRApache Airflow
Amazon
Kinesis Streams
Amazon
Kinesis Firehose
Amazon
Redshift
Amazon
S3
Ingestion ETL Importer
13
Data
Look Back at the Quickfire Pipeline - Ingestion
to Data Lake and Warehouse.
Data Lake
Amazon
Kinesis Streams
Amazon
Kinesis Firehose
Amazon
S3
Amazon
Redshift
Ingestion Streaming Compute Storage Batch Compute Storage Data Warehouse
Ingestion ETL Importer
14
Periscope
Event-Driven Apps - Engagement &
Notification
Data
Amazon
Kinesis Streams
Amazon
Kinesis Firehose
Amazon
S3
Amazon
Redshift
Apache Kafka
Kubernetes Kubernetes
App.Open
Tinder Notification
DynamoDB
N days after
last App.Open
Streaming
Trigger Event
Streaming Trigger
Feature
Specific
Notifications
Ingestion ETL Importer
15
Data
Look Now / Streaming + Batch Compute
Amazon
Kinesis Streams
Amazon
Kinesis Firehose
Amazon
S3
Amazon
Redshift
Kubernetes Amazon EMR
Apache Kafka
Apache Flink
Ingestion ETL Importer
16
Apache Kafka
Streaming Compute Storage
and So Much More
Motivation to Move to
an Event-Driven Data
Processing Model
Why Event-Driven?
Treat the Events as First Class Citizens in all
systems throughout Tinder.
- These events are the culmination of all user
experiences + happenings in the app.
18
Why Event-Driven?
Allow backend developers access to rich
amounts of intelligence data that has
traditionally been used for offline analysis
- Getting this data online would otherwise
add additional RPC calls and further
microservices fanout for each HTTP
client-side request
19
Why Event-Driven?
Turning the Database Inside Out!
Allow state that is typically managed by
individual services to be shared, immutably, in
the transaction log to allow decoupled access
by other services… including FUTURE services
that have yet to be conceived.
20
Why Event-Driven?
The Ability to Capture and Join Data from
Many Very Different Sources, in Real-Time.
- Query for user info originally from one
location (e.g. Postgres), hydrate with
information from another table (e.g.
DynamoDB), and then continue the event
processing pipeline, all from within Kafka
via Kafka Streams and KSQL. 21
Why Event-Driven?
Replace “Fire and Forget” calls from a
microservices architecture into a more robust,
fault-tolerant stream of data for interested
applications to consume the relevant data as
needed.
22
Why Event-Driven: The Unbounded
Nature of Data Streams
23
Image Credit: Tyler Akidau,
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
Solution to Very Unbounded Dataset -
Split It Into Chunks!
24
Window the data somehow to make it smaller!
Windowing is simply the notion of taking a data source (either
unbounded or bounded), and chopping it up along temporal
boundaries (or by some other aspect of the data) into finite
chunks for processing.
Windows can be time driven (example: every 30 seconds) or data driven
(example: every 100 elements). One typically distinguishes different types of
windows, such as tumbling windows (no overlap), sliding windows (with overlap),
and session windows (punctuated by a gap of inactivity). For further info on joins
and types and windowing, see the concurrent talk “Zen and the Art of Streaming
Joins” by Nick Dearden
Various Types of Windows
25
Image Credit: Tyler Akidau,
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
Traditional Batch Processing For
Counting By Hour
26
Image Credit: Tyler Akidau,
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
What’s One Problem in the Batch
Based Counting Problem?
We have to wait at least an hour for the
data to arrive!
- And it won’t all be there by then!
This can lead to results that take hours
to days to calculate! 🙀
27
What’s One Problem in the Batch
Based Counting Problem?
This could be achieved in an order of
magnitude less time in an online
setting. Data in motion should stay in
motion as much as possible.
😻
28
Traditional (Hourly) Batch Processing
Over Unbounded Data
29
Modeling This Process Streaming First
30
Why Event-Driven?
- Requirements for lower latency and/or
speculative results (e.g. partial results
before all events for a given window have
arrived).
- Know about and react to system
changes as they occur, not well after
the fact.
31
Exploring The Overall Problem Space
Further
32
Problem: Extracting Useful Information From An
Unbounded Stream of Data, which is almost always
out of order, late, and can often be pretty messy.
Problem: Storing and Processing Large Historical
Amounts of Data Without Paying Infinite Amounts of
$$$ for Excessive and Often Times Unnecessary
Intermediate Caches and Other Data Stores etc.
Traditional Batch Processing For
Generating User Sessions - Shows How
Hard and Messy Window Edges are for
Batch Systems.
33
Reordering and Building Online User Sessions
34
What Does Any of This Have
to do with Event Driven
Applications?
35
Traditional Applications vs Event Driven
Applications
36
What Happens in the Microservices Application when
one HTTP request results into several HTTP requests to
other various microservices, in the worst case?
- Some services may succeed and others won’t.
- Even timeouts can cause false successes.
- Although the client might see the expected server
side error on their end, the results may get partially
updated to various state stores.
- Data inconsistency can result
- Adding more and more HTTP requests, even “fire
and forget” still adds latency to every request.
37
Why Event-Driven?
Ensure All Stakeholders Know What Is Covered
By The System
- You may or may not be very surprised at
how often you don’t know what you don’t
know.
38
Streaming Event-Driven Approach:
Simplified View for Microservice
Consumption
39
Distributed Architecture - Parallel View
40
Source: https://flink.apache.org/introduction.html
Data Locality For Fast Local Processing of
Distributed Datasets
41
Efficient per Key State wrt Stream Time in Flink
42
Can use KStream#Process or Punctuate APIs as well as
KTable#Supress methods in Kafka Streams DSL to achieve some
similar semantics for most use cases.
Ideal Streaming First Architecture - Full Data
Flow
43
Let the
message bus,
Kafka, be the
shared source
of truth!
Localized
views can
serve the
needs of
different
microservices
Now for a higher abstraction
44
An event stream can be
considered as the changelog of
a database table.
Streams and Tables Are Duals
Deriving each user’s most recent location from a
geolocation event stream
45Image Credit: Michael Noll, Confluent:
https://www.michael-noll.com/blog/2018/04/05/of-stream-and-tables-in-kafka-and-stream-processing-part1/
Streams and Tables Are Duals
Deriving per user check in counts from
the same stream.
46
Diagram Credit: Michael Noll, Confluent
https://www.michael-noll.com/blog/2018/04/05/of-stream-and-t
ables-in-kafka-and-stream-processing-part1/
Side Note - This is all just map only SQL! (KSQL here
to be exact)
47
CREATE TABLE user_location_counts_table AS
SELECT user, count(*)
FROM user_location_stream
WINDOW TUMBLING (Size 1 hour)
GROUP BY user
Section II: Event
Driven Design
Patterns and
Pitfalls… Finally
👍 Design Patterns 👍
🚨 Pitfalls - Bad Idea 🚨
49
👍 Most Important Event Driven
Pattern for Organizational
Decoupling (In My Opinion) 👍
Event Storming
Wait… Event Storming?
What Did He Say?!
Event Storming or Domain Driven
Modeling
TLDR: Flesh out all events and
interactions in the domain of the
project and the application, as a
way to model the entire overall
architecture.
Event Storming - Why?
- Treat the Events as First Class Citizens in
the System
- Ensure All Stakeholders Know What Is
Covered By The System
- You’d be very surprised at how often you
don’t know what you don’t know.
53
54
Strictly One
Card at a Time
Swipe Left to
Nope
Swipe Right to
Like
If both swipers
like, It’s a
Match!
Event Storming Example - Swiping
Let’s Model A Few of The Basic Event(s)
Involved in the Classic Swiping Experience!
case class Swipe(userId: String /* Swiper */,
otherId: String /* Swipee */ ,
like: Boolean,
ts: Long /* Event Time */
st: Long, /* IngestionTime*/
...
) 55
What Other Events / Facts Took Place Here?
case class Match(user1: String, user2: String)
56
What Other Events / Facts Took Place Here?
/* What is this missing from the overall match
event? */
case class Match(u1: String, u2: String)
57
What could we have captured when we had
the chance?
/* Swiper / Swipee Info ! */
case class Match(u1: String, u2: String, ...)
/* Provides more information as */
case class Match(swiper: String ,
swipee: String)
58
Recall the three basic event types from the
CQRS pattern:
- Plain Old Events - Simple statements of
facts that took point in time, such as a swipe
event or a new match event. These are the
foundation of an event-driven platform.
- Commands - Calls to action
- Queries - Asking a question, e.g. of an
aggregate or over a specific view.
59
Event Driven Pattern 👍
Leveraging Change Data Capture
whenever Available.
Especially Useful for Migrating to
Kafka as a Shared Source of Truth
Design Pattern: CDC in Action with Kafka
Connect
61
Design Pattern: CDC in Action via Existing
Sources of Truth, namely DynamoDB streams
62
Users.* topic,
routed
dynamically to
topics based
on changed
keys.
Users
service
Benefits of DynamoDB Streams or other CDC
Systems:
- Can view new and old versions of the record
- Easy means for notifications of delete
events, to propagate deletions throughout
the system for data hygiene and compliance
- Can parse updates on keys only
- With true change data capture, backfills of
all time data isn’t as pertinent needed.
63
Event-Driven Pitfall 🚨🚨🚨
Not considering how long older
data is retained locally before its
expired
Migrational Pitfall: Not Considering Data
Retention 🚨
NOTE: It’s totally possible to store data forever
in Kafka.
65
Migrational Pitfall: Not Considering Data
Retention 🚨
With partitioned data by key, the application
doesn’t always know when that key and its
corresponding state will be expired.
- Keyed session identifiers
- Any aggregation (technically the window is
part of the grouping)
66
Migrational Pitfall: Not Considering Data
Retention
How to handle this:
- Manually in the code or via state retention.
- Configured in the state store builder for
Kafka backed state stores in the Kafka
Streams DSL.
- In Flink:
StreamingQueryConfig#withIdleStateRet
entionTime
67
Event-Driven Migrational Pitfall
🚨🚨🚨
Not considering the truthiness of
data that’s onboarded to Kafka in
your early phases
Migrational Pitfall: Truthiness of your CDC
source
69
Assume this is a Derived View and not a
Source of Truth…. Can + WIll Likely diverge
from whichever siloed data is considered
the “Ground Truth”
Migrational Pitfall: Truthiness of your CDC
source
Proper advice in the last scenario:
- Don’t onboard this data into kafka if
you’re using it as a shared source of truth
unless it REALLY is your source of
truth 🚨
70
Migrational Pitfall: Truthiness of your CDC
source
👎 Practical advice in the last scenario:
- Onboard this if you need to, depending
on your own application tolerance for
possibly incorrect data. 👎
- Label sketchy data as such:
- E.g. topic prefix namespacing:
user-unreliable-* topics vs user-* topics
71
Pitfall: Not considering the source of truth of
your events!
- Client side events are inherently less
reliable than server side events.
- Bugs introduced to app versions are
harder to fix than server side bugs.
- Client side events are less reliable in
general, as they may come much later than
server side events.
72
Pitfall: Not considering the source of truth of
your events!
- What can we do about this?
- How much truthiness does your application
realistically require?
- Can we add a corresponding server side event that
would be more reliable and easier to evolve over
time (e.g. new releases for services which emit
events happen every day)
- Extracting events from some datasource that is
closer to the source of truth than other possible
options.
73
Event Driven Pattern 👍
Using Broadcasted State to share
data sets across all consuming
members of streaming event-driven
services.
Event Driven Pattern 👍
Performing previously fanned out
microservice aggregational Joins
on data once it’s in kafka.
Design Pattern: Broadcasted State
- Submits a single event stream to all
consuming members in a topology.
- Aka a GlobalKTable in the Kafka Streams
DSL.
- Single writer and many consumers of a
changelog topic to build a globally
replicated view.
76
Design Pattern: Broadcasted State
- Usefulness:
- Small-to-Medium Sized Tables Only, as
the resulting broadcasted table from the
changelog stream is replicated to all tasks.
- Dynamic filter configurations, to allow
program managers etc to update topology
rules.
77
Design Pattern: Broadcasted State
- Drawbacks:
- Is not really distributed (that’s the point)
- In Kafka Streams DSL, no easy way to
ensure that only a single process writes
to the changelog for the broadcasted
table.
78
Event Driven Pattern 👍
Shared Logging Framework
Shared Logging Framework. 👍
Adding large amounts of event logging is
sometimes initially viewed incorrectly as an
extra burden to microservice developers.
However, no microservice developer is
realistically opposed to having to call a
`log.info()` function.
80
Shared Logging Framework. 👍
Allow developers to focus on the business
logic of their existing microservices without
thinking of the event stream they’re generating
and any associated challenges of “Pipelines”
and “Data flows” being added elsewhere.
- Can be as simple as `.log` to kafka
function for a JSON or Avro Payload.
81
Event Driven Pattern 👍
Side Outputting Data
- Late data
- Data deemed “invalid”
Side Outputting Data 👍
Even If You Think You Don’t Need It!
- Late data may need to be replayed if there’s a
long enough outage or large unforeseen
watermark skew.
83
Side Outputting Data 👍
Late data may need to be replayed if there’s a
long enough outage or large unforeseen
watermark skew.
84
Side Outputting Data 👍
How To
- Can be as simple as DLQ topic,
- Using a side-output in Apache Flink,
- Expressing the side output as a portion of
your topology (e.g. for data deemed “invalid”)
so that it benefits from the same semantics as
the entire topology.
85
Detour - Allowing
Non-Programmers to
Write and Deploy
Event-Driven
Streaming Analytics
for Intelligence
Gathering
Generic Streaming Aggregation
Abstraction
87
● Generic streaming services/abstractions for real-time
analytics
● Expressed by relational query
● “Correct by default” when interacting with
sources/sinks/external system
● Monitoring + cost billing attribution per team / pod in
the organization
● Aim to support custom logic while fitting into the
incremental model
● Can be chained together to create an ad-hoc data flow
Motivation for the Generic Streaming
Aggregation Abstraction
88
● Two Types of Stream Processing (from Gartner)
○ Stream Data Integration
■ Primarily cover streaming ETL
■ Integration of data source and data sinks
■ Filter and transform data (Enrich data)
■ Route data
○ Stream Analytics
■ Calculating aggregates
■ Detecting patterns to generate higher-level, more relevant
summary information
Motivating Case study: Phoenix AB Test Experiment
Assignment
89
Want to know, in near real-time, how
many users have been assigned to
various A/B testing experiments,
including the relevant subgroups etc.
Motivating Case study: Phoenix AB Test Experiment
Assignment
90
Spoiler Alert: Get Ready for A Variant
of Word Count!
Case study: AB Test Experiment Assignment - Possible Events
91
case class ExperimentAssignment(
uid: String,
ts: Long,
experimentName: String,
experimentGroup: String,
experimentBucket: String)
case class ExperimentCount(
experimentGroup: String,
experimentName: String,
experimentBucket: String,
countUsers: Long = 0L)
Case study: Phoenix AB Test Experiment Assignment -
Basic (Flink/Apache Calcite) SQL to Solve this Problem!
92
val sql: String =
"""
| SELECT experimentGroup,
| , experimentName
| , experimentBucket
| , COUNT(uid) as countUsers
| FROM
| experiment_assignment
| GROUP BY
| 1, 2, 3, TUMBLE(ts, interval '1' HOUR)
"""
Case study: Phoenix AB Test Experiment Assignment
93
Kafka source:
Case study: Phoenix AB Test Experiment Assignment
Partial Configuration File Generated From the UI or Typed
Out by Lazier Developers (like me)
94
Kafka source:
Streaming Feature Generation with Apache Spark
Structured Streaming
95
● Purely declarative API
● Unbounded table + Incremental
query
● Spark SQL execution engine
● High throughput, low latency
and fault-tolerant
● Unified Batch/Streaming API
● Streaming ML
Streaming Job Manager
96
StreamingJobManager
StreamingWorker
EMR cluster
staging prod
…
Streaming Job Manager
● Trigger a new job
● Terminate a job
● Get job status
Coming soon…
● Testing environment for job queries
● Sample job output
● Exception message if job fails
Streaming Job Manager
98
● Query job with
teamName + jobName
● Get job info with
teamName + jobName + version
● Terminate job with
teamName + jobName + version
● Job status
○ Active
○ Terminated
○ FallingBehind
TableName: streaming-jobs-config
AttributeDefinitions:
- AttributeName: jobId
AttributeType: S
- AttributeName: teamName
AttributeType: S
- AttributeName: jobName
AttributeType: S
- AttributeName: s3Key
AttributeType: S
- AttributeName: clusterId
AttributeType: S
- AttributeName: status
AttributeType: S
- AttributeName: exceptions
AttributeType: S
KeySchema:
- AttributeName: partitionKey (teamName_jobName)
KeyType: HASH
- AttributeName: version (timestamp when job is created)
KeyType: RANGE
Motivating Case study: Streaming - Extra Neat
Features
99
- Support for custom User Defined
Aggregate Functions
- Many of these are derived from a UI
from a set of drop down aggregations,
etc.
- Devs find novels ways to chain these
back into the application, typically via
Kafka and then DynamoDB or ES
Many Thanks!
100
- Tao Tao, Kate Yortsos, Shan Jing, Eric
Lane, Winston Lee, Kristin Collins
Jackson, Mary Dittmar, Cooper
Erlendsson, Yating Zhu, Chris Dower,
Matthew Orlove, Aijia Liu, Anne
Ying-An Lai, Wei Xu, Jinqiang Han,
Michael Allman, Jessica Hickey,
Jingwei Wu, Yibing Gu, and so many
others!
Further References
101
- Streaming Systems Book - Tyler Akidau,
http://streamingsystems.net/ O’Reilly and other DRM-free
platforms
- The Data Dichotomy: Rethinking the Way We Treat Data and
Services - Ben Stopford,
https://www.confluent.io/blog/data-dichotomy-rethinking-the-wa
y-we-treat-data-and-services/
- High Performance Spark - Holden Karau + Rachel Warren via
O’Reilly and other DRM-free platforms
- Not necessarily streaming focused, but a great read
for thinking about data processing with spark at scale
- https://www.oreilly.com/library/view/high-performance-
spark/9781491943199/

Mais conteúdo relacionado

Mais procurados

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Design patterns for microservice architecture
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architectureThe Software House
 
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...confluent
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Managementconfluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 

Mais procurados (20)

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Design patterns for microservice architecture
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architecture
 
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Management
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 

Semelhante a Event Driven Architecture with a RESTful Microservices Architecture (Kyle Bendickson, Tinder) Kafka Summit NYC 2019

Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud Altocloud
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 

Semelhante a Event Driven Architecture with a RESTful Microservices Architecture (Kyle Bendickson, Tinder) Kafka Summit NYC 2019 (20)

Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 

Mais de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Bendickson, Tinder) Kafka Summit NYC 2019

  • 1. Patterns and Pitfalls When Marrying a Microservices Architecture with an Event Driven Architecture Sept 2nd, 2019
  • 2. Preferred Pronouns: He/Him On the Pride@Tinder ERG Extensive Work in Large Scale Real Time ETLs ML Infrastructure / Modeling / Personalization in Applications Ad-Hoc Developer Advocate / Evangelist for All Things Data, Both Big and Fast, @Tinder Who Am I: Kyle Bendickson* Data Software Engineer, Machine Learning @ Tinder (all views are my own…) 2
  • 3. Also, A Dog Dad 3
  • 4. Who I think you wonderful people are: - Respectful* of others - Have worked in a microservices architecture - Aware of the basics of Apache Kafka - Can read basic and speak SQL - Want to use tools from the Apache Ecosystem to scale out and decouple your application components and infrastructure via an Event-Driven Application Architecture. Particularly with Kafka as a Shared Source of Truth. 4
  • 5. So what will we cover? 5
  • 6. Covering: - Quick review of Tinder + some of our scale - Discuss Quickfire, Tinder’s current unified data event platform for client and server side data, as well as its continued evolution - Discuss some patterns which have helped us move to a more event-driven architecture via Kafka, as well as pitfalls along the way - Why you should embrace SQL 6
  • 7. A Quick Review of the Classic Tinder Swiping Experience
  • 8. 8 Strictly One Card at a Time Swipe Left to Nope Swipe Right to Like If both swipers like, It’s a Match! The Original Swiping Experience
  • 9. A Quick Primer on The Quickfire Pipeline - Our Primary Unified Pipeline For Client and Server-Side Events
  • 11. 11 More Tinder Scale ~86B Events/Day ~1M Events/Sec Cost Savings ~90% Using Kafka over SQS / Kinesis / SNS / Lambda / Kinesis Firehoses, etc saves us approximately 90% on costs.* >40TB Data/Day Kafka delivers the performance and throughput needed to sustain this scale of data processing
  • 12. Data Look Back / Data Backbone Schemas Amazon Kinesis Streams Amazon Kinesis Firehose Amazon S3 Amazon Redshift Ingestion ETL Importer 12
  • 13. Data Look Back / Batch Compute Amazon EMRApache Airflow Amazon Kinesis Streams Amazon Kinesis Firehose Amazon Redshift Amazon S3 Ingestion ETL Importer 13
  • 14. Data Look Back at the Quickfire Pipeline - Ingestion to Data Lake and Warehouse. Data Lake Amazon Kinesis Streams Amazon Kinesis Firehose Amazon S3 Amazon Redshift Ingestion Streaming Compute Storage Batch Compute Storage Data Warehouse Ingestion ETL Importer 14 Periscope
  • 15. Event-Driven Apps - Engagement & Notification Data Amazon Kinesis Streams Amazon Kinesis Firehose Amazon S3 Amazon Redshift Apache Kafka Kubernetes Kubernetes App.Open Tinder Notification DynamoDB N days after last App.Open Streaming Trigger Event Streaming Trigger Feature Specific Notifications Ingestion ETL Importer 15
  • 16. Data Look Now / Streaming + Batch Compute Amazon Kinesis Streams Amazon Kinesis Firehose Amazon S3 Amazon Redshift Kubernetes Amazon EMR Apache Kafka Apache Flink Ingestion ETL Importer 16 Apache Kafka Streaming Compute Storage and So Much More
  • 17. Motivation to Move to an Event-Driven Data Processing Model
  • 18. Why Event-Driven? Treat the Events as First Class Citizens in all systems throughout Tinder. - These events are the culmination of all user experiences + happenings in the app. 18
  • 19. Why Event-Driven? Allow backend developers access to rich amounts of intelligence data that has traditionally been used for offline analysis - Getting this data online would otherwise add additional RPC calls and further microservices fanout for each HTTP client-side request 19
  • 20. Why Event-Driven? Turning the Database Inside Out! Allow state that is typically managed by individual services to be shared, immutably, in the transaction log to allow decoupled access by other services… including FUTURE services that have yet to be conceived. 20
  • 21. Why Event-Driven? The Ability to Capture and Join Data from Many Very Different Sources, in Real-Time. - Query for user info originally from one location (e.g. Postgres), hydrate with information from another table (e.g. DynamoDB), and then continue the event processing pipeline, all from within Kafka via Kafka Streams and KSQL. 21
  • 22. Why Event-Driven? Replace “Fire and Forget” calls from a microservices architecture into a more robust, fault-tolerant stream of data for interested applications to consume the relevant data as needed. 22
  • 23. Why Event-Driven: The Unbounded Nature of Data Streams 23 Image Credit: Tyler Akidau, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
  • 24. Solution to Very Unbounded Dataset - Split It Into Chunks! 24 Window the data somehow to make it smaller! Windowing is simply the notion of taking a data source (either unbounded or bounded), and chopping it up along temporal boundaries (or by some other aspect of the data) into finite chunks for processing. Windows can be time driven (example: every 30 seconds) or data driven (example: every 100 elements). One typically distinguishes different types of windows, such as tumbling windows (no overlap), sliding windows (with overlap), and session windows (punctuated by a gap of inactivity). For further info on joins and types and windowing, see the concurrent talk “Zen and the Art of Streaming Joins” by Nick Dearden
  • 25. Various Types of Windows 25 Image Credit: Tyler Akidau, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
  • 26. Traditional Batch Processing For Counting By Hour 26 Image Credit: Tyler Akidau, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
  • 27. What’s One Problem in the Batch Based Counting Problem? We have to wait at least an hour for the data to arrive! - And it won’t all be there by then! This can lead to results that take hours to days to calculate! 🙀 27
  • 28. What’s One Problem in the Batch Based Counting Problem? This could be achieved in an order of magnitude less time in an online setting. Data in motion should stay in motion as much as possible. 😻 28
  • 29. Traditional (Hourly) Batch Processing Over Unbounded Data 29
  • 30. Modeling This Process Streaming First 30
  • 31. Why Event-Driven? - Requirements for lower latency and/or speculative results (e.g. partial results before all events for a given window have arrived). - Know about and react to system changes as they occur, not well after the fact. 31
  • 32. Exploring The Overall Problem Space Further 32 Problem: Extracting Useful Information From An Unbounded Stream of Data, which is almost always out of order, late, and can often be pretty messy. Problem: Storing and Processing Large Historical Amounts of Data Without Paying Infinite Amounts of $$$ for Excessive and Often Times Unnecessary Intermediate Caches and Other Data Stores etc.
  • 33. Traditional Batch Processing For Generating User Sessions - Shows How Hard and Messy Window Edges are for Batch Systems. 33
  • 34. Reordering and Building Online User Sessions 34
  • 35. What Does Any of This Have to do with Event Driven Applications? 35
  • 36. Traditional Applications vs Event Driven Applications 36
  • 37. What Happens in the Microservices Application when one HTTP request results into several HTTP requests to other various microservices, in the worst case? - Some services may succeed and others won’t. - Even timeouts can cause false successes. - Although the client might see the expected server side error on their end, the results may get partially updated to various state stores. - Data inconsistency can result - Adding more and more HTTP requests, even “fire and forget” still adds latency to every request. 37
  • 38. Why Event-Driven? Ensure All Stakeholders Know What Is Covered By The System - You may or may not be very surprised at how often you don’t know what you don’t know. 38
  • 39. Streaming Event-Driven Approach: Simplified View for Microservice Consumption 39
  • 40. Distributed Architecture - Parallel View 40 Source: https://flink.apache.org/introduction.html
  • 41. Data Locality For Fast Local Processing of Distributed Datasets 41
  • 42. Efficient per Key State wrt Stream Time in Flink 42 Can use KStream#Process or Punctuate APIs as well as KTable#Supress methods in Kafka Streams DSL to achieve some similar semantics for most use cases.
  • 43. Ideal Streaming First Architecture - Full Data Flow 43 Let the message bus, Kafka, be the shared source of truth! Localized views can serve the needs of different microservices
  • 44. Now for a higher abstraction 44 An event stream can be considered as the changelog of a database table.
  • 45. Streams and Tables Are Duals Deriving each user’s most recent location from a geolocation event stream 45Image Credit: Michael Noll, Confluent: https://www.michael-noll.com/blog/2018/04/05/of-stream-and-tables-in-kafka-and-stream-processing-part1/
  • 46. Streams and Tables Are Duals Deriving per user check in counts from the same stream. 46 Diagram Credit: Michael Noll, Confluent https://www.michael-noll.com/blog/2018/04/05/of-stream-and-t ables-in-kafka-and-stream-processing-part1/
  • 47. Side Note - This is all just map only SQL! (KSQL here to be exact) 47 CREATE TABLE user_location_counts_table AS SELECT user, count(*) FROM user_location_stream WINDOW TUMBLING (Size 1 hour) GROUP BY user
  • 48. Section II: Event Driven Design Patterns and Pitfalls… Finally
  • 49. 👍 Design Patterns 👍 🚨 Pitfalls - Bad Idea 🚨 49
  • 50. 👍 Most Important Event Driven Pattern for Organizational Decoupling (In My Opinion) 👍 Event Storming
  • 52. Event Storming or Domain Driven Modeling TLDR: Flesh out all events and interactions in the domain of the project and the application, as a way to model the entire overall architecture.
  • 53. Event Storming - Why? - Treat the Events as First Class Citizens in the System - Ensure All Stakeholders Know What Is Covered By The System - You’d be very surprised at how often you don’t know what you don’t know. 53
  • 54. 54 Strictly One Card at a Time Swipe Left to Nope Swipe Right to Like If both swipers like, It’s a Match! Event Storming Example - Swiping
  • 55. Let’s Model A Few of The Basic Event(s) Involved in the Classic Swiping Experience! case class Swipe(userId: String /* Swiper */, otherId: String /* Swipee */ , like: Boolean, ts: Long /* Event Time */ st: Long, /* IngestionTime*/ ... ) 55
  • 56. What Other Events / Facts Took Place Here? case class Match(user1: String, user2: String) 56
  • 57. What Other Events / Facts Took Place Here? /* What is this missing from the overall match event? */ case class Match(u1: String, u2: String) 57
  • 58. What could we have captured when we had the chance? /* Swiper / Swipee Info ! */ case class Match(u1: String, u2: String, ...) /* Provides more information as */ case class Match(swiper: String , swipee: String) 58
  • 59. Recall the three basic event types from the CQRS pattern: - Plain Old Events - Simple statements of facts that took point in time, such as a swipe event or a new match event. These are the foundation of an event-driven platform. - Commands - Calls to action - Queries - Asking a question, e.g. of an aggregate or over a specific view. 59
  • 60. Event Driven Pattern 👍 Leveraging Change Data Capture whenever Available. Especially Useful for Migrating to Kafka as a Shared Source of Truth
  • 61. Design Pattern: CDC in Action with Kafka Connect 61
  • 62. Design Pattern: CDC in Action via Existing Sources of Truth, namely DynamoDB streams 62 Users.* topic, routed dynamically to topics based on changed keys. Users service
  • 63. Benefits of DynamoDB Streams or other CDC Systems: - Can view new and old versions of the record - Easy means for notifications of delete events, to propagate deletions throughout the system for data hygiene and compliance - Can parse updates on keys only - With true change data capture, backfills of all time data isn’t as pertinent needed. 63
  • 64. Event-Driven Pitfall 🚨🚨🚨 Not considering how long older data is retained locally before its expired
  • 65. Migrational Pitfall: Not Considering Data Retention 🚨 NOTE: It’s totally possible to store data forever in Kafka. 65
  • 66. Migrational Pitfall: Not Considering Data Retention 🚨 With partitioned data by key, the application doesn’t always know when that key and its corresponding state will be expired. - Keyed session identifiers - Any aggregation (technically the window is part of the grouping) 66
  • 67. Migrational Pitfall: Not Considering Data Retention How to handle this: - Manually in the code or via state retention. - Configured in the state store builder for Kafka backed state stores in the Kafka Streams DSL. - In Flink: StreamingQueryConfig#withIdleStateRet entionTime 67
  • 68. Event-Driven Migrational Pitfall 🚨🚨🚨 Not considering the truthiness of data that’s onboarded to Kafka in your early phases
  • 69. Migrational Pitfall: Truthiness of your CDC source 69 Assume this is a Derived View and not a Source of Truth…. Can + WIll Likely diverge from whichever siloed data is considered the “Ground Truth”
  • 70. Migrational Pitfall: Truthiness of your CDC source Proper advice in the last scenario: - Don’t onboard this data into kafka if you’re using it as a shared source of truth unless it REALLY is your source of truth 🚨 70
  • 71. Migrational Pitfall: Truthiness of your CDC source 👎 Practical advice in the last scenario: - Onboard this if you need to, depending on your own application tolerance for possibly incorrect data. 👎 - Label sketchy data as such: - E.g. topic prefix namespacing: user-unreliable-* topics vs user-* topics 71
  • 72. Pitfall: Not considering the source of truth of your events! - Client side events are inherently less reliable than server side events. - Bugs introduced to app versions are harder to fix than server side bugs. - Client side events are less reliable in general, as they may come much later than server side events. 72
  • 73. Pitfall: Not considering the source of truth of your events! - What can we do about this? - How much truthiness does your application realistically require? - Can we add a corresponding server side event that would be more reliable and easier to evolve over time (e.g. new releases for services which emit events happen every day) - Extracting events from some datasource that is closer to the source of truth than other possible options. 73
  • 74. Event Driven Pattern 👍 Using Broadcasted State to share data sets across all consuming members of streaming event-driven services.
  • 75. Event Driven Pattern 👍 Performing previously fanned out microservice aggregational Joins on data once it’s in kafka.
  • 76. Design Pattern: Broadcasted State - Submits a single event stream to all consuming members in a topology. - Aka a GlobalKTable in the Kafka Streams DSL. - Single writer and many consumers of a changelog topic to build a globally replicated view. 76
  • 77. Design Pattern: Broadcasted State - Usefulness: - Small-to-Medium Sized Tables Only, as the resulting broadcasted table from the changelog stream is replicated to all tasks. - Dynamic filter configurations, to allow program managers etc to update topology rules. 77
  • 78. Design Pattern: Broadcasted State - Drawbacks: - Is not really distributed (that’s the point) - In Kafka Streams DSL, no easy way to ensure that only a single process writes to the changelog for the broadcasted table. 78
  • 79. Event Driven Pattern 👍 Shared Logging Framework
  • 80. Shared Logging Framework. 👍 Adding large amounts of event logging is sometimes initially viewed incorrectly as an extra burden to microservice developers. However, no microservice developer is realistically opposed to having to call a `log.info()` function. 80
  • 81. Shared Logging Framework. 👍 Allow developers to focus on the business logic of their existing microservices without thinking of the event stream they’re generating and any associated challenges of “Pipelines” and “Data flows” being added elsewhere. - Can be as simple as `.log` to kafka function for a JSON or Avro Payload. 81
  • 82. Event Driven Pattern 👍 Side Outputting Data - Late data - Data deemed “invalid”
  • 83. Side Outputting Data 👍 Even If You Think You Don’t Need It! - Late data may need to be replayed if there’s a long enough outage or large unforeseen watermark skew. 83
  • 84. Side Outputting Data 👍 Late data may need to be replayed if there’s a long enough outage or large unforeseen watermark skew. 84
  • 85. Side Outputting Data 👍 How To - Can be as simple as DLQ topic, - Using a side-output in Apache Flink, - Expressing the side output as a portion of your topology (e.g. for data deemed “invalid”) so that it benefits from the same semantics as the entire topology. 85
  • 86. Detour - Allowing Non-Programmers to Write and Deploy Event-Driven Streaming Analytics for Intelligence Gathering
  • 87. Generic Streaming Aggregation Abstraction 87 ● Generic streaming services/abstractions for real-time analytics ● Expressed by relational query ● “Correct by default” when interacting with sources/sinks/external system ● Monitoring + cost billing attribution per team / pod in the organization ● Aim to support custom logic while fitting into the incremental model ● Can be chained together to create an ad-hoc data flow
  • 88. Motivation for the Generic Streaming Aggregation Abstraction 88 ● Two Types of Stream Processing (from Gartner) ○ Stream Data Integration ■ Primarily cover streaming ETL ■ Integration of data source and data sinks ■ Filter and transform data (Enrich data) ■ Route data ○ Stream Analytics ■ Calculating aggregates ■ Detecting patterns to generate higher-level, more relevant summary information
  • 89. Motivating Case study: Phoenix AB Test Experiment Assignment 89 Want to know, in near real-time, how many users have been assigned to various A/B testing experiments, including the relevant subgroups etc.
  • 90. Motivating Case study: Phoenix AB Test Experiment Assignment 90 Spoiler Alert: Get Ready for A Variant of Word Count!
  • 91. Case study: AB Test Experiment Assignment - Possible Events 91 case class ExperimentAssignment( uid: String, ts: Long, experimentName: String, experimentGroup: String, experimentBucket: String) case class ExperimentCount( experimentGroup: String, experimentName: String, experimentBucket: String, countUsers: Long = 0L)
  • 92. Case study: Phoenix AB Test Experiment Assignment - Basic (Flink/Apache Calcite) SQL to Solve this Problem! 92 val sql: String = """ | SELECT experimentGroup, | , experimentName | , experimentBucket | , COUNT(uid) as countUsers | FROM | experiment_assignment | GROUP BY | 1, 2, 3, TUMBLE(ts, interval '1' HOUR) """
  • 93. Case study: Phoenix AB Test Experiment Assignment 93 Kafka source:
  • 94. Case study: Phoenix AB Test Experiment Assignment Partial Configuration File Generated From the UI or Typed Out by Lazier Developers (like me) 94 Kafka source:
  • 95. Streaming Feature Generation with Apache Spark Structured Streaming 95 ● Purely declarative API ● Unbounded table + Incremental query ● Spark SQL execution engine ● High throughput, low latency and fault-tolerant ● Unified Batch/Streaming API ● Streaming ML
  • 97. Streaming Job Manager ● Trigger a new job ● Terminate a job ● Get job status Coming soon… ● Testing environment for job queries ● Sample job output ● Exception message if job fails
  • 98. Streaming Job Manager 98 ● Query job with teamName + jobName ● Get job info with teamName + jobName + version ● Terminate job with teamName + jobName + version ● Job status ○ Active ○ Terminated ○ FallingBehind TableName: streaming-jobs-config AttributeDefinitions: - AttributeName: jobId AttributeType: S - AttributeName: teamName AttributeType: S - AttributeName: jobName AttributeType: S - AttributeName: s3Key AttributeType: S - AttributeName: clusterId AttributeType: S - AttributeName: status AttributeType: S - AttributeName: exceptions AttributeType: S KeySchema: - AttributeName: partitionKey (teamName_jobName) KeyType: HASH - AttributeName: version (timestamp when job is created) KeyType: RANGE
  • 99. Motivating Case study: Streaming - Extra Neat Features 99 - Support for custom User Defined Aggregate Functions - Many of these are derived from a UI from a set of drop down aggregations, etc. - Devs find novels ways to chain these back into the application, typically via Kafka and then DynamoDB or ES
  • 100. Many Thanks! 100 - Tao Tao, Kate Yortsos, Shan Jing, Eric Lane, Winston Lee, Kristin Collins Jackson, Mary Dittmar, Cooper Erlendsson, Yating Zhu, Chris Dower, Matthew Orlove, Aijia Liu, Anne Ying-An Lai, Wei Xu, Jinqiang Han, Michael Allman, Jessica Hickey, Jingwei Wu, Yibing Gu, and so many others!
  • 101. Further References 101 - Streaming Systems Book - Tyler Akidau, http://streamingsystems.net/ O’Reilly and other DRM-free platforms - The Data Dichotomy: Rethinking the Way We Treat Data and Services - Ben Stopford, https://www.confluent.io/blog/data-dichotomy-rethinking-the-wa y-we-treat-data-and-services/ - High Performance Spark - Holden Karau + Rachel Warren via O’Reilly and other DRM-free platforms - Not necessarily streaming focused, but a great read for thinking about data processing with spark at scale - https://www.oreilly.com/library/view/high-performance- spark/9781491943199/