SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Powering consistent, high-throughput,
real-time distributed calculation engines
using Kafka Streams
Kamlesh Shah
Intro
Kamlesh Shah – Technical Architect
Morgan Stanley
Kafka Summit 2023, London
KAFKA SUMMIT 2023 3
Agenda
Topics Covered
§ Problem statement
§ Distributed processing
• Kafka for data ingestion & distribution
• Stream processing
§ Example distributed application
• Kafka stream & State store usage
• Stateful & Cascading Calculators
• Idempotency
• Joining static & not-so-static data
• Eventual Consistency with Saga
§ State-Enriched Events & Decoupling of Services
§ Time windows and Replay
§ Architecture for Real-time Ticking Stateful Views
§ Summary
KAFKA SUMMIT 2023 4
Problem statement
Transactions Market Data
Legacy
Systems
Business
Events
Calc..
Risk
Calc.. Calc.. Calc..
Regulation
Fraud
Screen
Limits
§ Data of different types and
domains
§ Internal & External
§ Distributed processing
§ Coupling of services
§ Hard to change
§ Infrastructure complexity
§ Scaling
§ Fault identification
§ Calculations
§ Timely
§ Accurate & consistent
§ Realtime User views
§ Critical decision making
KAFKA SUMMIT 2023 5
Kafka Data Ingestion & Distribution
Transactions Market Data
Legacy
Systems
Business
Events
Input Streams
Stream Processing
Kafka
stream
Analytics & Reporting Data Store
Cache
API
Kafka
stream
Kafka
stream
Kafka
stream
Distributed Processing with Kafka – a stream processing use case
KAFKA SUMMIT 2023 6
Example distributed application
Client 1
Client N
Client 2
Account 1
Account 2
Account N
Limit
Transaction
Transaction
Transaction
Transaction
Transaction
Account Now Time 2 … Time N
Account 1
Account 2
…
Account N
Client Now Time 2 … Time N
Client 1 nnn xxx
Client 2
…
Client N
Future Balance
Business case overview
Current Balance
KAFKA SUMMIT 2023 7
Example distributed application
Transaction
Transaction Id = 1
Account Id = aa1
Debit or Credit = Debit
Amount = 100
Date = T+1
Transaction
Transaction Id = 2
Account Id = aa1
Debit or Credit = Debit
Amount = 200
Date = T+1
Transaction
Transaction Id = 3
Account Id = aa1
Debit or Credit = Debit
Amount = 100
Date = T+2
..
Account Balance
Service
Account Balance
Account Id = aa1
Projected Balance = 200
Date = T+1
Account Balance
Account Id = aa1
Projected Balance = 100
Date = T+2
Transactions to Account Balances
KAFKA SUMMIT 2023 8
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID
..
..
Date
Projected Balance
..
Transaction
Transaction Id
Account Id
..
Debit or Credit
Date
Amount
..
..
.. .. ..
Consumer 1
Stateful Processor
State
Store
streaming sources
Kafka stream
Stateful Account Balance Calculator
KAFKA SUMMIT 2023 9
Account Balance Calculator – Deep Dive
final StreamsBuilder builder = new StreamsBuilder();
final KTable<TransactionId, Transaction> transactionTable =
builder.table(transactionsTopic,
Materialized…)
);
final KGroupedTable<AccountId, Transaction> transactionByAccount =
transactionTable.groupBy((k,v) ->
KeyValue.pair(v.accountId, v),
Grouped.with(…));
final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()->
createNewAccountBalance(),
(key, value, aggregate) -> addAccountBalance(value, aggregate),
(key, value, aggregate) -> removeAccountBalance(value, aggregate),
Materialized.with(Serdes.String(), new JSONSerde<>())
);
accountBalanceKTable.toStream().to(accountBalancesTopic);
KAFKA SUMMIT 2023 10
Cash Transactions
Topic
Account Balances
Topic
Account Balance
Account ID
..
..
Projected Balance
..
Transaction
Transaction Id
Account Id
..
Debit or Credit
Amount
..
streaming sources
Account Balance
Service Consumer N
Stateful Processor
State
Store
Account Balance
Service Consumer 1
Stateful Processor
State
Store
..
.. .. ..
..
.. .. ..
Partition 1
Partition N
..
.. .. ..
..
.. .. ..
Partition 1
Partition N
Kafka stream
Kafka stream
Scale Number of Consumers <-> Number of Partitions
Stateful Calculations with Kafka Stream – Scaling Consumers
Partition Key Partition Key
KAFKA SUMMIT 2023 11
Client Balance
Service
Account Balance
Account Id = aa1
Projected Balance = 200
Date = T+1
Account Balance
Account Id = aa1
Projected Balance = 100
Date = T+2
Client To Account Map
Client Id = 123
Account List [aa1, aa2…]
Client Balance
Client Id = 123
Projected Balance = 200
Date = T+1
Client Balance
Client Id = 123
Projected Balance = 100
Date = T+2
Account Balances to Client Balances
KAFKA SUMMIT 2023 12
Cash
Transactions
Topic
Account Balance
Service
Account
Balances
Topic
Client Balance
Service
Client Balances
Topic
API
Client to Account Reference
Client to Account Cache
..
.. .. .. ..
.. .. ..
Account Balance
Account ID
..
..
Projected Balance
..
Client Balance
Client ID
..
..
Projected Balance
..
Client Balances calculation with reference data
Cascading Calculators
KAFKA SUMMIT 2023 13
Cash
Transactions
Topic
Account Balance
Service
Account
Balances
Topic
Client Balance
Service
Client Balances
Topic
API
Client to Account Reference
Client to Account Cache
..
.. .. ..
Client Balance
Client ID
..
..
Projected Balance
..
Client Limits
Topic
streaming sources
..
.. .. ..
Client Limit Monitor
Service
..
.. .. ..
Limit Breach
Topic
Client Limit
Client ID
..
..
Limit
..
Client Limit Breach
Client ID
Projected Balance
Limit
Breach Y/N..
Cascading Calculators
Client Limits using Kafka Join
KAFKA SUMMIT 2023 14
Client Limits
Topic
..
.. .. .. Client Limit Monitor
Service
..
.. .. ..
Limit Breach
Topic
Client Limit
Client ID
..
..
Limit
..
Client Limit Breach
Client ID
Projected Balance
Limit
Breach Y/N..
Client Balances
Topic
..
.. .. ..
Client Balance
Client ID
..
..
Projected Balance
..
Time Client Projected
Balance
Limit Breach Y/N
T0 C1 190 200 N
T1 C1 190 180 Y
… …
Tn C2 150 200 N
Time Client Limit
T0 C1 200
T1 C1 180
… …
Tn C2 200
Time Client Projected
Balance
T0 C1 190
T1 C1 190
… …
Tn C2 150
Kafka Join
Kafka stream
Kafka stream
Kafka stream
Cascading Calculators
Client Limits – Breach
KAFKA SUMMIT 2023 15
15
Cascading Calculations – Kafka Join
Kafka Join Choice
Matching Key => Client ID
Client Balances
Topic
Client Limit
Topic
Ktable
Client Balance
Ktable
Client Limit
Time Client ID
Client
Balance
Limit
Balance,
Breach
1 1 10 15 10,Y
2 2 -10 -10,
3 2 -20 15 -20,Y
4 2 30 15 30,N
5 2 40 30,Y
6 3 100
7 3 40 40,Y
..
n
KAFKA SUMMIT 2023 16
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID = A
..
..
Projected Balance = 10
Transaction
Transaction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
..
.. .. ..
Transaction
Transaction Id = 2
Account Id = A
..
Debit or Credit = D
Amount = 25
..
Transaction
Transaction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
Account Balance
Account ID = A
..
..
Projected Balance = 35
Handling duplicate events
Kafka stream
Idempotency
KAFKA SUMMIT 2023 17
Deep Dive
State Store
Handling duplicate events
Transaction
Transaction Id = 1
..
Transaction
Transaction Id = 1
..
Does
TransactionID
exist in SS?
Add Balance Remove Balance
Yes
No
Add Balance
Amount = 100
Account Bal = 100
Amount = 100
Account Bal = 0
Account Bal = 100
final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()->
createNewAccountBalance(),
(key, value, aggregate) -> addAccountBalance(value, aggregate),
(key, value, aggregate) -> removeAccountBalance(value, aggregate),
Materialized.with(Serdes.String(), new JSONSerde<>())
);
KAFKA SUMMIT 2023 18
Cash
Transactions
Account
Balances
Account
Balance Service
Client
Balances
Transaction
Transaction ID
Amount = 10
Eventual Consistency
Saga – Rollback transactions
Client Balance
Service
Update
Account
Balance
Account Balance
Account ID
Transaction ID
Balance += 10
Update
Client
Balance
Client Balance
Client ID
Account ID
Transaction ID
Cancel
Transaction
Transaction
Transaction ID
Cancel = Y
Rollback
Account
Balance
Account Balance
Account ID
Transaction ID
Balance -= 10
Error
Update
Client
Balance
Client Balance
Client ID
Account ID
Transaction ID
Eventually
Consistent
Success
KAFKA SUMMIT 2023 19
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID = A
..
..
Projected Balance = 10
Transaction
Instruction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
..
.. .. ..
Transaction
Instruction Id = 2
Account Id = A
..
Debit or Credit = D
Amount = 25
..
Transaction
Instruction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
Cancel = Yes
..
Account Balance
Account ID = A
..
..
Projected Balance = 35
Account Balance
Account ID = A
..
..
Projected Balance = 25
Handling Cancellations/Rollbacks
Kafka stream
KAFKA SUMMIT 2023 20
Deep Dive
KTable<AccountId, AccountBalance> accBalAggregate =
groupedTable.aggregate(AccountBalance::getNewInstance,
new AddBalanceAggregator(),
new RemoveBalanceAggregator(),
Materialized…)
State Store
Handling Cancels
Transaction
Transaction Id = 1
Cancel = Yes
..
Transaction
Transaction Id = 1
..
Does
TransactionID
exist in SS?
Add Balance Remove Balance
Yes
No
Add Balance
(If Cancel = Yes,
then do nothing)
Amount = 100
Account Bal = 100
Amount = 100
Account Bal = 0
Account Bal = 0
KAFKA SUMMIT 2023 21
Decouple Services
Event Carried State Transfer
§ Enrich the events with State
§ Services receive all the details for calc from events
§ Reduced callbacks between services
§ Scalable/reduced bottlenecks
§ Improved resilience – Services function
independently, even when other services are
briefly down
Where is the state?
Service 1
Service 6
Service 5
Service 4
Service 2
Service 3
Service 1
Service 6
Service 5
Service 4
Service 2
Service 3
..
..
..
..
..
..
..
..
..
..
..
..
..
..
KAFKA SUMMIT 2023 22
Message Retention
.. d1 .. d4 .. d1 ..
.. dn .. .. ..
.. .. .. ..
ü Retention Period = 3 days
Time
Day 4
Day 1
.. d1 .. d4 .. d1 .. d2
.. dn .. d3 .. d6 .. d4
.. d7 .. d5 .. d5 .. d7
Day 1
.. d5 .. d6 .. d6
Arrival Day
Day 1
Arrival Day Day 2 Day 3
Expires on Day 4
KAFKA SUMMIT 2023 23
.. d1 .. d4 .. d1 .. d2
.. dn .. d3 .. d6 .. d4
.. d7 .. d5 .. d5 .. d7
.. d5 .. d6 .. d6
Day 1
Arrival Day Day 2 Day 3
Event
Regeneration
Service
Data Store
Day 4
.. d4 .. dn
Message Retention
Event Regeneration
KAFKA SUMMIT 2023 24
Account
Balance
Service
Account
Balance
Service
Service
1
Account
Balance
Service
Account
Balance
Service
Service
2
Account
Balance
Service
Account
Balance
Service
Service
N
Kafka Cluster
Topic 1 Topic 2 Topic N
Web Socket
Read Model
<Cache>
Ingest data
<K, V(t2)>
<K, V(t3)> <K, V(t1)> <K, V(t)> <K, V(t)>
Last Point in time Value
User Views
Real-time & historical views
Long term
Storage
API
KAFKA SUMMIT 2023 25
Summary
Takeaways
q Managing consistency and state across distributed systems is critical
q Kafka streamlines the distribution of large volumes of data between services in
near real-time
q Kafka streams help in building scalable microservices and to implement scalable
distributed event-based architectures
q Blending near term and long-term data infrastructure with Kafka provides the
ability to build interactive stateful user views
KAFKA SUMMIT 2023 26
Questions?

Mais conteúdo relacionado

Semelhante a Powering Consistent, High-throughput, Real-time Distributed Calculation Engines Using Kafka Streams with Kamlesh Shah

java and javascript api dev guide
java and javascript api dev guidejava and javascript api dev guide
java and javascript api dev guide
Zenita Smythe
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
confluent
 

Semelhante a Powering Consistent, High-throughput, Real-time Distributed Calculation Engines Using Kafka Streams with Kamlesh Shah (20)

Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
 
Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...Building microservices with Scala, functional domain models and Spring Boot (...
Building microservices with Scala, functional domain models and Spring Boot (...
 
QCon 2019 - Opportunities and Pitfalls of Event-Driven Utopia
QCon 2019 - Opportunities and Pitfalls of Event-Driven UtopiaQCon 2019 - Opportunities and Pitfalls of Event-Driven Utopia
QCon 2019 - Opportunities and Pitfalls of Event-Driven Utopia
 
java and javascript api dev guide
java and javascript api dev guidejava and javascript api dev guide
java and javascript api dev guide
 
Payments On Rails
Payments On RailsPayments On Rails
Payments On Rails
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Sharing Microsoft RMS Data with QuickBooks
Sharing Microsoft RMS Data with QuickBooksSharing Microsoft RMS Data with QuickBooks
Sharing Microsoft RMS Data with QuickBooks
 
Dependency injection - the right way
Dependency injection - the right wayDependency injection - the right way
Dependency injection - the right way
 
Cqrs, Event Sourcing
Cqrs, Event SourcingCqrs, Event Sourcing
Cqrs, Event Sourcing
 
Introduction to Domain driven design (LaravelBA #5)
Introduction to Domain driven design (LaravelBA #5)Introduction to Domain driven design (LaravelBA #5)
Introduction to Domain driven design (LaravelBA #5)
 
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
 
MuCon London 2017: Break your event chains
MuCon London 2017: Break your event chainsMuCon London 2017: Break your event chains
MuCon London 2017: Break your event chains
 
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
Building and deploying microservices with event sourcing, CQRS and Docker (Ha...
 
MicroCPH - Managing data consistency in a microservice architecture using Sagas
MicroCPH - Managing data consistency in a microservice architecture using SagasMicroCPH - Managing data consistency in a microservice architecture using Sagas
MicroCPH - Managing data consistency in a microservice architecture using Sagas
 
The Future of Distributed Databases is Relational
The Future of Distributed Databases is RelationalThe Future of Distributed Databases is Relational
The Future of Distributed Databases is Relational
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 

Mais de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Mais de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Powering Consistent, High-throughput, Real-time Distributed Calculation Engines Using Kafka Streams with Kamlesh Shah

  • 1. Powering consistent, high-throughput, real-time distributed calculation engines using Kafka Streams Kamlesh Shah
  • 2. Intro Kamlesh Shah – Technical Architect Morgan Stanley Kafka Summit 2023, London
  • 3. KAFKA SUMMIT 2023 3 Agenda Topics Covered § Problem statement § Distributed processing • Kafka for data ingestion & distribution • Stream processing § Example distributed application • Kafka stream & State store usage • Stateful & Cascading Calculators • Idempotency • Joining static & not-so-static data • Eventual Consistency with Saga § State-Enriched Events & Decoupling of Services § Time windows and Replay § Architecture for Real-time Ticking Stateful Views § Summary
  • 4. KAFKA SUMMIT 2023 4 Problem statement Transactions Market Data Legacy Systems Business Events Calc.. Risk Calc.. Calc.. Calc.. Regulation Fraud Screen Limits § Data of different types and domains § Internal & External § Distributed processing § Coupling of services § Hard to change § Infrastructure complexity § Scaling § Fault identification § Calculations § Timely § Accurate & consistent § Realtime User views § Critical decision making
  • 5. KAFKA SUMMIT 2023 5 Kafka Data Ingestion & Distribution Transactions Market Data Legacy Systems Business Events Input Streams Stream Processing Kafka stream Analytics & Reporting Data Store Cache API Kafka stream Kafka stream Kafka stream Distributed Processing with Kafka – a stream processing use case
  • 6. KAFKA SUMMIT 2023 6 Example distributed application Client 1 Client N Client 2 Account 1 Account 2 Account N Limit Transaction Transaction Transaction Transaction Transaction Account Now Time 2 … Time N Account 1 Account 2 … Account N Client Now Time 2 … Time N Client 1 nnn xxx Client 2 … Client N Future Balance Business case overview Current Balance
  • 7. KAFKA SUMMIT 2023 7 Example distributed application Transaction Transaction Id = 1 Account Id = aa1 Debit or Credit = Debit Amount = 100 Date = T+1 Transaction Transaction Id = 2 Account Id = aa1 Debit or Credit = Debit Amount = 200 Date = T+1 Transaction Transaction Id = 3 Account Id = aa1 Debit or Credit = Debit Amount = 100 Date = T+2 .. Account Balance Service Account Balance Account Id = aa1 Projected Balance = 200 Date = T+1 Account Balance Account Id = aa1 Projected Balance = 100 Date = T+2 Transactions to Account Balances
  • 8. KAFKA SUMMIT 2023 8 Account Balance Service Cash Transactions Topic Account Balances Topic .. .. .. .. Account Balance Account ID .. .. Date Projected Balance .. Transaction Transaction Id Account Id .. Debit or Credit Date Amount .. .. .. .. .. Consumer 1 Stateful Processor State Store streaming sources Kafka stream Stateful Account Balance Calculator
  • 9. KAFKA SUMMIT 2023 9 Account Balance Calculator – Deep Dive final StreamsBuilder builder = new StreamsBuilder(); final KTable<TransactionId, Transaction> transactionTable = builder.table(transactionsTopic, Materialized…) ); final KGroupedTable<AccountId, Transaction> transactionByAccount = transactionTable.groupBy((k,v) -> KeyValue.pair(v.accountId, v), Grouped.with(…)); final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()-> createNewAccountBalance(), (key, value, aggregate) -> addAccountBalance(value, aggregate), (key, value, aggregate) -> removeAccountBalance(value, aggregate), Materialized.with(Serdes.String(), new JSONSerde<>()) ); accountBalanceKTable.toStream().to(accountBalancesTopic);
  • 10. KAFKA SUMMIT 2023 10 Cash Transactions Topic Account Balances Topic Account Balance Account ID .. .. Projected Balance .. Transaction Transaction Id Account Id .. Debit or Credit Amount .. streaming sources Account Balance Service Consumer N Stateful Processor State Store Account Balance Service Consumer 1 Stateful Processor State Store .. .. .. .. .. .. .. .. Partition 1 Partition N .. .. .. .. .. .. .. .. Partition 1 Partition N Kafka stream Kafka stream Scale Number of Consumers <-> Number of Partitions Stateful Calculations with Kafka Stream – Scaling Consumers Partition Key Partition Key
  • 11. KAFKA SUMMIT 2023 11 Client Balance Service Account Balance Account Id = aa1 Projected Balance = 200 Date = T+1 Account Balance Account Id = aa1 Projected Balance = 100 Date = T+2 Client To Account Map Client Id = 123 Account List [aa1, aa2…] Client Balance Client Id = 123 Projected Balance = 200 Date = T+1 Client Balance Client Id = 123 Projected Balance = 100 Date = T+2 Account Balances to Client Balances
  • 12. KAFKA SUMMIT 2023 12 Cash Transactions Topic Account Balance Service Account Balances Topic Client Balance Service Client Balances Topic API Client to Account Reference Client to Account Cache .. .. .. .. .. .. .. .. Account Balance Account ID .. .. Projected Balance .. Client Balance Client ID .. .. Projected Balance .. Client Balances calculation with reference data Cascading Calculators
  • 13. KAFKA SUMMIT 2023 13 Cash Transactions Topic Account Balance Service Account Balances Topic Client Balance Service Client Balances Topic API Client to Account Reference Client to Account Cache .. .. .. .. Client Balance Client ID .. .. Projected Balance .. Client Limits Topic streaming sources .. .. .. .. Client Limit Monitor Service .. .. .. .. Limit Breach Topic Client Limit Client ID .. .. Limit .. Client Limit Breach Client ID Projected Balance Limit Breach Y/N.. Cascading Calculators Client Limits using Kafka Join
  • 14. KAFKA SUMMIT 2023 14 Client Limits Topic .. .. .. .. Client Limit Monitor Service .. .. .. .. Limit Breach Topic Client Limit Client ID .. .. Limit .. Client Limit Breach Client ID Projected Balance Limit Breach Y/N.. Client Balances Topic .. .. .. .. Client Balance Client ID .. .. Projected Balance .. Time Client Projected Balance Limit Breach Y/N T0 C1 190 200 N T1 C1 190 180 Y … … Tn C2 150 200 N Time Client Limit T0 C1 200 T1 C1 180 … … Tn C2 200 Time Client Projected Balance T0 C1 190 T1 C1 190 … … Tn C2 150 Kafka Join Kafka stream Kafka stream Kafka stream Cascading Calculators Client Limits – Breach
  • 15. KAFKA SUMMIT 2023 15 15 Cascading Calculations – Kafka Join Kafka Join Choice Matching Key => Client ID Client Balances Topic Client Limit Topic Ktable Client Balance Ktable Client Limit Time Client ID Client Balance Limit Balance, Breach 1 1 10 15 10,Y 2 2 -10 -10, 3 2 -20 15 -20,Y 4 2 30 15 30,N 5 2 40 30,Y 6 3 100 7 3 40 40,Y .. n
  • 16. KAFKA SUMMIT 2023 16 Account Balance Service Cash Transactions Topic Account Balances Topic .. .. .. .. Account Balance Account ID = A .. .. Projected Balance = 10 Transaction Transaction Id = 1 Account Id = A .. Debit or Credit = C Amount = 10 .. .. .. .. .. Transaction Transaction Id = 2 Account Id = A .. Debit or Credit = D Amount = 25 .. Transaction Transaction Id = 1 Account Id = A .. Debit or Credit = C Amount = 10 .. Account Balance Account ID = A .. .. Projected Balance = 35 Handling duplicate events Kafka stream Idempotency
  • 17. KAFKA SUMMIT 2023 17 Deep Dive State Store Handling duplicate events Transaction Transaction Id = 1 .. Transaction Transaction Id = 1 .. Does TransactionID exist in SS? Add Balance Remove Balance Yes No Add Balance Amount = 100 Account Bal = 100 Amount = 100 Account Bal = 0 Account Bal = 100 final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()-> createNewAccountBalance(), (key, value, aggregate) -> addAccountBalance(value, aggregate), (key, value, aggregate) -> removeAccountBalance(value, aggregate), Materialized.with(Serdes.String(), new JSONSerde<>()) );
  • 18. KAFKA SUMMIT 2023 18 Cash Transactions Account Balances Account Balance Service Client Balances Transaction Transaction ID Amount = 10 Eventual Consistency Saga – Rollback transactions Client Balance Service Update Account Balance Account Balance Account ID Transaction ID Balance += 10 Update Client Balance Client Balance Client ID Account ID Transaction ID Cancel Transaction Transaction Transaction ID Cancel = Y Rollback Account Balance Account Balance Account ID Transaction ID Balance -= 10 Error Update Client Balance Client Balance Client ID Account ID Transaction ID Eventually Consistent Success
  • 19. KAFKA SUMMIT 2023 19 Account Balance Service Cash Transactions Topic Account Balances Topic .. .. .. .. Account Balance Account ID = A .. .. Projected Balance = 10 Transaction Instruction Id = 1 Account Id = A .. Debit or Credit = C Amount = 10 .. .. .. .. .. Transaction Instruction Id = 2 Account Id = A .. Debit or Credit = D Amount = 25 .. Transaction Instruction Id = 1 Account Id = A .. Debit or Credit = C Amount = 10 Cancel = Yes .. Account Balance Account ID = A .. .. Projected Balance = 35 Account Balance Account ID = A .. .. Projected Balance = 25 Handling Cancellations/Rollbacks Kafka stream
  • 20. KAFKA SUMMIT 2023 20 Deep Dive KTable<AccountId, AccountBalance> accBalAggregate = groupedTable.aggregate(AccountBalance::getNewInstance, new AddBalanceAggregator(), new RemoveBalanceAggregator(), Materialized…) State Store Handling Cancels Transaction Transaction Id = 1 Cancel = Yes .. Transaction Transaction Id = 1 .. Does TransactionID exist in SS? Add Balance Remove Balance Yes No Add Balance (If Cancel = Yes, then do nothing) Amount = 100 Account Bal = 100 Amount = 100 Account Bal = 0 Account Bal = 0
  • 21. KAFKA SUMMIT 2023 21 Decouple Services Event Carried State Transfer § Enrich the events with State § Services receive all the details for calc from events § Reduced callbacks between services § Scalable/reduced bottlenecks § Improved resilience – Services function independently, even when other services are briefly down Where is the state? Service 1 Service 6 Service 5 Service 4 Service 2 Service 3 Service 1 Service 6 Service 5 Service 4 Service 2 Service 3 .. .. .. .. .. .. .. .. .. .. .. .. .. ..
  • 22. KAFKA SUMMIT 2023 22 Message Retention .. d1 .. d4 .. d1 .. .. dn .. .. .. .. .. .. .. ü Retention Period = 3 days Time Day 4 Day 1 .. d1 .. d4 .. d1 .. d2 .. dn .. d3 .. d6 .. d4 .. d7 .. d5 .. d5 .. d7 Day 1 .. d5 .. d6 .. d6 Arrival Day Day 1 Arrival Day Day 2 Day 3 Expires on Day 4
  • 23. KAFKA SUMMIT 2023 23 .. d1 .. d4 .. d1 .. d2 .. dn .. d3 .. d6 .. d4 .. d7 .. d5 .. d5 .. d7 .. d5 .. d6 .. d6 Day 1 Arrival Day Day 2 Day 3 Event Regeneration Service Data Store Day 4 .. d4 .. dn Message Retention Event Regeneration
  • 24. KAFKA SUMMIT 2023 24 Account Balance Service Account Balance Service Service 1 Account Balance Service Account Balance Service Service 2 Account Balance Service Account Balance Service Service N Kafka Cluster Topic 1 Topic 2 Topic N Web Socket Read Model <Cache> Ingest data <K, V(t2)> <K, V(t3)> <K, V(t1)> <K, V(t)> <K, V(t)> Last Point in time Value User Views Real-time & historical views Long term Storage API
  • 25. KAFKA SUMMIT 2023 25 Summary Takeaways q Managing consistency and state across distributed systems is critical q Kafka streamlines the distribution of large volumes of data between services in near real-time q Kafka streams help in building scalable microservices and to implement scalable distributed event-based architectures q Blending near term and long-term data infrastructure with Kafka provides the ability to build interactive stateful user views
  • 26. KAFKA SUMMIT 2023 26 Questions?