"Building financial-grade applications involve performing complex calculations over a wide range of data from across different domains, with challenges including stringent accuracy requirements, latency constraints, along with the need to share states across distributed services.
During this session, I will cover how, at Morgan Stanley, we built a real-time, microservices based Liquidity Management platform using event streaming with Kafka Streams API, to tackle high volumes of data and to perform calculations on cross domain events, spanning wide time windows over the past and the future.
I will demonstrate how we used Kafka Streams & state stores, along with patterns like Saga to achieve eventual data consistency and use state-enriched events to decouple services when transferring them through multiple business domains. I will cover mechanisms to ensure accuracy and transparency with idempotency at heart along with error detection and replay strategies.
Finally, I will look at how we used a high-performant in-memory cache to stage the results of cascaded KStream based calculation engines, which powered our high-speed, ticking and stateful data visualisations."
2. Intro
Kamlesh Shah – Technical Architect
Morgan Stanley
Kafka Summit 2023, London
3. KAFKA SUMMIT 2023 3
Agenda
Topics Covered
§ Problem statement
§ Distributed processing
• Kafka for data ingestion & distribution
• Stream processing
§ Example distributed application
• Kafka stream & State store usage
• Stateful & Cascading Calculators
• Idempotency
• Joining static & not-so-static data
• Eventual Consistency with Saga
§ State-Enriched Events & Decoupling of Services
§ Time windows and Replay
§ Architecture for Real-time Ticking Stateful Views
§ Summary
4. KAFKA SUMMIT 2023 4
Problem statement
Transactions Market Data
Legacy
Systems
Business
Events
Calc..
Risk
Calc.. Calc.. Calc..
Regulation
Fraud
Screen
Limits
§ Data of different types and
domains
§ Internal & External
§ Distributed processing
§ Coupling of services
§ Hard to change
§ Infrastructure complexity
§ Scaling
§ Fault identification
§ Calculations
§ Timely
§ Accurate & consistent
§ Realtime User views
§ Critical decision making
5. KAFKA SUMMIT 2023 5
Kafka Data Ingestion & Distribution
Transactions Market Data
Legacy
Systems
Business
Events
Input Streams
Stream Processing
Kafka
stream
Analytics & Reporting Data Store
Cache
API
Kafka
stream
Kafka
stream
Kafka
stream
Distributed Processing with Kafka – a stream processing use case
6. KAFKA SUMMIT 2023 6
Example distributed application
Client 1
Client N
Client 2
Account 1
Account 2
Account N
Limit
Transaction
Transaction
Transaction
Transaction
Transaction
Account Now Time 2 … Time N
Account 1
Account 2
…
Account N
Client Now Time 2 … Time N
Client 1 nnn xxx
Client 2
…
Client N
Future Balance
Business case overview
Current Balance
7. KAFKA SUMMIT 2023 7
Example distributed application
Transaction
Transaction Id = 1
Account Id = aa1
Debit or Credit = Debit
Amount = 100
Date = T+1
Transaction
Transaction Id = 2
Account Id = aa1
Debit or Credit = Debit
Amount = 200
Date = T+1
Transaction
Transaction Id = 3
Account Id = aa1
Debit or Credit = Debit
Amount = 100
Date = T+2
..
Account Balance
Service
Account Balance
Account Id = aa1
Projected Balance = 200
Date = T+1
Account Balance
Account Id = aa1
Projected Balance = 100
Date = T+2
Transactions to Account Balances
8. KAFKA SUMMIT 2023 8
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID
..
..
Date
Projected Balance
..
Transaction
Transaction Id
Account Id
..
Debit or Credit
Date
Amount
..
..
.. .. ..
Consumer 1
Stateful Processor
State
Store
streaming sources
Kafka stream
Stateful Account Balance Calculator
9. KAFKA SUMMIT 2023 9
Account Balance Calculator – Deep Dive
final StreamsBuilder builder = new StreamsBuilder();
final KTable<TransactionId, Transaction> transactionTable =
builder.table(transactionsTopic,
Materialized…)
);
final KGroupedTable<AccountId, Transaction> transactionByAccount =
transactionTable.groupBy((k,v) ->
KeyValue.pair(v.accountId, v),
Grouped.with(…));
final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()->
createNewAccountBalance(),
(key, value, aggregate) -> addAccountBalance(value, aggregate),
(key, value, aggregate) -> removeAccountBalance(value, aggregate),
Materialized.with(Serdes.String(), new JSONSerde<>())
);
accountBalanceKTable.toStream().to(accountBalancesTopic);
10. KAFKA SUMMIT 2023 10
Cash Transactions
Topic
Account Balances
Topic
Account Balance
Account ID
..
..
Projected Balance
..
Transaction
Transaction Id
Account Id
..
Debit or Credit
Amount
..
streaming sources
Account Balance
Service Consumer N
Stateful Processor
State
Store
Account Balance
Service Consumer 1
Stateful Processor
State
Store
..
.. .. ..
..
.. .. ..
Partition 1
Partition N
..
.. .. ..
..
.. .. ..
Partition 1
Partition N
Kafka stream
Kafka stream
Scale Number of Consumers <-> Number of Partitions
Stateful Calculations with Kafka Stream – Scaling Consumers
Partition Key Partition Key
11. KAFKA SUMMIT 2023 11
Client Balance
Service
Account Balance
Account Id = aa1
Projected Balance = 200
Date = T+1
Account Balance
Account Id = aa1
Projected Balance = 100
Date = T+2
Client To Account Map
Client Id = 123
Account List [aa1, aa2…]
Client Balance
Client Id = 123
Projected Balance = 200
Date = T+1
Client Balance
Client Id = 123
Projected Balance = 100
Date = T+2
Account Balances to Client Balances
12. KAFKA SUMMIT 2023 12
Cash
Transactions
Topic
Account Balance
Service
Account
Balances
Topic
Client Balance
Service
Client Balances
Topic
API
Client to Account Reference
Client to Account Cache
..
.. .. .. ..
.. .. ..
Account Balance
Account ID
..
..
Projected Balance
..
Client Balance
Client ID
..
..
Projected Balance
..
Client Balances calculation with reference data
Cascading Calculators
13. KAFKA SUMMIT 2023 13
Cash
Transactions
Topic
Account Balance
Service
Account
Balances
Topic
Client Balance
Service
Client Balances
Topic
API
Client to Account Reference
Client to Account Cache
..
.. .. ..
Client Balance
Client ID
..
..
Projected Balance
..
Client Limits
Topic
streaming sources
..
.. .. ..
Client Limit Monitor
Service
..
.. .. ..
Limit Breach
Topic
Client Limit
Client ID
..
..
Limit
..
Client Limit Breach
Client ID
Projected Balance
Limit
Breach Y/N..
Cascading Calculators
Client Limits using Kafka Join
16. KAFKA SUMMIT 2023 16
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID = A
..
..
Projected Balance = 10
Transaction
Transaction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
..
.. .. ..
Transaction
Transaction Id = 2
Account Id = A
..
Debit or Credit = D
Amount = 25
..
Transaction
Transaction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
Account Balance
Account ID = A
..
..
Projected Balance = 35
Handling duplicate events
Kafka stream
Idempotency
17. KAFKA SUMMIT 2023 17
Deep Dive
State Store
Handling duplicate events
Transaction
Transaction Id = 1
..
Transaction
Transaction Id = 1
..
Does
TransactionID
exist in SS?
Add Balance Remove Balance
Yes
No
Add Balance
Amount = 100
Account Bal = 100
Amount = 100
Account Bal = 0
Account Bal = 100
final KTable<AccountId, AccountBalance> accountBalanceKTable = transactionByAccount.aggregate(()->
createNewAccountBalance(),
(key, value, aggregate) -> addAccountBalance(value, aggregate),
(key, value, aggregate) -> removeAccountBalance(value, aggregate),
Materialized.with(Serdes.String(), new JSONSerde<>())
);
18. KAFKA SUMMIT 2023 18
Cash
Transactions
Account
Balances
Account
Balance Service
Client
Balances
Transaction
Transaction ID
Amount = 10
Eventual Consistency
Saga – Rollback transactions
Client Balance
Service
Update
Account
Balance
Account Balance
Account ID
Transaction ID
Balance += 10
Update
Client
Balance
Client Balance
Client ID
Account ID
Transaction ID
Cancel
Transaction
Transaction
Transaction ID
Cancel = Y
Rollback
Account
Balance
Account Balance
Account ID
Transaction ID
Balance -= 10
Error
Update
Client
Balance
Client Balance
Client ID
Account ID
Transaction ID
Eventually
Consistent
Success
19. KAFKA SUMMIT 2023 19
Account Balance
Service
Cash Transactions
Topic
Account Balances
Topic
..
.. .. ..
Account Balance
Account ID = A
..
..
Projected Balance = 10
Transaction
Instruction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
..
..
.. .. ..
Transaction
Instruction Id = 2
Account Id = A
..
Debit or Credit = D
Amount = 25
..
Transaction
Instruction Id = 1
Account Id = A
..
Debit or Credit = C
Amount = 10
Cancel = Yes
..
Account Balance
Account ID = A
..
..
Projected Balance = 35
Account Balance
Account ID = A
..
..
Projected Balance = 25
Handling Cancellations/Rollbacks
Kafka stream
20. KAFKA SUMMIT 2023 20
Deep Dive
KTable<AccountId, AccountBalance> accBalAggregate =
groupedTable.aggregate(AccountBalance::getNewInstance,
new AddBalanceAggregator(),
new RemoveBalanceAggregator(),
Materialized…)
State Store
Handling Cancels
Transaction
Transaction Id = 1
Cancel = Yes
..
Transaction
Transaction Id = 1
..
Does
TransactionID
exist in SS?
Add Balance Remove Balance
Yes
No
Add Balance
(If Cancel = Yes,
then do nothing)
Amount = 100
Account Bal = 100
Amount = 100
Account Bal = 0
Account Bal = 0
21. KAFKA SUMMIT 2023 21
Decouple Services
Event Carried State Transfer
§ Enrich the events with State
§ Services receive all the details for calc from events
§ Reduced callbacks between services
§ Scalable/reduced bottlenecks
§ Improved resilience – Services function
independently, even when other services are
briefly down
Where is the state?
Service 1
Service 6
Service 5
Service 4
Service 2
Service 3
Service 1
Service 6
Service 5
Service 4
Service 2
Service 3
..
..
..
..
..
..
..
..
..
..
..
..
..
..
22. KAFKA SUMMIT 2023 22
Message Retention
.. d1 .. d4 .. d1 ..
.. dn .. .. ..
.. .. .. ..
ü Retention Period = 3 days
Time
Day 4
Day 1
.. d1 .. d4 .. d1 .. d2
.. dn .. d3 .. d6 .. d4
.. d7 .. d5 .. d5 .. d7
Day 1
.. d5 .. d6 .. d6
Arrival Day
Day 1
Arrival Day Day 2 Day 3
Expires on Day 4
23. KAFKA SUMMIT 2023 23
.. d1 .. d4 .. d1 .. d2
.. dn .. d3 .. d6 .. d4
.. d7 .. d5 .. d5 .. d7
.. d5 .. d6 .. d6
Day 1
Arrival Day Day 2 Day 3
Event
Regeneration
Service
Data Store
Day 4
.. d4 .. dn
Message Retention
Event Regeneration
24. KAFKA SUMMIT 2023 24
Account
Balance
Service
Account
Balance
Service
Service
1
Account
Balance
Service
Account
Balance
Service
Service
2
Account
Balance
Service
Account
Balance
Service
Service
N
Kafka Cluster
Topic 1 Topic 2 Topic N
Web Socket
Read Model
<Cache>
Ingest data
<K, V(t2)>
<K, V(t3)> <K, V(t1)> <K, V(t)> <K, V(t)>
Last Point in time Value
User Views
Real-time & historical views
Long term
Storage
API
25. KAFKA SUMMIT 2023 25
Summary
Takeaways
q Managing consistency and state across distributed systems is critical
q Kafka streamlines the distribution of large volumes of data between services in
near real-time
q Kafka streams help in building scalable microservices and to implement scalable
distributed event-based architectures
q Blending near term and long-term data infrastructure with Kafka provides the
ability to build interactive stateful user views