SlideShare uma empresa Scribd logo
1 de 22
1
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
August 13, 2022 at DataConLA
Real Time Data Streaming with Kafka
Speaker:
Jie Chen
Manager Advisory
Engineering Architect
LinkedIn
2
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Agenda
Kafka at a Glance
Kafka Use Cases
Key Takeaways
Q&A
Intelligent Forecast System
Kafka in Banking
Distributed Data with CQRS
5
min
20
min
5
min
10
min
3
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka at a Glance
4
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in the Market
CORE CAPABILITIES
Scalable
Scale production clusters up to a
thousand brokers, trillion of
messages per day, petabytes of
data, hundreds of thousands of
partitions. Elastically expand and
contract storage and processing.
High Throughput
Deliver messages at network limited
throughput using a cluster of machines
with latencies as low as 2ms
Permanent Storage
Store streams of data safely in a
distributed, durable, fault tolerant cluster
High Availability
Stretch clusters efficiently over availability
zones or connect separate clusters across
geographic regions
Source: kafka.apache.org
5
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka Platform Overview
Event Streaming Platform
Distributed streaming platform
that enables real-time, event-
driven applications using a topic-
based pub-sub model
Performance at Scale
Kafka operates as a highly-
available and fault-tolerant
cluster that spans servers and
even data centers with a
partitioning system that supports
data volumes of practically any
size
https://docs.confluent.io/
6
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
What is Event Driven Streaming with Kafka
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Topic
Partition
Partition
Partition
Brokers (Servers) Web
Mobile
Data Warehouse
Monitor Tool
Partners
Subscribing
Publishing
Data Draining
Producers Consumers
Kafka Cluster
An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM)
For example, first time user registration, payment, social media post etc.
7
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with
CQRS and Kafka
8
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS at a Glance
Overview
Command Query
Responsibility Segregation
Read and write workloads are
separated, decoupled, and
scaled independently.
Event Sourcing
CQRS is often linked with event
sourcing – Effectively viewing
data state as a series of discrete
events.
Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store
(Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
9
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Traditional Design
Difficult to Scale
SOR must be able to support the load
of all clients and systems. Read
replicas can improve scalability.
Single Point of Failure
If SOR or API layer is unavailable, all
consumers may be affected
Rigid
All access to SOR data flows through
centralized APIs. Consumers receive
data in the schemas set up by access
layer.
Difficult to Manipulate Data
Data access to SOR directly is
restricted. Transforms, joins, and
analytical operations may be difficult
and rely on lagging ETL operations
Client: external facing UI, third party apis
System: internal facing ETL, mainframe
SOR: System of Record (the authoritative data source)
10
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS Design
Data Changes as Events
Current state of SOR is captured
through an event format
Consumer Subscribe to
Changes
Consumers listen to data event
changes and consume the information
according to their own use case
Other Systems Act on Data
Systems act on data updates as
defined by use case. Systems may
replicate the data, enrich the data, or
simply process events in real-time
Read / Write Separation
Data read is segregated from data
write. Read only consumers introduce
no additional load to SOR.
11
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka Advantages and Challenges
Independent Scaling
Read and write workloads may
be scaled independently based
on load and access patterns
Separation of Concerns
Segregated models allow for
tightly controlled write logic while
permitting flexibility in read
models and stream processing
System Isolation
Access to the SOR database is
restricted to a controlled write
API. Consumers may safely read
from a replica
Flexible Consumption
Kafka’s scalable architecture
allows for consumers to process
events differently across systems
at different velocities
Eventual Consistency
Reads will be eventually
consistent and may have some
delay until writes have
propagated through the system
Complexity
Implementation of the pattern
increases complexity of the
overall solution
Different Data Velocity
Consumers may process events
at different velocities, resulting in
inconsistencies across systems
Advantages Challenges
12
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Common Scenarios
Complex Data Operations Across
Systems
Different systems need to
process and transform data in
complex and evolving use cases.
Real-time Data Processing
Across Systems
Traditional ETL and batch
operations are too slow and rigid
to meet evolving business
requirements. Organization
seeks to process data in real-
time as it becomes available
across different systems.
Resource Bottlenecks with
Growing Demand
Traditional data system
resources are strained and
unable to support growing
demands of business.
Scenarios to Consider
Data Security Concerns Across
Systems
Data must be shared securely
across systems without
introducing new security risks.
Increased Demand for Data
Sharing Across Enterprise
Enterprise seeks to break down
data silos and share data
effectively across the
organization increase synergy
between systems.
13
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast
with Kafka
14
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast with Native Kafka Solution
ELK Stack
Elasticsearch
Storage
Kafka Connector API
Indexing
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Producers Consumers
Kafka Cluster
Publishing Subscribing
Data
Draining
Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs
(connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The
customized data can then be rendered through Kibana dashboard.
15
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Challenges
Kafka Connector
Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to
tackle the scalability bottleneck while reducing the licensing cost.
PII encryption
While considering Kafka security library and in house solution, it is important to establish the early PII governance
among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data
throughout the real data streaming pipeline.
Intelligent Forecast with Native Kafka Solution
Key Design
Pub/Sub, decoupled
and asynchronous
messaging service
for scalability
Equivalent Solutions
Azure Event hub
Google Pub/Sub
AWS Kinesis
Proactive Analytics
in use cases such as the capability of detecting
and forecasting the abnormal trend outside of
the threshold: transaction fraud at ATMs and
restaurant mobile orders.
16
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in Banking
What a Banking Institute’s need to modernize its Legacy System
A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand
in big data through building a modern data streaming platform as part of Business Operation Brain (BOB).
Reuse the existing data centers, storage, infrastructure and security procedures
Scalable and reliable (million transactions/events per second) with the existing infrastructure
Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
What options we have: Kafka and Its Comparables in the marketplace
Not an inclusive options
AWS Kinesis
Open Source, On Prem Managed Cloud Computing
Proprietary
Open Source, On Prem or Managed
Cloud Computing
©
2
0
2
2
K
P
M
G
L
L
P
,
a
D
e
l
a
w
a
r
e
l
i
m
i
t
e
d
l
i
a
b
i
l
i
t
y
p
a
r
t
n
e
r
s
h
i
p
a
n
d
a
m
e
m
b
e
r
f
i
r
m
o
f
t
h
e
K
P
M
G
g
l
o
1
9
Apache Kafka Rabbit MQ
Operation
Cost
Messaging
Immutable, ordered,
replay; User defined
retention policy
Queue/Message index attached with
TTL; Messages are removed once
consumed
Storage
Persistent storage offers
durability and reliability;
Append log
Scalability
Horizonal Scale, Scale Out,
adding more machines to
increase disk I/O
Vertical Scale, Scale up,
adding more CPU, RAM to the
existing machine/hardware
Up to 365 days
Identify KPIs When Evaluating the Options
Autoscaling
Security Customized,
Manual Configuration
Native Cloud Solution Customized,
Manual Configuration
Pay as you go,
Elastic and durable
Messages are removed once
consumed; In memory is preferred
20
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
21
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
CQRS Pattern with Kafka
Use the scale, speed, and reliability of
Kafka as the backbone for an
eventually-consistent distributed data
solutions that allows flexible
consumption models and independent
scaling.
Kafka in Banking
Objectively select the metrics for the
business use case. Design the data
streaming solution that is ready to
scale.
Intelligent Forecast with Kafka
To reap the scalability benefit, design
the Kafka connector solution for future
business growth. PII must be encrypted
throughout Kafka pipeline and
automated.
22
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Q&A

Mais conteúdo relacionado

Semelhante a Data Con LA 2022 - Data Streaming with Kafka

AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAmazon Web Services
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Firewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessFirewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessSumit Sarkar
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...EMC
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021Julian Douch
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 
Hybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and RESTHybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and RESTSumit Sarkar
 
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Amazon Web Services
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory SessionAbhinav Parmar
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2Brandon Watson
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2Pini Cohen
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...IRJET Journal
 
Amazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDBAmazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDBJohn Yeung
 

Semelhante a Data Con LA 2022 - Data Streaming with Kafka (20)

AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Firewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessFirewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data access
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Hybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and RESTHybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and REST
 
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory Session
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
 
Power
PowerPower
Power
 
Amazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDBAmazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDB
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
 

Último

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Data Con LA 2022 - Data Streaming with Kafka

  • 1. 1 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. August 13, 2022 at DataConLA Real Time Data Streaming with Kafka Speaker: Jie Chen Manager Advisory Engineering Architect LinkedIn
  • 2. 2 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Agenda Kafka at a Glance Kafka Use Cases Key Takeaways Q&A Intelligent Forecast System Kafka in Banking Distributed Data with CQRS 5 min 20 min 5 min 10 min
  • 3. 3 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka at a Glance
  • 4. 4 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in the Market CORE CAPABILITIES Scalable Scale production clusters up to a thousand brokers, trillion of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing. High Throughput Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms Permanent Storage Store streams of data safely in a distributed, durable, fault tolerant cluster High Availability Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions Source: kafka.apache.org
  • 5. 5 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka Platform Overview Event Streaming Platform Distributed streaming platform that enables real-time, event- driven applications using a topic- based pub-sub model Performance at Scale Kafka operates as a highly- available and fault-tolerant cluster that spans servers and even data centers with a partitioning system that supports data volumes of practically any size https://docs.confluent.io/
  • 6. 6 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. What is Event Driven Streaming with Kafka ETL Raw Message Queue Change Data Capture Mainframe Customed Topic Partition Partition Partition Brokers (Servers) Web Mobile Data Warehouse Monitor Tool Partners Subscribing Publishing Data Draining Producers Consumers Kafka Cluster An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM) For example, first time user registration, payment, social media post etc.
  • 7. 7 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka
  • 8. 8 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS at a Glance Overview Command Query Responsibility Segregation Read and write workloads are separated, decoupled, and scaled independently. Event Sourcing CQRS is often linked with event sourcing – Effectively viewing data state as a series of discrete events. Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store (Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
  • 9. 9 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Traditional Design Difficult to Scale SOR must be able to support the load of all clients and systems. Read replicas can improve scalability. Single Point of Failure If SOR or API layer is unavailable, all consumers may be affected Rigid All access to SOR data flows through centralized APIs. Consumers receive data in the schemas set up by access layer. Difficult to Manipulate Data Data access to SOR directly is restricted. Transforms, joins, and analytical operations may be difficult and rely on lagging ETL operations Client: external facing UI, third party apis System: internal facing ETL, mainframe SOR: System of Record (the authoritative data source)
  • 10. 10 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS Design Data Changes as Events Current state of SOR is captured through an event format Consumer Subscribe to Changes Consumers listen to data event changes and consume the information according to their own use case Other Systems Act on Data Systems act on data updates as defined by use case. Systems may replicate the data, enrich the data, or simply process events in real-time Read / Write Separation Data read is segregated from data write. Read only consumers introduce no additional load to SOR.
  • 11. 11 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka Advantages and Challenges Independent Scaling Read and write workloads may be scaled independently based on load and access patterns Separation of Concerns Segregated models allow for tightly controlled write logic while permitting flexibility in read models and stream processing System Isolation Access to the SOR database is restricted to a controlled write API. Consumers may safely read from a replica Flexible Consumption Kafka’s scalable architecture allows for consumers to process events differently across systems at different velocities Eventual Consistency Reads will be eventually consistent and may have some delay until writes have propagated through the system Complexity Implementation of the pattern increases complexity of the overall solution Different Data Velocity Consumers may process events at different velocities, resulting in inconsistencies across systems Advantages Challenges
  • 12. 12 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Common Scenarios Complex Data Operations Across Systems Different systems need to process and transform data in complex and evolving use cases. Real-time Data Processing Across Systems Traditional ETL and batch operations are too slow and rigid to meet evolving business requirements. Organization seeks to process data in real- time as it becomes available across different systems. Resource Bottlenecks with Growing Demand Traditional data system resources are strained and unable to support growing demands of business. Scenarios to Consider Data Security Concerns Across Systems Data must be shared securely across systems without introducing new security risks. Increased Demand for Data Sharing Across Enterprise Enterprise seeks to break down data silos and share data effectively across the organization increase synergy between systems.
  • 13. 13 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Kafka
  • 14. 14 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Native Kafka Solution ELK Stack Elasticsearch Storage Kafka Connector API Indexing ETL Raw Message Queue Change Data Capture Mainframe Customed Producers Consumers Kafka Cluster Publishing Subscribing Data Draining Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs (connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The customized data can then be rendered through Kibana dashboard.
  • 15. 15 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Challenges Kafka Connector Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to tackle the scalability bottleneck while reducing the licensing cost. PII encryption While considering Kafka security library and in house solution, it is important to establish the early PII governance among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data throughout the real data streaming pipeline. Intelligent Forecast with Native Kafka Solution Key Design Pub/Sub, decoupled and asynchronous messaging service for scalability Equivalent Solutions Azure Event hub Google Pub/Sub AWS Kinesis Proactive Analytics in use cases such as the capability of detecting and forecasting the abnormal trend outside of the threshold: transaction fraud at ATMs and restaurant mobile orders.
  • 16. 16 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in Banking
  • 17. What a Banking Institute’s need to modernize its Legacy System A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand in big data through building a modern data streaming platform as part of Business Operation Brain (BOB). Reuse the existing data centers, storage, infrastructure and security procedures Scalable and reliable (million transactions/events per second) with the existing infrastructure Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
  • 18. What options we have: Kafka and Its Comparables in the marketplace Not an inclusive options
  • 19. AWS Kinesis Open Source, On Prem Managed Cloud Computing Proprietary Open Source, On Prem or Managed Cloud Computing © 2 0 2 2 K P M G L L P , a D e l a w a r e l i m i t e d l i a b i l i t y p a r t n e r s h i p a n d a m e m b e r f i r m o f t h e K P M G g l o 1 9 Apache Kafka Rabbit MQ Operation Cost Messaging Immutable, ordered, replay; User defined retention policy Queue/Message index attached with TTL; Messages are removed once consumed Storage Persistent storage offers durability and reliability; Append log Scalability Horizonal Scale, Scale Out, adding more machines to increase disk I/O Vertical Scale, Scale up, adding more CPU, RAM to the existing machine/hardware Up to 365 days Identify KPIs When Evaluating the Options Autoscaling Security Customized, Manual Configuration Native Cloud Solution Customized, Manual Configuration Pay as you go, Elastic and durable Messages are removed once consumed; In memory is preferred
  • 20. 20 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways
  • 21. 21 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways CQRS Pattern with Kafka Use the scale, speed, and reliability of Kafka as the backbone for an eventually-consistent distributed data solutions that allows flexible consumption models and independent scaling. Kafka in Banking Objectively select the metrics for the business use case. Design the data streaming solution that is ready to scale. Intelligent Forecast with Kafka To reap the scalability benefit, design the Kafka connector solution for future business growth. PII must be encrypted throughout Kafka pipeline and automated.
  • 22. 22 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Q&A