Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

Strategies For Migrating
From SQL to NoSQL — The
Apache Kafka Way
Geetha Anne, Sr Solutions Engineer

Geetha Anne
■ Silicon Valley
■ 2 daughters
■ Cloudera, Servicenow, Hawaiian Airlines prior to
joining Conﬂuent
■ 10 years in the space
■ Software Development, Automation
Engineering/Presales are key areas of expertise
■ Cooking, Singing, Hiking

■ The Problem - Migrating to a modern NoSQL Database is a complex
process
■ Why Conﬂuent - Database and data modernization with Conﬂuent
■ The Solution - Proposed architecture and action plan
■ Takeaways - Food for thought and Next Steps
Agenda

Modern, cloud-native databases power business
critical applications with lower operational
overhead
Self-Managed Databases
● Rigid architecture that makes it
hard to integrate with other
systems
● Expensive in both upfront and
ongoing maintenance costs
● Slower to scale to meet evolving
demands
Cloud Databases
● Lower TCO by decoupling storage
from compute and leveraging
consumption- based pricing
● Increased overall ﬂexibility and
business agility
● Worry free operations with built
into auto-scaling and
maintenance cycles

Integrating multiple legacy system to the cloud
could be a complex, multi-year process
Time and resource intensive
Replacing or refactoring legacy data systems
across environments is not easy. During
which, data visibility can be limited.
Insight blind spots
Getting actionable data from disparate data
sources is cumbersome. Most data insight
comes from nightly loads, merges, and batch
updates to create a complete view.
Data silos across environments
Difﬁculties with integrating multiple data silos
and data formats.
On-Prem
Legacy
Database
Cloud
Cloud
Database
CRM
SaaS
App
Nightly
Reporting
Applications ETL App
Batch Jobs
ETL & Database Syncs

Easily modernize your database by integrating
legacy with the cloud using Confluent
1. Simplify and accelerate migration
Link on-prem and cloud for easy data movement across
environments and process data in ﬂight with ksqlDB stream
processing
2. Stay synchronized in real-time
Move from batch to real-time streaming and access change
data capture technology using Conﬂuent and our CDC
connectors
3. Reduce total cost of ownership
Leverage fully managed services and avoid prohibitive
licensing costs from existing solutions offered by legacy
vendors

Real-time &
Historical
Data
A sale
A shipment
A trade
A customer
interaction
A new paradigm is required for Data in Motion
Continuously process streams of data in real time
“We need to shift our thinking from everything
at rest, to everything in motion.”
Real-Time Stream Processing
Rich, front-end
customer experiences
Real-time, software-driven
business operations

Operationalizing Kafka on your own is difficult
Kafka is hard in experimentation. It gets harder (and riskier) as you add
mission-critical data and use cases.
● Architecture planning
● Cluster sizing
● Cluster provisioning
● Broker settings
● Zookeeper management
● Partition placement & data
durability
● Source/sink connectors
development & maintenance
● Monitoring & reporting tools
setup
● Software patches and upgrades
● Security controls and
integrations
● Failover design & planning
● Mirroring & geo-replication
● Streaming data governance
● Load rebalancing & monitoring
● Expansion planning & execution
● Utilization optimization &
visibility
● Cluster migrations
● Infrastructure & performance
upgrades / enhancements
V
A
L
U
E
1
2
3
4
5
Experimentation
/ Early Interest
Central Nervous
System
Mission critical,
disparate LOBs
Identify a
Project
Mission-critical,
connected LOBs
Key challenges:
Operational burden & resources
Manage and scale platform to support
ever-growing demand
Security & governance
Ensure streaming data is as safe & secure as
data-at-rest as Kafka usage scales
Real-time connectivity &
processing
Leverage valuable legacy data to power
modern, cloud-based apps & experiences
Global availability
Maintain high availability across environments
with minimal downtime

Cloud-native
Inﬁnite
Store unlimited
data on Conﬂuent
to enhance your
real-time apps
and use cases
with a broader set
of data
Global
Create a consistent
data fabric
throughout your
organization by
linking clusters
across your
different
environments
Elastic
Scale up instantly
to meet any
demand and scale
back down to
avoid
over-provisioning
infrastructure

Everywhere
Confluent provides deployment flexibility to span all of your
environments
SELF-MANAGED SOFTWARE
Conﬂuent Platform
The Enterprise Distribution of Apache Kafka
Deploy on-premises or in your private cloud
VM
FULLY MANAGED SERVICE
Conﬂuent Cloud
Cloud-native service for Apache Kafka
Available on the leading public clouds

Three Phase Plan
Modernize your Databases with Confluent
1. Migrate
● Choose the workloads that
you’d like to migrate to the
cloud
● Seamlessly integrate your data
source via managed
Conﬂuent source connectors
2. Optimize
● Perform real-time data
transformations using ksqlDB
● Find the most useful queries
for your cloud data
● Work with our ecosystem of
partners to ﬁnd the best use of
your data
3. Modernize
● Use our managed sink
connectors to send data
into your cloud database
of choice
● Continue migrating workloads
into the cloud
as chances arise

18
Instantly Connect Popular Data Sources & Sinks
130+
pre-built
connectors
100+ Confluent Supported 30+ Partner Supported, Confluent Verified
AWS
Lambda

Modernize and bridge your entire data architecture
with Confluent robust connector portfolio
Modern, cloud-based data
systems
Legacy data
systems
Oracle
Database
ksqlDB
Mainframes
Applications
Cloud-native / SaaS apps
Azure Synapse
Analytics
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Source
Connectors
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Sink
Connectors

3 Modalities of Stream Processing with Confluent
Kafka clients
21
Kafka Streams ksqlDB
ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(),
Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;
Flexibility Simplicity

ksqlDB at a Glance
What is it?
ksqlDB is an event streaming database for working with streams and tables of data.
All the key features of a modern
streaming solution.
Aggregations Joins
Windowing
Event-Time
Dual Query Support
Exactly-Once
Semantics
Out-of-Order
Handling
User-Deﬁned
Functions
Compute Storage
CREATE TABLE activePromotions AS
SELECT rideId,
qualifyPromotion(distanceToDst) AS promotion
FROM locations
GROUP BY rideId
EMIT CHANGES
How does it work?
It separates compute from storage, and scales
elastically in a fault-tolerant manner.
It remains highly available during disruption,
even in the face of failure to a quorum of its
servers.
ksqlDB Kafka
22

Built on the Best Technology,
Available as a Fully-Managed Service
Kafka is the backbone of ksqlDB
ksqlDB is built on top of Kafka’s battle-tested streaming
foundation. Its design re-uses Kafka to achieve elasticity,
fault-tolerance, and scalability for stream processing &
analytics..
Use a fully-managed service
With Confluent Cloud ksqlDB, you need
not worry about any of the details of
running it. You can forget about:
● Clusters
● Brokers
● Scaling
● Upgrading
● Monitoring
Pay only for what you use.
ksqlDB server Kafka
topic
topic
changelog topic
Push & Pull
Queries
Kafka Streams
Engine
Local State
(transient)
topic
Compute Storage
23

Accelerate your migration from legacy on-prem systems
to modern, cloud-based technologies
24
Modern, cloud-based data systems
Legacy data systems
Oracle
Database
ksqlDB
Mainframes
Applications
Cloud-native / SaaS apps
Azure Synapse
Analytics
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Source
Connectors
Expensive,
custom-built
integrations
Expensive,
custom-built
integrations
Sink
Connectors

Confluent the central nervous system of data
25

Confluent Cloud
Fully Managed Connectors
● Limited set of the larger Connector Catalogue
● Elastic scaling with no infrastructure to manage
● Connector networking configuration dependent
on your clusters networking
● Limited configuration options
● Stable Source IPs are Available for certain
connectors

Proposed Architecture
NOSQL DB

Three Phase Plan
Modernize your Database with Confluent
28
1. Migrate
● Choose the workloads that
you’d like to migrate to the
cloud
● Seamlessly integrate your data
source via managed
Conﬂuent source connectors
2. Optimize
● Perform real-time data
transformations using ksqlDB
● Find the most useful queries
for your cloud data
● Work with our ecosystem of
partners to ﬁnd the best use of
your data
3. Modernize
● Use our managed sink
connectors to send data
into your cloud database
of choice
● Continue migrating workloads
into the cloud
as chances arise

Cloud-native, Complete, Everywhere
with Kafka at its core
Inﬁnite Storage
Security &
Data Governance
ksqlDB & Stream
Processing, Analytics
Connectors
APIs, UIs, CLIs
Fully Managed ‘NoOps’
on AWS, Azure, GCP
29

Resources
https://github.com/confluentinc/demo-database-modernization
https://www.confluent.io/blog/real-time-cdc-pipelines-with-oracle-on-gke-using-co
nfluent-connector/?utm_source=linkedin&utm_medium=organicsocial&utm_campa
ign=tm.devx_ch.bp_building-a-real-time-data-pipeline-with-oracle-cdc-and-marklogi
c-using-cfk-and-cloud_content.pipelines

Thank You
Stay in Touch
Geetha Anne
geethaanne.sjsu@gmail.com
Geethaay
github.com/GeethaAnne
www.linkedin.com/in/geetha-anne-8646011a/

Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

Semelhante a Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way