Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

The Northwestern Mutual Life Insurance Company – Milwaukee, WI
Using Kafka as a Database
Chad Preisler

2
What does that mean?
Event Sourcing
Transactional database
All data is stored in Kafka Topics.
No traditional Relational Database.
Using Streams DSL, KTable, GlobalKTable, and Stores to process and search for data.

Why Did We Do It?
• Decoupled services
• Easily manage record retention
• Real-time processing
• Immutable log
• Topics can be safely shared using ACLs
• Fault Tolerance: never miss a record
• Confluent Cloud: SLA, Support. Broker just works.

What makes it work?
• Exactly Once Semantics
• Data Retention
• Stream DSL
– KTable
– Stream to KTable joins
– Stream to Stream joins

What is a Stream?
Read-process-write operation on a Kafka topic
Java Stream DSL
• Read from multiple topics and write to one output topic
• Read from and output to one topic
• Read from multiple topics and write to multiple topics
Click to add text

Exactly Once
Guarantees all the following things
happen or are all rolled back:
• Source topic commit.
• Sink topic commit.
• State Store commit.

Topic Retention
• Every topic has a retention period
• Retention periods can be any length of time including indefinitely
• Can easily manage retention times to meet business requirements.

Stream Joins
• Java DSL allows you to join streams together. Similar to relational
databases.
• Join Stream to Stream
• Join Stream to KTable
• Join Stream to GlobalKTable
• Join KTable to Ktable
• All support Inner and Left join.
• Some support outer joins

Ktable
• KTable is an abstraction over a Stream.
• Each data record represents an update.
• You can treat it like a read only table.
• Backed by a RocksDB on the application’s machine.
• Each instance of the stream app will get a portion of the topic data.
• Partitions are split across all instances of the stream application.
– Not all running instances get all the data.
– If three instances of a stream are running and topic has 12 partitions each stream
instance will get 4 partitions worth of data each.

GlobalKTable
• Like a KTable
• Main difference: Each instance of the application gets all the records.
– Data is not split across instances by partition.
• Are completely loaded before the stream starts processing.

KTable Pros/Cons
Pros:
• Loads fast if you run more than one POD.
• Fast lookup
• Will Start processing base on timestamps.
– It will always process in the same order.
Cons:
Topic key for KTable topic and Stream topic need to be the same: Needs to be co-partitioned.
– Streams 2.4 API allows KTable to Ktable joins on foreign keys.
• If your keys are not evenly distributed over partitions loading becomes an issue.
• Will start processing before all records are loaded.

GlobalKTable Pros/Cons
Pro:
• All records load before the stream starts.
• Very fast once loaded.
• Allow joins on non-key values.
Cons:
• Can take a very long time to load.
– Can reuse RocksDB if machine has attached storage
– Just builds the delta if DB already exists

KTable Joins
On start the stream will load the KTable based on timestamps.

Traditional Read/Update Pattern

Kafka Read/Update: Join
Turns out Kafka doesn’t handle
cyclic relationships very well.

Kafka Read/Update: Transformer
Use the Apache Kafka Stream’s
Processor API
• Little bit of work to implement
classes.
• Still get the benefits of KTable
Auto updates via Kafka Stream.
State restored on start up.

Things to remember
• Small values work better than larger values.
• KTables and GlobalKTables load quickly when the keys are normally distributed.
• The first-time stream applications start reading from a topic, they start from the
beginning.
– Regular Kafka Consumers start from the end by default.
– If an existing stream changes the input topic(s) and there are no committed indexes for that topic
then it will start from the beginning.
– Can change default behavior for new input topics with auto.offset.reset
– Only applies when stream application has not committed offsets
– After offsets are committed will continue where it left off

Things to remember
• Don’t write to topics “out of band” in your stream application.
– Use the stream DSL to convert records and write them to topics.
– Don’t create producers while processing in your stream.
• Do be careful with exactly once and external systems.
– Exactly once will send last processed record on crash.
– If a crash in your application happens after external system call, it will continue to call each time
the application respawns.
• Do make sure to set an uncaught exception handler and runtime shutdown hook to
log exceptions and handle shutting down the JVM.

Links
https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/
https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-
does-it/
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#joining
https://docs.confluent.io/platform/current/streams/concepts.html#ktable
https://docs.confluent.io/platform/current/streams/concepts.html#globalktable
https://medium.com/xebia-france/kafka-streams-co-partitioning-requirements-illustrated-
2033f686b19c
https://confluence.nml.com/display/SRE/Kafka+Streams%3A+auto.offset.reset

Links
https://kafka.apache.org/10/documentation/streams/developer-guide/config-
streams.html#default-deserialization-exception-handler
https://kafka.apache.org/10/documentation/streams/developer-guide/write-streams
https://docs.confluent.io/platform/current/streams/developer-guide/write-
streams.html#using-kstreams-within-your-application-code
https://medium.com/@daniyaryeralin/utilizing-kafka-streams-processor-api-and-
implementing-custom-aggregator-6cb23d00eaa7
https://kafka.apache.org/27/documentation/streams/developer-guide/processor-api

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

Semelhante a Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual (20)

Mais de HostedbyConfluent

Mais de HostedbyConfluent (20)

Último

Último (20)

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual