O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 21 Anúncio

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

Baixar para ler offline

You have learned about Kafka event sourcing with streams and using Kafka as a database, but you may be having a tough time wrapping your head around what that means and what challenges you will face. Kafka’s exactly once semantics, data retention rules, and stream DSL make it a great database for real-time transaction processing. This talk will focus on how to use Kafka events as a database. We will talk about using KTables vs GlobalKTables, and how to apply them to patterns we use with traditional databases. We will go over a real-world example of joining events against existing data and some issues to be aware of. We will finish covering some important things to remember about state stores, partitions, and streams to help you avoid problems when your data sets become large.

You have learned about Kafka event sourcing with streams and using Kafka as a database, but you may be having a tough time wrapping your head around what that means and what challenges you will face. Kafka’s exactly once semantics, data retention rules, and stream DSL make it a great database for real-time transaction processing. This talk will focus on how to use Kafka events as a database. We will talk about using KTables vs GlobalKTables, and how to apply them to patterns we use with traditional databases. We will go over a real-world example of joining events against existing data and some issues to be aware of. We will finish covering some important things to remember about state stores, partitions, and streams to help you avoid problems when your data sets become large.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual (20)

Anúncio

Mais de HostedbyConfluent (20)

Mais recentes (20)

Anúncio

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisler, Northwestern Mutual

  1. 1. The Northwestern Mutual Life Insurance Company – Milwaukee, WI Using Kafka as a Database Chad Preisler
  2. 2. 2 What does that mean? Event Sourcing Transactional database All data is stored in Kafka Topics. No traditional Relational Database. Using Streams DSL, KTable, GlobalKTable, and Stores to process and search for data.
  3. 3. Why Did We Do It? • Decoupled services • Easily manage record retention • Real-time processing • Immutable log • Topics can be safely shared using ACLs • Fault Tolerance: never miss a record • Confluent Cloud: SLA, Support. Broker just works.
  4. 4. What makes it work? • Exactly Once Semantics • Data Retention • Stream DSL – KTable – Stream to KTable joins – Stream to Stream joins
  5. 5. What is a Stream? Read-process-write operation on a Kafka topic Java Stream DSL • Read from multiple topics and write to one output topic • Read from and output to one topic • Read from multiple topics and write to multiple topics Click to add text
  6. 6. Exactly Once Guarantees all the following things happen or are all rolled back: • Source topic commit. • Sink topic commit. • State Store commit.
  7. 7. Topic Retention • Every topic has a retention period • Retention periods can be any length of time including indefinitely • Can easily manage retention times to meet business requirements.
  8. 8. Stream Joins • Java DSL allows you to join streams together. Similar to relational databases. • Join Stream to Stream • Join Stream to KTable • Join Stream to GlobalKTable • Join KTable to Ktable • All support Inner and Left join. • Some support outer joins
  9. 9. Ktable • KTable is an abstraction over a Stream. • Each data record represents an update. • You can treat it like a read only table. • Backed by a RocksDB on the application’s machine. • Each instance of the stream app will get a portion of the topic data. • Partitions are split across all instances of the stream application. – Not all running instances get all the data. – If three instances of a stream are running and topic has 12 partitions each stream instance will get 4 partitions worth of data each.
  10. 10. GlobalKTable • Like a KTable • Main difference: Each instance of the application gets all the records. – Data is not split across instances by partition. • Are completely loaded before the stream starts processing.
  11. 11. KTable Pros/Cons Pros: • Loads fast if you run more than one POD. • Fast lookup • Will Start processing base on timestamps. – It will always process in the same order. Cons: Topic key for KTable topic and Stream topic need to be the same: Needs to be co-partitioned. – Streams 2.4 API allows KTable to Ktable joins on foreign keys. • If your keys are not evenly distributed over partitions loading becomes an issue. • Will start processing before all records are loaded.
  12. 12. GlobalKTable Pros/Cons Pro: • All records load before the stream starts. • Very fast once loaded. • Allow joins on non-key values. Cons: • Can take a very long time to load. – Can reuse RocksDB if machine has attached storage – Just builds the delta if DB already exists
  13. 13. KTable Joins On start the stream will load the KTable based on timestamps.
  14. 14. Traditional Read/Update Pattern
  15. 15. Kafka Read/Update: Join Turns out Kafka doesn’t handle cyclic relationships very well.
  16. 16. Kafka Read/Update: Transformer Use the Apache Kafka Stream’s Processor API • Little bit of work to implement classes. • Still get the benefits of KTable Auto updates via Kafka Stream. State restored on start up.
  17. 17. Things to remember • Small values work better than larger values. • KTables and GlobalKTables load quickly when the keys are normally distributed. • The first-time stream applications start reading from a topic, they start from the beginning. – Regular Kafka Consumers start from the end by default. – If an existing stream changes the input topic(s) and there are no committed indexes for that topic then it will start from the beginning. – Can change default behavior for new input topics with auto.offset.reset – Only applies when stream application has not committed offsets – After offsets are committed will continue where it left off
  18. 18. Things to remember • Don’t write to topics “out of band” in your stream application. – Use the stream DSL to convert records and write them to topics. – Don’t create producers while processing in your stream. • Do be careful with exactly once and external systems. – Exactly once will send last processed record on crash. – If a crash in your application happens after external system call, it will continue to call each time the application respawns. • Do make sure to set an uncaught exception handler and runtime shutdown hook to log exceptions and handle shutting down the JVM.
  19. 19. Links https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/ https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka- does-it/ https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#joining https://docs.confluent.io/platform/current/streams/concepts.html#ktable https://docs.confluent.io/platform/current/streams/concepts.html#globalktable https://medium.com/xebia-france/kafka-streams-co-partitioning-requirements-illustrated- 2033f686b19c https://confluence.nml.com/display/SRE/Kafka+Streams%3A+auto.offset.reset
  20. 20. Links https://kafka.apache.org/10/documentation/streams/developer-guide/config- streams.html#default-deserialization-exception-handler https://kafka.apache.org/10/documentation/streams/developer-guide/write-streams https://docs.confluent.io/platform/current/streams/developer-guide/write- streams.html#using-kstreams-within-your-application-code https://medium.com/@daniyaryeralin/utilizing-kafka-streams-processor-api-and- implementing-custom-aggregator-6cb23d00eaa7 https://kafka.apache.org/27/documentation/streams/developer-guide/processor-api
  21. 21. Thank You

×