SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
Performance Tuning RocksDB
for Kafka Streams’ State Store
Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
About the Presenters
Dhruba Borthakur
CTO & Co-founder Rockset
rockset.com
2
Bruno Cadonna
Contributor to Apache Kafka &
Software Engineer at Confluent
confluent.io
Agenda
• Kafka Streams and State Stores
• Introduction to RocksDB
• Compaction Styles in RocksDB
• Possible Operational Issues
• Tuning RocksDB
• RocksDB Command Line Utilities
• Takeaways
3
Kafka Streams and State Stores
Kafka Streams
5
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
6
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
7
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
8
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
9
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
10
● Stateless and stateful processors
● Stateful processors use state stores
Kafka Streams
11
● Stateless and stateful processors
● Stateful processors use state stores
● Create one topology per input partition, i.e., task
State Stores in Kafka Streams
12
• Stateful processor may use one or more state
stores
• Each partition has its own state store
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
State Stores in Kafka Streams
13
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
State Stores in Kafka Streams
14
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
15
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
State Stores in Kafka Streams
16
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
01
10
State Stores in Kafka Streams
17
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
18
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
01
10
State Stores in Kafka Streams
19
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
01
10
State Stores in Kafka Streams
20
01
10
01
10
Metrics &
De-/Serialization
Caching
Changelogging
Restoration
• Stateful processor may use one or more state
stores
• Each partition has its own state store
• State stores are layered:
• collects metrics and de-/serializes records
• caches records
• writes records to changelog
• writes records to local state store
• State stores are restored from changelog
topics
• Restoration is byte-based and by-passes
wrapping layers
RocksDB is the Default State Store
• Kafka Streams needed a write optimized state store
• Kafka Streams 2.6 uses RocksDB 5.18.4
• Kafka Streams provides metrics to monitor RocksDB state stores
• RocksDB can be configured by passing a class that implements interface
RocksDBConfigSetter to configuration rocksdb.config.setter
21
Example: Configuring RocksDB in Kafka Streams
22
public static class MyRocksDBConfig implements RocksDBConfigSetter {
@Override
public void setConfig(final String storeName,
final Options options,
final Map<String, Object> configs) {
// e.g. set compaction style
options.setCompactionStyle(CompactionStyle.LEVEL);
}
@Override
public void close(final String storeName, final Options options) {}
}
Introduction to RocksDB
What is RocksDB?
• Key-value persistent store
• Embedded C++ & Java library
• Server workloads
24
What is it not?
• Not distributed
• No failover
• Not highly-available. If the machine
dies, you lose your data
• Focus on performance
Kafka Streams makes it fault-tolerant
25
RocksDB API
• Keys and values are byte arrays
• Data are stored sorted by key
• Update Operations: Put/Delete/Merge
• Queries: Get/Iterator
26
Log Structured Merge Architecture
27
Periodic
compaction
Read only data
in SSD or disk
Read write data
in RAM
Transaction log
Scan request from
application
Write request
from application
RocksDB Write Path
28
Write request
Read only
MemTable
Log
Log
sst sst sst
sst sst sst
LS
Compaction
Flush
SwitchSwitch
Active
MemTable Log
RocksDB Reads
• Data can be in memory or disk
• Consult multiple files to find the latest
instance of the key
• Use bloom filters to reduce IO
• Every sst file has a bloom filter
• bloom filters are cached in memory
• default config: eliminates 99% of reads
29
RocksDB Read Path
30
Read only
MemTable Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable Log
sst sst sst
Memory
Persistent
Storage
Read
request
Get(k)
Blooms
RocksDB Architecture
31
Read only
MemTable
Log
Log
sst sst sst
LS
Compaction
Flush
Active
MemTable Log
sst sst
Memory
Persistent
Storage
sst
Switch Switch
Write
request
Read only
BlockCache
Read
request
RocksDB Open & Pluggable
32
Pluggable
compaction
Pluggable sst
data format on
storage
Pluggable
Memtable
format in RAM
Transaction log
Blooms
Customizable
WAL
Get or scan request
from application
Write request
from application
Compaction Styles in RocksDB
What is Compaction
• Multi-threaded
• Parallel compactions on different parts of the database
• Deletes overwritten keys
• Two types of compactions
• level compactions
• universal compaction
34
Level compaction
• RocksDB default compaction is Level Compaction (for read heavy workloads)
• Stores data in multiple levels
• More recent data stored in L0
• Older data stored in Lmax
• Files in L0
• overlapping keys, sorted by flush time
• Files in L1 to Lmax
• non overlapping keys, sorted by key
• Max space amplification = 10%
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
35
Universal Compaction
• For write heavy workloads
• needed if Level style compaction is bottlenecked by disk throughout
• Stores all files in L0
• All files are arranged in time order
• Decreases write amplification but increases space amplification
• Pick up files that are chronologically adjacent to one another
• merge them
• replace them with a new file in L0
36
Possible Operational Issues
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, planned for 2.7)
show high values
38
Operational Issues
• High memory usage
• Application gets slower or even crashes
• Operating system shows high memory usage
• Kafka Streams metrics for monitoring memory
usage of RocksDB (KIP-607, planned for 2.7)
show high values
• High disk usage
• Application crashes with I/O errors
• Operating system shows high disk usage
39
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
40
Operational Issues
• High disk I/O
• Operating system shows high disk I/O
• Kafka Streams metrics with high values
• memtable-bytes-flushed-[rate | total]
• bytes-[read | written]-compaction-rate
• Kafka Streams metrics with low values
• memtable-hit-ratio
• block-cache-[data | index | filter]-hit-ratio
• Write stalls
• Processing latency of the application increases
• Kafka Streams client gets kicked out of the group
• Kafka Streams metric write-stall-duration-[avg | total] shows high values
41
Operational Issues
• Too many open files
• Application crashes with I/O errors
• Kafka Streams metric number-open-files shows high values
42
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
43
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
• Manual compaction is a blocking call that may take longer than max.poll.interval.ms
44
Operational Issues
• Kafka Streams client gets kicked out of the consumer group during restoration
• Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad())
feature to restore the state store faster.
• Bulk loading basically consists of:
• disable automatic compaction and
• write all data to level 0
• trigger manual compaction
• Manual compaction is a blocking call that may take longer than max.poll.interval.ms
• Bulk loading is removed in 2.6
• Currently evaluating alternatives to increase the performance of state store restoration by using other
features of RocksDB, e.g., ingesting SST files directly.
45
Tuning RocksDB
Debug Kafka Streams OOM
• Memory consumption
• memtable (for writes)
• memtable size, number of memtables
• block cache (reads)
• configure to share among all the partitions in the kafka store
• Kafka Streams keeps index blocks in the block cache
• rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247)
• High disk usage
• Use level compaction instead of universal compaction
• provision more disk space
https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html
47
Debug writes stalls
• Debug write stalls in RocksDB
• Is disk IO utilization at 100%?
• add more storage spindles
• use universal compaction
• Check number of background compaction threads
• Kafka Streams uses Max(2, number of available processors) by default
• Check memtable configuration
• AdvancedColumnFamilyOptions.max_write_buffer_number
• ColumnFamilyOptions.write_buffer_size
48
Debugging file descriptor issues
• Too many open files
• DBOptions.max_open_files = -1 (default)
• opens all sst files at db open time
• good for performance but can run out of file descriptors
• Increase operating system number of open file descriptors
• Set DBOptions.max_open_files = 10000
• will open a max of 10000 files concurrently
• Decrease number of files by making each file larger
• AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB)
49
RocksDB Command Line Utilities
Build rocksdb command line utilities
git clone git@github.com:facebook/rocksdb.git
cd rocksdb
make ldb sst_dump
cp ldb /usr/local/bin
cp sst_dump /usr/local/bin
51
Useful RocksDB command line tools: https://github.com/facebook/rocksdb/wiki/Administration-and-Data-
Access-Tool
Build
# change these values accordingly
APP_ID=my-app
STATE_STORE=my-counts
STATE_STORE_DIR=/tmp/kafka-streams
TASKS=$(ls $STATE_STORE_DIR/$APP_ID)
Change These Values
Useful commands
# View all keys
for i in $TASKS; do
ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE
scan 2>/dev/null;
done
# Show table properties
for i in $TASKS; do
TABLE_PROPERTIES=$(sst_dump --
file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE --
show_properties)
echo -e "Table properties for task:
$in$TABLE_PROPERTIESnn"
done
52
Useful commands- Example output
53
# example output
Table properties for task: 1_9
from [] to []
Process /tmp/kafka-streams/my-app/1_9/rocksdb/my-counts/000006.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 1
# entries: 2
raw key size: 18
raw average key size: 9.000000
raw value size: 88
raw average value size: 44.000000
data block size: 125
index block size: 35
filter block size: 0
(estimated) table size: 160
Takeaways
Takeaways
• RocksDB is the default state store in Kafka Streams
• Kafka Streams provides functionality to configure and monitor RocksDB
• RocksDB uses a log structured merge (LSM) architecture with different compaction
styles
• You might run into operational issues, but you can solve them by debugging and tuning
RocksDB
• RocksDB offers command line utilities for analysing state stores
55
56
Thank you!
dhruba@rockset.com
bruno@confluent.io
cnfl.io/meetups cnfl.io/slackcnfl.io/blog
Learn how Rockset uses RocksDB
https://rockset.com/blog/how-we-use-rocksdb-at-rockset/

Mais conteúdo relacionado

Mais procurados

Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone
 

Mais procurados (20)

Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
kafka
kafkakafka
kafka
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 

Semelhante a Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur, Rockset, Bruno Cadonna, Confluent) Kafka Summit 2020

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyContinuent
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
Running database infrastructure on containers
Running database infrastructure on containersRunning database infrastructure on containers
Running database infrastructure on containersMariaDB plc
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresChristian Posta
 
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]Rainforest QA
 

Semelhante a Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur, Rockset, Bruno Cadonna, Confluent) Kafka Summit 2020 (20)

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Running database infrastructure on containers
Running database infrastructure on containersRunning database infrastructure on containers
Running database infrastructure on containers
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
VMworld Europe 2014: Advanced SQL Server on vSphere Techniques and Best Pract...
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 

Mais de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur, Rockset, Bruno Cadonna, Confluent) Kafka Summit 2020

  • 1. Performance Tuning RocksDB for Kafka Streams’ State Store Dhruba Borthakur (Rockset), Bruno Cadonna (Confluent)
  • 2. About the Presenters Dhruba Borthakur CTO & Co-founder Rockset rockset.com 2 Bruno Cadonna Contributor to Apache Kafka & Software Engineer at Confluent confluent.io
  • 3. Agenda • Kafka Streams and State Stores • Introduction to RocksDB • Compaction Styles in RocksDB • Possible Operational Issues • Tuning RocksDB • RocksDB Command Line Utilities • Takeaways 3
  • 4. Kafka Streams and State Stores
  • 5. Kafka Streams 5 ● Stateless and stateful processors ● Stateful processors use state stores
  • 6. Kafka Streams 6 ● Stateless and stateful processors ● Stateful processors use state stores
  • 7. Kafka Streams 7 ● Stateless and stateful processors ● Stateful processors use state stores
  • 8. Kafka Streams 8 ● Stateless and stateful processors ● Stateful processors use state stores
  • 9. Kafka Streams 9 ● Stateless and stateful processors ● Stateful processors use state stores
  • 10. Kafka Streams 10 ● Stateless and stateful processors ● Stateful processors use state stores
  • 11. Kafka Streams 11 ● Stateless and stateful processors ● Stateful processors use state stores ● Create one topology per input partition, i.e., task
  • 12. State Stores in Kafka Streams 12 • Stateful processor may use one or more state stores • Each partition has its own state store Metrics & De-/Serialization Caching Changelogging Restoration
  • 13. State Stores in Kafka Streams 13 • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: Metrics & De-/Serialization Caching Changelogging Restoration
  • 14. State Stores in Kafka Streams 14 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 15. State Stores in Kafka Streams 15 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records
  • 16. State Stores in Kafka Streams 16 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records
  • 17. 01 10 State Stores in Kafka Streams 17 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 18. 01 10 State Stores in Kafka Streams 18 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog
  • 19. 01 10 State Stores in Kafka Streams 19 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store
  • 20. 01 10 State Stores in Kafka Streams 20 01 10 01 10 Metrics & De-/Serialization Caching Changelogging Restoration • Stateful processor may use one or more state stores • Each partition has its own state store • State stores are layered: • collects metrics and de-/serializes records • caches records • writes records to changelog • writes records to local state store • State stores are restored from changelog topics • Restoration is byte-based and by-passes wrapping layers
  • 21. RocksDB is the Default State Store • Kafka Streams needed a write optimized state store • Kafka Streams 2.6 uses RocksDB 5.18.4 • Kafka Streams provides metrics to monitor RocksDB state stores • RocksDB can be configured by passing a class that implements interface RocksDBConfigSetter to configuration rocksdb.config.setter 21
  • 22. Example: Configuring RocksDB in Kafka Streams 22 public static class MyRocksDBConfig implements RocksDBConfigSetter { @Override public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) { // e.g. set compaction style options.setCompactionStyle(CompactionStyle.LEVEL); } @Override public void close(final String storeName, final Options options) {} }
  • 24. What is RocksDB? • Key-value persistent store • Embedded C++ & Java library • Server workloads 24
  • 25. What is it not? • Not distributed • No failover • Not highly-available. If the machine dies, you lose your data • Focus on performance Kafka Streams makes it fault-tolerant 25
  • 26. RocksDB API • Keys and values are byte arrays • Data are stored sorted by key • Update Operations: Put/Delete/Merge • Queries: Get/Iterator 26
  • 27. Log Structured Merge Architecture 27 Periodic compaction Read only data in SSD or disk Read write data in RAM Transaction log Scan request from application Write request from application
  • 28. RocksDB Write Path 28 Write request Read only MemTable Log Log sst sst sst sst sst sst LS Compaction Flush SwitchSwitch Active MemTable Log
  • 29. RocksDB Reads • Data can be in memory or disk • Consult multiple files to find the latest instance of the key • Use bloom filters to reduce IO • Every sst file has a bloom filter • bloom filters are cached in memory • default config: eliminates 99% of reads 29
  • 30. RocksDB Read Path 30 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst sst Memory Persistent Storage Read request Get(k) Blooms
  • 31. RocksDB Architecture 31 Read only MemTable Log Log sst sst sst LS Compaction Flush Active MemTable Log sst sst Memory Persistent Storage sst Switch Switch Write request Read only BlockCache Read request
  • 32. RocksDB Open & Pluggable 32 Pluggable compaction Pluggable sst data format on storage Pluggable Memtable format in RAM Transaction log Blooms Customizable WAL Get or scan request from application Write request from application
  • 34. What is Compaction • Multi-threaded • Parallel compactions on different parts of the database • Deletes overwritten keys • Two types of compactions • level compactions • universal compaction 34
  • 35. Level compaction • RocksDB default compaction is Level Compaction (for read heavy workloads) • Stores data in multiple levels • More recent data stored in L0 • Older data stored in Lmax • Files in L0 • overlapping keys, sorted by flush time • Files in L1 to Lmax • non overlapping keys, sorted by key • Max space amplification = 10% https://github.com/facebook/rocksdb/wiki/Leveled-Compaction 35
  • 36. Universal Compaction • For write heavy workloads • needed if Level style compaction is bottlenecked by disk throughout • Stores all files in L0 • All files are arranged in time order • Decreases write amplification but increases space amplification • Pick up files that are chronologically adjacent to one another • merge them • replace them with a new file in L0 36
  • 38. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, planned for 2.7) show high values 38
  • 39. Operational Issues • High memory usage • Application gets slower or even crashes • Operating system shows high memory usage • Kafka Streams metrics for monitoring memory usage of RocksDB (KIP-607, planned for 2.7) show high values • High disk usage • Application crashes with I/O errors • Operating system shows high disk usage 39
  • 40. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio 40
  • 41. Operational Issues • High disk I/O • Operating system shows high disk I/O • Kafka Streams metrics with high values • memtable-bytes-flushed-[rate | total] • bytes-[read | written]-compaction-rate • Kafka Streams metrics with low values • memtable-hit-ratio • block-cache-[data | index | filter]-hit-ratio • Write stalls • Processing latency of the application increases • Kafka Streams client gets kicked out of the group • Kafka Streams metric write-stall-duration-[avg | total] shows high values 41
  • 42. Operational Issues • Too many open files • Application crashes with I/O errors • Kafka Streams metric number-open-files shows high values 42
  • 43. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction 43
  • 44. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction • Manual compaction is a blocking call that may take longer than max.poll.interval.ms 44
  • 45. Operational Issues • Kafka Streams client gets kicked out of the consumer group during restoration • Before 2.6 Kafka Streams used RocksDB’s bulk loading (Options#prepareForBulkLoad()) feature to restore the state store faster. • Bulk loading basically consists of: • disable automatic compaction and • write all data to level 0 • trigger manual compaction • Manual compaction is a blocking call that may take longer than max.poll.interval.ms • Bulk loading is removed in 2.6 • Currently evaluating alternatives to increase the performance of state store restoration by using other features of RocksDB, e.g., ingesting SST files directly. 45
  • 47. Debug Kafka Streams OOM • Memory consumption • memtable (for writes) • memtable size, number of memtables • block cache (reads) • configure to share among all the partitions in the kafka store • Kafka Streams keeps index blocks in the block cache • rocksdb-java bugs (https://github.com/facebook/rocksdb/issues/6247) • High disk usage • Use level compaction instead of universal compaction • provision more disk space https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html 47
  • 48. Debug writes stalls • Debug write stalls in RocksDB • Is disk IO utilization at 100%? • add more storage spindles • use universal compaction • Check number of background compaction threads • Kafka Streams uses Max(2, number of available processors) by default • Check memtable configuration • AdvancedColumnFamilyOptions.max_write_buffer_number • ColumnFamilyOptions.write_buffer_size 48
  • 49. Debugging file descriptor issues • Too many open files • DBOptions.max_open_files = -1 (default) • opens all sst files at db open time • good for performance but can run out of file descriptors • Increase operating system number of open file descriptors • Set DBOptions.max_open_files = 10000 • will open a max of 10000 files concurrently • Decrease number of files by making each file larger • AdvancedColumnFamilyOptions.target_file_size_base = 128 MB (default is 64 MB) 49
  • 50. RocksDB Command Line Utilities
  • 51. Build rocksdb command line utilities git clone git@github.com:facebook/rocksdb.git cd rocksdb make ldb sst_dump cp ldb /usr/local/bin cp sst_dump /usr/local/bin 51 Useful RocksDB command line tools: https://github.com/facebook/rocksdb/wiki/Administration-and-Data- Access-Tool Build # change these values accordingly APP_ID=my-app STATE_STORE=my-counts STATE_STORE_DIR=/tmp/kafka-streams TASKS=$(ls $STATE_STORE_DIR/$APP_ID) Change These Values
  • 52. Useful commands # View all keys for i in $TASKS; do ldb --db=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE scan 2>/dev/null; done # Show table properties for i in $TASKS; do TABLE_PROPERTIES=$(sst_dump -- file=$STATE_STORE_DIR/$APP_ID/$i/rocksdb/$STATE_STORE -- show_properties) echo -e "Table properties for task: $in$TABLE_PROPERTIESnn" done 52
  • 53. Useful commands- Example output 53 # example output Table properties for task: 1_9 from [] to [] Process /tmp/kafka-streams/my-app/1_9/rocksdb/my-counts/000006.sst Sst file format: block-based Table Properties: ------------------------------ # data blocks: 1 # entries: 2 raw key size: 18 raw average key size: 9.000000 raw value size: 88 raw average value size: 44.000000 data block size: 125 index block size: 35 filter block size: 0 (estimated) table size: 160
  • 55. Takeaways • RocksDB is the default state store in Kafka Streams • Kafka Streams provides functionality to configure and monitor RocksDB • RocksDB uses a log structured merge (LSM) architecture with different compaction styles • You might run into operational issues, but you can solve them by debugging and tuning RocksDB • RocksDB offers command line utilities for analysing state stores 55
  • 56. 56 Thank you! dhruba@rockset.com bruno@confluent.io cnfl.io/meetups cnfl.io/slackcnfl.io/blog Learn how Rockset uses RocksDB https://rockset.com/blog/how-we-use-rocksdb-at-rockset/