Large partition in Cassandra

•Transferir como PPTX, PDF•

7 gostaram•4,272 visualizações

Shogo Hoshii

Summary of problems caused by large partition and solutions to it

Tecnologia

Large partition in Cassandra
Shogo Hoshii
Yahoo! Japan Corp.

About me
• Cassandra operator atYahoo! Japan Corp.
• https://issues.apache.org/jira/browse/CASSA
NDRA-5977

remark
• This is a summary of following tickets:
– https://issues.apache.org/jira/browse/CASSANDR
A-11206
– https://issues.apache.org/jira/browse/CASSANDR
A-9738

Agenda
• Recap the read path
• What’s the problem?
• Solutions

High level: read path
Row Cache
Key Cache
SSTables MemTable
1. Check row cache before going to key cache
2. Check the key cache to get the
offsets to data
3. Find the offsets to data and retrieve data
4. Merge data from sstables and memtable
5. Populate row cache with new row returned
http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html

Pattern 1.The row is in row cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. return row when that is in row cache

Pattern 2.The key is in key cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Check bloom filters 3. Check the partition key is in key cache
4. Find the offset to the result set
5. Access the result set

Pattern 3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offset into the result set
7. Access the result set
8. Update key cache

What’s the problem?
• GC pressure by key cache when a large
partition is read

Partition Index Recap
• http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html

RowIndexEntry
• Partition size < 64 kb
– RowIndexEntry
• Position
• Seriarized size of data
• Partition size > 64 kb
– IndexedEntry
• Position
• Seriarized size of data
• IndexInfo[]
– Seriarize method
– Offset
– width
– Etc.
Approximation on 16 byte value
1mb : 3kb / > 200 objects
4mb : 11kb / > 800 objects
64mb : 180kb / > 13k objects
512mb : 1.4mb / > 106k objects

3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offsets into the result set
7. Access the result set
8. Update key cache
9. GC, GC, GC…

Current solution
• If partition size <
column_index_cache_size_in_kb(configurable)
– IndexedEntry is kept on heap
• Otherwise
– Always read from disk when needed
• https://issues.apache.org/jira/browse/CASSANDRA-11206
• https://www.youtube.com/watch?v=qa84vABqftM

Other possible solutions
• IndexInfo never be kept on heap
– Read from disk when needed
– degrades performance when small partition is
read

Other possible solutions
• Migrate key cache to be fully off heap
– https://issues.apache.org/jira/browse/CASSANDR
A-9738
– Serialization & deserialization cost so much when
large partition is read
• Will Birch help us to solve this problem?
– https://issues.apache.org/jira/browse/CASSANDRA-9754

Mais conteúdo relacionado

Mais procurados

Time series with Apache Cassandra - Long versionPatrick McFadin

[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들DongMin Choi

Parquet Strata/Hadoop World, New York 2013Julien Le Dem

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit

Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa

Apache Hudi: The Path ForwardAlluxio, Inc.

Internals of Presto ServiceTreasure Data, Inc.

HBase in Practice DataWorks Summit/Hadoop Summit

Facebook Presto presentationCyanny LIANG

Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit

Thinking Big - Big data: principes et architecture Lilia Sfaxi

Oracle Exadata Cloud Services guide from practical experience - OOW19Nelson Calero

NUMA and Java DatabasesRaghavendra Prabhu

MySQL Buffer ManagementMIJIN AN

Admission Control in ImpalaCloudera, Inc.

Apache Phoenix + Apache HBaseDataWorks Summit/Hadoop Summit

Intro to HBasealexbaranau

Parquet performance tuning: the missing guideRyan Blue

I Have the Power(View)Will Schroeder

Powering Interactive BI Analytics with Presto and Delta LakeDatabricks

Mais procurados (20)

Time series with Apache Cassandra - Long version

[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들

Parquet Strata/Hadoop World, New York 2013

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...

Using Time Window Compaction Strategy For Time Series Workloads

Apache Hudi: The Path Forward

Internals of Presto Service

HBase in Practice

Facebook Presto presentation

Apache Tez: Accelerating Hadoop Query Processing

Thinking Big - Big data: principes et architecture

Oracle Exadata Cloud Services guide from practical experience - OOW19

NUMA and Java Databases

MySQL Buffer Management

Admission Control in Impala

Apache Phoenix + Apache HBase

Intro to HBase

Parquet performance tuning: the missing guide

I Have the Power(View)

Powering Interactive BI Analytics with Presto and Delta Lake

Semelhante a Large partition in Cassandra

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss

MongoDB for Time Series Data: ShardingMongoDB

Managing Security At 1M Events a Second using ElasticsearchJoe Alex

cache memory introduction, level, functionTeddyIswahyudi1

04 cache memorydilip kumar

Sharding Methods for MongoDBMongoDB

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks

In-memory Data Management Trends & TechniquesHazelcast

Introduction to ShardingMongoDB

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati

Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie

Postgres db performance improvementsMahesh Chopker

EncExec: Secure In-Cache ExecutionYue Chen

Colvin exadata mistakes_ioug_2014marvin herrera

Elasticsearch Data AnalysesAlaa Elhadba

NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.

London + Dublin Cassandra 2.0jbellis

Hadoop 3.0 - Revolution or evolution?Uwe Printz

SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson

Semelhante a Large partition in Cassandra (20)

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]

MongoDB for Time Series Data: Sharding

Managing Security At 1M Events a Second using Elasticsearch

cache memory introduction, level, function

04 cache memory

Sharding Methods for MongoDB

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...

In-memory Data Management Trends & Techniques

Introduction to Sharding

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

Cassandra 2.1 boot camp, Read/Write path

Postgres db performance improvements

EncExec: Secure In-Cache Execution

Colvin exadata mistakes_ioug_2014

Elasticsearch Data Analyses

NYJavaSIG - Big Data Microservices w/ Speedment

London + Dublin Cassandra 2.0

Hadoop 3.0 - Revolution or evolution?

SQL Server 2014 Memory Optimised Tables - Advanced

Último

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

How to convert PDF to text with Nanonetsnaman860154

Real Time Object Detection Using Open CVKhem

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Large partition in Cassandra

1. Large partition in Cassandra Shogo Hoshii Yahoo! Japan Corp.

2. About me • Cassandra operator atYahoo! Japan Corp. • https://issues.apache.org/jira/browse/CASSA NDRA-5977

3. remark • This is a summary of following tickets: – https://issues.apache.org/jira/browse/CASSANDR A-11206 – https://issues.apache.org/jira/browse/CASSANDR A-9738

4. Agenda • Recap the read path • What’s the problem? • Solutions

5. High level: read path Row Cache Key Cache SSTables MemTable 1. Check row cache before going to key cache 2. Check the key cache to get the offsets to data 3. Find the offsets to data and retrieve data 4. Merge data from sstables and memtable 5. Populate row cache with new row returned http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html

6. Pattern 1.The row is in row cache Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. return row when that is in row cache

7. Pattern 2.The key is in key cache Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Check bloom filters 3. Check the partition key is in key cache 4. Find the offset to the result set 5. Access the result set

8. Pattern 3.The key is not cached Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Miss -> Check bloom filters 3. Check the partition key is in key cache 4. Miss -> Bsearch the close location of index 5. Disk scan to find the offsets 6. Find the offset into the result set 7. Access the result set 8. Update key cache

9. What’s the problem? • GC pressure by key cache when a large partition is read

10. Partition Index Recap • http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html

11. RowIndexEntry • Partition size < 64 kb – RowIndexEntry • Position • Seriarized size of data • Partition size > 64 kb – IndexedEntry • Position • Seriarized size of data • IndexInfo[] – Seriarize method – Offset – width – Etc. Approximation on 16 byte value 1mb : 3kb / > 200 objects 4mb : 11kb / > 800 objects 64mb : 180kb / > 13k objects 512mb : 1.4mb / > 106k objects

12. 3.The key is not cached Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Miss -> Check bloom filters 3. Check the partition key is in key cache 4. Miss -> Bsearch the close location of index 5. Disk scan to find the offsets 6. Find the offsets into the result set 7. Access the result set 8. Update key cache 9. GC, GC, GC…

13. Current solution • If partition size < column_index_cache_size_in_kb(configurable) – IndexedEntry is kept on heap • Otherwise – Always read from disk when needed • https://issues.apache.org/jira/browse/CASSANDRA-11206 • https://www.youtube.com/watch?v=qa84vABqftM

14. Other possible solutions • IndexInfo never be kept on heap – Read from disk when needed – degrades performance when small partition is read

15. Other possible solutions • Migrate key cache to be fully off heap – https://issues.apache.org/jira/browse/CASSANDR A-9738 – Serialization & deserialization cost so much when large partition is read • Will Birch help us to solve this problem? – https://issues.apache.org/jira/browse/CASSANDRA-9754

16. Thank you

Notas do Editor

Row Cache は、該当のrowに書き込みが入ると無効化される -> Row Cacheを使うのは read : write = 95 : 5 を目安とするとよい
2.0 の read path http://www.slideshare.net/planetcassandra/c-summit-eu-2013-keynote-by-jonathan-ellis 2.1 の read path https://grockdoc.com/cassandra/2.1/articles/intra-node-read-path-flow_12a63adf-f148-45ff-87d8-0fcb84e8aca0/ http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest ----- 会議メモ (6/21/16 19:17) ----- offset は、data file上のどこにdataがあるかを示すもの
2.0 の read path http://www.slideshare.net/planetcassandra/c-summit-eu-2013-keynote-by-jonathan-ellis 2.1 の read path https://grockdoc.com/cassandra/2.1/articles/intra-node-read-path-flow_12a63adf-f148-45ff-87d8-0fcb84e8aca0/ http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest

Large partition in Cassandra

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Large partition in Cassandra

Semelhante a Large partition in Cassandra (20)

Último

Último (20)

Large partition in Cassandra

Notas do Editor