SlideShare uma empresa Scribd logo
1 de 16
Large partition in Cassandra
Shogo Hoshii
Yahoo! Japan Corp.
About me
• Cassandra operator atYahoo! Japan Corp.
• https://issues.apache.org/jira/browse/CASSA
NDRA-5977
remark
• This is a summary of following tickets:
– https://issues.apache.org/jira/browse/CASSANDR
A-11206
– https://issues.apache.org/jira/browse/CASSANDR
A-9738
Agenda
• Recap the read path
• What’s the problem?
• Solutions
High level: read path
Row Cache
Key Cache
SSTables MemTable
1. Check row cache before going to key cache
2. Check the key cache to get the
offsets to data
3. Find the offsets to data and retrieve data
4. Merge data from sstables and memtable
5. Populate row cache with new row returned
http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html
Pattern 1.The row is in row cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. return row when that is in row cache
Pattern 2.The key is in key cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Check bloom filters 3. Check the partition key is in key cache
4. Find the offset to the result set
5. Access the result set
Pattern 3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offset into the result set
7. Access the result set
8. Update key cache
What’s the problem?
• GC pressure by key cache when a large
partition is read
Partition Index Recap
• http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html
RowIndexEntry
• Partition size < 64 kb
– RowIndexEntry
• Position
• Seriarized size of data
• Partition size > 64 kb
– IndexedEntry
• Position
• Seriarized size of data
• IndexInfo[]
– Seriarize method
– Offset
– width
– Etc.
Approximation on 16 byte value
1mb : 3kb / > 200 objects
4mb : 11kb / > 800 objects
64mb : 180kb / > 13k objects
512mb : 1.4mb / > 106k objects
3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offsets into the result set
7. Access the result set
8. Update key cache
9. GC, GC, GC…
Current solution
• If partition size <
column_index_cache_size_in_kb(configurable)
– IndexedEntry is kept on heap
• Otherwise
– Always read from disk when needed
• https://issues.apache.org/jira/browse/CASSANDRA-11206
• https://www.youtube.com/watch?v=qa84vABqftM
Other possible solutions
• IndexInfo never be kept on heap
– Read from disk when needed
– degrades performance when small partition is
read
Other possible solutions
• Migrate key cache to be fully off heap
– https://issues.apache.org/jira/browse/CASSANDR
A-9738
– Serialization & deserialization cost so much when
large partition is read
• Will Birch help us to solve this problem?
– https://issues.apache.org/jira/browse/CASSANDRA-9754
Thank you

Mais conteúdo relacionado

Mais procurados

Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들DongMin Choi
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture Lilia Sfaxi
 
Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Nelson Calero
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer ManagementMIJIN AN
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
I Have the Power(View)
I Have the Power(View)I Have the Power(View)
I Have the Power(View)Will Schroeder
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 

Mais procurados (20)

Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
[C++ Korea] C++ 메모리 모델과 atomic 타입 연산들
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture Thinking Big - Big data: principes et architecture
Thinking Big - Big data: principes et architecture
 
Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
I Have the Power(View)
I Have the Power(View)I Have the Power(View)
I Have the Power(View)
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 

Semelhante a Large partition in Cassandra

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, functionTeddyIswahyudi1
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
Postgres db performance improvements
Postgres db performance improvementsPostgres db performance improvements
Postgres db performance improvementsMahesh Chopker
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionYue Chen
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data AnalysesAlaa Elhadba
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 

Semelhante a Large partition in Cassandra (20)

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Postgres db performance improvements
Postgres db performance improvementsPostgres db performance improvements
Postgres db performance improvements
 
EncExec: Secure In-Cache Execution
EncExec: Secure In-Cache ExecutionEncExec: Secure In-Cache Execution
EncExec: Secure In-Cache Execution
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Large partition in Cassandra

  • 1. Large partition in Cassandra Shogo Hoshii Yahoo! Japan Corp.
  • 2. About me • Cassandra operator atYahoo! Japan Corp. • https://issues.apache.org/jira/browse/CASSA NDRA-5977
  • 3. remark • This is a summary of following tickets: – https://issues.apache.org/jira/browse/CASSANDR A-11206 – https://issues.apache.org/jira/browse/CASSANDR A-9738
  • 4. Agenda • Recap the read path • What’s the problem? • Solutions
  • 5. High level: read path Row Cache Key Cache SSTables MemTable 1. Check row cache before going to key cache 2. Check the key cache to get the offsets to data 3. Find the offsets to data and retrieve data 4. Merge data from sstables and memtable 5. Populate row cache with new row returned http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html
  • 6. Pattern 1.The row is in row cache Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. return row when that is in row cache
  • 7. Pattern 2.The key is in key cache Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Check bloom filters 3. Check the partition key is in key cache 4. Find the offset to the result set 5. Access the result set
  • 8. Pattern 3.The key is not cached Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Miss -> Check bloom filters 3. Check the partition key is in key cache 4. Miss -> Bsearch the close location of index 5. Disk scan to find the offsets 6. Find the offset into the result set 7. Access the result set 8. Update key cache
  • 9. What’s the problem? • GC pressure by key cache when a large partition is read
  • 10. Partition Index Recap • http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html
  • 11. RowIndexEntry • Partition size < 64 kb – RowIndexEntry • Position • Seriarized size of data • Partition size > 64 kb – IndexedEntry • Position • Seriarized size of data • IndexInfo[] – Seriarize method – Offset – width – Etc. Approximation on 16 byte value 1mb : 3kb / > 200 objects 4mb : 11kb / > 800 objects 64mb : 180kb / > 13k objects 512mb : 1.4mb / > 106k objects
  • 12. 3.The key is not cached Partition Summary Disk MemTable Compression Offsets Bloom Filter Row Cache Heap Off Heap Key Cache Partition Index Data 1. read request 2. Miss -> Check bloom filters 3. Check the partition key is in key cache 4. Miss -> Bsearch the close location of index 5. Disk scan to find the offsets 6. Find the offsets into the result set 7. Access the result set 8. Update key cache 9. GC, GC, GC…
  • 13. Current solution • If partition size < column_index_cache_size_in_kb(configurable) – IndexedEntry is kept on heap • Otherwise – Always read from disk when needed • https://issues.apache.org/jira/browse/CASSANDRA-11206 • https://www.youtube.com/watch?v=qa84vABqftM
  • 14. Other possible solutions • IndexInfo never be kept on heap – Read from disk when needed – degrades performance when small partition is read
  • 15. Other possible solutions • Migrate key cache to be fully off heap – https://issues.apache.org/jira/browse/CASSANDR A-9738 – Serialization & deserialization cost so much when large partition is read • Will Birch help us to solve this problem? – https://issues.apache.org/jira/browse/CASSANDRA-9754

Notas do Editor

  1. Row Cache は、該当のrowに書き込みが入ると無効化される -> Row Cacheを使うのは read : write = 95 : 5 を目安とするとよい
  2. 2.0 の read path http://www.slideshare.net/planetcassandra/c-summit-eu-2013-keynote-by-jonathan-ellis 2.1 の read path https://grockdoc.com/cassandra/2.1/articles/intra-node-read-path-flow_12a63adf-f148-45ff-87d8-0fcb84e8aca0/ http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest ----- 会議メモ (6/21/16 19:17) ----- offset は、data file上のどこにdataがあるかを示すもの
  3. 2.0 の read path http://www.slideshare.net/planetcassandra/c-summit-eu-2013-keynote-by-jonathan-ellis 2.1 の read path https://grockdoc.com/cassandra/2.1/articles/intra-node-read-path-flow_12a63adf-f148-45ff-87d8-0fcb84e8aca0/ http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest