SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Distributed Counters
            in Cassandra




Friday, August 13, 2010
I: Goal
             II: Design
            III: Implementation




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Goal




            Distributed Counters in Cassandra

Friday, August 13, 2010
Goal




       Low Latency,
       Highly Available
       Counters




            Distributed Counters in Cassandra

Friday, August 13, 2010
II: Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Traditional Counter Design
             II: Abstract Strategy
            III: Distributed Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



                 I: Traditional Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Atomic Counters


       1. single machine
       2. one order of execution
       3. strongly consistent



            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Problems


       1. SPOF / single master
       2. high latency
       3. manually sharded



            Distributed Counters in Cassandra

Friday, August 13, 2010
Traditional Counter Design
       Question




                          What constraints can we relax?




            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



               II: Abstract Strategy




            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Constraints to Relax



       1. one order of execution
       2. strong consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Relax: One Order of Execution



       commutative operation:
         - operations must be re-orderable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Abstract Strategy
       Relax: Strong Consistency

       partitioned work:
         - each op must occur once
         - unique partition identifier
       idempotent repair:
         - recognize ops from other partitions

            Distributed Counters in Cassandra

Friday, August 13, 2010
Design



            III: Distributed Counter Design




            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Requirements


       1. commutative operation
       2. partitioned work
       3. idempotent repair



            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Commutative Operation


       addition:
         - commutative operation
         - sum ops performed by all replicas
         -a + b = b + a

            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Partitioned Work



       each op assigned to a replica:
         - every replica sums all of its ops



            Distributed Counters in Cassandra

Friday, August 13, 2010
Distributed Counter Design
       Idempotent Repair


       save counts from remote replicas:
         - keep highest count seen
       prevent multiple execution:
         - do not transfer the target replica’s count


            Distributed Counters in Cassandra

Friday, August 13, 2010
III: Implementation




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Data Structure
             II: Single Node
            III: Eventual Consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
I: Data Structure




            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Requirements


       local counts:
         - incrementally update
       remote counts:
         - independently track partitions

            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Format



       list of (replica id, count) tuples:
                 [(replica A, count), (replica B, count), ...]




            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Mutations


       local write:
         sum local count and write delta
         note: memtable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Data Structure
       Context Mutations


       remote repair:
         for each replica,
         keep highest count seen
         (local or from repair)


            Distributed Counters in Cassandra

Friday, August 13, 2010
II: Single Node




            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path

       client
          1. construct column
             - value: delta (big-endian long)
             - clock: empty
          2. thrift: insert / batch_mutate

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path

       coordinator
         1. choose partition
                          - choose target replica
                          - requirement: ConsistencyLevel.ONE
                 2. construct clock
                          - context format: [(target replica id, count delta)]


            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path


       target replica
       insert:
                 1. memtable does not contain column
                 2. insert column into memtable



            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path
       target replica
       update:
                 1. memtable contains column
                 2. retrieve existing column
                 3. create new column
                    - context: sum local count w/ delta from write
                 4. replace column in ConcurrentSkipListMap
                 5. if failed to replace column, go to step 2.


            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Write Path
       Interesting Note:
       MTs are serialized to SSTs, as-is
                 - each SST encapsulates the updates
                   when it was an MT
                 - local count total must be aggregated
                   across the MT and all SSTs

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Read Path
       target replica
       read:
                 1. construct collating iterator over:
                    - frozen snapshot of MT
                    - all relevant SSTs
                 2. resolve column
                    - local counts: sum
                    - remote counts: keep max
                 3. construct value
                    - sum local and remote counts (big-endian long)

            Distributed Counters in Cassandra

Friday, August 13, 2010
Single Node
       Compaction

       replica
       compaction:
                 1. construct collating iterator over all SSTs
                 2. resolve every column in the CF
                    - local counts: sum
                    - remote counts: keep max
                 3. write out resolved CF



            Distributed Counters in Cassandra

Friday, August 13, 2010
III: Eventual Consistency




            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Read Repair


       coordinator / replica
       read repair:
                 1. calculate resolved (superset) CF
                    - resolve every column (local: sum, remote: max)
                 2. return resolved CF to client




            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Read Repair

       coordinator / replica
       read repair:
                 1. calculate repair CF for each replica
                    - calculate diff CF between resolved and received
                    - modify columns to remove target replica’s counts
                 2. send repair CF to each replica



            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Anti-Entropy Service


       sending replica
       AES:
                 1. follow normal AES code path
                    - calculate repair SST based on shared ranges
                    - send repair SST



            Distributed Counters in Cassandra

Friday, August 13, 2010
Eventual Consistency
       Anti-Entropy Service

       receiving replica
       AES:
                 1. post-process streamed SST
                    - re-build streamed SST
                    - note: strip out local replica’s counts
                 2. remove temporary descriptor
                 3. add to SSTableTracker



            Distributed Counters in Cassandra

Friday, August 13, 2010
Questions?




            Distributed Counters in Cassandra

Friday, August 13, 2010
More Information
       Issues:
       #580: Vector Clocks
       #1072: Distributed Counters

       Related Work:
       Helland and Campbell, Building on Quicksand, CIDR (2009),
       Sections 5 & 6.


       My email address:
       kakugawa@gmail.com


            Distributed Counters in Cassandra

Friday, August 13, 2010

Mais conteúdo relacionado

Mais procurados

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionDatabricks
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2ScyllaDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 

Mais procurados (20)

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Nosql
NosqlNosql
Nosql
 

Semelhante a Distributed Counters in Cassandra (Cassandra Summit 2010)

Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoCLOUDIAN KK
 
TechEvent Apache Cassandra
TechEvent Apache CassandraTechEvent Apache Cassandra
TechEvent Apache CassandraTrivadis
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandraWu Liang
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraJason Brown
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15SignalFx
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!BertrandDrouvot
 

Semelhante a Distributed Counters in Cassandra (Cassandra Summit 2010) (16)

07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in TokyoSummary of "Cassandra" for 3rd nosql summer reading in Tokyo
Summary of "Cassandra" for 3rd nosql summer reading in Tokyo
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
TechEvent Apache Cassandra
TechEvent Apache CassandraTechEvent Apache Cassandra
TechEvent Apache Cassandra
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
 
04 reports
04 reports04 reports
04 reports
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
06 data
06 data06 data
06 data
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
04 Reports
04 Reports04 Reports
04 Reports
 
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Distributed Counters in Cassandra (Cassandra Summit 2010)

  • 1. Distributed Counters in Cassandra Friday, August 13, 2010
  • 2. I: Goal II: Design III: Implementation Distributed Counters in Cassandra Friday, August 13, 2010
  • 3. I: Goal Distributed Counters in Cassandra Friday, August 13, 2010
  • 4. Goal Low Latency, Highly Available Counters Distributed Counters in Cassandra Friday, August 13, 2010
  • 5. II: Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 6. I: Traditional Counter Design II: Abstract Strategy III: Distributed Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 7. Design I: Traditional Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 8. Traditional Counter Design Atomic Counters 1. single machine 2. one order of execution 3. strongly consistent Distributed Counters in Cassandra Friday, August 13, 2010
  • 9. Traditional Counter Design Problems 1. SPOF / single master 2. high latency 3. manually sharded Distributed Counters in Cassandra Friday, August 13, 2010
  • 10. Traditional Counter Design Question What constraints can we relax? Distributed Counters in Cassandra Friday, August 13, 2010
  • 11. Design II: Abstract Strategy Distributed Counters in Cassandra Friday, August 13, 2010
  • 12. Abstract Strategy Constraints to Relax 1. one order of execution 2. strong consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 13. Abstract Strategy Relax: One Order of Execution commutative operation: - operations must be re-orderable Distributed Counters in Cassandra Friday, August 13, 2010
  • 14. Abstract Strategy Relax: Strong Consistency partitioned work: - each op must occur once - unique partition identifier idempotent repair: - recognize ops from other partitions Distributed Counters in Cassandra Friday, August 13, 2010
  • 15. Design III: Distributed Counter Design Distributed Counters in Cassandra Friday, August 13, 2010
  • 16. Distributed Counter Design Requirements 1. commutative operation 2. partitioned work 3. idempotent repair Distributed Counters in Cassandra Friday, August 13, 2010
  • 17. Distributed Counter Design Commutative Operation addition: - commutative operation - sum ops performed by all replicas -a + b = b + a Distributed Counters in Cassandra Friday, August 13, 2010
  • 18. Distributed Counter Design Partitioned Work each op assigned to a replica: - every replica sums all of its ops Distributed Counters in Cassandra Friday, August 13, 2010
  • 19. Distributed Counter Design Idempotent Repair save counts from remote replicas: - keep highest count seen prevent multiple execution: - do not transfer the target replica’s count Distributed Counters in Cassandra Friday, August 13, 2010
  • 20. III: Implementation Distributed Counters in Cassandra Friday, August 13, 2010
  • 21. I: Data Structure II: Single Node III: Eventual Consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 22. I: Data Structure Distributed Counters in Cassandra Friday, August 13, 2010
  • 23. Data Structure Requirements local counts: - incrementally update remote counts: - independently track partitions Distributed Counters in Cassandra Friday, August 13, 2010
  • 24. Data Structure Context Format list of (replica id, count) tuples: [(replica A, count), (replica B, count), ...] Distributed Counters in Cassandra Friday, August 13, 2010
  • 25. Data Structure Context Mutations local write: sum local count and write delta note: memtable Distributed Counters in Cassandra Friday, August 13, 2010
  • 26. Data Structure Context Mutations remote repair: for each replica, keep highest count seen (local or from repair) Distributed Counters in Cassandra Friday, August 13, 2010
  • 27. II: Single Node Distributed Counters in Cassandra Friday, August 13, 2010
  • 28. Single Node Write Path client 1. construct column - value: delta (big-endian long) - clock: empty 2. thrift: insert / batch_mutate Distributed Counters in Cassandra Friday, August 13, 2010
  • 29. Single Node Write Path coordinator 1. choose partition - choose target replica - requirement: ConsistencyLevel.ONE 2. construct clock - context format: [(target replica id, count delta)] Distributed Counters in Cassandra Friday, August 13, 2010
  • 30. Single Node Write Path target replica insert: 1. memtable does not contain column 2. insert column into memtable Distributed Counters in Cassandra Friday, August 13, 2010
  • 31. Single Node Write Path target replica update: 1. memtable contains column 2. retrieve existing column 3. create new column - context: sum local count w/ delta from write 4. replace column in ConcurrentSkipListMap 5. if failed to replace column, go to step 2. Distributed Counters in Cassandra Friday, August 13, 2010
  • 32. Single Node Write Path Interesting Note: MTs are serialized to SSTs, as-is - each SST encapsulates the updates when it was an MT - local count total must be aggregated across the MT and all SSTs Distributed Counters in Cassandra Friday, August 13, 2010
  • 33. Single Node Read Path target replica read: 1. construct collating iterator over: - frozen snapshot of MT - all relevant SSTs 2. resolve column - local counts: sum - remote counts: keep max 3. construct value - sum local and remote counts (big-endian long) Distributed Counters in Cassandra Friday, August 13, 2010
  • 34. Single Node Compaction replica compaction: 1. construct collating iterator over all SSTs 2. resolve every column in the CF - local counts: sum - remote counts: keep max 3. write out resolved CF Distributed Counters in Cassandra Friday, August 13, 2010
  • 35. III: Eventual Consistency Distributed Counters in Cassandra Friday, August 13, 2010
  • 36. Eventual Consistency Read Repair coordinator / replica read repair: 1. calculate resolved (superset) CF - resolve every column (local: sum, remote: max) 2. return resolved CF to client Distributed Counters in Cassandra Friday, August 13, 2010
  • 37. Eventual Consistency Read Repair coordinator / replica read repair: 1. calculate repair CF for each replica - calculate diff CF between resolved and received - modify columns to remove target replica’s counts 2. send repair CF to each replica Distributed Counters in Cassandra Friday, August 13, 2010
  • 38. Eventual Consistency Anti-Entropy Service sending replica AES: 1. follow normal AES code path - calculate repair SST based on shared ranges - send repair SST Distributed Counters in Cassandra Friday, August 13, 2010
  • 39. Eventual Consistency Anti-Entropy Service receiving replica AES: 1. post-process streamed SST - re-build streamed SST - note: strip out local replica’s counts 2. remove temporary descriptor 3. add to SSTableTracker Distributed Counters in Cassandra Friday, August 13, 2010
  • 40. Questions? Distributed Counters in Cassandra Friday, August 13, 2010
  • 41. More Information Issues: #580: Vector Clocks #1072: Distributed Counters Related Work: Helland and Campbell, Building on Quicksand, CIDR (2009), Sections 5 & 6. My email address: kakugawa@gmail.com Distributed Counters in Cassandra Friday, August 13, 2010