SlideShare uma empresa Scribd logo
1 de 15
Cassandra – A Decentralized Structured Storage System Gemini Mobile Technologies, Inc. NOSQL Tokyo Reading Group (http://nosqlsummer.org/city/tokyo) August 25, 2010 Tags: #cassandra #nosql 2010/8/23 Gemini Mobile Technologies, Inc. 1
Cassandra: A Decentralized Structured Storage System Authors: AvinashLakshman, PrashantMalik. Abstract:  Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. … Appeared in:3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 2
1. Introduction and 2. Related Work Facebook inbox search: Enables users to search through their inbox. Launched 6/2008.  Highly scalable: 250M users. Tolerant for server/network failures. Very high write throughput: “billions of writes per day”. Replicate data across data centers. Related Work Distributed file systems: Ficus, Coda, Farsite, GFS, Bayou. Storage systems: Dynamo, Bigtable. “The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.” 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 3
3. Data Model Multi-level Index: Table:  Set of rows Key:  Identifies the row Key is arbitrary byte[]. Each row can contain a variable number of columns/CFs.    No need for rows to contain same columns/CFs. Each row can contain millions of columns/CFs Atomic operations per key per replica. 3.  ColumnName: Identify the column value(s). Can be either “Column”, “ColumnFamily”, “ColumnFamily:Column”, “ColumnFamily:ColumnFamily”, etc. ColumnFamily (CF) is a group of Columns. CFs and Columns are sorted.  Time-based or name-based. Columns can be added/deleted efficiently during run-time. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 4
Data Model Example: Inbox Search 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 5 Query: Find all messages of user3 with “hello”. Get(UserMessages, “user3”, “term:hello”) Table: UserMessages Key:<userid> CF:”term” CF: <word> Name:<timestamp>  Val:<messageID> “term” user3 “hello” “how” “you” time4 time12 time4 time4 time12 time1 msg10 msg81 msg10 msg10 msg81 msg03
4. API Simple get/put operations: Insert(table, key, rowMutation) Single columns, Multiple columns, Batch of multiple keys. Get(table, key, columnName) Key: Single key or key range. columnName: “Slice” range or name. Delete(table, key, columnName) Also, specify Consistency Level. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 6
5. System Architecture Data partitioned to subset of nodes: Consistent Hashing Data replicated to multiple nodes for redundancy, performance: 	Quorum using “preference list” of nodes Node management: Membership algorithm to know which nodes are up/down. “Accrual failure detection + Gossip” Bootstrapping to add node. Manual operation + “Seed” nodes 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 7 Consistent Hash NodeA NodeC NodeD Gossip NodeB
5.1 Partitioning Algorithm: Consistent Hashing Each node is assigned a random position on ring. Key k is hashed to fixed circular space. Nodes are assigned by walking clockwise from hash location. Example:  Nodes A, B, C, D, E, F, G assigned to ring. Hash(k) is between A and B. Since 3 replicas, choose next 3 nodes on ring (i.e., B, C, D). 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 8 Hash(k) A Node assignment B G C F D E
5.1 Consistent Hashing Key advantage:  Adding, deleting, re-allocating nodes is cheap.  It affects only immediate neighbor node keys. Hash function Locality Load distribution. Load-balancing by moving nodes toward heavily-loaded nodes. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 9
5.2 Replication Each data item is replicated at multiple nodes (N). Each key is assigned to a “coordinator” node by consistent hash function. “Coordinator” node replicates the key to an additional N-1 nodes. “Consistency Level” is set by client per read/write request. ZERO, ONE, ALL, ANY, QUORUM Zookeeper used to elect leader node and distribute “preference list” Leader node owns “preference list” that maps key to node list. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 10
5.3 Membership Each node locally determines if any other node in the system is up/down. Φ (phi) Accrual Failure Detector Instead of boolean value (up or down), compute a numeric value Φ representing suspicion level for each monitored nodes. Φ is computed using inter-arrival times of gossip messages from other nodes in the cluster. If Φ exceeds a particular threshold, then node is considered as “down”. In experiment of 100 nodes with threshold of 5, average time to detect failure: 15 seconds. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 11
5.4 Bootstrapping, 5.5 Scaling the Cluster New nodes check configuration for “seed” nodes to get initial gossip data like “preference” lists. Add/remove of nodes is not done automatically.  Requires manual command-line operation. New node needs to have data moved to it from other nodes.  Operationally, 40MB/s.  Working to improve this by copying data from multiple replicas a la BitTorrent. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 12
5.6 Local Persistence 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 13 READ ,[object Object]
 Bloom filter to reduce SSTable access
 Check SSTables in time-orderIn-Memory Table ,[object Object],Commit Log SS Table SS Table SS Table WRITE SSTable ,[object Object]

Mais conteúdo relacionado

Mais procurados

Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
Bharat Rane
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
Hiroshi Ono
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
Rim Moussa
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 

Mais procurados (20)

Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 

Destaque

Lielie lasīšanas svētki
Lielie lasīšanas svētkiLielie lasīšanas svētki
Lielie lasīšanas svētki
Valmibibl
 
ELR Rad Waste Article Werner
ELR Rad Waste Article WernerELR Rad Waste Article Werner
ELR Rad Waste Article Werner
Jim Werner
 
Trollisi mumini aicina ciemos
Trollisi mumini aicina ciemosTrollisi mumini aicina ciemos
Trollisi mumini aicina ciemos
Valmibibl
 
吉祥经2
吉祥经2吉祥经2
吉祥经2
suminch
 
Welcome back learning
Welcome back learningWelcome back learning
Welcome back learning
papeeler
 
Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92
Jim Werner
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawat
Nirmal Ghorawat
 

Destaque (20)

Lielie lasīšanas svētki
Lielie lasīšanas svētkiLielie lasīšanas svētki
Lielie lasīšanas svētki
 
Cassandra: a NoSQL storage system
Cassandra: a NoSQL storage system Cassandra: a NoSQL storage system
Cassandra: a NoSQL storage system
 
Melokalisasi dan mengisolasi daerah permasalahan
Melokalisasi dan mengisolasi daerah permasalahanMelokalisasi dan mengisolasi daerah permasalahan
Melokalisasi dan mengisolasi daerah permasalahan
 
Opening at cloudian seminar 2012
Opening at cloudian seminar 2012Opening at cloudian seminar 2012
Opening at cloudian seminar 2012
 
Cloudian closing remarks at cloudian seminar 2013
Cloudian closing remarks at cloudian seminar 2013Cloudian closing remarks at cloudian seminar 2013
Cloudian closing remarks at cloudian seminar 2013
 
How to Distribute, Store and Version Models with EMFStore
How to Distribute, Store and Version Models with EMFStoreHow to Distribute, Store and Version Models with EMFStore
How to Distribute, Store and Version Models with EMFStore
 
ELR Rad Waste Article Werner
ELR Rad Waste Article WernerELR Rad Waste Article Werner
ELR Rad Waste Article Werner
 
Trollisi mumini aicina ciemos
Trollisi mumini aicina ciemosTrollisi mumini aicina ciemos
Trollisi mumini aicina ciemos
 
Digital collaborative accounting
Digital collaborative accounting Digital collaborative accounting
Digital collaborative accounting
 
吉祥经2
吉祥经2吉祥经2
吉祥经2
 
Welcome back learning
Welcome back learningWelcome back learning
Welcome back learning
 
Ipadrevolution 100721204445-phpapp02
Ipadrevolution 100721204445-phpapp02Ipadrevolution 100721204445-phpapp02
Ipadrevolution 100721204445-phpapp02
 
NTT Com at Cloudian seminar 2012
NTT Com at Cloudian seminar 2012NTT Com at Cloudian seminar 2012
NTT Com at Cloudian seminar 2012
 
Submate Pitch TWiST Paris Meetup
Submate Pitch TWiST Paris MeetupSubmate Pitch TWiST Paris Meetup
Submate Pitch TWiST Paris Meetup
 
Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92
 
Idaho
IdahoIdaho
Idaho
 
Idaho
IdahoIdaho
Idaho
 
Puerto rico
Puerto ricoPuerto rico
Puerto rico
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawat
 
Apresentação ShareNext
Apresentação ShareNextApresentação ShareNext
Apresentação ShareNext
 

Semelhante a Summary of "Cassandra" for 3rd nosql summer reading in Tokyo

Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Kiruthikak14
 

Semelhante a Summary of "Cassandra" for 3rd nosql summer reading in Tokyo (20)

Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in TokyoSummary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
Cassandra
CassandraCassandra
Cassandra
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Chapter3 ec2 and usage.ppt
Chapter3 ec2 and usage.pptChapter3 ec2 and usage.ppt
Chapter3 ec2 and usage.ppt
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Os9
Os9Os9
Os9
 
A Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of ChoiceA Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of Choice
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 

Mais de CLOUDIAN KK

Mais de CLOUDIAN KK (20)

CLOUDIAN HYPERSTORE - 風林火山ストレージ
CLOUDIAN HYPERSTORE - 風林火山ストレージCLOUDIAN HYPERSTORE - 風林火山ストレージ
CLOUDIAN HYPERSTORE - 風林火山ストレージ
 
クラウディアンのご紹介
クラウディアンのご紹介クラウディアンのご紹介
クラウディアンのご紹介
 
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
 
CLOUDIAN Presentation at VERITAS VISION in Tokyo
CLOUDIAN Presentation at VERITAS VISION in TokyoCLOUDIAN Presentation at VERITAS VISION in Tokyo
CLOUDIAN Presentation at VERITAS VISION in Tokyo
 
S3 API接続検証プログラムのご紹介
S3 API接続検証プログラムのご紹介S3 API接続検証プログラムのご紹介
S3 API接続検証プログラムのご紹介
 
Auto tiering and Versioning of CLOUDIAN HyperStore
Auto tiering and Versioning of CLOUDIAN HyperStoreAuto tiering and Versioning of CLOUDIAN HyperStore
Auto tiering and Versioning of CLOUDIAN HyperStore
 
AWS SDK for Python and CLOUDIAN HyperStore
AWS SDK for Python and CLOUDIAN HyperStoreAWS SDK for Python and CLOUDIAN HyperStore
AWS SDK for Python and CLOUDIAN HyperStore
 
AWS CLI and CLOUDIAN HyperStore
AWS CLI and CLOUDIAN HyperStoreAWS CLI and CLOUDIAN HyperStore
AWS CLI and CLOUDIAN HyperStore
 
ZiDOMA data and CLOUDIAN HyperStore
ZiDOMA data and CLOUDIAN HyperStoreZiDOMA data and CLOUDIAN HyperStore
ZiDOMA data and CLOUDIAN HyperStore
 
FOBAS CSC and CLOUDIAN HyperStore
FOBAS CSC and CLOUDIAN HyperStoreFOBAS CSC and CLOUDIAN HyperStore
FOBAS CSC and CLOUDIAN HyperStore
 
ARCserve backup and CLOUDIAN HyperStore
ARCserve backup and CLOUDIAN HyperStoreARCserve backup and CLOUDIAN HyperStore
ARCserve backup and CLOUDIAN HyperStore
 
Cloudian presentation at idc japan sv2016
Cloudian presentation at idc japan sv2016Cloudian presentation at idc japan sv2016
Cloudian presentation at idc japan sv2016
 
ITコアを刷新するハイブリッドクラウド型ITシステム
ITコアを刷新するハイブリッドクラウド型ITシステムITコアを刷新するハイブリッドクラウド型ITシステム
ITコアを刷新するハイブリッドクラウド型ITシステム
 
【FOBAS】Data is money. ストレージ分散投資のススメ
【FOBAS】Data is money. ストレージ分散投資のススメ【FOBAS】Data is money. ストレージ分散投資のススメ
【FOBAS】Data is money. ストレージ分散投資のススメ
 
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
 
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
 
【CLOUDIAN】コード化されたインフラの実装
【CLOUDIAN】コード化されたインフラの実装【CLOUDIAN】コード化されたインフラの実装
【CLOUDIAN】コード化されたインフラの実装
 
【CLOUDIAN】自動階層化による現有ストレージ活用術
【CLOUDIAN】自動階層化による現有ストレージ活用術【CLOUDIAN】自動階層化による現有ストレージ活用術
【CLOUDIAN】自動階層化による現有ストレージ活用術
 
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
 
【Cloudian】FIT2015における会社製品紹介
【Cloudian】FIT2015における会社製品紹介【Cloudian】FIT2015における会社製品紹介
【Cloudian】FIT2015における会社製品紹介
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Summary of "Cassandra" for 3rd nosql summer reading in Tokyo

  • 1. Cassandra – A Decentralized Structured Storage System Gemini Mobile Technologies, Inc. NOSQL Tokyo Reading Group (http://nosqlsummer.org/city/tokyo) August 25, 2010 Tags: #cassandra #nosql 2010/8/23 Gemini Mobile Technologies, Inc. 1
  • 2. Cassandra: A Decentralized Structured Storage System Authors: AvinashLakshman, PrashantMalik. Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. … Appeared in:3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 2
  • 3. 1. Introduction and 2. Related Work Facebook inbox search: Enables users to search through their inbox. Launched 6/2008. Highly scalable: 250M users. Tolerant for server/network failures. Very high write throughput: “billions of writes per day”. Replicate data across data centers. Related Work Distributed file systems: Ficus, Coda, Farsite, GFS, Bayou. Storage systems: Dynamo, Bigtable. “The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.” 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 3
  • 4. 3. Data Model Multi-level Index: Table: Set of rows Key: Identifies the row Key is arbitrary byte[]. Each row can contain a variable number of columns/CFs. No need for rows to contain same columns/CFs. Each row can contain millions of columns/CFs Atomic operations per key per replica. 3. ColumnName: Identify the column value(s). Can be either “Column”, “ColumnFamily”, “ColumnFamily:Column”, “ColumnFamily:ColumnFamily”, etc. ColumnFamily (CF) is a group of Columns. CFs and Columns are sorted. Time-based or name-based. Columns can be added/deleted efficiently during run-time. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 4
  • 5. Data Model Example: Inbox Search 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 5 Query: Find all messages of user3 with “hello”. Get(UserMessages, “user3”, “term:hello”) Table: UserMessages Key:<userid> CF:”term” CF: <word> Name:<timestamp> Val:<messageID> “term” user3 “hello” “how” “you” time4 time12 time4 time4 time12 time1 msg10 msg81 msg10 msg10 msg81 msg03
  • 6. 4. API Simple get/put operations: Insert(table, key, rowMutation) Single columns, Multiple columns, Batch of multiple keys. Get(table, key, columnName) Key: Single key or key range. columnName: “Slice” range or name. Delete(table, key, columnName) Also, specify Consistency Level. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 6
  • 7. 5. System Architecture Data partitioned to subset of nodes: Consistent Hashing Data replicated to multiple nodes for redundancy, performance: Quorum using “preference list” of nodes Node management: Membership algorithm to know which nodes are up/down. “Accrual failure detection + Gossip” Bootstrapping to add node. Manual operation + “Seed” nodes 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 7 Consistent Hash NodeA NodeC NodeD Gossip NodeB
  • 8. 5.1 Partitioning Algorithm: Consistent Hashing Each node is assigned a random position on ring. Key k is hashed to fixed circular space. Nodes are assigned by walking clockwise from hash location. Example: Nodes A, B, C, D, E, F, G assigned to ring. Hash(k) is between A and B. Since 3 replicas, choose next 3 nodes on ring (i.e., B, C, D). 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 8 Hash(k) A Node assignment B G C F D E
  • 9. 5.1 Consistent Hashing Key advantage: Adding, deleting, re-allocating nodes is cheap. It affects only immediate neighbor node keys. Hash function Locality Load distribution. Load-balancing by moving nodes toward heavily-loaded nodes. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 9
  • 10. 5.2 Replication Each data item is replicated at multiple nodes (N). Each key is assigned to a “coordinator” node by consistent hash function. “Coordinator” node replicates the key to an additional N-1 nodes. “Consistency Level” is set by client per read/write request. ZERO, ONE, ALL, ANY, QUORUM Zookeeper used to elect leader node and distribute “preference list” Leader node owns “preference list” that maps key to node list. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 10
  • 11. 5.3 Membership Each node locally determines if any other node in the system is up/down. Φ (phi) Accrual Failure Detector Instead of boolean value (up or down), compute a numeric value Φ representing suspicion level for each monitored nodes. Φ is computed using inter-arrival times of gossip messages from other nodes in the cluster. If Φ exceeds a particular threshold, then node is considered as “down”. In experiment of 100 nodes with threshold of 5, average time to detect failure: 15 seconds. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 11
  • 12. 5.4 Bootstrapping, 5.5 Scaling the Cluster New nodes check configuration for “seed” nodes to get initial gossip data like “preference” lists. Add/remove of nodes is not done automatically. Requires manual command-line operation. New node needs to have data moved to it from other nodes. Operationally, 40MB/s. Working to improve this by copying data from multiple replicas a la BitTorrent. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 12
  • 13.
  • 14. Bloom filter to reduce SSTable access
  • 15.
  • 19.
  • 20.
  • 21. Epilogue Active Apache project with good documentation: http://cassandra.apache.org/ http://wiki.apache.org/cassandra/ArticlesAndPresentations In use at companies like Digg, Facebook, Twitter, Reddit, Rackspace. Largest production cluster has over 100 TB data over 150 machines. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 15