SlideShare uma empresa Scribd logo
1 de 28
Apache Cassandra
Overview
Road Map
• The Big Data Challenge
• The Cassandra Solution
• The CAPTheorem
• The Architecture of Cassandra
• The Data Partition and Replication
The Big Data Challenge
Data Proliferation
• Data Produce Growing Exponentially
• Sources of Data Growing
• Mobile Devices
• Cloud Services
• Historical Data
ThreeVs –Trends In Big Data
• Volume:
• Terabytess -> Zettabyts
• Velocity:
• Batch -> Streaming
• Variety :
• Structured -> Unstructured
RDBMS Scaling
• Traditional RDBMS : No Scale out
• Cassandra : Linear Scale out
The Cassandra Solution
Objective
• Schema Free
• Easy Replication
• SimpleAPI
• Consistence
• Can Handle huge data
NoSQLDatabase
• simplicity of design,
• horizontal scaling
• finer control over availability
Relational Database vs. NoSQL
Relational Database
• Supports powerful query language
• It has a fixed schema
• Follows ACID (Atomicity,Consistency,
Isolation, and Durability)
• Supports transactions
NoSQL Database
• Supports very simple query language
• No Fixed Schema
• It is only “eventually consistent”
• Does not support transactions
Other NoSQL Database
• Apache HBase - HBase is an open source, non-relational,
distributed database modeled after Google’s BigTable and is
written in Java. It is developed as a part of Apache Hadoop
project and runs on top of HDFS, providing BigTable-like
capabilities for Hadoop.
• MongoDB - MongoDB is a cross-platform document-
oriented database system that avoids using the traditional
table-based relational database structure in favor of JSON-
like documents with dynamic schemas making the
integration of data in certain types of applications easier and
faster.
What is Apache Cassandra?
• Apache Cassandra™ is a free
• Distributed
• High performance
• Extremely scalable
• Fault tolerant (i.e. no single point of failure)
• post-relational database solution. Cassandra can serve as both real-time
data store (the “system of record”) for online/transactional applications, and
as a read intensive database for business intelligence systems.
Features of Cassandra
• Elastic scalability
• Always on architecture
• Fast linear-scale performance
• Flexible data storage
• Easy data distribution
• Transaction support
• Fast writes
History of Cassandra
• Cassandra was developed at Facebook for inbox search.
• It was open-sourced by Facebook in July 2008.
• Cassandra was accepted into Apache Incubator in March 2009.
• It was made an Apache top-level project since February 2010.
The CapTheorem
CAPTheorem
• Distributed System Can only provide two of
• Availability
• Consistency
• PartitionTolerance
• AKA BrewersTheorem
Cassandra AP
• Cassandra Prioritizes Availability and PartitionTolerance
• Consistency is not guaranteed
• Tradeoffs between latency and Consistency
Other Approaches -CP
• Eg. Hbase
• Implements Row locking for consistency
• HBase has master/slave & Single point of Failure
• No A
The Architecture of Cassandra
Architecture Overview
• Cassandra was designed with the understanding that
system/hardware failures can and do occur
• Peer-to-peer, distributed system
• All nodes the same
• Data partitioned among all nodes in the cluster
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
Architecture Overview
• Each node communicates with each other through the
• Gossip protocol, which exchanges information across
the
• cluster every second
• A commit log is used on each node to capture write
• activity. Data durability is assured
• Data also written to an in-memory structure
(memtable)
• and then to disk once the memory structure is full (an
• SStable)
Architecture Overview
• The schema used in Cassandra is mirrored after
Google
• Bigtable. It is a row-oriented, column structure
• A keyspace is akin to a database in the RDBMS world
• A column family is similar to an RDBMS table but is
more
• flexible/dynamic
• A row in a column family is indexed by its key. Other
• columns may be indexed as well
Components of Cassandra
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log −The commit log is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After commit log, the data
will be written to the mem-table. Sometimes, for a single-column family, there will be
multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its contents
reach a threshold value.
• Bloom filter −These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters are
accessed after every query.
The Data Partition and Replication
Partition Process
• Data is transparently portioned across the nodes
• Data sent to a node is hashed and sent to partition based on hash
• The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml
file
• Once a cluster in initialized with a partitioner option, it can not be changed without
reloading all of the data in the cluster
Partitioning Strategies
• Random Partitioning
• This is the default and recommended strategy.
• Partition data as evenly as possible across all nodes
• using an MD5 hash of every column family row key
• Ordered Partitioning
• Store column family row keys in sorted order across all nodes in the cluster.
• Sequential writes can cause hot spots
• More administrative overhead to load balance the cluster
• Uneven load balancing for multiple column families
Replication
• To ensure fault tolerance and no single point of failure, you can replicate one or more
copies of every row across nodes in the cluster
• Replication is controlled by the parameters replication factor and replication strategy of a
keyspace
• Replication factor controls how many copies of a row should be store in the cluster
• Replication strategy controls how the data being replicated.
Replication Strategies
• Simple Strategy
• Place the original row on a node determined by the partitioner.Additional replica rows
are placed on the new nodes clockwise in the ring.
• NetworkTopology Strategy
• Allow replication between different racks in a data center and or between multiple
data centers
• The original row is placed according the partitioner.Additional replica rows in the same
data center are then placed by walking the ring clockwise until a node in a different
rack from previous replica is found. If there is no such node, additional replicas will be
placed in the same rack.

Mais conteúdo relacionado

Mais procurados

Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 

Mais procurados (20)

Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Cassandra
CassandraCassandra
Cassandra
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 

Semelhante a Cassandra an overview

The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Semelhante a Cassandra an overview (20)

The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
NoSql
NoSqlNoSql
NoSql
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Revision
RevisionRevision
Revision
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Cassandra
CassandraCassandra
Cassandra
 

Último

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 

Cassandra an overview

  • 2. Road Map • The Big Data Challenge • The Cassandra Solution • The CAPTheorem • The Architecture of Cassandra • The Data Partition and Replication
  • 3. The Big Data Challenge
  • 4. Data Proliferation • Data Produce Growing Exponentially • Sources of Data Growing • Mobile Devices • Cloud Services • Historical Data
  • 5. ThreeVs –Trends In Big Data • Volume: • Terabytess -> Zettabyts • Velocity: • Batch -> Streaming • Variety : • Structured -> Unstructured
  • 6. RDBMS Scaling • Traditional RDBMS : No Scale out • Cassandra : Linear Scale out
  • 8. Objective • Schema Free • Easy Replication • SimpleAPI • Consistence • Can Handle huge data NoSQLDatabase • simplicity of design, • horizontal scaling • finer control over availability
  • 9. Relational Database vs. NoSQL Relational Database • Supports powerful query language • It has a fixed schema • Follows ACID (Atomicity,Consistency, Isolation, and Durability) • Supports transactions NoSQL Database • Supports very simple query language • No Fixed Schema • It is only “eventually consistent” • Does not support transactions
  • 10. Other NoSQL Database • Apache HBase - HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as a part of Apache Hadoop project and runs on top of HDFS, providing BigTable-like capabilities for Hadoop. • MongoDB - MongoDB is a cross-platform document- oriented database system that avoids using the traditional table-based relational database structure in favor of JSON- like documents with dynamic schemas making the integration of data in certain types of applications easier and faster.
  • 11. What is Apache Cassandra? • Apache Cassandra™ is a free • Distributed • High performance • Extremely scalable • Fault tolerant (i.e. no single point of failure) • post-relational database solution. Cassandra can serve as both real-time data store (the “system of record”) for online/transactional applications, and as a read intensive database for business intelligence systems.
  • 12. Features of Cassandra • Elastic scalability • Always on architecture • Fast linear-scale performance • Flexible data storage • Easy data distribution • Transaction support • Fast writes
  • 13. History of Cassandra • Cassandra was developed at Facebook for inbox search. • It was open-sourced by Facebook in July 2008. • Cassandra was accepted into Apache Incubator in March 2009. • It was made an Apache top-level project since February 2010.
  • 15. CAPTheorem • Distributed System Can only provide two of • Availability • Consistency • PartitionTolerance • AKA BrewersTheorem
  • 16. Cassandra AP • Cassandra Prioritizes Availability and PartitionTolerance • Consistency is not guaranteed • Tradeoffs between latency and Consistency
  • 17. Other Approaches -CP • Eg. Hbase • Implements Row locking for consistency • HBase has master/slave & Single point of Failure • No A
  • 18. The Architecture of Cassandra
  • 19. Architecture Overview • Cassandra was designed with the understanding that system/hardware failures can and do occur • Peer-to-peer, distributed system • All nodes the same • Data partitioned among all nodes in the cluster • Custom data replication to ensure fault tolerance • Read/Write-anywhere design
  • 20. Architecture Overview • Each node communicates with each other through the • Gossip protocol, which exchanges information across the • cluster every second • A commit log is used on each node to capture write • activity. Data durability is assured • Data also written to an in-memory structure (memtable) • and then to disk once the memory structure is full (an • SStable)
  • 21. Architecture Overview • The schema used in Cassandra is mirrored after Google • Bigtable. It is a row-oriented, column structure • A keyspace is akin to a database in the RDBMS world • A column family is similar to an RDBMS table but is more • flexible/dynamic • A row in a column family is indexed by its key. Other • columns may be indexed as well
  • 22. Components of Cassandra • Node − It is the place where data is stored. • Data center − It is a collection of related nodes. • Cluster − A cluster is a component that contains one or more data centers. • Commit log −The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. • Bloom filter −These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 23.
  • 24. The Data Partition and Replication
  • 25. Partition Process • Data is transparently portioned across the nodes • Data sent to a node is hashed and sent to partition based on hash • The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml file • Once a cluster in initialized with a partitioner option, it can not be changed without reloading all of the data in the cluster
  • 26. Partitioning Strategies • Random Partitioning • This is the default and recommended strategy. • Partition data as evenly as possible across all nodes • using an MD5 hash of every column family row key • Ordered Partitioning • Store column family row keys in sorted order across all nodes in the cluster. • Sequential writes can cause hot spots • More administrative overhead to load balance the cluster • Uneven load balancing for multiple column families
  • 27. Replication • To ensure fault tolerance and no single point of failure, you can replicate one or more copies of every row across nodes in the cluster • Replication is controlled by the parameters replication factor and replication strategy of a keyspace • Replication factor controls how many copies of a row should be store in the cluster • Replication strategy controls how the data being replicated.
  • 28. Replication Strategies • Simple Strategy • Place the original row on a node determined by the partitioner.Additional replica rows are placed on the new nodes clockwise in the ring. • NetworkTopology Strategy • Allow replication between different racks in a data center and or between multiple data centers • The original row is placed according the partitioner.Additional replica rows in the same data center are then placed by walking the ring clockwise until a node in a different rack from previous replica is found. If there is no such node, additional replicas will be placed in the same rack.

Notas do Editor

  1. Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database. Let us first understand what a NoSQL database does. NoSQLDatabase A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data. The primary objective of a NoSQL database is to have simplicity of design, horizontal scaling, and finer control over availability. NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.