SlideShare uma empresa Scribd logo
1 de 31
Talk About Apache Cassandra
           Boris Yen
           byan@ruckuswireless.com
Outline
•   Overview
•   Architecture Overview
•   Partitioning and Replication
•   Data Consistency
Overview
• Distributed
  – Data partitioned among all nodes
• Extremely Scalable
  – Add new node = Add more capacity
  – Easy to add new node
• Fault tolerant
  – All nodes are the same
  – Read/Write anywhere
  – Automatic Data replication
Overview
• High Performance




    http://blog.cubrid.org/dev-platform/nosql-benchmarking/



• Schema-less (Not completely true)
  – Need to provide basic settings for each column family.
Architecture Overview
• Keyspace
  – Where the replication strategy and replication factor
    is defined
  – RDBMS synonym: Database
• Column family
  – Standard (recommended) or Super
  – Lots of settings can be defined
  – RDBMS synonym: Table
• Row/Record
  – Indexed by Key. Columns might be indexed as well
  – Column name are sorted based on the comparator
  – Each column has its own timestamp
Architecture Overview
Standard CF                Super CF
{                          {
  Key1: {                    Key1: {
     column1: value,            super_column1: {
     column2: value                subColumn1: value,
  },                               subColumn2: value
  Key2: {                       },
     column1: value,            super_column2: {
     column2: value                subColumn1: value,
  }                                subColumn2: value
}                               }
                             },
Recommended. Super           Key2: {
columns could be somehow        super_column1: {
replaced by composite              subColumn1: value,
columns.                           subColumn2: value
                                }
                           }
Architecture Overview
• Commit log
  – Used to capture write activities. Data durability is
    assured.
• Memtable
  – Used to store most recent write activities.
• SSTable
  – When a memtable got flushed to disk, it becomes
    a sstable.
Architecture Overview
• Data write path


       Data   Commitlog   Memtable


                                     Flushed

                           SSTable
Architecture Overview
• Data read path
  – Search Row cache, if the result is not empty, then
    return the result. No further actions are needed.
  – If no hit in the Row cache. Try to get data from
    Memtable(s) and SSTable(s). Collate the results
    and return.
Partitioning and Replication
• In Cassandra, the total data managed by the
  cluster is represented as a circular space or ring.
• The ring is divided up into ranges equal to the
  number of nodes, with each node being
  responsible for one or more ranges of the overall
  data.
• Before a node can join the ring, it must be
  assigned a token. The token determines the
  node’s position on the ring and the range of data
  it is responsible for.
Partitioning


                                                              “boris” is inserted here


              Data
Data is inserted and
assigned a row key in a
column family.

{
    boris:{
      first name: boris,
      last name: Yen
    }
                                 Data placed on the node based on its
}
                                 row key
Partitioning Strategies
• Random Partitioning
  – This is the default and recommended strategy.
    Partition data as evenly as possible across all nodes
    using an MD5 hash of every column family row key
• Order Partitioning
  – Store column family row keys in sorted order across all
    nodes in the cluster.
  – Sequential writes can cause hot spots
  – More administrative overhead to load balance the
    cluster
  – Uneven load balancing for multiple column families
Setting up data Partitioning
• The data partitioning strategy is controlled via
  the partitioner option inside cassandra.yaml
  file
• Once a cluster in initialized with a partitioner
  option, it can not be changed without
  reloading all of the data in the cluster.
Replication
• To ensure fault tolerance and no single point of
  failure, you can replicate one or more copies of
  every row across nodes in the cluster
• Replication is controlled by the parameters
  replication factor and replication strategy of a
  keyspace
• Replication factor controls how many copies of a
  row should be store in the cluster
• Replication strategy controls how the data being
  replicated.
Replication
                               RF=3



                                                                     “boris” is inserted here


              Data
Data is inserted and
assigned a row key in a
column family.               “boris” is inserted here               “boris” is inserted here

{
    boris:{
      first name: boris,
      last name: Yen
    }
                                         Copy of row is replicated across
}
                                         various nodes based on the assigned
                                         replication factor
Replication Strategies
• Simple Strategy
  – Place the original row on a node determined by the
    partitioner. Additional replica rows are placed on the
    new nodes clockwise in the ring.
• Network Topology Strategy
  – Allow replication between different racks in a data
    center and or between multiple data centers
  – The original row is placed according the partitioner.
    Additional replica rows in the same data center are
    then placed by walking the ring clockwise until a node
    in a different rack from previous replica is found. If
    there is no such node, additional replicas will be
    placed in the same rack.
Replication - Network Topology Strategy

     RF={DC1:2, DC2:2}




    http://www.datastax.com/docs/1.0/cluster_architecture/replication
Replication Mechanics
• Cassandra uses a snitch to define how nodes
  are grouped together within the overall
  network topology, such as rack and data
  center groupings.
• The snitch is defined in the cassandra.yaml
Replication Mechanics - Snitches
• Simple Snitch
   – The default and used for simple replication strategy
• Rack Inferring Snitch
   – Infers the topology of the network by analyzing the
     node IP addresses. This snitch assumes that the
     second octet identifies the data center where a node
     is located, and third octet identifies the rack
• Property File Snitch
   – Determines the location of nodes by referring to a
     user-defined file, cassandra-topology.properties
• EC2 Snitch
   – Is for deployments on Amazon EC2 only
Data Consistency
• Cassandra supports tunable data consistency
• Choose from strong and eventual consistency
  depending on the need
• Can be done on a per-operation basis, and for
  both reads and writes.
• Handles multi-data center operations
Consistency Level for Writes
• Any
   – A write must succeed on any available node (hint included)
• One
   – A write must succeed on any node responsible for that row
     (either primary or replica)
• Quorum
   – A write mush succeed on a quorum of replica nodes (RF/2 + 1)
• Local_Quorum
   – A write mush succeed on a quorum of replica nodes in the same
     data center as the coordinator node.
• Each_Quorum
   – A write must succeed on a quorum of replica nodes in all data
     centers
• All
   – A write must succeed on all replica nodes for a row key
Consistency Level for Reads
• One
   – Reads from the closest node holding the data
• Quorum
   – Returns a result from a quorum of servers with the most recent
     timestamp for the data
• Local_Quorum
   – Returns a result from a quorum of servers with the most recent
     timestamp for the data in the same data center as the
     coordinator node
• Each_Quorum
   – Returns a result from a quorum of servers with the most recent
     timestamp in all data centers
• All
   – Returns a result from all replica nodes for a row key
Built-in Consistency Repair Features
• Read Repair
  – When a read is done, the coordinator node
    compares the data from all the remaining replicas
    that own the row in the background, and If they
    are inconsistent, issues writes to the out-of-date
    replicas to update the row.
• Anti-Entropy Node Repair
• Hinted Handoff
What is New in 1.0
• Column Family Compression
  – 2x-4x reduction in data size
  – 25-35% performance improvement on reads
  – 5-10% performance improvement on writes
• Improved Memory and Disk Space Management
  – Off-heap row cache
  – Storage engine self-tuning
  – Faster disk space reclamation
• Tunable Compaction Strategy
  – Support LevelDB style compaction algorithm that can
    be enabled on a per-column family basis.
What is New in 1.0
• Cassandra Windows Service
• Improved Write Consistency and Performance
  – Hint data is stored more efficiently
  – Coordinator nodes no longer need to wait for the
    failure detector to mark a node as down before
    saving hints for unresponsive nodes.
     • Running a full node repair to reconcile missed writes is
       not necessary. Full node repair is only necessary when
       simultaneous multi-node fails o losing a node entirely
     • Default read repair probability has been reduced from
       100% to 10%
Anti-Patterns
• Non-Sun JVM
• CommitLog+Data on the same Disk
  – Does not apply to SSDs or EC2
• Oversized JVM heaps
  – 6-8 GB is good
  – 10-12 is possible and in some circumstances
    “correct”
  – 16GB == max JVM heap size
  – > 16GB => badness
       http://www.slideshare.net/mattdennis/cassandra-antipatterns
Anti-Patterns
• Large batch mutations
  – Timeout => entire mutation must be retried =>
    wasted work
  – Keep the batch mutations to 10-100 (this really
    depends on the HW)
• Ordered partitioner
  – Creates hot spots
  – Requires extra cares from operators
• Cassnadra auto selection of tokens
  – Always specify your initial token.
       http://www.slideshare.net/mattdennis/cassandra-antipatterns
Anti-Patterns
• Super Column
  – 10-15 percent performance penalty on reads and
    writes
  – Easier/Better to use to composite columns
• Read Before write
• Winblows



       http://www.slideshare.net/mattdennis/cassandra-antipatterns
Want to Learn More
• http://www.datastax.com/resources/tutorials
• http://www.datastax.com/docs/1.0/index




         P.S. Most of the content in this presentation is actually
                    coming from the websites above
Q&A
We are hiring people
• If you are interesting in what we are doing,
  please contact us.

Mais conteúdo relacionado

Mais procurados

The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in CassandraArunit Gupta
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internalsBhuvan Rawal
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data ModellingKnoldus Inc.
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning CassandraDave Gardner
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 

Mais procurados (20)

The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internals
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 

Destaque

Apache Cassandra
Apache CassandraApache Cassandra
Apache CassandraSperasoft
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overviewElifTech
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
How Do I Cassandra?
How Do I Cassandra?How Do I Cassandra?
How Do I Cassandra?Rick Branson
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hoodAndriy Rymar
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 

Destaque (20)

Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
How Do I Cassandra?
How Do I Cassandra?How Do I Cassandra?
How Do I Cassandra?
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hood
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 

Semelhante a Talk About Apache Cassandra

Cassandra
CassandraCassandra
Cassandraexsuns
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Jason Brown
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.pptDanBarcan2
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 

Semelhante a Talk About Apache Cassandra (20)

Cassandra
CassandraCassandra
Cassandra
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassandra
CassandraCassandra
Cassandra
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 

Último

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Talk About Apache Cassandra

  • 1. Talk About Apache Cassandra Boris Yen byan@ruckuswireless.com
  • 2. Outline • Overview • Architecture Overview • Partitioning and Replication • Data Consistency
  • 3. Overview • Distributed – Data partitioned among all nodes • Extremely Scalable – Add new node = Add more capacity – Easy to add new node • Fault tolerant – All nodes are the same – Read/Write anywhere – Automatic Data replication
  • 4. Overview • High Performance http://blog.cubrid.org/dev-platform/nosql-benchmarking/ • Schema-less (Not completely true) – Need to provide basic settings for each column family.
  • 5. Architecture Overview • Keyspace – Where the replication strategy and replication factor is defined – RDBMS synonym: Database • Column family – Standard (recommended) or Super – Lots of settings can be defined – RDBMS synonym: Table • Row/Record – Indexed by Key. Columns might be indexed as well – Column name are sorted based on the comparator – Each column has its own timestamp
  • 6. Architecture Overview Standard CF Super CF { { Key1: { Key1: { column1: value, super_column1: { column2: value subColumn1: value, }, subColumn2: value Key2: { }, column1: value, super_column2: { column2: value subColumn1: value, } subColumn2: value } } }, Recommended. Super Key2: { columns could be somehow super_column1: { replaced by composite subColumn1: value, columns. subColumn2: value } }
  • 7. Architecture Overview • Commit log – Used to capture write activities. Data durability is assured. • Memtable – Used to store most recent write activities. • SSTable – When a memtable got flushed to disk, it becomes a sstable.
  • 8. Architecture Overview • Data write path Data Commitlog Memtable Flushed SSTable
  • 9. Architecture Overview • Data read path – Search Row cache, if the result is not empty, then return the result. No further actions are needed. – If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s). Collate the results and return.
  • 10. Partitioning and Replication • In Cassandra, the total data managed by the cluster is represented as a circular space or ring. • The ring is divided up into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the overall data. • Before a node can join the ring, it must be assigned a token. The token determines the node’s position on the ring and the range of data it is responsible for.
  • 11. Partitioning “boris” is inserted here Data Data is inserted and assigned a row key in a column family. { boris:{ first name: boris, last name: Yen } Data placed on the node based on its } row key
  • 12. Partitioning Strategies • Random Partitioning – This is the default and recommended strategy. Partition data as evenly as possible across all nodes using an MD5 hash of every column family row key • Order Partitioning – Store column family row keys in sorted order across all nodes in the cluster. – Sequential writes can cause hot spots – More administrative overhead to load balance the cluster – Uneven load balancing for multiple column families
  • 13. Setting up data Partitioning • The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml file • Once a cluster in initialized with a partitioner option, it can not be changed without reloading all of the data in the cluster.
  • 14. Replication • To ensure fault tolerance and no single point of failure, you can replicate one or more copies of every row across nodes in the cluster • Replication is controlled by the parameters replication factor and replication strategy of a keyspace • Replication factor controls how many copies of a row should be store in the cluster • Replication strategy controls how the data being replicated.
  • 15. Replication RF=3 “boris” is inserted here Data Data is inserted and assigned a row key in a column family. “boris” is inserted here “boris” is inserted here { boris:{ first name: boris, last name: Yen } Copy of row is replicated across } various nodes based on the assigned replication factor
  • 16. Replication Strategies • Simple Strategy – Place the original row on a node determined by the partitioner. Additional replica rows are placed on the new nodes clockwise in the ring. • Network Topology Strategy – Allow replication between different racks in a data center and or between multiple data centers – The original row is placed according the partitioner. Additional replica rows in the same data center are then placed by walking the ring clockwise until a node in a different rack from previous replica is found. If there is no such node, additional replicas will be placed in the same rack.
  • 17. Replication - Network Topology Strategy RF={DC1:2, DC2:2} http://www.datastax.com/docs/1.0/cluster_architecture/replication
  • 18. Replication Mechanics • Cassandra uses a snitch to define how nodes are grouped together within the overall network topology, such as rack and data center groupings. • The snitch is defined in the cassandra.yaml
  • 19. Replication Mechanics - Snitches • Simple Snitch – The default and used for simple replication strategy • Rack Inferring Snitch – Infers the topology of the network by analyzing the node IP addresses. This snitch assumes that the second octet identifies the data center where a node is located, and third octet identifies the rack • Property File Snitch – Determines the location of nodes by referring to a user-defined file, cassandra-topology.properties • EC2 Snitch – Is for deployments on Amazon EC2 only
  • 20. Data Consistency • Cassandra supports tunable data consistency • Choose from strong and eventual consistency depending on the need • Can be done on a per-operation basis, and for both reads and writes. • Handles multi-data center operations
  • 21. Consistency Level for Writes • Any – A write must succeed on any available node (hint included) • One – A write must succeed on any node responsible for that row (either primary or replica) • Quorum – A write mush succeed on a quorum of replica nodes (RF/2 + 1) • Local_Quorum – A write mush succeed on a quorum of replica nodes in the same data center as the coordinator node. • Each_Quorum – A write must succeed on a quorum of replica nodes in all data centers • All – A write must succeed on all replica nodes for a row key
  • 22. Consistency Level for Reads • One – Reads from the closest node holding the data • Quorum – Returns a result from a quorum of servers with the most recent timestamp for the data • Local_Quorum – Returns a result from a quorum of servers with the most recent timestamp for the data in the same data center as the coordinator node • Each_Quorum – Returns a result from a quorum of servers with the most recent timestamp in all data centers • All – Returns a result from all replica nodes for a row key
  • 23. Built-in Consistency Repair Features • Read Repair – When a read is done, the coordinator node compares the data from all the remaining replicas that own the row in the background, and If they are inconsistent, issues writes to the out-of-date replicas to update the row. • Anti-Entropy Node Repair • Hinted Handoff
  • 24. What is New in 1.0 • Column Family Compression – 2x-4x reduction in data size – 25-35% performance improvement on reads – 5-10% performance improvement on writes • Improved Memory and Disk Space Management – Off-heap row cache – Storage engine self-tuning – Faster disk space reclamation • Tunable Compaction Strategy – Support LevelDB style compaction algorithm that can be enabled on a per-column family basis.
  • 25. What is New in 1.0 • Cassandra Windows Service • Improved Write Consistency and Performance – Hint data is stored more efficiently – Coordinator nodes no longer need to wait for the failure detector to mark a node as down before saving hints for unresponsive nodes. • Running a full node repair to reconcile missed writes is not necessary. Full node repair is only necessary when simultaneous multi-node fails o losing a node entirely • Default read repair probability has been reduced from 100% to 10%
  • 26. Anti-Patterns • Non-Sun JVM • CommitLog+Data on the same Disk – Does not apply to SSDs or EC2 • Oversized JVM heaps – 6-8 GB is good – 10-12 is possible and in some circumstances “correct” – 16GB == max JVM heap size – > 16GB => badness http://www.slideshare.net/mattdennis/cassandra-antipatterns
  • 27. Anti-Patterns • Large batch mutations – Timeout => entire mutation must be retried => wasted work – Keep the batch mutations to 10-100 (this really depends on the HW) • Ordered partitioner – Creates hot spots – Requires extra cares from operators • Cassnadra auto selection of tokens – Always specify your initial token. http://www.slideshare.net/mattdennis/cassandra-antipatterns
  • 28. Anti-Patterns • Super Column – 10-15 percent performance penalty on reads and writes – Easier/Better to use to composite columns • Read Before write • Winblows http://www.slideshare.net/mattdennis/cassandra-antipatterns
  • 29. Want to Learn More • http://www.datastax.com/resources/tutorials • http://www.datastax.com/docs/1.0/index P.S. Most of the content in this presentation is actually coming from the websites above
  • 30. Q&A
  • 31. We are hiring people • If you are interesting in what we are doing, please contact us.