SlideShare uma empresa Scribd logo
Cassandra
    Overview

         
What Is It?
    ●   It is a persistent database, but not an
        RDBMS – more on API later
    ●   It can run as a single instance or as a part
        of a cluster.
    ●   All nodes are equal, no master, no slaves
    ●   The cluster can be distributed within a
        single DC or across multiple DCs.
    ●   Multiple DCs can be Active-Active for
        performance or Active-Passive for DR
                              
Simple API
    ●   Get, Put, Delete – all by key
    ●   Batch put and delete – save wire time
    ●   Range queries (iterate over sequence of
        keys)
    ●   Target individual columns within a row –
        Get and Put
    ●   Native integration available for Hadoop
        MapReduce
    ●   CQL – SQL like language
                              
Consistent Hash Ring
    ●   Conceptually all nodes in a cluster are on
        a ring of hash values, “tokens”
    ●   Each node is assigned a token range on
        the ring
    ●   A key's hash (token) places it on the ring,
        within a specific node's token range
    ●   The hash is consistent, meaning the
        location of data is consistent and
        predictable
                              
0 => 2127 (Random 
    Partitoner)
    K1 => H1 (token)                 2127      0
    H1 => R4 (primary = N4)
    N = 3
                                                  N1
    RS = N4, N5, N6         N8       R1
                                                       R2
                           R8
                      N7
                                                                 N2


                      R7
                                                                 R3


                      N6                                         N3
                           R6                          R4

                                N5     R5         N4
                                                            H1


                                               
Replication
    ●   Replication Factor (N) determines how
        many replicas exist for each key
    ●   Location of replicas is determined by
        consistent hash ring and the “partitioner”
    ●   Generally, N=3 means data will be placed
        on node N, N+1, N+2 on the ring (This can
        vary based on placement strategy, but is
        predictable)
    ●   Powerful because no query required to
        find the node(s) containing a key
                              
Consistency
    ●   Consistency is “eventual” in Cassandra –
        it will always work to create N (Replication
        Factor) replicas
    ●   Write Consistency (W) defines how many
        replicas are guaranteed per “put” request
    ●   Read Consistency (R) defines how many
        replicas are consulted before responding
    ●   W and R are tunable per request,
        therefore consistency is tunable as well
                              
Data Modeling
      Example



           
Schema Overview
    ●   Keyspace (“database”) contains one or
        more ColumnFamilies
    ●   ColumnFamily (“table”) contains zero or
        more rows
    ●   A Row must contain one or more columns
    ●   ColumnFamilies are indexed by key
        (“rows”, but more like hash map)
    ●   Rows within the same CF may have
        different number of columns, and different
 
        column names!!        
Example
    UserData (Keyspace)
       UserAttributes (ColumnFamily, sort = UTF8)
                             Age         Sex        Weight
         Ellie
                             4           Female 32
                             Age         Sex
         Sammy
                             2           Male
                             Age         EyeColor    Height Sex
         Henry
                             2           Blue        30        Male

       UserAccessLog (ColumnFamily, sort = Long)
                             7/20/2010      7/22/2010
         Sammy

                             7/22/2010      7/23/2010        7/24/2010
         Henry

                                      
Columns
    ●   Column names (not values) are sorted,
        per key
    ●   32 bit limit to number of columns per key –
        entire column must fit in RAM, on one
        machine
    ●   Can retrieve/update/delete all columns,
        columns by name, or range of columns
    ●   A key (or row) must contain at least one
        Column, otherwise considered deleted
                             
Thrift Read Methods
    ●   get – return a single column for a single
        key
    ●   get_slice – return multiple columns for a
        single key
    ●   multiget_slice – return multiple columns
        for a list of keys
    ●   get_range_slices – return multiple
        columns for a “range” of keys
    ●   Most use “high level” client (Hector,
 
        Pycassa, etc)        
Thrift Write Methods
    ●   insert – insert/update a single column for a
        single key (most call this method, “put”)
    ●   batch_mutate – insert/update/remove
        multiple columns for multiple keys in
        multiple ColumnFamilies
    ●   remove – remove a single column (or
        entire row) for a single key



                             
Useful References
    ●   http://www.allthingsdistributed.com/2007/1
        0/amazons_dynamo.html
    ●   http://www.allthingsdistributed.com/2008/1
        2/eventually_consistent.html
    ●   http://wiki.apache.org/cassandra/
    ●   - "A description of the cassandra data
        model"
    ●   - "Architecture Overview"
    ●   - “Operations”
                             
    ●   - "Articles and Presentations"

Mais conteúdo relacionado

Semelhante a Cassandra Overview

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
lovingprince58
 

Semelhante a Cassandra Overview (20)

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 

Último

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Cassandra Overview

  • 1. Cassandra Overview    
  • 2. What Is It? ● It is a persistent database, but not an RDBMS – more on API later ● It can run as a single instance or as a part of a cluster. ● All nodes are equal, no master, no slaves ● The cluster can be distributed within a single DC or across multiple DCs. ● Multiple DCs can be Active-Active for performance or Active-Passive for DR    
  • 3. Simple API ● Get, Put, Delete – all by key ● Batch put and delete – save wire time ● Range queries (iterate over sequence of keys) ● Target individual columns within a row – Get and Put ● Native integration available for Hadoop MapReduce ● CQL – SQL like language    
  • 4. Consistent Hash Ring ● Conceptually all nodes in a cluster are on a ring of hash values, “tokens” ● Each node is assigned a token range on the ring ● A key's hash (token) places it on the ring, within a specific node's token range ● The hash is consistent, meaning the location of data is consistent and predictable    
  • 5. 0 => 2127 (Random  Partitoner) K1 => H1 (token) 2127      0 H1 => R4 (primary = N4) N = 3 N1 RS = N4, N5, N6 N8 R1 R2 R8 N7 N2 R7 R3 N6 N3 R6 R4 N5 R5 N4 H1    
  • 6. Replication ● Replication Factor (N) determines how many replicas exist for each key ● Location of replicas is determined by consistent hash ring and the “partitioner” ● Generally, N=3 means data will be placed on node N, N+1, N+2 on the ring (This can vary based on placement strategy, but is predictable) ● Powerful because no query required to find the node(s) containing a key    
  • 7. Consistency ● Consistency is “eventual” in Cassandra – it will always work to create N (Replication Factor) replicas ● Write Consistency (W) defines how many replicas are guaranteed per “put” request ● Read Consistency (R) defines how many replicas are consulted before responding ● W and R are tunable per request, therefore consistency is tunable as well    
  • 8. Data Modeling Example    
  • 9. Schema Overview ● Keyspace (“database”) contains one or more ColumnFamilies ● ColumnFamily (“table”) contains zero or more rows ● A Row must contain one or more columns ● ColumnFamilies are indexed by key (“rows”, but more like hash map) ● Rows within the same CF may have different number of columns, and different   column names!!  
  • 10. Example UserData (Keyspace) UserAttributes (ColumnFamily, sort = UTF8) Age Sex Weight Ellie 4 Female 32 Age Sex Sammy 2 Male Age EyeColor Height Sex Henry 2 Blue 30 Male UserAccessLog (ColumnFamily, sort = Long) 7/20/2010 7/22/2010 Sammy 7/22/2010 7/23/2010 7/24/2010 Henry    
  • 11. Columns ● Column names (not values) are sorted, per key ● 32 bit limit to number of columns per key – entire column must fit in RAM, on one machine ● Can retrieve/update/delete all columns, columns by name, or range of columns ● A key (or row) must contain at least one Column, otherwise considered deleted    
  • 12. Thrift Read Methods ● get – return a single column for a single key ● get_slice – return multiple columns for a single key ● multiget_slice – return multiple columns for a list of keys ● get_range_slices – return multiple columns for a “range” of keys ● Most use “high level” client (Hector,   Pycassa, etc)  
  • 13. Thrift Write Methods ● insert – insert/update a single column for a single key (most call this method, “put”) ● batch_mutate – insert/update/remove multiple columns for multiple keys in multiple ColumnFamilies ● remove – remove a single column (or entire row) for a single key    
  • 14. Useful References ● http://www.allthingsdistributed.com/2007/1 0/amazons_dynamo.html ● http://www.allthingsdistributed.com/2008/1 2/eventually_consistent.html ● http://wiki.apache.org/cassandra/ ● - "A description of the cassandra data model" ● - "Architecture Overview" ● - “Operations”     ● - "Articles and Presentations"

Notas do Editor

  1. CQL spec is at version 3 – but I believe is still a bit raw and untested. Not getting rid of thrift anytime soon