SlideShare uma empresa Scribd logo
1 de 14
Cassandra
    Overview

         
What Is It?
    ●   It is a persistent database, but not an
        RDBMS – more on API later
    ●   It can run as a single instance or as a part
        of a cluster.
    ●   All nodes are equal, no master, no slaves
    ●   The cluster can be distributed within a
        single DC or across multiple DCs.
    ●   Multiple DCs can be Active-Active for
        performance or Active-Passive for DR
                              
Simple API
    ●   Get, Put, Delete – all by key
    ●   Batch put and delete – save wire time
    ●   Range queries (iterate over sequence of
        keys)
    ●   Target individual columns within a row –
        Get and Put
    ●   Native integration available for Hadoop
        MapReduce
    ●   CQL – SQL like language
                              
Consistent Hash Ring
    ●   Conceptually all nodes in a cluster are on
        a ring of hash values, “tokens”
    ●   Each node is assigned a token range on
        the ring
    ●   A key's hash (token) places it on the ring,
        within a specific node's token range
    ●   The hash is consistent, meaning the
        location of data is consistent and
        predictable
                              
0 => 2127 (Random 
    Partitoner)
    K1 => H1 (token)                 2127      0
    H1 => R4 (primary = N4)
    N = 3
                                                  N1
    RS = N4, N5, N6         N8       R1
                                                       R2
                           R8
                      N7
                                                                 N2


                      R7
                                                                 R3


                      N6                                         N3
                           R6                          R4

                                N5     R5         N4
                                                            H1


                                               
Replication
    ●   Replication Factor (N) determines how
        many replicas exist for each key
    ●   Location of replicas is determined by
        consistent hash ring and the “partitioner”
    ●   Generally, N=3 means data will be placed
        on node N, N+1, N+2 on the ring (This can
        vary based on placement strategy, but is
        predictable)
    ●   Powerful because no query required to
        find the node(s) containing a key
                              
Consistency
    ●   Consistency is “eventual” in Cassandra –
        it will always work to create N (Replication
        Factor) replicas
    ●   Write Consistency (W) defines how many
        replicas are guaranteed per “put” request
    ●   Read Consistency (R) defines how many
        replicas are consulted before responding
    ●   W and R are tunable per request,
        therefore consistency is tunable as well
                              
Data Modeling
      Example



           
Schema Overview
    ●   Keyspace (“database”) contains one or
        more ColumnFamilies
    ●   ColumnFamily (“table”) contains zero or
        more rows
    ●   A Row must contain one or more columns
    ●   ColumnFamilies are indexed by key
        (“rows”, but more like hash map)
    ●   Rows within the same CF may have
        different number of columns, and different
 
        column names!!        
Example
    UserData (Keyspace)
       UserAttributes (ColumnFamily, sort = UTF8)
                             Age         Sex        Weight
         Ellie
                             4           Female 32
                             Age         Sex
         Sammy
                             2           Male
                             Age         EyeColor    Height Sex
         Henry
                             2           Blue        30        Male

       UserAccessLog (ColumnFamily, sort = Long)
                             7/20/2010      7/22/2010
         Sammy

                             7/22/2010      7/23/2010        7/24/2010
         Henry

                                      
Columns
    ●   Column names (not values) are sorted,
        per key
    ●   32 bit limit to number of columns per key –
        entire column must fit in RAM, on one
        machine
    ●   Can retrieve/update/delete all columns,
        columns by name, or range of columns
    ●   A key (or row) must contain at least one
        Column, otherwise considered deleted
                             
Thrift Read Methods
    ●   get – return a single column for a single
        key
    ●   get_slice – return multiple columns for a
        single key
    ●   multiget_slice – return multiple columns
        for a list of keys
    ●   get_range_slices – return multiple
        columns for a “range” of keys
    ●   Most use “high level” client (Hector,
 
        Pycassa, etc)        
Thrift Write Methods
    ●   insert – insert/update a single column for a
        single key (most call this method, “put”)
    ●   batch_mutate – insert/update/remove
        multiple columns for multiple keys in
        multiple ColumnFamilies
    ●   remove – remove a single column (or
        entire row) for a single key



                             
Useful References
    ●   http://www.allthingsdistributed.com/2007/1
        0/amazons_dynamo.html
    ●   http://www.allthingsdistributed.com/2008/1
        2/eventually_consistent.html
    ●   http://wiki.apache.org/cassandra/
    ●   - "A description of the cassandra data
        model"
    ●   - "Architecture Overview"
    ●   - “Operations”
                             
    ●   - "Articles and Presentations"

Mais conteúdo relacionado

Semelhante a Cassandra Overview

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionTheo Hultberg
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 

Semelhante a Cassandra Overview (20)

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Cassandra Overview

  • 1. Cassandra Overview    
  • 2. What Is It? ● It is a persistent database, but not an RDBMS – more on API later ● It can run as a single instance or as a part of a cluster. ● All nodes are equal, no master, no slaves ● The cluster can be distributed within a single DC or across multiple DCs. ● Multiple DCs can be Active-Active for performance or Active-Passive for DR    
  • 3. Simple API ● Get, Put, Delete – all by key ● Batch put and delete – save wire time ● Range queries (iterate over sequence of keys) ● Target individual columns within a row – Get and Put ● Native integration available for Hadoop MapReduce ● CQL – SQL like language    
  • 4. Consistent Hash Ring ● Conceptually all nodes in a cluster are on a ring of hash values, “tokens” ● Each node is assigned a token range on the ring ● A key's hash (token) places it on the ring, within a specific node's token range ● The hash is consistent, meaning the location of data is consistent and predictable    
  • 5. 0 => 2127 (Random  Partitoner) K1 => H1 (token) 2127      0 H1 => R4 (primary = N4) N = 3 N1 RS = N4, N5, N6 N8 R1 R2 R8 N7 N2 R7 R3 N6 N3 R6 R4 N5 R5 N4 H1    
  • 6. Replication ● Replication Factor (N) determines how many replicas exist for each key ● Location of replicas is determined by consistent hash ring and the “partitioner” ● Generally, N=3 means data will be placed on node N, N+1, N+2 on the ring (This can vary based on placement strategy, but is predictable) ● Powerful because no query required to find the node(s) containing a key    
  • 7. Consistency ● Consistency is “eventual” in Cassandra – it will always work to create N (Replication Factor) replicas ● Write Consistency (W) defines how many replicas are guaranteed per “put” request ● Read Consistency (R) defines how many replicas are consulted before responding ● W and R are tunable per request, therefore consistency is tunable as well    
  • 8. Data Modeling Example    
  • 9. Schema Overview ● Keyspace (“database”) contains one or more ColumnFamilies ● ColumnFamily (“table”) contains zero or more rows ● A Row must contain one or more columns ● ColumnFamilies are indexed by key (“rows”, but more like hash map) ● Rows within the same CF may have different number of columns, and different   column names!!  
  • 10. Example UserData (Keyspace) UserAttributes (ColumnFamily, sort = UTF8) Age Sex Weight Ellie 4 Female 32 Age Sex Sammy 2 Male Age EyeColor Height Sex Henry 2 Blue 30 Male UserAccessLog (ColumnFamily, sort = Long) 7/20/2010 7/22/2010 Sammy 7/22/2010 7/23/2010 7/24/2010 Henry    
  • 11. Columns ● Column names (not values) are sorted, per key ● 32 bit limit to number of columns per key – entire column must fit in RAM, on one machine ● Can retrieve/update/delete all columns, columns by name, or range of columns ● A key (or row) must contain at least one Column, otherwise considered deleted    
  • 12. Thrift Read Methods ● get – return a single column for a single key ● get_slice – return multiple columns for a single key ● multiget_slice – return multiple columns for a list of keys ● get_range_slices – return multiple columns for a “range” of keys ● Most use “high level” client (Hector,   Pycassa, etc)  
  • 13. Thrift Write Methods ● insert – insert/update a single column for a single key (most call this method, “put”) ● batch_mutate – insert/update/remove multiple columns for multiple keys in multiple ColumnFamilies ● remove – remove a single column (or entire row) for a single key    
  • 14. Useful References ● http://www.allthingsdistributed.com/2007/1 0/amazons_dynamo.html ● http://www.allthingsdistributed.com/2008/1 2/eventually_consistent.html ● http://wiki.apache.org/cassandra/ ● - "A description of the cassandra data model" ● - "Architecture Overview" ● - “Operations”     ● - "Articles and Presentations"

Notas do Editor

  1. CQL spec is at version 3 – but I believe is still a bit raw and untested. Not getting rid of thrift anytime soon