SlideShare a Scribd company logo
1 of 48
Cassandra Basics
              Indexing

     Benjamin Black, b@b3k.us
Relational stores are
SCHEMA ORIENTED
Start from your SCHEMA &
WORK FORWARDS
Column stores are
QUERY ORIENTED
Start from your QUERIES &
WORK BACKWARDS
AT SCALE
AT SCALE
           Denormalization is
              THE NORM
AT SCALE
AT SCALE
           Everything depends on
               THE INDICES
Cassandra is an
INDEX CONSTRUCTION KIT
Column Family
Two-level Map

key: {
  column: value,
  column: value,
  ...
 }
Super Column Family
Three-level Map
key: {
   supercolumn: {
       column:value,
      column: value
   },
   supercolumn: {
     ...
   }
 }
column sorting defined by
         CompareWith/
CompareSubcolumnsWith
TimeUUIDType
  UTF8Type
                ASCIIType
LongType

     LexicalUUIDType
row placement determined by
             Partitioner
RandomPartitioner
Place based on MD5 of key




        OrderPreservingPartitioner
               Place based on actual key
Rows are sorted by key on each node
Regardless of partitioner
One example in
TWO ACTS
Prelude
A USER DATABASE
<ColumnFamily Name=”Users”
       CompareWith=”UTF8Type” />
“b”:    {“name”:”Ben”, “street”:”1234 Oak St.”,
        “city”:”Seattle”, “state”:”WA”}
“jason”: {”name”:”Jason”, “street”:”456 First Ave.”,
        “city”:”Bellingham”, “state”:”WA”}
“zack”:     {”name”: “Zack”, “street”: “4321 Pine St.”,
          “city”: “Seattle”, “state”: “WA”}
“jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”,
         “city”:”San Francisco”, “state”:”CA”}
“albert”: {”name”:”Albert”, “street”:”2364 South St.”,
         “city”:”Boston”, “state”:”MA”}
SELECT name FROM Users
WHERE state=”WA”
SELECT name FROM Users
               WHERE state=”WA”

How is WHERE clause
formed?
Act One
Supercolumn Indexing
<ColumnFamily Name=”LocationUserIndexSCF”
       CompareWith=”UTF8Type”
       CompareSubcolumnsWith=”UTF8Type”
       ColumnType=”Super” />
[state]: {
  [city1]: {[name1]:[user1], [name2]:[user2], ... },
  [city2]: {[name3]:[user3], [name4]:[user4], ... },
  ...
  [cityX]: {[name5]:[user5], [name6]:[user6], ... }
}
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key


“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                 Super Column

“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Row Key
                                     Colum
                 Super Column                Value
                                     n
“CA”: {

 “San Francisco”: {”Jennifer”: “jen1982”}
}
“MA”: {

 “Boston”: {”Albert”: “albert”}
}
“WA”: {

 “Bellingham”: {”Jason”: “jason”},

 “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Show me
EVERYONE IN WASHINGTON
get(:LocationUserIndexSCF, ‘WA’)
{

   “Bellingham”: {”Jason”: “jason”},

   “Seattle”: {”Ben”: “b”, ”Zack”: “zack”}
}
Act Two
Composite Key Indexing
Order Preserving Partitioner
                          +
        Range Queries
<ColumnFamily Name=”LocationUserIndexCF”
       CompareWith=”UTF8Type” />
[state1]/[city1]:   {[name1]:[user1], [name2]:[user2], ... }
[state1]/[city2]:   {[name3]:[user3], [name4]:[user4], ... }
[state2]/[city1]:   {[name5]:[user5], [name6]:[user6], ... }
...
[stateX]/[cityY]:   {[name7]:[user7], [name8]:[user8], ... }
“CA/San Francisco”: {”Jennifer”: “jen1982”}
“MA/Boston”: {”Albert”: “albert”}
“WA/Bellingham”: {”Jason”: “jason”}
“WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
Show me
EVERYONE IN WASHINGTON
get_range(:LocationUserIndexCF, {:start: 'WA',
                          :finish:'WB'})
{
    ”WA/Bellingham”: {”Jason”: “jason”},
    “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
}
Finale
BUILD SOMETHING AWESOME
(This part is up to you)
Appendix
EXAMPLE KEYSPACE
<Keyspace Name="UserDb">
  <ColumnFamily Name="Users"
          CompareWith="UTF8Type" />

  <ColumnFamily Name="LocationUserIndexSCF"

   
         CompareWith="UTF8Type"
  
       
     CompareSubcolumnsWith="UTF8Type"

   
         ColumnType="Super" />

   
  <ColumnFamily Name="LocationUserIndexCF"

   
         CompareWith="UTF8Type" />

   
  <ReplicaPlacementStrategy>
      org.apache.cassandra.locator.RackUnawareStrategy
  </ReplicaPlacementStrategy>
  <ReplicationFactor>1</ReplicationFactor>
  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

More Related Content

Viewers also liked

Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 

Viewers also liked (20)

Cassandra
CassandraCassandra
Cassandra
 
Graphite cluster setup blueprint
Graphite cluster setup blueprintGraphite cluster setup blueprint
Graphite cluster setup blueprint
 
Understanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UCUnderstanding BYOE and How Today's User Experience Drives Value for UC
Understanding BYOE and How Today's User Experience Drives Value for UC
 
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
The Big 3 - 3 Keys to the Customer Kingdom - Business process, Big data, and ...
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 

Similar to Cassandra Basics: Indexing

Building Your First Java Application with MongoDB
Building Your First Java Application with MongoDBBuilding Your First Java Application with MongoDB
Building Your First Java Application with MongoDB
MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Similar to Cassandra Basics: Indexing (10)

Building Your First Java Application with MongoDB
Building Your First Java Application with MongoDBBuilding Your First Java Application with MongoDB
Building Your First Java Application with MongoDB
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
MongoDB - Features and Operations
MongoDB - Features and OperationsMongoDB - Features and Operations
MongoDB - Features and Operations
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*Native json in the Cache' ObjectScript 2016.*
Native json in the Cache' ObjectScript 2016.*
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
MongoDB .local Bengaluru 2019: Aggregation Pipeline Power++: How MongoDB 4.2 ...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Cassandra Basics: Indexing

  • 1. Cassandra Basics Indexing Benjamin Black, b@b3k.us
  • 3. Start from your SCHEMA & WORK FORWARDS
  • 5. Start from your QUERIES & WORK BACKWARDS
  • 7. AT SCALE Denormalization is THE NORM
  • 9. AT SCALE Everything depends on THE INDICES
  • 10. Cassandra is an INDEX CONSTRUCTION KIT
  • 12. Two-level Map key: { column: value, column: value, ... }
  • 14. Three-level Map key: { supercolumn: { column:value, column: value }, supercolumn: { ... } }
  • 15. column sorting defined by CompareWith/ CompareSubcolumnsWith
  • 16. TimeUUIDType UTF8Type ASCIIType LongType LexicalUUIDType
  • 17. row placement determined by Partitioner
  • 18. RandomPartitioner Place based on MD5 of key OrderPreservingPartitioner Place based on actual key
  • 19. Rows are sorted by key on each node Regardless of partitioner
  • 22. <ColumnFamily Name=”Users” CompareWith=”UTF8Type” />
  • 23. “b”: {“name”:”Ben”, “street”:”1234 Oak St.”, “city”:”Seattle”, “state”:”WA”} “jason”: {”name”:”Jason”, “street”:”456 First Ave.”, “city”:”Bellingham”, “state”:”WA”} “zack”: {”name”: “Zack”, “street”: “4321 Pine St.”, “city”: “Seattle”, “state”: “WA”} “jen1982”: {”name”:”Jennifer”, “street”:”1120 Foo Lane”, “city”:”San Francisco”, “state”:”CA”} “albert”: {”name”:”Albert”, “street”:”2364 South St.”, “city”:”Boston”, “state”:”MA”}
  • 24. SELECT name FROM Users WHERE state=”WA”
  • 25. SELECT name FROM Users WHERE state=”WA” How is WHERE clause formed?
  • 27. <ColumnFamily Name=”LocationUserIndexSCF” CompareWith=”UTF8Type” CompareSubcolumnsWith=”UTF8Type” ColumnType=”Super” />
  • 28. [state]: { [city1]: {[name1]:[user1], [name2]:[user2], ... }, [city2]: {[name3]:[user3], [name4]:[user4], ... }, ... [cityX]: {[name5]:[user5], [name6]:[user6], ... } }
  • 29. “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 30. Row Key “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 31. Row Key Super Column “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 32. Row Key Colum Super Column n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 33. Row Key Colum Super Column Value n “CA”: { “San Francisco”: {”Jennifer”: “jen1982”} } “MA”: { “Boston”: {”Albert”: “albert”} } “WA”: { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 34. Show me EVERYONE IN WASHINGTON
  • 36. { “Bellingham”: {”Jason”: “jason”}, “Seattle”: {”Ben”: “b”, ”Zack”: “zack”} }
  • 38. Order Preserving Partitioner + Range Queries
  • 39. <ColumnFamily Name=”LocationUserIndexCF” CompareWith=”UTF8Type” />
  • 40. [state1]/[city1]: {[name1]:[user1], [name2]:[user2], ... } [state1]/[city2]: {[name3]:[user3], [name4]:[user4], ... } [state2]/[city1]: {[name5]:[user5], [name6]:[user6], ... } ... [stateX]/[cityY]: {[name7]:[user7], [name8]:[user8], ... }
  • 41. “CA/San Francisco”: {”Jennifer”: “jen1982”} “MA/Boston”: {”Albert”: “albert”} “WA/Bellingham”: {”Jason”: “jason”} “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”}
  • 42. Show me EVERYONE IN WASHINGTON
  • 44. { ”WA/Bellingham”: {”Jason”: “jason”}, “WA/Seattle”: {”Ben”: “b”, “Zack”: “zack”} }
  • 46. (This part is up to you)
  • 48. <Keyspace Name="UserDb"> <ColumnFamily Name="Users" CompareWith="UTF8Type" /> <ColumnFamily Name="LocationUserIndexSCF" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> <ColumnFamily Name="LocationUserIndexCF" CompareWith="UTF8Type" /> <ReplicaPlacementStrategy> org.apache.cassandra.locator.RackUnawareStrategy </ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>

Editor's Notes