SlideShare a Scribd company logo
1 of 37
On Rails with Apache Cassandra


                 Austin on Rails
                  April 27th 2010
  Stu Hood (@stuhood) – Technical Lead, Rackspace
My, what a large/volatile dataset you
               have!
●   Large
    ●   Larger than 1 node can handle
●   Volatile
    ●   More than 25% (ish) writes
    ●   (but still larger than available memory)
●   Expensive
    ●   More than you can afford with a commercial
        solution
My, what a large/volatile dataset you
               have!
●   For example:
    ●   Event/log data
    ●   Output of batch processing or log analytics jobs
    ●   Social network relationships/updates
●   In general:
    ●   Large volume of high fanout data
Conversely...
●   If your pattern easily fits one RDBMS machine:
    ●   Don't Use Cassandra
    ●   Possibly consider MongoDB, CouchDB, Neo4j,
        Redis, etc
        –   For schema freedom and flexibility
Case Study: Digg
1.Vertical partitioning and master/slave trees
2.Developed sharding solution
  ●   IDDB
  ●   Awkward replication, fragile scaling
3.Began populating Cassandra in parallel
  ●   Initial dataset for 'green badges'
      –   3 TB
      –   76 billion kv pairs
  ●   Most applications being ported to Cassandra
Cassandra's Elders
Standing on the shoulders of:
             Amazon Dynamo
●   No node in the cluster is special
    ●   No special roles
    ●   No scaling bottlenecks
    ●   No single point of failure
●   Techniques
    ●   Gossip
    ●   Eventual consistency
Standing on the shoulders of:
              Google Bigtable
●   “Column family” data model
●   Range queries for rows:
    ●   Scan rows in order
●   Memtable/SSTable structure
    ●   Always writes sequentially to disk
    ●   Bloom filters to minimize random reads
    ●   Trounces B-Trees for big data
        –   Linear insert performance
        –   Log growth for reads
Enter Cassandra
●   Hybrid of ancestors
    ●   Adopts listed features
●   And adds:
    ●   Pluggable partitioning
    ●   Multi datacenter
        support
        –   Pluggable locality
            awareness
    ●   Datamodel
        improvements
Enter Cassandra
●   Project status
    ●   Open sourced by Facebook in 2008 (no longer active)
    ●   Apache License, Version 2.0
    ●   Graduated to Apache TLP February 2010
    ●   Major releases: 0.3 through 0.6.1 (0.7 this summer)
●   cassandra.apache.org
●   Known deployments at:
    ●   Cloudkick, Digg, Mahalo, SimpleGeo, Twitter,
        Rackspace, Reddit
The Datamodel
Cluster



                           Nodes have Tokens:
                     OrderPreservingPartitioner:
                                    Actual keys
                            RandomPartitioner:
                                  MD5s of keys
The Datamodel
Cluster >   Keyspace



                              Like an RDBMS schema:
                              Keyspace per application
The Datamodel
Cluster > Keyspace >   Column Family




              Sorted hash:
             Bytes → Row                      Like an RDBMS table:
                                       Separates classes of Objects
           Row Key → Row
The Datamodel
Cluster > Keyspace > Column Family >   Row




                                             Sorted hash: Name → Value
                                                        ...
The Datamodel
Cluster > Keyspace > Column Family > Row >   “Column”

                                                Not like an RDBMS column:
                                          Attribute of the row: each row can
                                        contain millions of different columns


                                                               …
                                                       Name → Value
                                                          bytes → bytes

                                                     +version timestamp
StatusApp: another Twitter clone.
StatusApp Example
<ColumnFamily Name=”Users”>
●   Unique id as key: name->value pairs contain
    user attributes
{key: “rails_user”, row: {“fullname”: “Damon
Clinkscales”, “joindate”: “back_in_the_day” … }}
StatusApp Example
<ColumnFamily Name=”Timelines”>
●   User id and timeline name as key: row contains
    list of updates from that timeline
{key: “user19:personal”, row: {<timeuuid1>:
“status19”, <timeuuid2>: “status21”, … }}
Raw Client API
●   Thrift RPC framework
    ●   Generates client bindings for (almost) any language


1. Get the most recent status in a timeline:
●   get_slice(keyspace, key, [column_family,
    column_name], predicate, consistency_level)
●   get_slice(“statusapp”, “userid19:personal”,
    [“Timelines”], {start: ””, count: 1}, QUORUM)
> <timeuuid1>: “status19”
But...
●   Don't use the Raw Thrift API!
    ●   You won't enjoy it
●   Use high level Client APIs
    ●   Many options for each language
Consistency Levels?
●   Eventual consistency
    ●   Synch to Washington, asynch to Hong Kong
●   Client API Tunables
    ●   Synchronously write to W replicas
    ●   Confirm R replicas match at read time
    ●   of N total replicas
●   Allows for almost-strong consistency
    ●   When W + R > N
Write Example




          Replication Factor == N == 3:
                              3 Copies
Write Example




         Client connects to arbitrary node
Write Example




                                cl.ONE:
                                W == 1
          Block for success on 1 replica
Write Example




                           cl.QUORUM:
                            W == N/2+1
          Block for success on a majority
Caveat consumptor
●   No secondary indexes:
    ●   Typically implemented in client libraries
●   No transactions
    ●   But atomic increment/decrement RSN
●   Absolutely no joins
    ●   You don't really want 'em anyway
“That doesn't sound worth the
          trouble!"
Cassandra Ruby Support:
               Cassandra Object
●   Mostly duck-type compatible with ActiveRecord
    objects
    ●   Transparently builds (non-)unique secondary
        indexes
    ●   Excludes:
        –   :order
        –   :conditions
        –   :join
        –   :group
Cassandra Ruby Support: RDF.rb
●   Repository implementation for RDF.rb
    ●   Stores triple of (subject, predicate, object) as
        (rowkey, name, subname)
Silver linings: Ops
●   Dead drive?
    ●   Swap the drive, restart, run 'repair'
    ●   Streams missing data from other replicas
●   Dead node?
    ●   Start a new node with the same IP and token, run
        'repair'
Silver linings: Ops
●   Need N new nodes?
    ●   Start more nodes with the same config file
    ●   New nodes request load information from the
        cluster and join with a token that balances the
        cluster
Silver linings: Ops
●   Adding a datacenter?
    ●   Configure “dc/rack/ip” describing node location
    ●   Add new nodes as before
Silver linings: Performance
Getting started
●   `gem install cassandra`
●   `git clone
    git://github.com/tritonrc/cassandra_object.git`
●   http://cassandra.apache.org/
    ●   Read "Getting Started"... Roughly:
        –   Start one node
        –   Test/develop app, editing node config as necessary
        –   Launch cluster by starting more nodes with chosen config
Questions?
Resources
●   http://cassandra.apache.org/
●   http://wiki.apache.org/cassandra/
●   Mailing Lists
●   #cassandra on freenode.net
References
●   Digg Technology Blog
    ●   http://about.digg.com/blog/looking-future-cassandra
    ●   http://about.digg.com/blog/introducing-digg’s-iddb-infrastructure
●   Github Projects
    ●   http://github.com/tritonrc/cassandra_object
    ●   http://github.com/bendiken/rdf-cassandra
●   Cassandra Wiki
    ●   http://wiki.apache.org/cassandra/
●   Brandon William's perf tests
    ●   http://racklabs.com/~bwilliam/cassandra/04vs05vs06.png

More Related Content

What's hot

Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
Joyabrata Das
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 

What's hot (20)

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Scalable PHP Applications With Cassandra
Scalable PHP Applications With CassandraScalable PHP Applications With Cassandra
Scalable PHP Applications With Cassandra
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 

Similar to On Rails with Apache Cassandra

Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
PL dream
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
shimi_k
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
srisatish ambati
 

Similar to On Rails with Apache Cassandra (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra
CassandraCassandra
Cassandra
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 

Recently uploaded

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

On Rails with Apache Cassandra

  • 1. On Rails with Apache Cassandra Austin on Rails April 27th 2010 Stu Hood (@stuhood) – Technical Lead, Rackspace
  • 2. My, what a large/volatile dataset you have! ● Large ● Larger than 1 node can handle ● Volatile ● More than 25% (ish) writes ● (but still larger than available memory) ● Expensive ● More than you can afford with a commercial solution
  • 3. My, what a large/volatile dataset you have! ● For example: ● Event/log data ● Output of batch processing or log analytics jobs ● Social network relationships/updates ● In general: ● Large volume of high fanout data
  • 4. Conversely... ● If your pattern easily fits one RDBMS machine: ● Don't Use Cassandra ● Possibly consider MongoDB, CouchDB, Neo4j, Redis, etc – For schema freedom and flexibility
  • 5. Case Study: Digg 1.Vertical partitioning and master/slave trees 2.Developed sharding solution ● IDDB ● Awkward replication, fragile scaling 3.Began populating Cassandra in parallel ● Initial dataset for 'green badges' – 3 TB – 76 billion kv pairs ● Most applications being ported to Cassandra
  • 7. Standing on the shoulders of: Amazon Dynamo ● No node in the cluster is special ● No special roles ● No scaling bottlenecks ● No single point of failure ● Techniques ● Gossip ● Eventual consistency
  • 8. Standing on the shoulders of: Google Bigtable ● “Column family” data model ● Range queries for rows: ● Scan rows in order ● Memtable/SSTable structure ● Always writes sequentially to disk ● Bloom filters to minimize random reads ● Trounces B-Trees for big data – Linear insert performance – Log growth for reads
  • 9. Enter Cassandra ● Hybrid of ancestors ● Adopts listed features ● And adds: ● Pluggable partitioning ● Multi datacenter support – Pluggable locality awareness ● Datamodel improvements
  • 10. Enter Cassandra ● Project status ● Open sourced by Facebook in 2008 (no longer active) ● Apache License, Version 2.0 ● Graduated to Apache TLP February 2010 ● Major releases: 0.3 through 0.6.1 (0.7 this summer) ● cassandra.apache.org ● Known deployments at: ● Cloudkick, Digg, Mahalo, SimpleGeo, Twitter, Rackspace, Reddit
  • 11. The Datamodel Cluster Nodes have Tokens: OrderPreservingPartitioner: Actual keys RandomPartitioner: MD5s of keys
  • 12. The Datamodel Cluster > Keyspace Like an RDBMS schema: Keyspace per application
  • 13. The Datamodel Cluster > Keyspace > Column Family Sorted hash: Bytes → Row Like an RDBMS table: Separates classes of Objects Row Key → Row
  • 14. The Datamodel Cluster > Keyspace > Column Family > Row Sorted hash: Name → Value ...
  • 15. The Datamodel Cluster > Keyspace > Column Family > Row > “Column” Not like an RDBMS column: Attribute of the row: each row can contain millions of different columns … Name → Value bytes → bytes +version timestamp
  • 17. StatusApp Example <ColumnFamily Name=”Users”> ● Unique id as key: name->value pairs contain user attributes {key: “rails_user”, row: {“fullname”: “Damon Clinkscales”, “joindate”: “back_in_the_day” … }}
  • 18. StatusApp Example <ColumnFamily Name=”Timelines”> ● User id and timeline name as key: row contains list of updates from that timeline {key: “user19:personal”, row: {<timeuuid1>: “status19”, <timeuuid2>: “status21”, … }}
  • 19. Raw Client API ● Thrift RPC framework ● Generates client bindings for (almost) any language 1. Get the most recent status in a timeline: ● get_slice(keyspace, key, [column_family, column_name], predicate, consistency_level) ● get_slice(“statusapp”, “userid19:personal”, [“Timelines”], {start: ””, count: 1}, QUORUM) > <timeuuid1>: “status19”
  • 20. But... ● Don't use the Raw Thrift API! ● You won't enjoy it ● Use high level Client APIs ● Many options for each language
  • 21. Consistency Levels? ● Eventual consistency ● Synch to Washington, asynch to Hong Kong ● Client API Tunables ● Synchronously write to W replicas ● Confirm R replicas match at read time ● of N total replicas ● Allows for almost-strong consistency ● When W + R > N
  • 22. Write Example Replication Factor == N == 3: 3 Copies
  • 23. Write Example Client connects to arbitrary node
  • 24. Write Example cl.ONE: W == 1 Block for success on 1 replica
  • 25. Write Example cl.QUORUM: W == N/2+1 Block for success on a majority
  • 26. Caveat consumptor ● No secondary indexes: ● Typically implemented in client libraries ● No transactions ● But atomic increment/decrement RSN ● Absolutely no joins ● You don't really want 'em anyway
  • 27. “That doesn't sound worth the trouble!"
  • 28. Cassandra Ruby Support: Cassandra Object ● Mostly duck-type compatible with ActiveRecord objects ● Transparently builds (non-)unique secondary indexes ● Excludes: – :order – :conditions – :join – :group
  • 29. Cassandra Ruby Support: RDF.rb ● Repository implementation for RDF.rb ● Stores triple of (subject, predicate, object) as (rowkey, name, subname)
  • 30. Silver linings: Ops ● Dead drive? ● Swap the drive, restart, run 'repair' ● Streams missing data from other replicas ● Dead node? ● Start a new node with the same IP and token, run 'repair'
  • 31. Silver linings: Ops ● Need N new nodes? ● Start more nodes with the same config file ● New nodes request load information from the cluster and join with a token that balances the cluster
  • 32. Silver linings: Ops ● Adding a datacenter? ● Configure “dc/rack/ip” describing node location ● Add new nodes as before
  • 34. Getting started ● `gem install cassandra` ● `git clone git://github.com/tritonrc/cassandra_object.git` ● http://cassandra.apache.org/ ● Read "Getting Started"... Roughly: – Start one node – Test/develop app, editing node config as necessary – Launch cluster by starting more nodes with chosen config
  • 36. Resources ● http://cassandra.apache.org/ ● http://wiki.apache.org/cassandra/ ● Mailing Lists ● #cassandra on freenode.net
  • 37. References ● Digg Technology Blog ● http://about.digg.com/blog/looking-future-cassandra ● http://about.digg.com/blog/introducing-digg’s-iddb-infrastructure ● Github Projects ● http://github.com/tritonrc/cassandra_object ● http://github.com/bendiken/rdf-cassandra ● Cassandra Wiki ● http://wiki.apache.org/cassandra/ ● Brandon William's perf tests ● http://racklabs.com/~bwilliam/cassandra/04vs05vs06.png