SlideShare a Scribd company logo
1 of 37
On Rails with Apache Cassandra


                 Austin on Rails
                  April 27th 2010
  Stu Hood (@stuhood) – Technical Lead, Rackspace
My, what a large/volatile dataset you
               have!
●   Large
    ●   Larger than 1 node can handle
●   Volatile
    ●   More than 25% (ish) writes
    ●   (but still larger than available memory)
●   Expensive
    ●   More than you can afford with a commercial
        solution
My, what a large/volatile dataset you
               have!
●   For example:
    ●   Event/log data
    ●   Output of batch processing or log analytics jobs
    ●   Social network relationships/updates
●   In general:
    ●   Large volume of high fanout data
Conversely...
●   If your pattern easily fits one RDBMS machine:
    ●   Don't Use Cassandra
    ●   Possibly consider MongoDB, CouchDB, Neo4j,
        Redis, etc
        –   For schema freedom and flexibility
Case Study: Digg
1.Vertical partitioning and master/slave trees
2.Developed sharding solution
  ●   IDDB
  ●   Awkward replication, fragile scaling
3.Began populating Cassandra in parallel
  ●   Initial dataset for 'green badges'
      –   3 TB
      –   76 billion kv pairs
  ●   Most applications being ported to Cassandra
Cassandra's Elders
Standing on the shoulders of:
             Amazon Dynamo
●   No node in the cluster is special
    ●   No special roles
    ●   No scaling bottlenecks
    ●   No single point of failure
●   Techniques
    ●   Gossip
    ●   Eventual consistency
Standing on the shoulders of:
              Google Bigtable
●   “Column family” data model
●   Range queries for rows:
    ●   Scan rows in order
●   Memtable/SSTable structure
    ●   Always writes sequentially to disk
    ●   Bloom filters to minimize random reads
    ●   Trounces B-Trees for big data
        –   Linear insert performance
        –   Log growth for reads
Enter Cassandra
●   Hybrid of ancestors
    ●   Adopts listed features
●   And adds:
    ●   Pluggable partitioning
    ●   Multi datacenter
        support
        –   Pluggable locality
            awareness
    ●   Datamodel
        improvements
Enter Cassandra
●   Project status
    ●   Open sourced by Facebook in 2008 (no longer active)
    ●   Apache License, Version 2.0
    ●   Graduated to Apache TLP February 2010
    ●   Major releases: 0.3 through 0.6.1 (0.7 this summer)
●   cassandra.apache.org
●   Known deployments at:
    ●   Cloudkick, Digg, Mahalo, SimpleGeo, Twitter,
        Rackspace, Reddit
The Datamodel
Cluster



                           Nodes have Tokens:
                     OrderPreservingPartitioner:
                                    Actual keys
                            RandomPartitioner:
                                  MD5s of keys
The Datamodel
Cluster >   Keyspace



                              Like an RDBMS schema:
                              Keyspace per application
The Datamodel
Cluster > Keyspace >   Column Family




              Sorted hash:
             Bytes → Row                      Like an RDBMS table:
                                       Separates classes of Objects
           Row Key → Row
The Datamodel
Cluster > Keyspace > Column Family >   Row




                                             Sorted hash: Name → Value
                                                        ...
The Datamodel
Cluster > Keyspace > Column Family > Row >   “Column”

                                                Not like an RDBMS column:
                                          Attribute of the row: each row can
                                        contain millions of different columns


                                                               …
                                                       Name → Value
                                                          bytes → bytes

                                                     +version timestamp
StatusApp: another Twitter clone.
StatusApp Example
<ColumnFamily Name=”Users”>
●   Unique id as key: name->value pairs contain
    user attributes
{key: “rails_user”, row: {“fullname”: “Damon
Clinkscales”, “joindate”: “back_in_the_day” … }}
StatusApp Example
<ColumnFamily Name=”Timelines”>
●   User id and timeline name as key: row contains
    list of updates from that timeline
{key: “user19:personal”, row: {<timeuuid1>:
“status19”, <timeuuid2>: “status21”, … }}
Raw Client API
●   Thrift RPC framework
    ●   Generates client bindings for (almost) any language


1. Get the most recent status in a timeline:
●   get_slice(keyspace, key, [column_family,
    column_name], predicate, consistency_level)
●   get_slice(“statusapp”, “userid19:personal”,
    [“Timelines”], {start: ””, count: 1}, QUORUM)
> <timeuuid1>: “status19”
But...
●   Don't use the Raw Thrift API!
    ●   You won't enjoy it
●   Use high level Client APIs
    ●   Many options for each language
Consistency Levels?
●   Eventual consistency
    ●   Synch to Washington, asynch to Hong Kong
●   Client API Tunables
    ●   Synchronously write to W replicas
    ●   Confirm R replicas match at read time
    ●   of N total replicas
●   Allows for almost-strong consistency
    ●   When W + R > N
Write Example




          Replication Factor == N == 3:
                              3 Copies
Write Example




         Client connects to arbitrary node
Write Example




                                cl.ONE:
                                W == 1
          Block for success on 1 replica
Write Example




                           cl.QUORUM:
                            W == N/2+1
          Block for success on a majority
Caveat consumptor
●   No secondary indexes:
    ●   Typically implemented in client libraries
●   No transactions
    ●   But atomic increment/decrement RSN
●   Absolutely no joins
    ●   You don't really want 'em anyway
“That doesn't sound worth the
          trouble!"
Cassandra Ruby Support:
               Cassandra Object
●   Mostly duck-type compatible with ActiveRecord
    objects
    ●   Transparently builds (non-)unique secondary
        indexes
    ●   Excludes:
        –   :order
        –   :conditions
        –   :join
        –   :group
Cassandra Ruby Support: RDF.rb
●   Repository implementation for RDF.rb
    ●   Stores triple of (subject, predicate, object) as
        (rowkey, name, subname)
Silver linings: Ops
●   Dead drive?
    ●   Swap the drive, restart, run 'repair'
    ●   Streams missing data from other replicas
●   Dead node?
    ●   Start a new node with the same IP and token, run
        'repair'
Silver linings: Ops
●   Need N new nodes?
    ●   Start more nodes with the same config file
    ●   New nodes request load information from the
        cluster and join with a token that balances the
        cluster
Silver linings: Ops
●   Adding a datacenter?
    ●   Configure “dc/rack/ip” describing node location
    ●   Add new nodes as before
Silver linings: Performance
Getting started
●   `gem install cassandra`
●   `git clone
    git://github.com/tritonrc/cassandra_object.git`
●   http://cassandra.apache.org/
    ●   Read "Getting Started"... Roughly:
        –   Start one node
        –   Test/develop app, editing node config as necessary
        –   Launch cluster by starting more nodes with chosen config
Questions?
Resources
●   http://cassandra.apache.org/
●   http://wiki.apache.org/cassandra/
●   Mailing Lists
●   #cassandra on freenode.net
References
●   Digg Technology Blog
    ●   http://about.digg.com/blog/looking-future-cassandra
    ●   http://about.digg.com/blog/introducing-digg’s-iddb-infrastructure
●   Github Projects
    ●   http://github.com/tritonrc/cassandra_object
    ●   http://github.com/bendiken/rdf-cassandra
●   Cassandra Wiki
    ●   http://wiki.apache.org/cassandra/
●   Brandon William's perf tests
    ●   http://racklabs.com/~bwilliam/cassandra/04vs05vs06.png

More Related Content

What's hot

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Scalable PHP Applications With Cassandra
Scalable PHP Applications With CassandraScalable PHP Applications With Cassandra
Scalable PHP Applications With CassandraAndrea De Pirro
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationJoyabrata Das
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)gdusbabek
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsJeremy Hanna
 

What's hot (20)

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Scalable PHP Applications With Cassandra
Scalable PHP Applications With CassandraScalable PHP Applications With Cassandra
Scalable PHP Applications With Cassandra
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 

Similar to On Rails with Apache Cassandra

Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemAdam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem yarapavan
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandrashimi_k
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 

Similar to On Rails with Apache Cassandra (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra
CassandraCassandra
Cassandra
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

On Rails with Apache Cassandra

  • 1. On Rails with Apache Cassandra Austin on Rails April 27th 2010 Stu Hood (@stuhood) – Technical Lead, Rackspace
  • 2. My, what a large/volatile dataset you have! ● Large ● Larger than 1 node can handle ● Volatile ● More than 25% (ish) writes ● (but still larger than available memory) ● Expensive ● More than you can afford with a commercial solution
  • 3. My, what a large/volatile dataset you have! ● For example: ● Event/log data ● Output of batch processing or log analytics jobs ● Social network relationships/updates ● In general: ● Large volume of high fanout data
  • 4. Conversely... ● If your pattern easily fits one RDBMS machine: ● Don't Use Cassandra ● Possibly consider MongoDB, CouchDB, Neo4j, Redis, etc – For schema freedom and flexibility
  • 5. Case Study: Digg 1.Vertical partitioning and master/slave trees 2.Developed sharding solution ● IDDB ● Awkward replication, fragile scaling 3.Began populating Cassandra in parallel ● Initial dataset for 'green badges' – 3 TB – 76 billion kv pairs ● Most applications being ported to Cassandra
  • 7. Standing on the shoulders of: Amazon Dynamo ● No node in the cluster is special ● No special roles ● No scaling bottlenecks ● No single point of failure ● Techniques ● Gossip ● Eventual consistency
  • 8. Standing on the shoulders of: Google Bigtable ● “Column family” data model ● Range queries for rows: ● Scan rows in order ● Memtable/SSTable structure ● Always writes sequentially to disk ● Bloom filters to minimize random reads ● Trounces B-Trees for big data – Linear insert performance – Log growth for reads
  • 9. Enter Cassandra ● Hybrid of ancestors ● Adopts listed features ● And adds: ● Pluggable partitioning ● Multi datacenter support – Pluggable locality awareness ● Datamodel improvements
  • 10. Enter Cassandra ● Project status ● Open sourced by Facebook in 2008 (no longer active) ● Apache License, Version 2.0 ● Graduated to Apache TLP February 2010 ● Major releases: 0.3 through 0.6.1 (0.7 this summer) ● cassandra.apache.org ● Known deployments at: ● Cloudkick, Digg, Mahalo, SimpleGeo, Twitter, Rackspace, Reddit
  • 11. The Datamodel Cluster Nodes have Tokens: OrderPreservingPartitioner: Actual keys RandomPartitioner: MD5s of keys
  • 12. The Datamodel Cluster > Keyspace Like an RDBMS schema: Keyspace per application
  • 13. The Datamodel Cluster > Keyspace > Column Family Sorted hash: Bytes → Row Like an RDBMS table: Separates classes of Objects Row Key → Row
  • 14. The Datamodel Cluster > Keyspace > Column Family > Row Sorted hash: Name → Value ...
  • 15. The Datamodel Cluster > Keyspace > Column Family > Row > “Column” Not like an RDBMS column: Attribute of the row: each row can contain millions of different columns … Name → Value bytes → bytes +version timestamp
  • 17. StatusApp Example <ColumnFamily Name=”Users”> ● Unique id as key: name->value pairs contain user attributes {key: “rails_user”, row: {“fullname”: “Damon Clinkscales”, “joindate”: “back_in_the_day” … }}
  • 18. StatusApp Example <ColumnFamily Name=”Timelines”> ● User id and timeline name as key: row contains list of updates from that timeline {key: “user19:personal”, row: {<timeuuid1>: “status19”, <timeuuid2>: “status21”, … }}
  • 19. Raw Client API ● Thrift RPC framework ● Generates client bindings for (almost) any language 1. Get the most recent status in a timeline: ● get_slice(keyspace, key, [column_family, column_name], predicate, consistency_level) ● get_slice(“statusapp”, “userid19:personal”, [“Timelines”], {start: ””, count: 1}, QUORUM) > <timeuuid1>: “status19”
  • 20. But... ● Don't use the Raw Thrift API! ● You won't enjoy it ● Use high level Client APIs ● Many options for each language
  • 21. Consistency Levels? ● Eventual consistency ● Synch to Washington, asynch to Hong Kong ● Client API Tunables ● Synchronously write to W replicas ● Confirm R replicas match at read time ● of N total replicas ● Allows for almost-strong consistency ● When W + R > N
  • 22. Write Example Replication Factor == N == 3: 3 Copies
  • 23. Write Example Client connects to arbitrary node
  • 24. Write Example cl.ONE: W == 1 Block for success on 1 replica
  • 25. Write Example cl.QUORUM: W == N/2+1 Block for success on a majority
  • 26. Caveat consumptor ● No secondary indexes: ● Typically implemented in client libraries ● No transactions ● But atomic increment/decrement RSN ● Absolutely no joins ● You don't really want 'em anyway
  • 27. “That doesn't sound worth the trouble!"
  • 28. Cassandra Ruby Support: Cassandra Object ● Mostly duck-type compatible with ActiveRecord objects ● Transparently builds (non-)unique secondary indexes ● Excludes: – :order – :conditions – :join – :group
  • 29. Cassandra Ruby Support: RDF.rb ● Repository implementation for RDF.rb ● Stores triple of (subject, predicate, object) as (rowkey, name, subname)
  • 30. Silver linings: Ops ● Dead drive? ● Swap the drive, restart, run 'repair' ● Streams missing data from other replicas ● Dead node? ● Start a new node with the same IP and token, run 'repair'
  • 31. Silver linings: Ops ● Need N new nodes? ● Start more nodes with the same config file ● New nodes request load information from the cluster and join with a token that balances the cluster
  • 32. Silver linings: Ops ● Adding a datacenter? ● Configure “dc/rack/ip” describing node location ● Add new nodes as before
  • 34. Getting started ● `gem install cassandra` ● `git clone git://github.com/tritonrc/cassandra_object.git` ● http://cassandra.apache.org/ ● Read "Getting Started"... Roughly: – Start one node – Test/develop app, editing node config as necessary – Launch cluster by starting more nodes with chosen config
  • 36. Resources ● http://cassandra.apache.org/ ● http://wiki.apache.org/cassandra/ ● Mailing Lists ● #cassandra on freenode.net
  • 37. References ● Digg Technology Blog ● http://about.digg.com/blog/looking-future-cassandra ● http://about.digg.com/blog/introducing-digg’s-iddb-infrastructure ● Github Projects ● http://github.com/tritonrc/cassandra_object ● http://github.com/bendiken/rdf-cassandra ● Cassandra Wiki ● http://wiki.apache.org/cassandra/ ● Brandon William's perf tests ● http://racklabs.com/~bwilliam/cassandra/04vs05vs06.png