2. Where Did Cassandra Come From
• Cassandra originated at Facebook in 2007 to
solve that company’s inbox search problem
– large volumes of data
– many random reads
– many simultaneous random writes
• was released as an open source Google Code
project in July 2008
• March 2009 it was moved to an Apache Incubator
project
• February 17, 2010 it was voted into a top-level
project
3. Cassandra in 50 Words or Less
• Apache Cassandra is an
– open source
– distributed
– Decentralized
– elastically scalable
– highly available
– fault-tolerant
– tuneably consistent
– column-oriented
• Database that
• bases its distribution design on Amazon’s Dynamo
• its data model on Google’s Bigtable
• Created at Facebook
• it is now used at some of the most popular sites on the Web
4. Who Is Using Cassandra
• Twitter is using Cassandra for analytics.
• Mahalo uses it for its primary near-time data store.
• Facebook still uses it for inbox search, though they are using a
proprietary fork.
• Digg uses it for its primary near-time data store.
• Rackspace uses it for its cloud service, monitoring, and logging.
• Reddit uses it as a persistent cache.
• Cloudkick uses it for monitoring statistics and analytics.
• Ooyala uses it to store and serve near real-time video analytics
data.
• SimpleGeo uses it as the main data store for its real-time location
infrastructure.
• Onespot uses it for a subset of its main data store
5. Decentralized
• Master/slave:
Decentralized Master/slave
all nodes are the same, If the master node fails, the
failures of a whole database is in jeopardy
node won’t disrupt service
8. SCID
• Atomic
– All or nothing
• Consistent
• Isolated
– Two transaction modify same data
• Durable
9. Brewer’s CAP Theorem
• you can strongly support only two of the Three:
– Consistency
• All database client will read the same value for same query,
even given concurrent updates
– Availability
• All database clients will always be able to read and write
data
– Partition Tolerance
• The database can be split into multiple machines
• It can continue functioning in fact of network segmentation
breaks
17. Clusters (Ring)
• If the first node goes down, a replica can
respond to queries. The peer-to-peer protocol
allows the data to replicate across nodes in a
manner transparent to the user
• Replaction factor
19. Gossip protocols
• intra-ring communication so that each node
can have state information about other nodes
• Runs every second
• Gossip Message:
– Send: GossipDigestSynMessage
– Ack: GossipDigestAckMessage
– send: GossipDigestAck2Message
• algorithm :
– Phi Accrual Failure Detection
20. Anti-entropy
• Anti-entropy is the replica synchronization
mechanism in Cassandra for ensuring that
data on different nodes is updated to the
newest version
• Merkle tree
21. Memtable&SSTable&CommitLog
• Memtable
– Value is written to a memory-resident data structure
• SSTable
– Include: Data, Index, and Filter
– concept borrowed from Google’s Bigtable
– Memtable reaches a threshold, flushed to disk
• Commit log
– Flush status: 0 / 1
• 1:start to flush
• 0: flush success
22. hinted handoff & Compaction
• hinted handoff
– When a write no available
– Create a hint to node Cassandra
• Compaction:
– In order to merge SSTable
– merged data is sorted
– new index is created over the sorted data
23. major compaction
• stored in memory
• used to improve performance by reducing disk
access on key lookups
24. Tombstones 墓碑
• Knows as “soft delete”
• Not immediately deleted after execute a
delete operation
• Garbage Collection Grace Seconds:
– GCGraceSeconds
• Default: 10 days (864000 sec)
25. Staged Event-Driven Architecture
(SEDA)
• originally proposed in a 2001 paper called “SEDA: An
Architecture for Well-Conditioned, Scalable Internet
Services”
• A stage consists of an incoming event queue
– Read
– Mutation
– Gossip
– Response
– Anti-Entropy
– Load Balance
– Migration
– Streaming
– …
27. Configuring Cassandra
• system_add_keyspace
– Creates a keyspace.
• system_rename_keyspace
– Changes the name of a keyspace after taking a snapshot of it. Note that this
method
– blocks until its work is done.
• system_drop_keyspace
– Deletes an entire keyspace after taking a snapshot of it.
• system_add_column_family
– Creates a column family.
• system_drop_column_family
– Deletes a column family after taking a snapshot of it.
• system_rename_column_family
– Changes the name of a column family after taking a snapshot of it. Note that
this
– method blocks until its work is done.
28. Creating a Column Family
• column_type
– Either Super or Standard.
• clock_type
– The only valid value is Timestamp.
• comparator
– Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type.
• subcomparator
– Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator.
• reconciler
– Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp.
• comment
– Any human-readable comment in the form of a string.
• rows_cached
– The number of rows to cache.
• preload_row_cache
– Set this to true to automatically load the row cache.
• key_cache_size
– The number of keys to pull into the cache.
• read_repair_chance
– Valid values are a number between 0.0 and 1.0.
30. Replication Factor
• specifies how many copies of each piece of
data will be stored and distributed throughout
the Cassandra cluster
• Factor = 1 : your data will exist only in a single
node in the cluster. Losing that node means
that data becomes unavailable
31. Increasing the Replication Factor
• Nodes grows and should increasing factor
• How to do:
– ensure that all the data is flushed to the SSTables
• flush -h 192.168.1.1 -p 9160
– stop that node
– copy the datafiles from your keyspaces
– Paste those datafiles to the new node
33. Adding Nodes to a Cluster
• If you want to add a new seed node, then you should
autobootstrap it first, and then change it to a seed
afterward
• Node1:
– listen_address: 192.168.1.1
– rpc_address: 0.0.0.0
• Node2:
– auto_bootstrap: true
– listen_address: 192.168.2.34
– rpc_address: 0.0.0.0