The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
http://tyfs.rocks
4. Cassandra Architecture – CAP Theorem
tyfs.rocks 426.07.2017
Cassandra was designed to fall in the “AP” intersection of
the CAP theorem that states that any distributed system can
only guarantee two of the following capabilities at same time;
Consistency, Availability and Partition Tolerance. In this way
Cassandra is a best fit for a solution seeking a distributed
database that brings high availability to a system and is also very
tolerant to partition to its data when some node in the cluster is
offline, which is common in distributed systems.
5. Cassandra Architecture – Data Model
tyfs.rocks 526.07.2017
Cassandra is classified as a column based database, which means that its
basic structure to store data is based upon a set of columns, which are
comprised, by a pair of column key and column value. Every row is identified
by a unique key, a string without a size limit, called partition key. Each set of
columns are called column families, similar to a relational database table.
6. Cassandra Architecture – Data Model
tyfs.rocks 626.07.2017
SortedMap<RowKey,SortedMap<ColumnKey, ColumnValue>>
A map gives efficient key lookup, and the sorted nature gives efficient scans. In Cassandra, we can use row keys and column
keys to do efficient lookups and range scans.
The number of column keys is unbounded. This means, you can have wide rows.
A key can itself hold a value, meaning In other words, you can have a valueless column.
7. Cassandra Architecture – Write Path
tyfs.rocks 726.07.2017
Cassandra Write Path
Every node first writes the mutation to the commit log
and then writes the mutation to the memtable.
Writing to the commit log ensures durability of the write
as the memtable is an in-memory structure and is only
written to disk when the memtable is flushed to disk. A
memtable is flushed to disk when:
• It reaches its maximum allocated size in memory
• The number of minutes a memtable can stay in
memory elapses.
• Manually flushed by a user
A memtable is flushed to an immutable structure called
and SSTable (Sorted String Table). The commit log is used
for playback purposes in case data from the memtable is
lost due to node failure.
Every SSTable creates three files on disk which include a
bloom filter, a key index and a data file.
8. Cassandra Architecture – Read Path
tyfs.rocks 826.07.2017
Cassandra Read Path
Every Column Family stores data in a number of
SSTables. Thus Data for a particular row can be located in
a number of SSTables and the memtable. Thus for every
read request Cassandra needs to read data from all
applicable SSTables ( all SSTables for a column family)
and scan the memtable for applicable data fragments.
This data is then merged and returned to the
coordinator.
If the contacted replicas has a different version of the
data the coordinator returns the latest version to the
client and issues a read repair command to the
node/nodes with the older version of the data. The read
repair operation pushes the newer version of the data to
nodes with the older version.
9. Cassandra Architecture – Cluster Topology
tyfs.rocks 926.07.2017
Cluster Concepts
a node is a cassandra instance (in
production: one node per machine)
a partition is one ordered and replicable
unit of data on a node
a rack is a logical set of nodes
a Data Center is a logical set or racks
Cluster is the full set of nodes which
map to a single complete token ring
peer-to-peer communication gossip
protocol
10. Cassandra Architecture – Data Consistency
tyfs.rocks 1026.07.2017
Tunable Data Consistency
How many nodes must acknowledge a
read/write request
choose between STRONG to
EVENTUAL
possible CL: ANY, ONE, QUORUM
(RF/2+1), ALL
tunable per request support
multi-datacenter support
11. Cassandra Architecture – CQL Language
tyfs.rocks 1126.07.2017
Cassandra Query Language
very similar to RDBMS SQL syntax
create objects via DDL
core DML commands insert,
update, delete supported
query data with Select commands
12. Cassandra Architecture – Security
tyfs.rocks 1226.07.2017
Cassandra Security Features
Authentication based on internally
controlled rolename/passwords
Authorization based on object
permission management
Authentication and authorization
based on JMX
username/passwords
SSL encryption
13. Why Cassandra ?
tyfs.rocks 1326.07.2017
• Scales linearly with massive write
Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• Highly Fault Tolerant
Masterless cluster with no single point of failure. In simple terms, your users will never know if a server, an entire rack
of servers, or even if an entire data center fails. There is also the potential for zero downtime rolling upgrades.
• Easy Replication / Data Distribution
• Homogenous Environment
No master-slave or sharding setup and that all nodes in the ring are equal.
• Ease of Administration
Masterless, fault-tolerant, supports temporary loss of nodes with minimal impact to production performance.
• Wide Community
No master-slave or sharding setup and that all nodes in the ring are equal.
14. Use Cases of Cassandra
tyfs.rocks 1426.07.2017
• Messaging & Event Sourcing
Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• IoT & High Speed Applications
Cassandra can handle the high speed data so it is a great database for the applications where data is coming at very
high speed from different devices or sensors.
• Product Catalogs and Retail Apps
Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.
• Social Media Analytics & Recommendations
Cassandra is a great database for many online companies and social media providers for analysis and
recommendation to their customers.
15. Cassandra for Akka Persistence
tyfs.rocks 1526.07.2017
• Linear scalability
Expected Massive Load
• No SPOF
Fault-tolerant, Resilient
• Always-On Multi-Data Center
Data Distribution & Replication
Cluster over Multi-Data Centers
• AKKA Persistence
CQRS with Event-Sourcing
Akka’s supported up to date plugin
(Lightbend)
• Akka Streams
Batch Processing over Streaming
16. Cassandra Benchmarks
tyfs.rocks 1626.07.2017
University of TORONTO, NoSQL Database Performance Benchmarks, 2012
Write latency for workload read/write
Throughput for workload read/scan/write
Read latency for workload read/write
Throughput for workload read/write
20. Resources
tyfs.rocks 2026.07.2017
• Apache Cassandra Web Site
• Planet Cassandra Community
• DataStax Web Site
• The Distributed Architecture Behind Apache Cassandra, Bruno TINOCO
• Introduction to Apache Cassandra's Architecture, Akhil Mehra
• An Overview of Apache Cassandra, DataStax
• NoSQL Performance Benchmarks, DataStax
• Top 10 Reasons to Use Cassandra, Michael COLBY
• Security in Cassandra, IBM Developer Works
Notas do Editor
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Each node processes the request individually. Every node first writes the mutation to the commit log and then writes the mutation to the memtable. Writing to the commit log ensures durability of the write as the memtable is an in-memory structure and is only written to disk when the memtable is flushed to disk. A memtable is flushed to disk when:
It reaches its maximum allocated size in memory
The number of minutes a memtable can stay in memory elapses.
Manually flushed by a user
A memtable is flushed to an immutable structure called and SSTable (Sorted String Table). The commit log is used for playback purposes in case data from the memtable is lost due to node failure. For example the machine has a power outage before the memtable could get flushed. Every SSTable creates three files on disk which include a bloom filter, a key index and a data file. Over a period of time a number of SSTables are created. This results in the need to read multiple SSTables to satisfy a read request. Compaction is the process of combining SSTables so that related data can be found in a single SSTable. This helps with making reads much faster.
At the cluster level a read operation is similar to a write operation. As with the write path the client can connect with any node in the cluster. The chosen node is called the coordinator and is responsible for returning the requested data. A row key must be supplied for every read operation. The coordinator uses the row key to determine the first replica. The replication strategy in conjunction with the replication factor is used to determine all other applicable replicas. As with the write path the consistency level determines the number of replica's that must respond before successfully returning data. Let's assume that the request has a consistency level of QUORUM and a replication factor of three, thus requiring the coordinator to wait for successful replies from at least two nodes. If the contacted replicas has a different version of the data the coordinator returns the latest version to the client and issues a read repair command to the node/nodes with the older version of the data. The read repair operation pushes the newer version of the data to nodes with the older version.
On a per SSTable basis the operation becomes a bit more complicated. The illustration above outlines key steps that take place when reading data from an SSTable. Every SSTable has an associated bloom filter which enables it to quickly ascertain if data for the requested row key exists on the corresponding SSTable. This reduces IO when performing an row key lookup. A bloom filter is always held in memory since the whole purpose is to save disk IO. Cassandra also keeps a copy of the bloom filter on disk which enables it to recreate the bloom filter in memory quickly . Cassandra does not store the bloom filter Java Heap instead makes a separate allocation for it in memory. If the bloom filter returns a negative response no data is returned from the particular SSTable. This is a common case as the compaction operation tries to group all row key related data into as few SSTables as possible. If the bloom filter provides a positive response the partition key cache is scanned to ascertain the compression offset for the requested row key. It then proceeds to fetch the compressed data on disk and returns the result set. If the partition cache does not contain a corresponding entry the partition key summary is scanned. The partition summary is a subset to the partition index and helps determine the approximate location of the index entry in the partition index. The partition index is then scanned to locate the compression offset which is then used to find the appropriate data on disk. If you reached the end of this long post then well done. In this post I have provided an introduction to Cassandra architecture. In my upcoming posts I will try and explain Cassandra architecture using a more practical approach.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Cassandra was designed to fall in the “AP” intersection of the CAP theorem that states that any distributed system can only guarantee two of the following capabilities at same time; Consistency, Availability and Partition tolerance. In this way Cassandra is a best fit for a solution seeking a distributed database that brings high availability to a system and is also very tolerant to partition to its data when some node in the cluster is offline, which is common in distributed systems.
Authentication based on internally controlled rolename/passwordsCassandra authentication is roles-based and stored internally in Cassandra system tables. Administrators can create, alter, drop, or list roles using CQL commands, with an associated password. Roles can be created with superuser, non-superuser, and login privileges. The internal authentication is used to access Cassandra keyspaces and tables, and by cqlsh and DevCenter to authenticate connections to Cassandra clusters and sstableloader to load SSTables.
Authorization based on object permission managementAuthorization grants access privileges to Cassandra cluster operations based on role authentication. Authorization can grant permission to access the entire database or restrict a role to individual table access. Roles can grant authorization to authorize other roles. Roles can be granted to roles. CQL commands GRANT and REVOKE are used to manage authorization.
Authentication and authorization based on JMX username/passwordsJMX (Java Management Extensions) technology provides a simple and standard way of managing and monitoring resources related to an instance of a Java Virtual Machine (JVM). This is achieved by instrumenting resources with Java objects known as Managed Beans (MBeans) that are registered with an MBean server. JMX authentication stores username and associated passwords in two files, one for passwords and one for access. JMX authentication is used by nodetool and external monitoring tools such as jconsole.In Cassandra 3.6 and later, JMX authentication and authorization can be accomplished using Cassandra's internal authentication and authorization capabilities.
SSL encryptionCassandra provides secure communication between a client and a database cluster, and between nodes in a cluster. Enabling SSL encryption ensures that data in flight is not compromised and is transferred securely. Client-to-node and node-to-node encryption are independently configured. Cassandra tools (cqlsh, nodetool, DevCenter) can be configured to use SSL encryption. The DataStax drivers can be configured to secure traffic between the driver and Cassandra.
General security measuresTypically, production Cassandra clusters will have all non-essential firewall ports closed. Some ports must be open in order for nodes to communicate in the cluster. These ports are detailed.
Goals for the Tests
Select workloads that are typical of today’s modern applications
Use data volumes that are representative of ‘big data’ datasets that exceed the RAM capacity for each node
Ensure that all data written was done in a manner that allowed no data loss (i.e. durable writes), which is what most production environments require
Tested Workloads
The following workloads were included in the benchmark:
Read-mostly workload, based on YCSB’s provided workload B: 95% read to 5% update ratio
Read/write combination, based on YCSB’s workload A: 50% read to 50% update ratio
Read-modify-write, based on YCSB workload F: 50% read to 50% read-modify-write
Mixed operational and analytical: 60% read, 25% update, 10% insert, and 5% scan
Insert-mostly combined with read: 90% insert to 10% read ratio