In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
There are multiple relational databases available in market and most of them are well equipped to store huge amount of structured data and provide easy access to read and manipulate that data.
All relational databases have good transaction control mechanism and provides concurrency.
Almost all relation databases has very good security management and different levels of access controls.
Relational databases evolved over a period of time and implemented a lot of functionalities based on user needs.
SQL which is used across different relational databases helps in making the data access standard across multiple relational databases though there would be minimal changes in the syntax
Key concepts of relational databases like Tables, Joins, Indexing, Clustering remains almost same across multiple relational databases.
Though relational databases have a lot of advantages, they are not perfect
Support for clusters – With increase in volume of data, many organizations moved to cluster of small machines instead of big servers to support the data and traffic. Relational databases are not designed to run on clusters.
Cost – Most of the NoSQL databases are open source and for the same reason the cost incurred for a NoSQL implementation is less compared to the leading relational databases available in market.
Impedance mismatch – It is the difference between the structure of data present in the code / in-memory Vs the structure in database. In the high level language, if we are using a complex data structure for e.g. map or list, we cannot directly port that to a relational database. We have to translate that to a format that a relational database can understand where as most NoSQL databases provide complex data types
Adaptability to newer workload / data – Popularity of mobile and social media started pumping in huge amount of data which doesn’t gel well with relational databases. Increased need of real-time analytics is one another major workload where NoSQL is preferred
Gartner’s magic quadrant clearly indicates that Cassandra is one of the prominent choices for organizations migrating from RDBMS to NoSQL. The report categorizes Cassandra as ‘Visionary’ and calls out the major differentiating factors which helped the customers
At the same time, one of the drawbacks mentioned in the report is that 53% of the respondents who evaluated DataStax did not select it due to poor performance during POC testing. It is very surprising that ‘Strengths’ has ‘High performance’ and ‘Cautions’ has ‘Poor performance’ listed.
There are two major put falls.
Most of the POCs are done quick and dirty and there was not enough time spend on capacity planning and performance tuning during POCs. These play a crucial role in identifying whether a POC is successful or not
In some cases, organizations move to prod after functionality testing and very minimal performance testing assuming that things will happen the same way in production. Because of the scale and volume in production, the solution may not work. Enough effort need to be spent on tuning the application with comparable or equal volume before being implemented in production.
Before concluding the POC as unsuccessful, we need to make sure that all possible tuning techniques are tried out. The tuning can be done in different areas such as Data modeling, Integration and database parameters
Tuning of an application takes place in various stages. In starts from the data modeling, sizing activities, continues to the coding and integrations steps. And many tuning techniques need to be tried out during the performance testing by adjusting several DB parameters
During the data modeling, unfortunately we’ll have to unlearns some of the basics we had from RDBMS world. Unlike RDBMS modeling, Cassandra data modeling is heavily dependent on access query patterns. Data redundancy is very much acceptable to cater to different read use cases. De-normalize the data wherever needed since there are no joins. Client side joins and round trips are very costly and should try to avoid the same. Adding a secondary index to better the read performance is very common in RDBMS world but it won’t work the same way in Cassandra because of the distributed nature. Adding too many secondary indexes and using them in queries can harm query performance considerably.
Selecting the right partitioning key is very crucial. We need to be well aware of data to make sure that the data is uniformly distributed across different partitions without which we can’t expect the best performance. Unlike RDBMS systems, one the Primary is decided (Partitioning and Clustering) we have little control over the range of data each partition is going to store since it is completely dependent on the hash key being generated
Adding a secondary index to better the read performance is very common in RDBMS world but it won’t work the same way in Cassandra because of the distributed nature. Adding too many secondary indexes and using them in queries can harm query performance considerably.
Use ‘WITH CLUSTERING’ and ‘ORDER BY’ clauses while creating tables to order the data based on the read pattern. In case of time series data, this is beneficial and provides very faster retrieves. This makes data to be physically stored based on the retrieve pattern.
Anyone from RDBMS background would assume that batching of queries will better the performance. But in Cassandra, it is contrary. Batching can really harm the performance since it overloads the coordinator node with all the requests.
In the given example when a ‘Batch’ insert is done, all the 3 queries in the batch will hit the same coordinator node. Once coordinator node identifies the nodes corresponding to the partition key, the queries will be redirected accordingly. When the inserts are done individually, based on the load balancing policy mentioned, the requests will be sent to multiple nodes without overloading a single node.
One of the main advantage of Cassandra is tunable consistency. Depending on the type of workload and the need for consistency, we can adjust the consistency to get the optimal results. The thumb rule is to reduce the activity done by coordinator node for the most probable use case.
In case of a read heavy application, we would obviously want all the reads to happen very fast and with least overhead on the coordinator node. In such cases, going with a RC of ONE will quickly send back the response and the overhead for the coordinator is very less. But to have immediately consistent data, we need to make sure that there is an overlap of at least one node between read and write. So in this use case, we will have to go with WR of ALL to make sure that data is updated in all nodes before a successful response for write transaction. In such cases, during writes, coordinator node will have the overhead of making sure that data is updated in all replicas
A load balancing policy will determine which node it is to run a query. Since a client can read or write to any node, sometimes that can be inefficient. If a node receives a read or write owned on another node, it will coordinate that request for the client. We can use a load balancing policy to control that action. The TokenAwarePolicy ensures that the request will go to the node or replica responsible for the data indicated by the primary key. It is wrapped around DCAwareRoundRobinPolicy to make sure the requests stay in the local datacenter. This is a good choice for use cases where there are multiple data centers and requests are coming from all different geographical locations. When the user requests are not coming from distributed locations, analyze in detail on the impact on using DCAware.
Querying data which has columns with tombstone set can bring down the performance
Need to extra cautious while dealing with applications having high volume of delete use cases
Do not insert NULL to Cassandra. This sets tombstone. Instead use dynamic queries
Consider partitioning data with heavy churn rate into separate rows and deleting the entire rows when you no longer need them. Alternatively, partition it into separate
tables and truncate them when they aren’t needed anymore.
Group the columns that requires delete with similar expiry date ( TTL ) and set the same gc.
Possible to improve on this hypothetical queue scenario. Specifically, when knowing what the last entry was, a consumer can specify the start column and
thus somewhat mitigate the effect of tombstones by not having to either 1) start scanning at the beginning of the row and 2) collect and keep all the irrelevant
tombstones in memory.
TTL and gc_grace_seconds goes hand in hand
Even after the data is deleted (tombstone is set), it still occupies the space till it passes gc_grace_seconds
Direct impact on storage and performance
Know your data well before you decide on compaction
Cassandra 2.1 improves read performance after compaction by performing an incremental replacement of compacted SSTables. Instead of waiting for the entire compaction to finish and then throwing away the old SSTable (and cache), Cassandra can read data directly from the new SSTable even before it finishes writing.
As data is written to the new SSTable and reads are directed to it, the corresponding data in the old SSTables is no longer accessed and is evicted from the page cache. Thus begins an incremental process of caching the new SSTable, while directing reads away from the old one. The dramatic cache miss is gone. Cassandra provides predictable high performance even under heavy load.
By increasing the memtable size or preventing too pre-mature flushing
Less frequent memtable flush results in fewer SSTables files and less compaction
Fewer compaction reduces SSTables I/O contention, and therefore improves read operations
Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes
Size Tiered Compaction – Good fit for WORM use case, the row might spread across max SStables (no guarantee on the # ) which impacts the read. Need 100 % of free space. Cassandra 2.0.2 laid the foundation to allow Cassandra to allocate its resources more intelligently by tracking SSTable read rates.
Compact the Hottest SSTables First
The first optimization is to prioritize compaction of the hottest SSTables. That is, if there are multiple sets of SSTables that can be compacted next, the set with the highest collective reads/sec per key will be compacted first. Ideally, this will help to more quickly merge partition fragments that are read frequently.
Avoid Compacting Cold SSTables
The second optimization tries to avoid compacting cold SSTables at all. A new compaction strategy option, cold_reads_to_omit, was added to STCS and may be set per table. The value should be a float between 0.0 and 1.0 representing the maximum percentage of reads/sec that the ignored sstables may account for. In other words, as many cold sstables as possible will be ignored during compaction while retaining at least 1 - cold_reads_to_omit of the total reads/sec for the table.
An example may clarify this. Suppose cold_reads_to_omit is set to 0.1 and we have four equally sized SSTables with the following read rates: SSTable A has 100 reads/sec, B has 5 reads/sec, C has 4 reads/sec, and D has 3 reads/sec. In total, the SSTables have 112 reads/sec. With cold_reads_to_omit set to 0.1, we can ignore the coldest SSTables as long as they collectively have less than 11.2 reads/sec. This means that C and D, with only 7 reads/sec total, can be ignored for compaction purposes, while A and B are still candidates.
Starting in Cassandra 2.1, this feature will be enabled by default with a cold_reads_to_omit value of 0.05. This option is also available in 2.0.3 and later, but is disabled by default. To enable it, the below cqlsh has been used:
ALTER TABLE mykeyspace.mytable WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'cold_reads_to_omit': 0.05};
When tuning this option, consider what percentage of your reads hit “cold” data and set the value slightly below that. You may want to keep an eye on the “SSTables” column of nodetool cfhistograms to see if too many reads are spanning a large number of SSTables. If so, consider lowering the value. A value of 0.0 will result in all SSTables being candidates for compaction.
Leveled Compaction – More beneficial with rows getting frequently updated / deleted. Spends more I/O to guarantee the # of SSTables hold a row.
Every sstable is created when a fixed (relatively small) size limit is reached. By default L0 gets 5MB files of files, and each subsequent level is 10x the size. (in L1 you'll have 50MB of data, L2 500MB, and so on). Sstables are created with the guarantee that they don't overlap. When a level fills up, a compaction is triggered and stables from level-L are promoted to level-L+1. So, in L1 you'll have 50MB in ~10 files, L2 500MB in ~100 files, etc.
DateTieredCompactionStrategy is a compaction strategy specifically written for time series-like data, where data is mostly appended to existing partproblem with leveled- and size-tiered compaction is that it won’t care about when the data was written, meaning they mix new and old data, and one characteristic of time series workloads is that you mostly want to read the most recent data. This forces Cassandra to read from many SSTables. Leveled compaction can give great read performance, but with a big write amplification cost – we will need to recompact data a lot. DTCS makes this better by only compacting together SSTables that contain data with timestamps that are close to each other, meaning for a query that requests the most recent data, we can greatly reduce the number of SSTables touched during a read. It also limits write amplification by having an option to stop compacting data that is old and rarely read.
In the read path of Cassandra, there are a lot of in-memory structures as well as disk components involved. The in-memory components such as Row Cache, Bloom Filter, Key cache and Partition Summary plays a pivotal role in making Cassandra reads faster.
Row Cache: If row cache is enables, the complete row is cached in memory. If there is a Row Cache hit, Cassandra immediately returns the data and this will be the quickest response that Cassandra can provide. This option is not commonly used because of the heavy memory utilization. The complete partition used to be placed in memory. From 2.1, we do have the flexibility to specify how many records from each partition has to be stored in memory. With this change, many retrieve use cases especially the ones with a definite retrieve patterns will be benefitted. Patterns such as reading most recent customer interactions, last 10 account activity etc.. can be catered much faster. Another use case where Row Cache is effective is when the data in the table is less and is being accessed very frequently. A typical use case is Reference Data Models where the volume of data is very less but retrieves and much more
Key Cache: Key Cache is enabled by default. If Row Cache is not enabled or there is no hit for the key for which query is looking for, Cassandra will check bloom filters to eliminate the locations where it need to have to search for that key. If the partition key is present in Key Cache, from there it’ll get the location of data is specific SSTables and does a seek directly.
Key Cache: Key Cache is enabled by default. If Row Cache is not enabled or there is no hit for the key for which query is looking for, Cassandra will check bloom filters to eliminate the locations where it need to have to search for that key. If the partition key is present in Key Cache, from there it’ll get the location of data is specific SSTables and does a seek directly. If there are some archive tables where the reads happen once in a blue moon, those are good candidates to turn off all caching and save some memory.
If the application has some data that are frequently read vs sparsely read, splitting them to two different column families and enabling row cache for frequently read tuning key cache for sparsely read will give optimal read performance without spending too much on memory.
The tuning process has to be iterative especially when it comes to caching. Several configurations need to be tested to identify the right fit parameter for that use case / application.
The database features and drivers are evolving very fast
New features requested by users are getting implemented quickly
Users faces compatibility and learnability challenges with drastic evolution
In traditional RDBMS world, there used to be a very clear demarcation on the activities done by DBAs, Developers and System Admins. With the NoSQL invasion, the distinction between all the three areas has reduced a lot. The same person gets involved in modeling, integrating and tuning the database.
Just to quote an example, none of the RDBMS developers will be thinking of how the buffer pool hit ration can be increased or how the logs can be written effectively. In NoSQL world, the same person gets involved in deciding what sort of caching a column family should be having and whether the commit log and data directories are pointing to the same location or not.