Performance tuning - A key to successful cassandra migration

Performance tuning - A key to successful Cassandra migration

1.0 Abstract
2.0 Dominance of traditional RDBMS and Adoption of NoSQL
3.0 DataStax Cassandra – ‘The Visionary’
4.1 Our journey through Cassandra optimization : Data Model
4.2 Our journey through Cassandra optimization : Integration
4.3 Our journey through Cassandra optimization : DB Parameters
5.0 The only thing constant is change
6.0 Performance tuning - Key to success
2© 2015. All Rights Reserved.

Abstract
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS
databases across different domains. Expeditious adoption of NoSQL databases especially
Cassandra in the industry opens up a lot more discussions on what are the major challenges that
are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude
that migration or POC (proof of concept) is not successful; however the real flaw might be in the data
modeling, identifying the right hardware configurations, database parameters, right consistency level
and so on. There's no one good model or configuration which fits all use cases and all applications.
Performance tuning an application is truly an art and requires perseverance. This paper delve into
different performance tuning considerations and anti-patterns that need to be considered during
Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra,
what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database
Management Systems.

Dominance of RDBMS and NoSQL adoption
 Storage of high volume data
 Transaction control
 Security management
 Common key concepts
 Evolved over a period
 Common construct for querying
Why don’t I try if
these databases
can offer more?
 Support for clusters
 Cost
 Impedance mismatch
 Adaptability to newer workload

DataStax Cassandra – ‘The Visionary’ ……
 As per Gartner’s Magic Quadrant, DataStax Cassandra is listed as a ‘Visionary’
 Magic Quadrant clearly calls out the differentiating factors
 High performance
 In-memory options
 Search capabilities
 Integration with Spark and Hadoop
 Experience in doing business with
the vendor
Source: www.gartner.com

…… But
 One of the major challenges listed in Gartner Magic Quadrant analysis is the
poor performance during POCs
Two major pit falls..
 POCs are conducted as quick and dirty
 No capacity planning
 Performance Tuning
 Moving to production without enough performance
testing

Don’t be in dark…
Have you tried out all possible tuning techniques
before concluding the results ???...
 Data model
 Integration best practices
 Database parameters

Performance tuning - Key to success
 For a successful migration / implementation due diligence need to be done on all
different aspects
• Distribution
• De-Normalization
• Indexing
• Query patterns
Data Model
• ‘Batch’ statements
• Consistency levels
• Load balancing
• Tombstones
Integration • Hidden data
• Compaction
• Cache
DB
Parameters

Our journey through Cassandra optimization..
• Distribution
• De-Normalization
• Indexing
• Query patterns
Data Model
Integration
DB
Parameters

Data model
 Equal distribution of data across
partitions
 De-normalization
 Redundancy of data is acceptable
to cater to different read use cases
 Reduce client side joins
Think out of the box (RDBMS) ! ! !

Data model contd..
 Limit secondary indexes
 Do clustering based on the read
pattern
CREATE TABLE cust_interaction (
cust_id text,
intr_id timeuuid,
intr_tx text,
PRIMARY KEY (cust_id, intr_id)
)
WITH CLUSTERING ORDER BY (intr_id DESC);
A table / CF that
supports read for most
recent customer
interactions

Data Model
• Load balancing
• Tombstones
Integration
DB
Parameters

‘Batch’ is not for performance improvement
 Batching the statements can really harm the performance
 Use individual inserts wherever possible
N1
N2
N3
N4
N5
N6
Individual Inserts
Batch Inserts

Consistency levels
 Decide consistency levels based on
 Workload
 Need for immediate consistency
Read Heavy Write Heavy Mixed work load
High Consistency
(Immediate)
RC : ONE
WC : All
RC : All
WC : ONE
RC : Quorum
WC : Quorum
Relaxed
consistency
RC : ONE
WC : ONE, TWO
RC : ONE, TWO
WC : ONE
RC : ONE, TWO
WC : ONE, TWO
Considered RF = 3

Load balancing strategy
 Consider topology
 Be aware of distribution of clients / users
 TokenAwarePolicy acts as a wrapper
 With multiple data centers, most preferred approach is to go
with DCAwareRoundRobinPolicy with TokenAwarePolicy
 In case of single data center installations, RoundRobinPolicy
with TokenAwarePolicy can be considered

Beware of Tombstones
 Querying data which has columns with tombstone set can bring
down the performance
 Marker in a row indicates the delete
 Compaction removes the Tombstone based on GC
 Do not insert NULL to Cassandra
 IGNORE_NULLS to TRUE
Image Source: www.datastax.com

Data Model
• Load balancing
• Tombstones
Integration • Hidden data
• Compaction
• Cache
DB
Parameters

Watch for hidden data
 TTL and gc_grace_seconds goes hand in hand
 Even after the data is deleted (tombstone is set), it still occupies the space
till it passes gc_grace_seconds
 Direct impact on storage and performance
 Default GC is 10 days

Compaction
 Size Tiered Compaction :
 Leveled Compaction :
 Date Tiered Compaction :
 Full replacement is default
 Incremental Replacement
 Anti-compaction
 Clients can read data directly
from the new SSTable even
before it finishes writing
 Reduce Compaction I/O
contention

Compaction Cont...
 Default is Size-tiered
 Alter column family to change compaction type

Compaction Cont...
 Handle Time series-like data
DateTiered Compaction Strategy

Cache what you need
Cassandra read path = A lot of in-memory components.. Be Optimal...
Image Source: https://academy.datastax.com/
Row cache hit
 Row Cache – Turned OFF by default
 Caches the complete data
 Earlier versions used to load the
whole partition
 From 2.1, number of rows cached
per partition is configurable
 Optimal for low volume data that
are frequently accessed

Cache what you need contd..
Image Source: https://academy.datastax.com/
Key cache hit
 Key Cache – Turned ON by default
 Caches just the key
 Turning OFF  Increase the
response time for retrieves
 Place frequently and sparsely
read data to different CF
No one configuration fits all. Tuning has to be iterative

The only thing constant is change
2011–2012
- Secondary Indexes
- Online schema
changes
- Introduction of CQL
- Zero-downtime
upgrade
- Leveled compaction 2013-2014
- Virtual nodes
- Inter-node
communication
- Light weight tnxs
- Triggers
- Change in data and
log location
- User defined data
types
2015
- Commit log
compression
- JSON support
- Role-based
authorization
- User defined
functions
- Windows support
- Monthly versions
Keep up with the pace.. Changes can impact the performance a lot..

Performance tuning - Key to success
DBA
Developer
Sys Admin
Traditional DBMS world NoSQL World
Database Engineer
Boundary between different roles has blurred..
Onus is on ‘us’ to tune, tune and tune the system to make the Cassandra
implementation successful.. !!!

Question & Answers
???

Authors

Thanks..
 Thanks to all great minds who contributed towards this presentation.
 Srivas J, Infosys Ltd
 Srivas G, Infosys Ltd
 Lakshman G, Infosys Ltd
 Kiran N G Infosys Ltd
 Sivaram K Infosys Ltd
 Chethan Danivas, Infosys Ltd
 Badrinath Narayanan, Infosys Ltd
 Gautam Tiwari, Infosys Ltd
 Shailesh Janrao Barde , Infosys Ltd

References
 NoSQL Distilled by Pramod J. Sadalage and Martin Fowler
 https://academy.datastax.com/courses
 http://www.gartner.com/
 Mastering Apache Cassandra
 http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
 http://www.planetcassandra.org/cassandra/
 http://jonathanhui.com/cassandra-performance-tuning-and-monitoring
Source: www.gartner.com

Performance tuning - A key to successful cassandra migration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Performance tuning - A key to successful cassandra migration

Similar to Performance tuning - A key to successful cassandra migration (20)

Recently uploaded

Recently uploaded (20)

Performance tuning - A key to successful cassandra migration

Editor's Notes