Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack.
Also leaves open the possibility of DC migrations later on.
Datastax recommends not to use logical racks. Cause:
Most users tend to ignore / forget rack requirements - should be in alternating order.
Same number of rack as nodes? –Use racks = RF
Expanding is difficult – not if you’re using vnodes.
Makes repairing easier
Cluster operations
Minimises downtime.
Lose a whole rack of nodes without downtime.
rack and data center information for the local node defined in the cassandra-rackdc.properties
prefer_local=true - tells Cassandra to use the local IP address when communication is not across different data centers.
Causes downtime when adding nodes. RF2 doing quorum.
Driver config + consistency (QUORUM, Defaultretrypolicy)
Change to DowngradingConsistencyRetryPolicy
2.1 handles compactions of large partitions better.
Quorum queries do read repairs, so one slower (compacting) node will take much longer to return digest to coordinator, making the whole operation slower.
SSTables per read
Tombstones per read
Indicates compactions are not keeping up or compaction strategy is not appropriate.
Nodetool stop only stops current compaction, but does not prevent more compactions from occurring. So the same (problematic) compaction will be kicked off again later.
Compactionthroughput – on 2.1 applies to new compactions, on 2.2.5+ applies instantly.
Particularly in 2.0
-H makes is readable.
50% for STCS
If you fill up a disk – can cause corrupt sstables, fails halfway through compactions.
C* doesn’t restart
Snapshots can consume considerable space on disk.
I like to look at the data files on disk – easier than Cfstats.
Look for large CFs. Can you remove data?
Note: might not just be your data. Space can commonly be consumed by snapshots or even system keyspaces.
We’ve had nodes nearly fill up because of stored hinted handoffs.
Be wary of changing gc_grace – make sure there are no other nodes down, or are back up within 3 hours or else the tombstone won’t get passed in HH.
Recovery – add ebs, start C*, compact, etc.
~25G remaining but only 4G free. Unlikely.
This example was SizeTieredCompactionStrategy
If you are watching the node closely, you can let the available disk space get VERY low during compactions (MB free). Be prepared to stop Cassandra on the node if it gets too low.
Added the three 800’s previously – didn’t think they’d need the storage but needed compute
QUORUM - A write must be written to the commit log and memtable on a quorum of replicas
I guess they were having a busy day.
- when Cassandra attempts a read at CL higher than 1 it actually request data from the likely fastest node a digest from the remaining nodes. It then compares the digest values to ensure they agree.- if the don't agree, Cassandra attempts to apply the "most recent data" wins rule and also repair the inconsistency across nodes. To do this, it issue a query at CL=ALL- CASSANDRA 7947 (https://issues.apache.org/jira/browse/CASSANDRA-7947) changed the behaviour of Cassandra so that a failure at the original CL is report rather than a failure at CL=ALLTwo nodes up but one was missing updates/data which triggered the CL=ALL reads and failures.
In our experience it is faster overall to take a slow-and-steady approach rather than overloading the cluster and having to recover it.
Eg. Don’t execute many rebuilds in parallel.