SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Building Apache Cassandra
clusters for massive scale
Covering theory and operational aspects of bring up
Apache Cassandra clusters - this presentation can be used
as a field reference.
Alex Thompson, Solution Architect APAC - DataStax Australia Pty Ltd
Operationalise the rollout of
nodes
2
Build a best practice reproducible machine
image using automation:
Use one of the core test linux distros and versions: RHEL, CentOS or Ubuntu Server.
Select a cloud server or on-premise hardware that at least meets minimum specifications for Apache Cassandra, refer
to this guide for details: Planning Apache Cassandra Hardware
For production, load testing and production like workloads do NOT use a SAN, NAS, CEPH or any other type of shared
storage, DO use directly attached SSDs.
More RAM is better, more CPU is better but don’t get stuck in the RDBMS trap of vertically scaling, Apache Cassandra
works best with many more medium spec’d nodes than a smaller amount of very large nodes - think horizontal scaling
not vertical scaling.
3
Build a best practice reproducible machine
image using automation:
Use an automation tool like Ansible, Salt, Chef or Puppet to:
1. Apply Apache Cassandra OS specific settings for Linux
2. Install Java JDK 1.8.latest
3. Install but not start Apache Cassandra via yum or apt (a tarball is also available)
4. Copy over this nodes cassandra.yaml and cassandra-env.sh
5. Lock down all ports except the required Apache Cassandra ports in iptables, you can see a list of the ports and
their usage here: Securing Firewall but as a simple list you need access on 22 (SSH), 7000, 7001(SSL), 9042
(CQL), 9160(Thrift - optional) and 7199(JMX-optional)
Refer to the presentation by Jon from Macquarie Bank on the use of Ansible and lessons learned for an in depth
discussion on automation - November 2016 meetup.
4
Minimum node specific cassandra.yaml
fields for automation deployment scripts:
cluster_name All nodes participating in a cluster must have the identical cluster name.
hints_directory Where to store hints for other nodes that are down, small disk space requirement.
authenticator Used to identify users; default is wide open, lock this down in combination with transport layer security and
on disk encryption if internet exposed.
authorizer Used to limit access/provide permissions; default is wide open, lock this down in combination with transport
layer security and on disk encryption if internet exposed.
data_file_directories Where you will store data for this node, this will be the largest consumer of disk space. You should put your
commitlog_directory and data_file_directories on different drives for performance.
commitlog_directory You should put your commitlog_directory and data_file_directories on different drives for performance.
saved_caches_directory Where to store your “fast start-up” cache; small disk space requirement.
5
Minimum node specific cassandra.yaml
fields for automation deployment scripts:
seeds When bootstrapping a new node into a cluster, the bootstrapping node will refer to a seed node to learn
topology of the cluster, with this information it can take ownership of token ranges and begin data transfer.
listen_address The ip-address of the node for a single homed 1x NIC node.
rpc_address The ip-address of the node for a single homed 1x NIC node.
endpoint_snitch GossipingPropertyFileSnitch
1. The parameter list above is for a basic C* cluster leaving many unlisted parameters at their default settings, the
default settings are very sane for most use cases but can be fine tuned to maximize performance and hardware
utilisation, only tweak the unlisted parameters when you know what you are doing.
2. The parameters listed above are in top down order as at 13/2/2017 for the github.com master Apache Cassandra
repository here: cassandra.yaml
6
Minimum node specific cassandra-env.sh
fields for automation deployment scripts:
If the cassandra-env.sh is left in default form it will allocate ¼ of the RAM in the node to Apache Cassandra, this can be
problematic on very small spec’d nodes as C* really needs a minimum 4GB HEAP allocation to function in development.
As a general rule if HEAP =< 16GB use ParNew/CMS GC otherwise HEAP > 16GB use G1 GC.
You set the HEAP by uncommenting the following in the cassandra-env.sh:
#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"
G1 requires that only MAX_HEAP_SIZE be set.
In production the HEAP setting on G1 GC are usually 16,24,32GB.
ParNew/CMS requires both are set, as a guide HEAP_NEWSIZE should be 20-25% of MAX_HEAP_SIZE.
7
Summary so far
We now have a node that:
1. Is on the correct hardware
2. Has correct OS with basic tuning in place
3. Has the correct Java JDK version
4. Has Apache Cassandra installed via yum or apt
5. Has customised cassandra.yaml and cassandra-env.sh files
6. Has been secured at IPtable level
7. Can now be started and bootstrapped against seed in the cluster
8
Construction of the cluster
9
Bringing up the first node...
This is a new cluster when bring up the first node so there is in effect nothing to bootstrap against, Cassandra
understands this and initialises the node without going thru the bootstrapping phase.
>service cassandra start
Check /var/log/cassandra/system.log for startup process and monitor for any warnings or exceptions.
You most likely want to bring up multiple nodes at once in the new cluster, for the sake of this presentation I am
looking at one at a time so that i can break down the bootstrapping phases, to skip that and bring multiple nodes up at
once follow the documentation here:
Initializing a multiple node cluster (single datacenter)
10
Load some data
Load some data into the first node.
Here I am going to use the
cassandra-stress tool to load 100GB of
sample data.
Cassandra-stress can be used for
loading sample data and/or stress
testing a Cassandra cluster with read /
write workloads.
You can read more about
cassandra-stress here.
1
Tokens 0-9
Data on disk 100GB
11
Bootstrapping the second node...
Put the ip-address of the first node in the seed list of this node’s cassandra.yaml
>service cassandra start
Check /var/log/cassandra/system.log for bootstrapping progress.
12
Bootstrapping the second node...
Run the following on the first node and you will see your new node in UJ state - Up Joining:
>nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.10.3.62 100 GB 256 ? c934ced4-b1c9-4f0f-b278-83282cd7107f RAC2
UJ 10.10.3.63 3 MB 256 ? 1a3df7fa-a1e7-464a-9495-c6a52d61eafa RAC3
13
Bootstrapping...what happened?
So what is happening in this bootstrapping phase?
In Up Joining (UJ) state the node is not actively participating in any queries either read or write for both internode and
client to node traffic.
1. A calculation is done for this node’s share of the token space, in this case it takes half of the token space as it is
one of only two nodes in the ring and in taking half the token space it is taking responsibility for half the data in
the ring.
2. The node begins streaming in the data from the first node for its tokens.
3. The node completes streaming its data from the first node, this can take time for 100’s of GBs of data
4. The node changes state to UN (Up Normal)
5. The node can now be discovered by drivers and their application servers and now start responding to read /
write requests.
14
Data streaming
during bootstrap
Be aware on small clusters of the
costs of bootstrapping, the data
streaming phase can consume
considerable resources and take
increasing amounts of time for very
large amounts of data.
1
2
Tokens 0-4
Data on disk 100GB
Tokens 5-9
Data on disk ..growing
15
Second node
added
Notice that the second node now owns
half of the tokens in the ring.
Notice that the data on node 1 is
100GB on disk and the data on the
new node 2 is only 50GB on disk.
1
2
Tokens 0-4
Data on disk 100GB
Tokens 5-9
Data on disk 50GB
16
Bootstrapping data...WTF?
In bootstrapping the new node, I knew it took half the data off the first node but the amount of disk space used on the
first node didn’t change, it didn’t go down? WTF is going on here? Something is broken!
Rule: Bootstrapping a new node into a cluster does NOT clean up after itself and delete the orphaned data on the
original nodes!
Don’t get me wrong, the data on the first node is not hurting anything, it’s not used anymore, it just sits there using up
precious space, let's get rid of it by running the following command on the first node:
>nodetool cleanup
Note that in a Vnode cluster (most likely what you will be using) you have to run nodetool cleanup on all nodes in the
DC except of course the node you just added.
17
After cleanup
After [nodetool cleanup] has run data
is once again evenly distributed over
nodes.
1
2
Tokens 0-4
Data on disk 50GB
Tokens 5-9
Data on disk 50GB
18
Powerful
implications
We just doubled the raw compute
capacity of our database tier in the
following ways:
1. Doubled IO throughput
2. Doubled the amount of RAM
3. Doubled the amount of disk
4. Doubled the number of CPUs
1
2
Tokens 0-4
Data on disk 50GB
Tokens 5-9
Data on disk 50GB
19
Powerful
implications
The effect at the application tier is
arguably more profound, we have
doubled the workload capacity of the
underlying database tier to handle
increases in application tier traffic. So
as our workload increases at the
application tier we simply add nodes at
the Cassandra cluster level to soak up
the workload increase.
*The tps figures in this series are not real, your
tps limits will be dependent on your hardware,
data model, replication_factor and how you
read / write data. Use cassandra-stress to
emulate your real world traffic patterns and
and record performance behaviour.
1
Application server max tps 5000 tps
1000 tps
20
Powerful
implications
The effect at the application tier is
arguably more profound, we have
doubled the workload capacity of the
underlying database tier to handle
increases in application tier traffic. So
as our workload increases at the
application tier we simply add nodes at
the Cassandra cluster level to soak up
the workload increase.
1
2
1000 tps
1000 tps
Application server max tps 5000 tps
21
Practical
considerations
There is not much use having a two
node cluster, you really want a
minimum of 3 nodes and a
replication_factor of 3 and then scale
out your cluster from there.
1
23
22
Practical
considerations
Here we have stayed with a single
application server which is not a really
good idea from a redundancy
perspective but there is another
problem.
The tps capacity of the database tier
has scaled past the tps capacity of the
application tier, leaving the database
tier under-utilized.
1
5
2
3
4
8
6
7
9
9000 tps
Application server max tps 5000 tps
23
Practical
considerations
Time to start scaling out the
application tier to fully utilize the
capacity of the database tier.
1
5
2
3
4
8
6
7
9
9000 tps
Application server max tps 10000 tps
24
Triggers for adding more nodes and
capacity planning
Too much data per node You want to aim for 500GB-1TB of data per node, the more data per node the longer repairs,
bootstrapping and compactions take.
Insufficient free space on drives For SizeTieredCompactionStrategy (the default) you need 50% of the disk free at all times in the
worst case.
Poor IO performance If you have done everything right in regards to amount of data per node, have directly attached
SSD’s and have tuned both your hardware and Cassandra to maximize IO performance and you
still have poor IO performance then you need to scale out of the problem.
Bottlenecked CPUs Same as above, if you have done everything right and tuned both your hardware and Cassandra
to maximize CPU performance and you still have poor CPU performance then you need to scale
out of the problem.
25
Triggers for adding more nodes and
capacity planning
Poor JVM GC behaviour This can be tricky to troubleshoot, more than likely it’s just a scale out fix as you are
overloading the nodes with read / write traffic, but there are cases where a poor access pattern
or problematic use case can be the cause of GC churning.
Adding additional keyspaces and
application workloads to the cluster
Workloads are cumulative in resource demand.
Increases in application tier traffic If you double the amount of requests against your application tier, the relationship with
Cassandra is linear, you will need to double the number of nodes in your cluster to maintain
the same performance, it’s simple maths.
26
Summary so far
Now we have a basic cluster of 9 nodes that we can continue to scale out.
What we do not have is any form of redundancy:
1. What if a shared switch goes down?
2. What if a common rack chassis power supply goes down?
3. What if we loose the network to this physical data center?
Cassandra has probably the best answer to this of any DB solution available: the logical data center.
27
Redundancy, replication and
workload isolation via logical
Cassandra data centers
28
cluster
Data centers
Cassandra data centers (DCs) are a
logical not physical concept.
A Cassandra cluster is made up of
data centers and each data center
holds a complete token range.
You write your data to one data center
and it is replicated to another
datacenter, that other data center
could be in the same rack or across
the world.
A cluster can have many data centers
but practical limits do apply.
DC1
1
5
2
3
4
8
6
7
9
DC2
1
5
2
3
4
8
6
7
9
29
cluster
Data centers
Data centers are a versatile concept
and can be used for many differing
purposes, here are some examples:
1. Simple redundancy
2. Active failover from app tier
3. Geo edge serving
4. Workload isolation
As mentioned before, each DC holds
complete token range for the
keyspaces that are replicated to it, you
decide which keyspaces are
replicated.
DC1
1
5
2
3
4
8
6
7
9
DC2
1
5
2
3
4
8
6
7
9
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}
30
cluster
Simple redundancy
This multi-dc cluster is a simple
redundancy setup, if we lose us-east-1
due to an outage we can access
us-west-1 for the data for business
continuity.
us-east-1
1
5
2
3
4
8
6
7
9
us-west-1
1
5
2
3
4
8
6
7
9
read/write DC
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'}
31
cluster
Active failover
This multi-dc cluster is a an active
failover setup, if we lose us-east-1 due
to an outage we can failover the
application servers to us-west-1, this
can be configured at the cassandra
driver level*, in custom code, the
network layer or at the DNS level.
* See the April 2016 Sydney Cassandra Users
Meetup talk that covers most aspects of driver
configuration and strategies.
us-east-1
1
5
2
3
4
8
6
7
9
us-west-1
1
5
2
3
4
8
6
7
9
read/write DC actively fails over to the us-west-1 DC
CREATE KEYSPACE myKeyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'}
32
cluster
Geo edge serving
All DC’s are close to their own
in-country app servers.
Writes can be handled in any number
of ways, reads are always from the
closest DC.
Any write to any DC replicates to the
other 3 geographic locations.
US-DC
1
5
2
3
4
8
6
7
9
CREATE KEYSPACE myKeyspace
WITH replication =
{'class': 'NetworkTopologyStrategy', 'US-DC: '3', 'EU-DC': '3',, 'ME-DC': '3', 'AP-DC': '3'}
EU-DC
1
5
2
3
4
8
6
7
9
ME-DC
1
5
2
3
4
8
6
7
9
AP-DC
1
5
2
3
4
8
6
7
9
33
Workload isolation
34
cluster
Workload isolation
Apart from simple redundancy this is the most
important use of logical data centers in
Cassandra.
Different workloads are pointed to different
data centers to allow us to isolate say a spiky
web workload from an analytic Spark
workload, we can then independently scale
each DC to its own workload making the most
efficient use of resources.
In this example we replicate cass-DC tables to
spark-DC, perform analytics on them and write
to recommendation tables in the spark-DC
which replicate back to the cass-DC.
cass-DC
1
5
2
3
4
8
6
7
9
spark-DC
1
5
2
3
4
8
6
7
9
app server
CREATE KEYSPACE web-tables
WITH replication = {'class': 'NetworkTopologyStrategy', 'cass-DC: '3', 'spark-DC': '2'}
CREATE KEYSPACE recommendation-tables
WITH replication = {'class': 'NetworkTopologyStrategy', 'spark-DC: '2', 'cass-DC': '3'}
spark
35
C* Learning resources
The datastax documentation has more extensive descriptions of all the concepts listed here, please
refer to it if you need more in depth knowledge and don’t forget academy.datastax.com for full
courses and a multitude of Apache Cassandra learning resources.
36
Thanks!
Contact us:
DataStax Australia
alex.thompson@datastax.com
www.datastax.com
37

Mais conteúdo relacionado

Mais procurados

Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus cagingBertrandDrouvot
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart PookEvention
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineChris Adkin
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation FactoryNathan Milford
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingAbdelhamide EL ARIB
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Denish Patel
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterJ.B. Langston
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 

Mais procurados (20)

Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
JahiaOne - Performance Tuning
JahiaOne - Performance TuningJahiaOne - Performance Tuning
JahiaOne - Performance Tuning
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and later
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 

Semelhante a Building Apache Cassandra clusters for massive scale

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Santosh Kangane
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Osama Mustafa
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at PollfishPollfish
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Squid proxy server
Squid proxy serverSquid proxy server
Squid proxy serverGreen Jb
 
Avi Apelbaum - RAC
Avi Apelbaum - RAC Avi Apelbaum - RAC
Avi Apelbaum - RAC gridcontrol
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBasephanleson
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 

Semelhante a Building Apache Cassandra clusters for massive scale (20)

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
Arun
ArunArun
Arun
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
 
os
osos
os
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Squid proxy server
Squid proxy serverSquid proxy server
Squid proxy server
 
Avi Apelbaum - RAC
Avi Apelbaum - RAC Avi Apelbaum - RAC
Avi Apelbaum - RAC
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 

Último

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Último (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Building Apache Cassandra clusters for massive scale

  • 1. Building Apache Cassandra clusters for massive scale Covering theory and operational aspects of bring up Apache Cassandra clusters - this presentation can be used as a field reference. Alex Thompson, Solution Architect APAC - DataStax Australia Pty Ltd
  • 3. Build a best practice reproducible machine image using automation: Use one of the core test linux distros and versions: RHEL, CentOS or Ubuntu Server. Select a cloud server or on-premise hardware that at least meets minimum specifications for Apache Cassandra, refer to this guide for details: Planning Apache Cassandra Hardware For production, load testing and production like workloads do NOT use a SAN, NAS, CEPH or any other type of shared storage, DO use directly attached SSDs. More RAM is better, more CPU is better but don’t get stuck in the RDBMS trap of vertically scaling, Apache Cassandra works best with many more medium spec’d nodes than a smaller amount of very large nodes - think horizontal scaling not vertical scaling. 3
  • 4. Build a best practice reproducible machine image using automation: Use an automation tool like Ansible, Salt, Chef or Puppet to: 1. Apply Apache Cassandra OS specific settings for Linux 2. Install Java JDK 1.8.latest 3. Install but not start Apache Cassandra via yum or apt (a tarball is also available) 4. Copy over this nodes cassandra.yaml and cassandra-env.sh 5. Lock down all ports except the required Apache Cassandra ports in iptables, you can see a list of the ports and their usage here: Securing Firewall but as a simple list you need access on 22 (SSH), 7000, 7001(SSL), 9042 (CQL), 9160(Thrift - optional) and 7199(JMX-optional) Refer to the presentation by Jon from Macquarie Bank on the use of Ansible and lessons learned for an in depth discussion on automation - November 2016 meetup. 4
  • 5. Minimum node specific cassandra.yaml fields for automation deployment scripts: cluster_name All nodes participating in a cluster must have the identical cluster name. hints_directory Where to store hints for other nodes that are down, small disk space requirement. authenticator Used to identify users; default is wide open, lock this down in combination with transport layer security and on disk encryption if internet exposed. authorizer Used to limit access/provide permissions; default is wide open, lock this down in combination with transport layer security and on disk encryption if internet exposed. data_file_directories Where you will store data for this node, this will be the largest consumer of disk space. You should put your commitlog_directory and data_file_directories on different drives for performance. commitlog_directory You should put your commitlog_directory and data_file_directories on different drives for performance. saved_caches_directory Where to store your “fast start-up” cache; small disk space requirement. 5
  • 6. Minimum node specific cassandra.yaml fields for automation deployment scripts: seeds When bootstrapping a new node into a cluster, the bootstrapping node will refer to a seed node to learn topology of the cluster, with this information it can take ownership of token ranges and begin data transfer. listen_address The ip-address of the node for a single homed 1x NIC node. rpc_address The ip-address of the node for a single homed 1x NIC node. endpoint_snitch GossipingPropertyFileSnitch 1. The parameter list above is for a basic C* cluster leaving many unlisted parameters at their default settings, the default settings are very sane for most use cases but can be fine tuned to maximize performance and hardware utilisation, only tweak the unlisted parameters when you know what you are doing. 2. The parameters listed above are in top down order as at 13/2/2017 for the github.com master Apache Cassandra repository here: cassandra.yaml 6
  • 7. Minimum node specific cassandra-env.sh fields for automation deployment scripts: If the cassandra-env.sh is left in default form it will allocate ¼ of the RAM in the node to Apache Cassandra, this can be problematic on very small spec’d nodes as C* really needs a minimum 4GB HEAP allocation to function in development. As a general rule if HEAP =< 16GB use ParNew/CMS GC otherwise HEAP > 16GB use G1 GC. You set the HEAP by uncommenting the following in the cassandra-env.sh: #MAX_HEAP_SIZE="4G" #HEAP_NEWSIZE="800M" G1 requires that only MAX_HEAP_SIZE be set. In production the HEAP setting on G1 GC are usually 16,24,32GB. ParNew/CMS requires both are set, as a guide HEAP_NEWSIZE should be 20-25% of MAX_HEAP_SIZE. 7
  • 8. Summary so far We now have a node that: 1. Is on the correct hardware 2. Has correct OS with basic tuning in place 3. Has the correct Java JDK version 4. Has Apache Cassandra installed via yum or apt 5. Has customised cassandra.yaml and cassandra-env.sh files 6. Has been secured at IPtable level 7. Can now be started and bootstrapped against seed in the cluster 8
  • 10. Bringing up the first node... This is a new cluster when bring up the first node so there is in effect nothing to bootstrap against, Cassandra understands this and initialises the node without going thru the bootstrapping phase. >service cassandra start Check /var/log/cassandra/system.log for startup process and monitor for any warnings or exceptions. You most likely want to bring up multiple nodes at once in the new cluster, for the sake of this presentation I am looking at one at a time so that i can break down the bootstrapping phases, to skip that and bring multiple nodes up at once follow the documentation here: Initializing a multiple node cluster (single datacenter) 10
  • 11. Load some data Load some data into the first node. Here I am going to use the cassandra-stress tool to load 100GB of sample data. Cassandra-stress can be used for loading sample data and/or stress testing a Cassandra cluster with read / write workloads. You can read more about cassandra-stress here. 1 Tokens 0-9 Data on disk 100GB 11
  • 12. Bootstrapping the second node... Put the ip-address of the first node in the seed list of this node’s cassandra.yaml >service cassandra start Check /var/log/cassandra/system.log for bootstrapping progress. 12
  • 13. Bootstrapping the second node... Run the following on the first node and you will see your new node in UJ state - Up Joining: >nodetool status Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.10.3.62 100 GB 256 ? c934ced4-b1c9-4f0f-b278-83282cd7107f RAC2 UJ 10.10.3.63 3 MB 256 ? 1a3df7fa-a1e7-464a-9495-c6a52d61eafa RAC3 13
  • 14. Bootstrapping...what happened? So what is happening in this bootstrapping phase? In Up Joining (UJ) state the node is not actively participating in any queries either read or write for both internode and client to node traffic. 1. A calculation is done for this node’s share of the token space, in this case it takes half of the token space as it is one of only two nodes in the ring and in taking half the token space it is taking responsibility for half the data in the ring. 2. The node begins streaming in the data from the first node for its tokens. 3. The node completes streaming its data from the first node, this can take time for 100’s of GBs of data 4. The node changes state to UN (Up Normal) 5. The node can now be discovered by drivers and their application servers and now start responding to read / write requests. 14
  • 15. Data streaming during bootstrap Be aware on small clusters of the costs of bootstrapping, the data streaming phase can consume considerable resources and take increasing amounts of time for very large amounts of data. 1 2 Tokens 0-4 Data on disk 100GB Tokens 5-9 Data on disk ..growing 15
  • 16. Second node added Notice that the second node now owns half of the tokens in the ring. Notice that the data on node 1 is 100GB on disk and the data on the new node 2 is only 50GB on disk. 1 2 Tokens 0-4 Data on disk 100GB Tokens 5-9 Data on disk 50GB 16
  • 17. Bootstrapping data...WTF? In bootstrapping the new node, I knew it took half the data off the first node but the amount of disk space used on the first node didn’t change, it didn’t go down? WTF is going on here? Something is broken! Rule: Bootstrapping a new node into a cluster does NOT clean up after itself and delete the orphaned data on the original nodes! Don’t get me wrong, the data on the first node is not hurting anything, it’s not used anymore, it just sits there using up precious space, let's get rid of it by running the following command on the first node: >nodetool cleanup Note that in a Vnode cluster (most likely what you will be using) you have to run nodetool cleanup on all nodes in the DC except of course the node you just added. 17
  • 18. After cleanup After [nodetool cleanup] has run data is once again evenly distributed over nodes. 1 2 Tokens 0-4 Data on disk 50GB Tokens 5-9 Data on disk 50GB 18
  • 19. Powerful implications We just doubled the raw compute capacity of our database tier in the following ways: 1. Doubled IO throughput 2. Doubled the amount of RAM 3. Doubled the amount of disk 4. Doubled the number of CPUs 1 2 Tokens 0-4 Data on disk 50GB Tokens 5-9 Data on disk 50GB 19
  • 20. Powerful implications The effect at the application tier is arguably more profound, we have doubled the workload capacity of the underlying database tier to handle increases in application tier traffic. So as our workload increases at the application tier we simply add nodes at the Cassandra cluster level to soak up the workload increase. *The tps figures in this series are not real, your tps limits will be dependent on your hardware, data model, replication_factor and how you read / write data. Use cassandra-stress to emulate your real world traffic patterns and and record performance behaviour. 1 Application server max tps 5000 tps 1000 tps 20
  • 21. Powerful implications The effect at the application tier is arguably more profound, we have doubled the workload capacity of the underlying database tier to handle increases in application tier traffic. So as our workload increases at the application tier we simply add nodes at the Cassandra cluster level to soak up the workload increase. 1 2 1000 tps 1000 tps Application server max tps 5000 tps 21
  • 22. Practical considerations There is not much use having a two node cluster, you really want a minimum of 3 nodes and a replication_factor of 3 and then scale out your cluster from there. 1 23 22
  • 23. Practical considerations Here we have stayed with a single application server which is not a really good idea from a redundancy perspective but there is another problem. The tps capacity of the database tier has scaled past the tps capacity of the application tier, leaving the database tier under-utilized. 1 5 2 3 4 8 6 7 9 9000 tps Application server max tps 5000 tps 23
  • 24. Practical considerations Time to start scaling out the application tier to fully utilize the capacity of the database tier. 1 5 2 3 4 8 6 7 9 9000 tps Application server max tps 10000 tps 24
  • 25. Triggers for adding more nodes and capacity planning Too much data per node You want to aim for 500GB-1TB of data per node, the more data per node the longer repairs, bootstrapping and compactions take. Insufficient free space on drives For SizeTieredCompactionStrategy (the default) you need 50% of the disk free at all times in the worst case. Poor IO performance If you have done everything right in regards to amount of data per node, have directly attached SSD’s and have tuned both your hardware and Cassandra to maximize IO performance and you still have poor IO performance then you need to scale out of the problem. Bottlenecked CPUs Same as above, if you have done everything right and tuned both your hardware and Cassandra to maximize CPU performance and you still have poor CPU performance then you need to scale out of the problem. 25
  • 26. Triggers for adding more nodes and capacity planning Poor JVM GC behaviour This can be tricky to troubleshoot, more than likely it’s just a scale out fix as you are overloading the nodes with read / write traffic, but there are cases where a poor access pattern or problematic use case can be the cause of GC churning. Adding additional keyspaces and application workloads to the cluster Workloads are cumulative in resource demand. Increases in application tier traffic If you double the amount of requests against your application tier, the relationship with Cassandra is linear, you will need to double the number of nodes in your cluster to maintain the same performance, it’s simple maths. 26
  • 27. Summary so far Now we have a basic cluster of 9 nodes that we can continue to scale out. What we do not have is any form of redundancy: 1. What if a shared switch goes down? 2. What if a common rack chassis power supply goes down? 3. What if we loose the network to this physical data center? Cassandra has probably the best answer to this of any DB solution available: the logical data center. 27
  • 28. Redundancy, replication and workload isolation via logical Cassandra data centers 28
  • 29. cluster Data centers Cassandra data centers (DCs) are a logical not physical concept. A Cassandra cluster is made up of data centers and each data center holds a complete token range. You write your data to one data center and it is replicated to another datacenter, that other data center could be in the same rack or across the world. A cluster can have many data centers but practical limits do apply. DC1 1 5 2 3 4 8 6 7 9 DC2 1 5 2 3 4 8 6 7 9 29
  • 30. cluster Data centers Data centers are a versatile concept and can be used for many differing purposes, here are some examples: 1. Simple redundancy 2. Active failover from app tier 3. Geo edge serving 4. Workload isolation As mentioned before, each DC holds complete token range for the keyspaces that are replicated to it, you decide which keyspaces are replicated. DC1 1 5 2 3 4 8 6 7 9 DC2 1 5 2 3 4 8 6 7 9 CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} 30
  • 31. cluster Simple redundancy This multi-dc cluster is a simple redundancy setup, if we lose us-east-1 due to an outage we can access us-west-1 for the data for business continuity. us-east-1 1 5 2 3 4 8 6 7 9 us-west-1 1 5 2 3 4 8 6 7 9 read/write DC CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'} 31
  • 32. cluster Active failover This multi-dc cluster is a an active failover setup, if we lose us-east-1 due to an outage we can failover the application servers to us-west-1, this can be configured at the cassandra driver level*, in custom code, the network layer or at the DNS level. * See the April 2016 Sydney Cassandra Users Meetup talk that covers most aspects of driver configuration and strategies. us-east-1 1 5 2 3 4 8 6 7 9 us-west-1 1 5 2 3 4 8 6 7 9 read/write DC actively fails over to the us-west-1 DC CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1: '3', 'us-west-1': '3'} 32
  • 33. cluster Geo edge serving All DC’s are close to their own in-country app servers. Writes can be handled in any number of ways, reads are always from the closest DC. Any write to any DC replicates to the other 3 geographic locations. US-DC 1 5 2 3 4 8 6 7 9 CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'US-DC: '3', 'EU-DC': '3',, 'ME-DC': '3', 'AP-DC': '3'} EU-DC 1 5 2 3 4 8 6 7 9 ME-DC 1 5 2 3 4 8 6 7 9 AP-DC 1 5 2 3 4 8 6 7 9 33
  • 35. cluster Workload isolation Apart from simple redundancy this is the most important use of logical data centers in Cassandra. Different workloads are pointed to different data centers to allow us to isolate say a spiky web workload from an analytic Spark workload, we can then independently scale each DC to its own workload making the most efficient use of resources. In this example we replicate cass-DC tables to spark-DC, perform analytics on them and write to recommendation tables in the spark-DC which replicate back to the cass-DC. cass-DC 1 5 2 3 4 8 6 7 9 spark-DC 1 5 2 3 4 8 6 7 9 app server CREATE KEYSPACE web-tables WITH replication = {'class': 'NetworkTopologyStrategy', 'cass-DC: '3', 'spark-DC': '2'} CREATE KEYSPACE recommendation-tables WITH replication = {'class': 'NetworkTopologyStrategy', 'spark-DC: '2', 'cass-DC': '3'} spark 35
  • 36. C* Learning resources The datastax documentation has more extensive descriptions of all the concepts listed here, please refer to it if you need more in depth knowledge and don’t forget academy.datastax.com for full courses and a multitude of Apache Cassandra learning resources. 36