Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime

Migrating 500 Nodes from Rackspace to Google
With Zero Downtime

Gilberto Müller
• Engineering Manager
• 17 YoE
• XP - Infrastructure and datastores
• METRONOM for 2.5 years
• Previously HSBC, Wipro, MasterCard
• SRE enthusiast

Paul Chandler
• Independent Cassandra Consultant
• First used Cassandra in 2014
• Designed this Google Move process
• Historically based in the Travel Industry
British Airways, Avis, TUI etc

METRO
• Leading international wholesale and
retail food specialist company
• 50+ years old
• 35 countries
• 764 stores (in 25 countries)
• 150.000 people worldwide
• ~24mn customers
• €36.5bn on sales for fiscal year
2017/18

METRONOM
• The biggest software company
you never heard about (from our CEO)
• Digital transformation started in 2015
• Platform as a Service and Dev
• Cassandra started as the only option
• 8 Platform teams (changing over time)
• Multiple DCs in different countries,
hybrid-cloud (EU, CH, and RU*)
• 100+ application development teams
• MCC main customer

NoSQL Team
• 9 people from 10 different places
• Agile: Dash
• Shared responsibility
• Consultancy
• SRE
• DevOps
• Infrastructure as a Code
• Provisioning, patch, upgrade
• Support
• Migrations
• We offer a platform, not DBA
• Service wrapper (whole platform)
• Backup and restore (whole platform)
• On-call

Products
• Apache Cassandra
• DataStax Enterprise
• Apache Solr (Solr Cloud)
• DSE Search
• Apache Spark
• HDFS*
DataStax, is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Solr, Apache Spark, Spark, Apache Zookeeper, Zookeeper, Apache Hadoop, and Hadoop are
either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other
countries.

Technologies
and Numbers
• Zookeeper
• HAProxy
• Nginx
• OpsCenter
• Graphana
• PostgreSQL
• Puppet
• Jenkins
• Java
• Linux
• 1200+ servers
• 300+ clusters
• 165+ C* (both flavours)
• 80+ Solr

Steady State - 1 Datacenter RS_UK
RS_UK
• Multiple Clusters
• Move 1 cluster at time
• No Downtime allowed

RS_UK
• Local consistency types for Reading and
Writing
• LOCAL_ONE
LOCAL_QUORUM
• Application Driver needs to DC Aware policy
• Light Weight Transactions (LWT) must use
LOCAL_SERIAL
Application Pre Requisites

RS_UK
ALTER KEYSPACE system_auth WITH replication =
{'class': 'NetworkTopologyStrategy', ‘RS_UK': 3,
‘GL_EU': 3};
Keyspaces:
 system_auth
 system_schema
 dse_leases
 system_distributed
 dse_perf
 system_traces
 dse_security
Step 1 – Alter system keyspaces

RS_UK
GL_EU• Can be different
Number of Nodes
• Only System keyspaces
automatically migrated
• Should be quick
Step 2 - Create Nodes in New Datacenter

RS_UK
GL_EUcassandra.yaml
• cluster_name: Must be the same
for both datacenters
• seeds: should point to seeds in
RS_UK
cassandra-rackdc.properties
• dc should be the new datacenter
Continue using
GossipingPropertyFileSnitch
Step 2 - Create Nodes in New Datacenter

RS_UK
GL_EU
Nodes created and system keyspaces copied

RS_UK
GL_EU
• Must still connect to
RS_UK
• No Data in GL_EU
Nodes created and system keyspaces copied

RS_UK
GL_EU
• ALTER KEYSPACE user_keyspace1 WITH replication = {'class': 'NetworkTopologyStrategy',
‘RS_UK': 3, ‘GL_EU': 3};
‘RS_UK': 3, ‘GL_EU': 3};
‘RS_UK': 3, ‘GL_EU': 3};
Step 3 – Alter Replication for User Keyspaces

RS_UK
GL_EUAt This Point:
• Inserted data replicated
• Old data not replicated
(yet)
• Still don’t connect
• Lots of data missing
Keyspaces Replicated

RS_UK
GL_EUOn each new node run in turn
• nodetool rebuild RS_UK
This will take some time, best to script this section
Step 4 – Rebuild Nodes

RS_UK
GL_EUNodes gain data one
node at a time
Step 4 – Rebuild Nodes

RS_UK
GL_EUFully functioning cluster:
• Connect to either DC
• Data flows
automatically
Nodes Rebuilt

RS_UK
GL_EUcassandra.yaml
change seed nodes to be nodes in GL_EU
Point all applications to new datacenter
Full repair on all nodes in new datacenter
Prepare for Decommission

RS_UK
GL_EU
Prepare for Decommission

RS_UK
GL_EU
‘GL_EU': 3};
‘GL_EU': 3};
• ALTER KEYSPACE user_keyspace3 WITH replication = {'class': 'NetworkTopologyStrategy', ‘
‘GL_EU': 3};
• Plus system keyspaces
Alter Replication to one Datacenter for ALL keyspaces

RS_UK
GL_EU
Data now Disconnectecd

RS_UK
GL_EU
• Stop each node in RS_UK
• Decommission each node in turn
• nodetool removenode xxxxxxxxxxxxxxxx
Decommission RS_UK nodes

RS_UK
GL_EU
Decommission RS_UK nodes
Datacenter: RS_UK
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.29.30.29 11.66 GB 256 ? ab479afd-c754-47f7-92fb-47790d734ac9 rack1
UN 10.29.30.33 12.32 GB 256 ? 9aa1c5c5-c6cd-4267-ba68-c6bd8b2ac460 rack2
UN 10.29.30.34 12.16 GB 256 ? db454258-ac73-4a8a-9c75-226108c66889 rack3
Datacenter: GL_EU
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.131.134.35 13.19 GB 256 ? 114b4a37-7d69-40e5-988b-a4c998e7a02a rack1
UN 10.131.134.39 12.14 GB 256 ? 4173fc2a-e65c-43aa-baa4-a5eefe0ceb60 rack2
UN 10.131.134.42 12.97 GB 256 ? 8b5dde02-1ff1-48cc-9900-6d8f2bb339bf rack3
nodetool removenode ab479afd-c754-47f7-92fb-47790d734ac9

RS_UK
GL_EU• Data successfully
moved
• Old Datacenter
decommissioned
Movement Complete

What Possibly Could Go Wrong ?

Network Performance
Test the network performance between Datacenters

Network Performance
• Enough Bandwidth
• Not stealing all bandwidth

iperf3
• iperf3 –s
• iperf3 -c xxx.xxx.xxx.xxxx
• iperf3 -c xxx.xxx.xxx.xxxx -b 10G
• iperf3 -c xxx.xxx.xxx.xxxx -C yeah
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 17.1 GBytes 14.7 Gbits/sec

net.ipv4.tcp_congestion_control=yeah
Nodetool setinterdcstreamthroughput xxx

Views
• Views rebuilt – not streamed
• Uses selects on table to rebuild
= Tombstone Trouble

Memory
Heavy use of Heap memory

Heap Size
• Streaming and Compaction use up memory
• Heap size can be increased
• Don’t need to worry about GC pauses
• Change back before connecting applications

Compaction Throughput
• Large amount of data streamed
• Compaction Lag
• Lots of small sstables
• Update Compaction Throughput
nodetool setcompactionthroughput xxxxx

Streaming Throughput
• Reduce pressure if needed
• Reduce only streaming between datacenters
nodetool setinterdcstreamthroughput xxxxx

RS_UK
GL_EUselect column from table
where id = 1
• 3 nodes holding data
per DC
Multi DC Replication

RS_UK
GL_EU
2 nodes of:
Node3
Node4
Node5
LOCAL QUORUM

RS_UK
GL_EU
4 nodes of:
Node3
Node4
Node5
Node8
Node9
Node10
At least one in
2nd DC
250 miles
22 m/s
QUORUM

Lightweight Transactions (LWT)
insert into table (id, name)
values (1, “Name” )
IF NOT EXISTS
Uses Paxos algorithm
Uses different consistency level for Paxos
SERIAL or LOCAL_SERIAL

RS_UK
GL_EUselect column from table
where id = 1
• Without DC aware
there
will be problems
Load Balancing Policy

Implementation
• DB of cluster and node names
• Automatic scripts to create cloud
instances
• Scale clusters up or down
• Puppet
• Jenkins jobs
• Rebuild stage
• Decommission stage
• Service wrapper to protect integrity of
cluster

Success
• 91 Clusters moved
• Solr migration (not covered here)
• No C* cluster downtime
• Incorrect consistency sometimes caused application downtime
• April 2018 - October 2018
• One cluster delayed until February 2019
• Padding 0s with compression
• Automation is a must

Process can also be used for
• Splitting clusters (i.e.: multi-tenant)
• Updating non-trivial configuration
• num_tokens
• Upgrading underlying operating system
• Ubuntu upgrades (upstart –> systemd)

Thank You
More details can be found at:
https://bit.ly/2Lnosw6
Paul ChandlerGilberto Müller
Any Questions?

Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime

Semelhante a Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime (20)

Último

Último (20)

Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime