METRONOM, a multinational B2B Supermarket, migrated round 80 clusters with over 500 nodes from Rackspace in the UK to Google Cloud in Belgium, saving money and surviving Brexit. This is the story of how we managed this with zero production downtime, the problems and solutions we encountered on the way. An example of using Cassandra DCs to migrate data across geographical boundaries.
2. Gilberto Müller
• Engineering Manager
• 17 YoE
• XP - Infrastructure and datastores
• METRONOM for 2.5 years
• Previously HSBC, Wipro, MasterCard
• SRE enthusiast
3. Paul Chandler
• Independent Cassandra Consultant
• First used Cassandra in 2014
• Designed this Google Move process
• Historically based in the Travel Industry
British Airways, Avis, TUI etc
4. METRO
• Leading international wholesale and
retail food specialist company
• 50+ years old
• 35 countries
• 764 stores (in 25 countries)
• 150.000 people worldwide
• ~24mn customers
• €36.5bn on sales for fiscal year
2017/18
5. METRONOM
• The biggest software company
you never heard about (from our CEO)
• Digital transformation started in 2015
• Platform as a Service and Dev
• Cassandra started as the only option
• 8 Platform teams (changing over time)
• Multiple DCs in different countries,
hybrid-cloud (EU, CH, and RU*)
• 100+ application development teams
• MCC main customer
6. NoSQL Team
• 9 people from 10 different places
• Agile: Dash
• Shared responsibility
• Consultancy
• SRE
• DevOps
• Infrastructure as a Code
• Provisioning, patch, upgrade
• Support
• Migrations
• We offer a platform, not DBA
• Service wrapper (whole platform)
• Backup and restore (whole platform)
• On-call
7. Products
• Apache Cassandra
• DataStax Enterprise
• Apache Solr (Solr Cloud)
• DSE Search
• Apache Spark
• HDFS*
DataStax, is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, Cassandra, Apache Solr, Apache Spark, Spark, Apache Zookeeper, Zookeeper, Apache Hadoop, and Hadoop are
either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other
countries.
10. Steady State - 1 Datacenter RS_UK
RS_UK
• Multiple Clusters
• Move 1 cluster at time
• No Downtime allowed
11. RS_UK
• Local consistency types for Reading and
Writing
• LOCAL_ONE
LOCAL_QUORUM
• Application Driver needs to DC Aware policy
• Light Weight Transactions (LWT) must use
LOCAL_SERIAL
Application Pre Requisites
12. RS_UK
ALTER KEYSPACE system_auth WITH replication =
{'class': 'NetworkTopologyStrategy', ‘RS_UK': 3,
‘GL_EU': 3};
Keyspaces:
system_auth
system_schema
dse_leases
system_distributed
dse_perf
system_traces
dse_security
Step 1 – Alter system keyspaces
13. RS_UK
GL_EU• Can be different
Number of Nodes
• Only System keyspaces
automatically migrated
• Should be quick
Step 2 - Create Nodes in New Datacenter
14. RS_UK
GL_EUcassandra.yaml
• cluster_name: Must be the same
for both datacenters
• seeds: should point to seeds in
RS_UK
cassandra-rackdc.properties
• dc should be the new datacenter
Continue using
GossipingPropertyFileSnitch
Step 2 - Create Nodes in New Datacenter
16. RS_UK
GL_EU
• Must still connect to
RS_UK
• No Data in GL_EU
Nodes created and system keyspaces copied
17. RS_UK
GL_EU
• ALTER KEYSPACE user_keyspace1 WITH replication = {'class': 'NetworkTopologyStrategy',
‘RS_UK': 3, ‘GL_EU': 3};
• ALTER KEYSPACE user_keyspace2 WITH replication = {'class': 'NetworkTopologyStrategy',
‘RS_UK': 3, ‘GL_EU': 3};
• ALTER KEYSPACE user_keyspace3 WITH replication = {'class': 'NetworkTopologyStrategy',
‘RS_UK': 3, ‘GL_EU': 3};
Step 3 – Alter Replication for User Keyspaces
18. RS_UK
GL_EUAt This Point:
• Inserted data replicated
• Old data not replicated
(yet)
• Still don’t connect
• Lots of data missing
Keyspaces Replicated
19. RS_UK
GL_EUOn each new node run in turn
• nodetool rebuild RS_UK
This will take some time, best to script this section
Step 4 – Rebuild Nodes
22. RS_UK
GL_EUcassandra.yaml
change seed nodes to be nodes in GL_EU
Point all applications to new datacenter
Full repair on all nodes in new datacenter
Prepare for Decommission
24. RS_UK
GL_EU
• ALTER KEYSPACE user_keyspace1 WITH replication = {'class': 'NetworkTopologyStrategy',
‘GL_EU': 3};
• ALTER KEYSPACE user_keyspace2 WITH replication = {'class': 'NetworkTopologyStrategy',
‘GL_EU': 3};
• ALTER KEYSPACE user_keyspace3 WITH replication = {'class': 'NetworkTopologyStrategy', ‘
‘GL_EU': 3};
• Plus system keyspaces
Alter Replication to one Datacenter for ALL keyspaces
37. Heap Size
• Streaming and Compaction use up memory
• Heap size can be increased
• Don’t need to worry about GC pauses
• Change back before connecting applications
38. Compaction Throughput
• Large amount of data streamed
• Compaction Lag
• Lots of small sstables
• Update Compaction Throughput
nodetool setcompactionthroughput xxxxx
39. Streaming Throughput
• Reduce pressure if needed
• Reduce only streaming between datacenters
nodetool setinterdcstreamthroughput xxxxx
44. Lightweight Transactions (LWT)
insert into table (id, name)
values (1, “Name” )
IF NOT EXISTS
Uses Paxos algorithm
Uses different consistency level for Paxos
SERIAL or LOCAL_SERIAL
46. Implementation
• DB of cluster and node names
• Automatic scripts to create cloud
instances
• Scale clusters up or down
• Puppet
• Jenkins jobs
• Rebuild stage
• Decommission stage
• Service wrapper to protect integrity of
cluster
48. Success
• 91 Clusters moved
• Solr migration (not covered here)
• No C* cluster downtime
• Incorrect consistency sometimes caused application downtime
• April 2018 - October 2018
• One cluster delayed until February 2019
• Padding 0s with compression
• Automation is a must
49. Process can also be used for
• Splitting clusters (i.e.: multi-tenant)
• Updating non-trivial configuration
• num_tokens
• Upgrading underlying operating system
• Ubuntu upgrades (upstart –> systemd)
50. Thank You
More details can be found at:
https://bit.ly/2Lnosw6
Paul ChandlerGilberto Müller
Any Questions?