In this day and age, data grows so fast it’s not uncommon for those of us using a relational database to reach the limits of its capacity. In this session, Kwangbock Lee explains how Samsung uses ClustrixDB to handle fast-growing data without manual database sharding. He highlights lessons learned, including a few hiccups along the way, and shares Samsung's experience migrating to ClustrixDB.
2. Agenda
1. Introduction of Samsung Cloud Platform
2. Requirements & Features
3. Samsung Cloud + ClustrixDB Journey
4. Issues & Enhancements
5. Wrap Up
3.
4. User Benefit
Backup and restore
data and settings
Your photos on multiple
devices any time
15 GB of free storage,
Upgrade for more
- Home screen, App data, Contact, Messages,
Device settings, Music, Documents, etc.
- Sync photos, videos, notes using native
applications across Samsung devices
- Premium Plans
. Korea, 29 countries in EU (’16, Nov)
. US models (excl. VZW. ATT, ’17, Feb)
. Brazil (unlock devices, ‘18, Mar)
* No. 1 request from customers
5. Figures of Samsung Cloud
Hundreds of millions
Members
Tens of billions
Daily Request
Hundreds of PiB
Storage
ClustrixDB
Cassandra
MySQL
DynamoDB
6. Samsung Cloud Architecture
Data Processing
Layer Backend Modules
Service Modules
Basic Modules User Modules
Data Layer
API Gateway
Application Layer
Access Layer
ClustrixDB Cassandra
7. User Architecture – Before Migration
Master
Slaves
Master
Slaves
Shard Info
…
Shard #1 Shard #2
8. Key Challenges
● RDBMS Scaling Strategy
○ Sharding Overhead
○ Migration Overhead
○ Additional Codes for both sharding & migration
● High Availability
● Analytic Query
○ Need to run the query in every Shard DB and merge it.
● Online Schema Change
● Online Backup / Restore
9. Requirements & Clustrix Features
● Scalability, No more Sharding!
● ACID Compliant
● MySQL Compatible
● Fault Tolerance, No SPOF!
● OLTP and Operational Analytics
● Online Schema Change
● Online Backup / Restore
● Scalable
● High-Volume, High Concurrent OLTP
● Automatic Data Distribution
● Distributed Query Execution
● Fault-Tolerant
● Flexible Deployment Options
● MySQL Compatible
● Easy to Migrate from MySQL
● Fast Backup and Restore
Requirements Clustrix Features
10. Key Features of ClustrixDB
Scalability
● Scalable Architecture
○ Can scale linearly as nodes are added
○ Automatically distributes both data and query execution to scale
○ Flex Up & Flex Down
● Rebalancer
○ Automatically manage the distribution of data for the cluster
○ Read/Write imbalance across node/zones (ranking replica)
11. Key Features of ClustrixDB
Fault-Tolerant
● Built-in Fault Tolerance can endure a single node failure and automatically
maintain 2 copies of all data
● Replication
● Deploying Across Zones
○ AWS Availability Zones (requires 3 AZ)
● MAX_FAILURES
○ Number of failures that can occur simultaneously
○ ALTER CLUSTER SET MAX_FAILURES = number of simultaneous node failures
12. Key Features of ClustrixDB
Online Schema Change
● No blocking read or writes to a table
○ Requires more space to run
● Distributed Parallel Query Execution – FANOUT option
○ query_fanout
○ query_fanout_insert_select
○ query_fanout_all_writes
● Monitoring the Process of an ALTER
○ system.alter_progress
14. Issues & Enhancements
Replication Configuration with MySQL 5.7
● For Migraion Deployment
● MySQL 5.7(master) – ClustrixDB (slave)
PoC
ClustrixDB
v7.6
2016
Master Slave
ClustrixDB
15. Issues & Enhancements
Fast Backup and Restore
√ Fast Backup and Restore as a binary
backup mechanism
√ Each node sends its data directly to the
backup target in parallel
√ Provides SFTP for Backup and Restore
√ Can control concurrency
ClustrixDB
PoC
ClustrixDB
v7.6
2016
FTP Server
Secure FTP
17. Issues & Enhancements
Enhanced Security
● SSL
○ Supports SSL Encrypted Connections
○ Requires a mysql client 5.6.38 or higher
● SHA256 Password Plugin
○ Provides strong user password credentials than mysql_native_password plugin
● Audit (User Logging)
○ Provides audit logs of user login/logout (user.log)
○ SET GLOBAL session_log_users = true;
Expansion
ClustrixDB
v9.1
2018
18. Issues & Enhancements
Monitoring Tools
● Built-in Monitoring tool - ClustrixGUI
● Network security policy blocks using ClustrixGUI
● Need long-term historical data
√ Monitoring with InfluxDB & Grafana
○ Collector script
○ Grafana dashboard
√ Other tools are available
Expansion
ClustrixDB
v9.1
2018
19. Current Architecture
Architecture #1
Zone 1 Zone 2 Zone 3
Master Slave
ClustrixDBClustrixDB
Architecture #2
MAX_FAILURES = 2
REPLICAS = 3
MAX_FAILURES = 1
REPLICAS = 2
ClustrixDB
20. Current Deployment & Usage
Region #2
Region #1
Region #3
M SS
M SS
230 Million
TPS
16 Billion
Rows
2 Services
3 Regions
21. No Additional Resources for
Migration or Sharding
Downsized Instance Spec.
No Standby Replicas for HA,
Backup, Analytics
Less Man-Month
Easy Scalability
No SPOF, Strong HA
Better Maintenance &
Monitoring
Analytic Query
Tech Support
Simplified Application
Architecture
No Additional Code for
Migration or Sharding
Focus on Service Logic
Development
Benefits
Operation Cost
22. Wrap Up
● Future Work
○ BINLOG / Replication Enhancement
○ ETL Tools
● Q&A