Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources.
Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other.
In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.
What is Advanced Excel and what are some best practices for designing and cre...
Sizing your Scylla Cluster in AWS, Azure and GCP
1. Eyal Gutkind - VP of Solutions, ScyllaDB
Sizing your Scylla Cluster
A walk through the process
2. Presenter
2
Eyal Gutkind
Eyal Gutkind is VP of Solutions at ScyllaDB. Prior to joining
ScyllaDB, Eyal held product management roles at Mirantis
and DataStax, and spent 12 years with Mellanox
Technologies in various engineering management and
product marketing roles. Eyal holds a BSc. degree in
Electrical and Computer Engineering from Ben Gurion
University in Israel and an MBA from Fuqua School of
Business at Duke University.
3. 3
+ The Real-Time Big Data Database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ New: Scylla Cloud, DBaaS
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB
4. Agenda
+ Understand your workload
+ The machines
+ Let’s build a system
+ How it will look at different IaaS
+ Our sizing process
9. Make sure you have all requirements set
9
+ Business
+ Application
10. Make sure you have all requirements set
10
+ Business
+ Application
+ Infrastructure
11. Make sure you have all requirements set
11
+ Business
+ Application
+ Infrastructure
+ Resiliency
12. Make sure you have all requirements set
12
+ Business
+ Application
+ Infrastructure
+ Resiliency
+ Developer and Operator Friendliness
13. The obvious...
+ Data volume ingested per second/hour/day/year
+ Data attrition policy
+ Data format : Text, Binary blob
+ Required replication factor
+ What’s in your storage system?
13
22. Let’s build a system
Business requirements:
+ 1st year customers: 30M profiles
+ Number of records per profile: 12
+ Avg. size of each record: 3KB
+ Data types: text
22
23. Let’s build a system
Business requirements:
+ 1st year customers: 30M profiles
+ Number of records per profile: 12
+ Avg. size of each record: 3KB
+ Data types: text
Application requirements:
+ 99% write response time: 15ms
+ 99% read response time: 10ms
+ Peak throughput: 150,000 operations/sec
+ Read:Write ratio: 70:30
23
24. Let’s build a system
Business requirements:
+ 1st year customers: 30M profiles
+ Number of records per profile: 12
+ Avg. size of each record: 3KB
+ Data types: text
Application requirements:
+ 99% write response time: 15ms
+ 99% read response time: 10ms
+ Peak throughput: 150,000 operations/sec
+ Read:Write ratio: 70:30
Infrastructure:
+ AWS
+ Available instance types: all
+ Multi-DC: Oregon and N.Virgina
+ OS: CentOS 7.6
+ Replication Factor: 3
24
25. Let’s build a system
Business requirements:
+ 1st year customers: 30M profiles
+ Number of records per profile: 12
+ Avg. size of each record: 3KB
+ Data types: text
Application requirements:
+ 99% write response time: 15ms
+ 99% read response time: 10ms
+ Peak throughput: 150,000 operations/sec
+ Read:Write ratio: 70:30
Infrastructure:
+ AWS
+ Available instance types: all
+ Multi-DC: Oregon and N.Virgina
+ OS: CentOS 7.6
+ Replication Factor: 3
Auxiliary applications:
+ Spark
25
27. Let’s build a system
+ End of year Total raw data size: 2TB, Starting with ~1TB
+ Typical record size read/written by the application: 3KB
+ Data model: 20-30 tables, up to 20 columns per row, 10-50 rows per partition, mainly text
+ Latency requirements: 10ms Write, 15ms Read, for the 99%
+ Read:Write ratio: 70:30
+ Throughput: 150,000 database op/s
+ IaaS: AWS, multi-region, multi-availability-zone, N. Virginia and Oregon
+ Replication Factor: 3
+ Spark for analytics
27
28. Let’s build a system, Amazon Web Services
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 3 x i3.4xlarge → Total disk: 11.5TB
and
+ 3 x i3.2xlarge for the Spark cluster
+ 1x i3.2xlarge for Scyla monitoring and Scylla manager
28
29. Let’s build a system, Azure
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 3 x standard L16 v2 → Total disk: 11.5TB
and
+ 3 x standard L8 v2 for the Spark cluster
+ 1x standard L8 v2 for Scyla monitoring and Scylla manager
29
30. Let’s build a system, Google Cloud
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 6 x n1-standard-16 + 5x NVMe based direct attached, 375GB drives
and
+ 3 x n1-standard-8 for the Spark cluster
+ 1 x n1-standard-8 for Scyla monitoring and Scylla manager
30
31. Let’s build a system, Scylla Cloud
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 3 x i3.4xlarge → Total disk: 11.5TB
and
+ 3 x i3.2xlarge for the Spark cluster
31
32. Let’s build a system, on premise
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 3 machines with 8 physical cores each and at least 4TB of SSD direct attached drives
+ 3 machines with 4 or more physical cores for Spark
+ 1 x machine for Scyla monitoring and Scylla manager
+ Scylla nodes and Spark nodes: 128GB RAM
+ Network: 10GbE
32
35. Summary
+ Do not think only storage!
+ Gather applications and business requirements
+ Throughput and SLAs
+ Growth expectations
+ Security and compliance needs
+ Select the right infrastructure
+ Think about resiliency and high availability
Ask us questions!
35
36. Best Practices for Data Modeling
August 7, 2019 | 10:00 AM PT - 1:00 PM ET
How to Shrink Your
Datacenter Footprint by 50%
August 14, 2019 | 10:00 AM PT - 1:00 PM ET