Sizing your Scylla Cluster in AWS, Azure and GCP

Eyal Gutkind - VP of Solutions, ScyllaDB
Sizing your Scylla Cluster
A walk through the process

Presenter
2
Eyal Gutkind
Eyal Gutkind is VP of Solutions at ScyllaDB. Prior to joining
ScyllaDB, Eyal held product management roles at Mirantis
and DataStax, and spent 12 years with Mellanox
Technologies in various engineering management and
product marketing roles. Eyal holds a BSc. degree in
Electrical and Computer Engineering from Ben Gurion
University in Israel and an MBA from Fuqua School of
Business at Duke University.

3
+ The Real-Time Big Data Database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ New: Scylla Cloud, DBaaS
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
About ScyllaDB

Agenda
+ Understand your workload
+ The machines
+ Let’s build a system
+ How it will look at different IaaS
+ Our sizing process

Thinking about database cluster sizing?!
6

Make sure you have all requirements set
7

8
+ Business

9
+ Business
+ Application

10
+ Business
+ Application
+ Infrastructure

11
+ Business
+ Application
+ Infrastructure
+ Resiliency

12
+ Business
+ Application
+ Infrastructure
+ Resiliency
+ Developer and Operator Friendliness

The obvious...
+ Data volume ingested per second/hour/day/year
+ Data attrition policy
+ Data format : Text, Binary blob
+ Required replication factor
+ What’s in your storage system?
13

The shape of your workload
https://www.scylladb.com/2019/05/23/workload-prioritization-running-oltp-and-olap-traffic-on-t
he-same-superhighway/

Someone needs to do the job!
17

Memory and data volume
+ Keep disk space to memory at a reasonable ratio
18

Connectivity
+ Enable enough bandwidth for client-server and server-server communication
19

Connectivity
+ Enable enough bandwidth for
client-server and server-server
communication
+ Geo-replication beneﬁts
20

Let’s build a system
Business requirements:
+ 1st year customers: 30M proﬁles
+ Number of records per proﬁle: 12
+ Avg. size of each record: 3KB
+ Data types: text
22

+ Data types: text
Application requirements:
+ 99% write response time: 15ms
+ 99% read response time: 10ms
+ Peak throughput: 150,000 operations/sec
+ Read:Write ratio: 70:30
23

+ Data types: text
Infrastructure:
+ AWS
+ Available instance types: all
+ Multi-DC: Oregon and N.Virgina
+ OS: CentOS 7.6
+ Replication Factor: 3
24

+ Data types: text
Infrastructure:
+ AWS
+ Available instance types: all
+ Multi-DC: Oregon and N.Virgina
+ OS: CentOS 7.6
Auxiliary applications:
+ Spark
25

How it looks
at different IaaS

+ End of year Total raw data size: 2TB, Starting with ~1TB
+ Typical record size read/written by the application: 3KB
+ Data model: 20-30 tables, up to 20 columns per row, 10-50 rows per partition, mainly text
+ Latency requirements: 10ms Write, 15ms Read, for the 99%
+ Throughput: 150,000 database op/s
+ IaaS: AWS, multi-region, multi-availability-zone, N. Virginia and Oregon
+ Spark for analytics
27

Let’s build a system, Amazon Web Services
+ Per Data Center
+ Needed disk space: 12TB
+ Media type: NVMe drives to meet latency SLA
+ Requires 30 threads, 15 physical cores
+ Per Data center instance options
+ 3 x i3.4xlarge → Total disk: 11.5TB
and
+ 3 x i3.2xlarge for the Spark cluster
+ 1x i3.2xlarge for Scyla monitoring and Scylla manager
28

Let’s build a system, Azure
+ Per Data Center
+ 3 x standard L16 v2 → Total disk: 11.5TB
and
+ 3 x standard L8 v2 for the Spark cluster
+ 1x standard L8 v2 for Scyla monitoring and Scylla manager
29

Let’s build a system, Google Cloud
+ Per Data Center
+ 6 x n1-standard-16 + 5x NVMe based direct attached, 375GB drives
and
+ 3 x n1-standard-8 for the Spark cluster
+ 1 x n1-standard-8 for Scyla monitoring and Scylla manager
30

Let’s build a system, Scylla Cloud
+ Per Data Center
+ 3 x i3.4xlarge → Total disk: 11.5TB
and
+ 3 x i3.2xlarge for the Spark cluster
31

Let’s build a system, on premise
+ Per Data Center
+ 3 machines with 8 physical cores each and at least 4TB of SSD direct attached drives
+ 3 machines with 4 or more physical cores for Spark
+ 1 x machine for Scyla monitoring and Scylla manager
+ Scylla nodes and Spark nodes: 128GB RAM
+ Network: 10GbE
32

Summary
+ Do not think only storage!
+ Gather applications and business requirements
+ Throughput and SLAs
+ Growth expectations
+ Security and compliance needs
+ Select the right infrastructure
+ Think about resiliency and high availability
Ask us questions!
35

Best Practices for Data Modeling
August 7, 2019 | 10:00 AM PT - 1:00 PM ET
How to Shrink Your
Datacenter Footprint by 50%
August 14, 2019 | 10:00 AM PT - 1:00 PM ET

Q&A
Stay in touch
eyal@scylladb.com
@gutkinde

United States
1900 Embarcadero Road
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

Sizing your Scylla Cluster in AWS, Azure and GCP

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Sizing your Scylla Cluster in AWS, Azure and GCP

Semelhante a Sizing your Scylla Cluster in AWS, Azure and GCP (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Sizing your Scylla Cluster in AWS, Azure and GCP