Mais conteúdo relacionado Semelhante a Building Data Pipelines with SMACK: Designing Storage Strategies for Scale and Performance (20) Building Data Pipelines with SMACK: Designing Storage Strategies for Scale and Performance1. Building Data Pipelines with SMACK:
Storage Strategies for Scale & Performance
June 8, 2016
Jonathan Shook, Solution Architect, DataStax
3. 1 Essential Storage Concepts
2 Design Strategies
3 Storage Selection
4 Q & A
3© DataStax, All Rights Reserved.
5. Important Terms
• Topology
• Bandwidth, Throughput, Headroom
• Latency, Minimum Latency
• Concurrency, Parallelism, Contention
© DataStax, All Rights Reserved. 5
6. Basic System Topology
6
Every modern system is
essentially a network of
components.
The language of message
delivery applies at every
level of design.
System Topology Example (high level)
HDD SSD
7. Term: Bandwidth, Throughput, Headroom
• Bandwidth - Maximum rated transfer speed of a device
• Throughput - Measurement of achievable transfer speed
• Headroom - Safety margin above normal usage - “reserve
capacity”
© DataStax, All Rights Reserved. 7
8. Throughput Example: SATA3
Using a popular SSD and an online benchmark...
© DataStax, All Rights Reserved. 8
Bandwidth Throughput Headroom
6Gb/s (750MB/s) 40MB-500MB as
tested, depending
on operation type
30%, for example.
This is a design
parameter.
In this case, if you can achieve 200MB throughput on the drive
for your operational patterns, headroom of 30% means you
should be scaling out before your metrics show 140MB/s.
9. Term: Latency and Minimum Latency
• Latency - How long it takes to receive a response, once a
request is submitted
• Minimum Latency - Latency which is possible on a single
node when there is no resource contention
© DataStax, All Rights Reserved. 9
Single Node Replica Set of 3 Nodes and
LOCAL_QUORUM
• However fast that node can service the
request, uncontended.
• Writes: The fastest 2 of 3 nodes in the
replica set to respond.
• Reads: Usually the fastest 2 of 3, based
on latency trends.
10. Latency and Throughput Example:
Random reads at different block sizes
© DataStax, All Rights Reserved. 10
SATA HDD has an unavoidable
seek time penalty for all op sizes.
Throughput tops out at 180MB/s
at 16MB read sizes and over 1.5
seconds of latency.
SATA SSD performs well.
550MB is possible, but
desirable latencies are found
below 1MB read size.
The NVMe drive can push 2
CDs worth of data per second
at 128KB read sizes. At 16MB,
latency is only .25 seconds.
11. © DataStax, All Rights Reserved. 11
Latency and Throughput Example:
Compared by Drive Type
This shows the same measurements compared between drive types.
12. Latency & Throughput Example:
Comparative Numbers
12
1 block read
(512 bytes)
KB/s µs latency iops
NVMe 62006 177 124013
SATA SSD 38700 306 77400
SATA HDD 215 119000 430
256 block read
(128 KB)
KB/s µs latency iops
NVMe 1707520 1160 13339
SATA SSD 549133 2320 4290
SATA HDD 41198 157000 321
32K block read
(16 MB)
KB/s µs latency iops
NVMe 1339596.8 235000 81
SATA SSD 554920 594000 33
SATA HDD 179063 1647000 10
13. Term: Concurrency, Parallelism, Contention
• Concurrency - Multiple requests in flight
• Parallelism - Simultaneous processing of requests
• Resource Contention - When work is blocked awaiting
access to a shared resource
Concurrency without parallelism causes resource contention,
queueing, latency increases, and unhappy users.
© DataStax, All Rights Reserved. 13
15. Key Design Strategies
1. Design to the Workload
2. Simplify the Storage Path
3. Maintain Headroom
4. Balance Compute and I/O
5. Balance I/O Caching
© DataStax, All Rights Reserved. 15
16. Strategy #1: Design to the Workload
• Estimate your workloads.
Focus on the read patterns.
• Can your users endure effects
of resource contention?
• Can they endure disruptive
outliers?
• How do you know?
© DataStax, All Rights Reserved. 16
17. Strategy #2: Simplify the Storage Path
© DataStax, All Rights Reserved. 17
• Avoid unnecessary hardware layers. Go directly from your
system chipset to the drive when possible.
• Favor JBOD over storage aggregation.
• Only use RAID for:
– Datacenter or Operator Standards with HDDs.
(Try to avoid RAID with SSDs if possible.)
– Aggregating smaller disks.
(Why not just get larger drives for JBOD?)
18. Strategy #3: Maintain Headroom
• Build-in headroom according to your loading patterns.
• Measure your system with bench tools.
• Saturate during non-prod testing, and use that as a reference
point in production.
© DataStax, All Rights Reserved. 18
19. Strategy #4: Balance Compute and I/O
© DataStax, All Rights Reserved. 19
• Databases are not just storage APIs.
• You need to keep your CPU and IO throughput in relative
balance.
• Perfection is not required, but extreme imbalances are no
fun.
• There will always be a bottleneck.
20. Strategy #5: Balance I/O Caching
© DataStax, All Rights Reserved. 20
• Understand the potential benefits of caching: best and
worst cases.
• “Unused” memory in Linux is available for caching.
• Don’t depend on cache to solve cold read latencies.
• Design around cold-read performance first.
22. 22
It’s a bad idea.
SANs for distributed databases...
Have strong skepticism when anybody tells you otherwise.
Perhaps they haven’t tried it yet, or are ignoring the obvious.
You don’t have to suffer the pains of others in order to learn
from their experiences. Still, some insist on trying.
23. HDD vs. SSD
23
Type Pro Con
HDD ● Cheap? ● All concurrent operations are contended
● Random access is slow - drive seek
● Power usage
● Lower latencies come with much higher
costs
● Little room for further improvement
SSD ● Cheap? (1TB ~ $300)
● Fast
● Low internal contention
● Runs cooler / lower
wattage
● Faster transport
technology available
● Initial capacities available - encouraged
RAID shenanigans → No longer an issue
for reasonable data densities with
Cassandra/DSE.
● MTBF of earlier designs → No longer an
issue as SSDs have made huge strides in
reliability and DWPD limits
● Initial cost - No longer an issue
25. Selecting SSD vs. HDD
Favor modern SSDs by default.
Use HDDs only if you must for:
● High-write applications with low read concurrency
● Archival or Logging systems with low read concurrency
● Commit log storage, if you have the option
● Persistent messaging systems
● Non-latency sensitive batch/analytics workloads
25
26. Storage Path
© DataStax, All Rights Reserved. 26
A) Direct SSD
B) Direct HDD
C) NVMe
D) SSDs via HBA
E) HDDs via HBA
F) Combo via HBA
We’ll come back to this
slide if we have time.
HDD SSD
27. Data Density
• Keep data density in reasonable bounds.
• Every database must deal with the realities of storage traversal.
• Avoid trying to store too much data on a node.
© DataStax, All Rights Reserved. 27
28. In Conclusion...
• Provision with headroom to avoid unnecessary contention.
• Select hardware to support user and workload requirements.
• Keep the storage path as simple as possible.
• Consider SSDs by default for your data directories.
28
29. Coming Soon!
● June 23: Top 5 Reasons Why DSE is Game Changing
● July 7: Proofpoint & DataStax Webinar
● For the latest schedule of webinars, check out our Webinars
page: http://www.datastax.com/resources/webinars
© 2015 DataStax, All Rights Reserved. 29
30. Get your SMACK on!
Thank You!
Follow me on Twitter: @Shookinator
© 2015 DataStax, All Rights Reserved. 30
32. Q & A
© 2015 DataStax, All Rights Reserved. 32
35. Math relating to Scale & Performance
Little’s Law
Relates latency, concurrency and throughput as averages.
Ahmdahl’s Law
Relates latency to improvements in working resources.
Pigeonhole principle
Statistics of the pigeonhole principle come up again and again in distributed computing.
Latency numbers every programmer should know.
© DataStax, All Rights Reserved. 35
36. Online Resources
C* Microbench scripts
Fio scripts to measure a disk subsystem across many C*-style workloads.
https://github.com/jshook/perfscripts
Al’s Tuning Guide: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
© DataStax, All Rights Reserved. 36
38. Addendum: What about RAID?
See IBM Patent 4092732 about a 1978 solution to a 1978
problem: drives were very unreliable, and systems were not
resilient to failure. In 1978, parallelism was pronounced
“mainframe”. Times have changed.
System topologies of today expose storage parallelism all
the way to the drive. Cassandra allows drive failure without
cluster failure. Cassandra can make direct use of the
parallelism exposed at the storage layer.
© DataStax, All Rights Reserved. 38