Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
Take control of your SAP testing with UiPath Test Suite
Everything You Need to Know About Sharding
1. Everything
You Need to
Know About
Sharding
Dylan Tong
dylan.tong@mongodb.com
Senior Solutions Architect
2. 2
Agenda
Overview
• What is sharding?
• Why and what should I use sharding for?
Building your First Sharded Cluster
• What do I need to know to succeed with sharding?
Q&A
3. 3
What is Sharding?
Sharding is a means of partitioning data across servers to enable:
Scale
needed by
modern
applications
to support
massive work
loads and
data volume.
Geo-Locality
to support
geographically
distributed
deployments to
support optimal UX
for customers across
vast geographies.
Hardware
Optimizations
on Performance vs.
Cost
Lower Recovery Times
to make “Recovery Time Objectives”
(RTO) feasible.
4. 4
What is Sharding?
Sharding involves a shard key defined by a data modeler
that describes the partition space of a data set.
Data is partitioned into data chunks by the shard key, and
these chunks are distributed evenly across shards that
reside across many physical servers.
MongoDB provides 3 Types of Sharding Strategies:
• Ranged
• Hashed
• Tag-aware
6. 6
Hash Sharding
Hash Sharding is a subset of Range Sharding.
MongoDB apples a MD5 hash on the key when a hash shard key is
used:
Hash Shard Key(deviceId) = MD5(deviceId)
Ensures data is distributed randomly within the range of MD5 values
…3333 …3334…8000 …8001…AAAA …AAAB…DDDD …DDDF
7. 7
Tag-aware Sharding
Tag-aware sharding allows subset of shards to be tagged, and assigned
to a sub-range of the shard-key.
Example: Sharding User Data belong to users from 100 “regions”
Collection: Users, Shard Key: {uId, regionCode}
Tag Start End
West MinKey, MinKey MaxKey,50
East MinKey, 50 MaxKey, MaxKey
Secondary
Secondary
Shard2,
Tag=West
Secondary
Secondary
Shard3,
Tag=East
Shard1,
Tag=West
Secondary
Secondary
Shard4,
Tag=East
Secondary
Secondary
Assign Regions
1-50 to the West
Assign Regions
51-100 to the
East
8. 8
Applying Sharding
Usage Required Strategy
Scale Range or Hash
Geo-Locality Tag-aware
Hardware Optimization Tag-aware
Lower Recovery Times Range or Hash
9. 9
Sharding for Scale
Performance Scale: Throughput and Latency
Data Scale: Cardinality, Data Volume
10. 10
Typical Small Deployment
Highly Available
but not Scalable
Replica Set
Writes Reads
Limited by capacity
of the Primary’s
host
When Immediate
Consistency
Matters: Limited by
capacity of the
Primary’s host
When Eventual
Consistency is
Acceptable: Limited
by capacity of
available replicaSet
members
11. 11
Sharded Architecture
Auto-balancing:
data is partitioned
based on a shard
key, and
automatically
balanced across
shards by MongoDB
Query Routing: database
operations are
transparently routed
across the cluster
through a routing proxy
process (software).
Horizontal Scalability:
load is distributed and
resources are pooled
across commodity
servers.
Increasing read/write capacity
Decoupling for
Development and
Operational Simplicity:
Ops can add capacity
without app dependencies
and Dev involvement.
12. 13
Value of Scale-out Architecture
Scale-down and re-allocate
resources
System Capacity
$
Scale-up Limits
High
Capacity/$
Scale-out to 100-1000s
of servers
Optimize on Capacity/$
Apps of
the Past
Modern Apps Trend
13. 14
Sharding for Geo-Locality
Adobe Cloud Services among other popular consumer and
Enterprise services use sharding to run servers across multiple data
centers across geographies.
Network latency from West to East is ~80ms
● Amazon - Every 1/10 second delay resulted in 1% loss of
sales.
● Google - Half a second delay caused a 20% drop in traffic.
● Aberdeen Group - 1-second delay in page-load time
o 11% fewer page views
o 16% decrease in customer satisfaction
o 7% loss in conversions
14. 15
Multi-Active DCs via Tag-aware
Sharding
Tag Start End
West MinKey, MinKey MaxKey,50
East MinKey, 50 MaxKey, MaxKey
WEST EAST
Secondary Secondary
Query
Tag = West
Shard 2
Tag = East
Shard 3
Tag = East
Local Reads (Eg. Read Preference = Nearest)
Query
Shard 1
Secondary Secondary
Secondary Secondary
Update
Update
Collection: Users, Shard Key: {uId, regionCode}
Priority=5 Priority=10
Priority=10 Priority=5
15. 17
Optimizing Latency and Cost
Magnitudes of Difference in Speed
Event Latency Normalized to 1 s
RAM access 120 ns 6 min
SSD access 150 μs 6 days
HDD access 10 ms 12 months
Magnitudes of Difference in Cost
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.50 550K
SSD 0.50-1.00 50K to 100K
HDD 0.03 3K
16. 18
Optimizing Latency and Cost
Use Case: Sensor data collected from millions of devices. Data used for
real-time decision automation, real-time monitoring and historical
reporting.
Data Type Description Latency SLA Data Volume
Meta Data Fast look-ups to
drive real-time
decisions
95th Percentile <
1ms
< 1 TB
Last 90 days of
Metrics
95+% of data
reported and
monitored
95th Percentile <
30ms
< 10 TB
Historic Used for historic
reporting. Access
infrequently
95th Percentile < 2s > 100TB
17. 19
Hardware Optimizations
Collections Tag Start End
Meta Cache MinKey MaxKey
Metrics Flash MinKey,MinKey MaxKey, 90 days ago
Metrics Archive MinKey,>90 days ago MaxKey, MaxKey
Collection Shard Key
Meta DeviceId
Metrics High Memory Ratio, DeviceId, Timestamp
Fast Cores
HDD
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Tag: Cache Tag: Flash Tag: Archive
Secondary
Secondary
Secondary
Secondary
Medium Memory Ratio,
High Compute
SSDs
Low Memory Ratio,
Medium Compute
HHD
18. 20
Restoration Times
Scenario: Application bug causes logical corruption of the data, and the database
needs to be rolled back to a previous PIT. What’s RTO does your business require in
this event?
Total DB Snapshot Size = 100TB
N = 10
10X10TB Snapshots generated
and/or transferred in parallel
N = 100
100X1TB Snapshots generated
and/or transferred in parallel.
Potentially 10X faster
restoration time.
Tar.g
z
Tar.g
z
Tar.g
z
Tar.g
z
20. Predictive Maintenance Platform: a cloud platform for building
predictive maintenance applications and services—such as a service
that monitors various vehicle components by collecting data from
sensors, and automatically prescribes actions to take.
•Allow tenant to register, ingest and modify data collected by sensors
•Define and apply workflows and business rules
•Publish and subscribe to notifications
•Data access API for things like reporting
22
Life-cycle of Sharding
Product Definition: Starts with an idea to build something big!
Design &
Development Test/QA Pre-Production Production
21. 23
Life-cycle of Sharding
Data Modeling: Do I need to Shard?
Throughput: data from millions of sensors updated in real-time
Latency: The value of certain attributes need to be access
with 95th percentile < 10ms to support real-time decisions
and automation.
Volume: 1TB of data collected per day. Retained for 5
years.
Design &
Development Test/QA Pre-Production Production
22. 26
Life-cycle of Sharding
Data Modeling: Select a Good Shard Key
Critical Step
•Sharding is only effective as the shard key.
•Shard Key attributes are immutable.
•Re-sharding is non-trivial. Requires re-partitioning
data.
Design &
Development Test/QA Pre-Production Production
23. 27
Good Shard Key
Cardinality
Write Distribution
Query Isolation
Reliability
Index Locality
31. Key = Hash(DeviceId) Key = DeviceId, Timestamp
35
Index Locality
Random Access Index Right Balance Index
MD5 ... -2 day -1 day -0 day
Right balanced index may only
need to be partially in RAM to be
effective.
Working Set
… Device N
32. 36
Good Shard Key
Key = DeviceId,
Timestamp
Random Sequential
Cardinality
Write Distribution
Query Isolation
Reliability
Index Locality
33. 37
Life-cycle of Sharding
Performance Testing: Avoid Pitfalls
Best Practices:
•Pre-split:
Sharding results in massive
performance degradation!
1. Hash Shard Key: specify numInitialChunks:
http://docs.mongodb.org/manual/reference/command/shardC
ollection/
2. Custom Shard Splits happen Key: create on demand
a pre-split script:
http://docs.as mongodb.chunks grow
org/manual/tutorial/create-chunks-in-sharded-
cluster/
Migrations happen when an
imbalance is detected
• Run mongos (query router) on app server if possible.
Design &
Development Test/QA Pre-Production Production
34. 38
Life-cycle of Sharding
Capacity Planning: How many shards do I need?
Sizing:
•What are the total resources required for your initial
deployment?
•What are the ideal hardware specs, and the # shards
necessary?
Capacity Planning: create a model to scale MongoDB for a
specific app.
•How do I determine when more shards need to be added?
•How much capacity do I gain from adding a shard?
Design &
Development Test/QA Pre-Production Production
35. 39
How Many Servers?
Strategy Accuracy Level of Effort Feasibility of
Early Project
Analysis
Domain Expert High to Low:
inversely related
to complexity of
the Application
Low Yes
Empirical (Load
Testing)
High High Unlikely
36. 41
Domain Expert
Normally, performed by MongoDB Solution Architect:
http://bit.ly/1rkXcfN
• What is the document model? Collections, documents, indexes
• What are the major operations?
- Throughput
- Latency
• What is the working set? Eg. Last 90 days of orders
Business SolutionM odel and Load Definition
Analysis Resource Analysis Hardware Specification
37. Adjust more or less depending on latency vs. cost requirements. Very large clusters
should account for connection pooling/thread overhead (1MB per active thread)
faults. Assume random IO. Account for replication, journal and log (note: sequential
IO). Ideally, estimated empirically through prototype testing. Experts can use
experience from similar applications as an estimate. Spot testing maybe needed.
Storage Estimate using throughput, document and index size approximations, and retention
42
Domain Expert
Resource Methodology
RAM Standard: Working Set + Indexes
IOPs Primarily based on throughput requirements. Writes + estimation on query page
requirements. Account for overhead like fragmentation if applicable.
CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using current
commodity CPU specs will suffice.
Network Estimate using throughput and document size approximations.
Business SolutionM odel and Load Definition
Analysis Resource Analysis Hardware Specification
38. 44
Sizing by Empirical Testing
• Sizing can be more accurately obtained by prototyping your application, and
performing load tests on selected hardware.
• Capacity Planning can be simultaneously accomplished through load testing.
• Past Webinars: http://www.mongodb.com/presentations/webinar-capacity-planning
Strategy:
1. Implement a prototype that can at least simulate major workloads
2. Select an economical server that you plan to scale-out on.
3. Saturate a single replicaSet or shard (maintaining latency SLA as needed). Address
bottlenecks, optimize and repeat.
4. Add an additional shard (as well as mongos and clients as needed). Saturate and
confirm roughly linear scaling.
5. Repeat step 4 until you are able to model capacity gains (throughput + latency)
versus #physical servers.
39. 45
Operational Scale
Business Critical Operations: How do I manage 100s to 1000s of nodes?
MongoDB Management Services (MMS): https://mms.mongodb.com
• Real-time monitoring
and visualization of
cluster health
• Alerting
• Automated cluster
provisioning
• Automation of daily
operational tasks like no-downtime
upgrades
• Centralized configuration
management
• Automated PIT
snapshotting of
clusters
• PITR support for
sharded clusters
Design &
Development Test/QA Pre-Production Production
40. 46
MMS Automation
Server Resources
(anywhere)
Agent
MMS
On-Prem or
SaaS
42. Get Expert Advice on Scaling. For Free.
For a limited time, if you’re
considering a commercial
relationship with
MongoDB, you can sign up
for a free one hour consult
about scaling with one of
our MongoDB Engineers.
Sign Up: http://bit.ly/1rkXcfN
EA Sports FIFA: world&apos;s best-selling sports video game franchise. User data and game state for millions of players,
Yandex: The largest search engine in Russia uses MongoDB to manage all user and metadata for its file sharing service. MongoDB has scaled to support tens of billions of objects and TBs of data, growing at 10 million new file uploads per day.
eBay
FourSquare: Foursquare is used by over 50 million people worldwide, who have checked in over 6 billion times, with millions more added every day. MongoDB is Foursquare’s main database, supporting hundreds of thousands of operations per second and storing all check-ins and history, user and venue data along with reviews.
AHL, a part of Man Group plc, is a quantitative investment manager based in London and Hong Kong, with over $11.3 billion in assets under management. After evaluating multiple technology options, AHL used MongoDB to replace its relational and specialised &quot;tick&quot; databases. MongoDB supports 250 million ticks per second, at 40x lower cost than the legacy technologies it replaced.
Adobe: Many of the world’s most recognizable brands use Adobe Experience Manager to accelerate development of digital experiences that increase customer loyalty, engagement and demand. Adobe uses MongoDB to store petabytes of data the large-scale content repositories underpinning the Experience Manager.
MongoDB MMS Back-up: 2PB
Mcafee: MongoDB powers McAfee Global Threat Intelligence (GTI), a cloud-based intelligence service that correlates data from millions of sensors around the globe. Billions of documents are stored and analyzed in MongoDB to deliver real-time threat intelligence to other McAfee end-client products.
Carfax: CARFAX relies on its Vehicle History database to connect potential buyers with used vehicles in their area, and for analytics to guide the business. To improve customer experience, CARFAX migrated to MongoDB which now manages over 13 billion documents, before replication across multiple data centers.
Limited capacity: monolithic architecture, siloed data access, or complex application-level facilitated sharding that is difficult and expensive to successfully implement
Scale down: system with seasonal load (ex. Online games- popular from the start and fade over time). Sharding enables scaling-down to reallocate resources on bare-metal servers. Can’t do the same for a database appliance
Hash Key isn’t always a good key. Can be good for the most most simplistic k-v access patterns, but any query retrieving multiple documents like range based queries can potentially lead to scatter-gather queries, which have high overhead and impair the ability to scale linearly.