SlideShare uma empresa Scribd logo
1 de 44
Everything 
You Need to 
Know About 
Sharding 
Dylan Tong 
dylan.tong@mongodb.com 
Senior Solutions Architect
2 
Agenda 
Overview 
• What is sharding? 
• Why and what should I use sharding for? 
Building your First Sharded Cluster 
• What do I need to know to succeed with sharding? 
Q&A
3 
What is Sharding? 
Sharding is a means of partitioning data across servers to enable: 
Scale 
needed by 
modern 
applications 
to support 
massive work 
loads and 
data volume. 
Geo-Locality 
to support 
geographically 
distributed 
deployments to 
support optimal UX 
for customers across 
vast geographies. 
Hardware 
Optimizations 
on Performance vs. 
Cost 
Lower Recovery Times 
to make “Recovery Time Objectives” 
(RTO) feasible.
4 
What is Sharding? 
Sharding involves a shard key defined by a data modeler 
that describes the partition space of a data set. 
Data is partitioned into data chunks by the shard key, and 
these chunks are distributed evenly across shards that 
reside across many physical servers. 
MongoDB provides 3 Types of Sharding Strategies: 
• Ranged 
• Hashed 
• Tag-aware
5 
Range Sharding 
Shard Key: {deviceId} 
…1000 1001……2000 2001……3000 3001……4000 4001… 
Composite Keys Supported: {deviceId, timestamp} 
…1000,1418244824 …1000,1418244825
6 
Hash Sharding 
Hash Sharding is a subset of Range Sharding. 
MongoDB apples a MD5 hash on the key when a hash shard key is 
used: 
Hash Shard Key(deviceId) = MD5(deviceId) 
Ensures data is distributed randomly within the range of MD5 values 
…3333 …3334…8000 …8001…AAAA …AAAB…DDDD …DDDF
7 
Tag-aware Sharding 
Tag-aware sharding allows subset of shards to be tagged, and assigned 
to a sub-range of the shard-key. 
Example: Sharding User Data belong to users from 100 “regions” 
Collection: Users, Shard Key: {uId, regionCode} 
Tag Start End 
West MinKey, MinKey MaxKey,50 
East MinKey, 50 MaxKey, MaxKey 
Secondary 
Secondary 
Shard2, 
Tag=West 
Secondary 
Secondary 
Shard3, 
Tag=East 
Shard1, 
Tag=West 
Secondary 
Secondary 
Shard4, 
Tag=East 
Secondary 
Secondary 
Assign Regions 
1-50 to the West 
Assign Regions 
51-100 to the 
East
8 
Applying Sharding 
Usage Required Strategy 
Scale Range or Hash 
Geo-Locality Tag-aware 
Hardware Optimization Tag-aware 
Lower Recovery Times Range or Hash
9 
Sharding for Scale 
Performance Scale: Throughput and Latency 
Data Scale: Cardinality, Data Volume
10 
Typical Small Deployment 
Highly Available 
but not Scalable 
Replica Set 
Writes Reads 
Limited by capacity 
of the Primary’s 
host 
When Immediate 
Consistency 
Matters: Limited by 
capacity of the 
Primary’s host 
When Eventual 
Consistency is 
Acceptable: Limited 
by capacity of 
available replicaSet 
members
11 
Sharded Architecture 
Auto-balancing: 
data is partitioned 
based on a shard 
key, and 
automatically 
balanced across 
shards by MongoDB 
Query Routing: database 
operations are 
transparently routed 
across the cluster 
through a routing proxy 
process (software). 
Horizontal Scalability: 
load is distributed and 
resources are pooled 
across commodity 
servers. 
Increasing read/write capacity 
Decoupling for 
Development and 
Operational Simplicity: 
Ops can add capacity 
without app dependencies 
and Dev involvement.
13 
Value of Scale-out Architecture 
Scale-down and re-allocate 
resources 
System Capacity 
$ 
Scale-up Limits 
High 
Capacity/$ 
Scale-out to 100-1000s 
of servers 
Optimize on Capacity/$ 
Apps of 
the Past 
Modern Apps Trend
14 
Sharding for Geo-Locality 
Adobe Cloud Services among other popular consumer and 
Enterprise services use sharding to run servers across multiple data 
centers across geographies. 
Network latency from West to East is ~80ms 
● Amazon - Every 1/10 second delay resulted in 1% loss of 
sales. 
● Google - Half a second delay caused a 20% drop in traffic. 
● Aberdeen Group - 1-second delay in page-load time 
o 11% fewer page views 
o 16% decrease in customer satisfaction 
o 7% loss in conversions
15 
Multi-Active DCs via Tag-aware 
Sharding 
Tag Start End 
West MinKey, MinKey MaxKey,50 
East MinKey, 50 MaxKey, MaxKey 
WEST EAST 
Secondary Secondary 
Query 
Tag = West 
Shard 2 
Tag = East 
Shard 3 
Tag = East 
Local Reads (Eg. Read Preference = Nearest) 
Query 
Shard 1 
Secondary Secondary 
Secondary Secondary 
Update 
Update 
Collection: Users, Shard Key: {uId, regionCode} 
Priority=5 Priority=10 
Priority=10 Priority=5
17 
Optimizing Latency and Cost 
Magnitudes of Difference in Speed 
Event Latency Normalized to 1 s 
RAM access 120 ns 6 min 
SSD access 150 μs 6 days 
HDD access 10 ms 12 months 
Magnitudes of Difference in Cost 
Storage Type Avg. Cost ($/GB) Cost at 100TB ($) 
RAM 5.50 550K 
SSD 0.50-1.00 50K to 100K 
HDD 0.03 3K
18 
Optimizing Latency and Cost 
Use Case: Sensor data collected from millions of devices. Data used for 
real-time decision automation, real-time monitoring and historical 
reporting. 
Data Type Description Latency SLA Data Volume 
Meta Data Fast look-ups to 
drive real-time 
decisions 
95th Percentile < 
1ms 
< 1 TB 
Last 90 days of 
Metrics 
95+% of data 
reported and 
monitored 
95th Percentile < 
30ms 
< 10 TB 
Historic Used for historic 
reporting. Access 
infrequently 
95th Percentile < 2s > 100TB
19 
Hardware Optimizations 
Collections Tag Start End 
Meta Cache MinKey MaxKey 
Metrics Flash MinKey,MinKey MaxKey, 90 days ago 
Metrics Archive MinKey,>90 days ago MaxKey, MaxKey 
Collection Shard Key 
Meta DeviceId 
Metrics High Memory Ratio, DeviceId, Timestamp 
Fast Cores 
HDD 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Secondary 
Tag: Cache Tag: Flash Tag: Archive 
Secondary 
Secondary 
Secondary 
Secondary 
Medium Memory Ratio, 
High Compute 
SSDs 
Low Memory Ratio, 
Medium Compute 
HHD
20 
Restoration Times 
Scenario: Application bug causes logical corruption of the data, and the database 
needs to be rolled back to a previous PIT. What’s RTO does your business require in 
this event? 
Total DB Snapshot Size = 100TB 
N = 10 
10X10TB Snapshots generated 
and/or transferred in parallel 
N = 100 
100X1TB Snapshots generated 
and/or transferred in parallel. 
Potentially 10X faster 
restoration time. 
Tar.g 
z 
Tar.g 
z 
Tar.g 
z 
Tar.g 
z
Building Your First 
Sharded Cluster
Predictive Maintenance Platform: a cloud platform for building 
predictive maintenance applications and services—such as a service 
that monitors various vehicle components by collecting data from 
sensors, and automatically prescribes actions to take. 
•Allow tenant to register, ingest and modify data collected by sensors 
•Define and apply workflows and business rules 
•Publish and subscribe to notifications 
•Data access API for things like reporting 
22 
Life-cycle of Sharding 
Product Definition: Starts with an idea to build something big! 
Design & 
Development Test/QA Pre-Production Production
23 
Life-cycle of Sharding 
Data Modeling: Do I need to Shard? 
Throughput: data from millions of sensors updated in real-time 
Latency: The value of certain attributes need to be access 
with 95th percentile < 10ms to support real-time decisions 
and automation. 
Volume: 1TB of data collected per day. Retained for 5 
years. 
Design & 
Development Test/QA Pre-Production Production
26 
Life-cycle of Sharding 
Data Modeling: Select a Good Shard Key 
Critical Step 
•Sharding is only effective as the shard key. 
•Shard Key attributes are immutable. 
•Re-sharding is non-trivial. Requires re-partitioning 
data. 
Design & 
Development Test/QA Pre-Production Production
27 
Good Shard Key 
Cardinality 
Write Distribution 
Query Isolation 
Reliability 
Index Locality
28 
Cardinality 
Key = Data Center
29 
Cardinality 
Key = Timestamp
30 
Write Distribution 
Key = Timestamp
31 
Write Distribution 
Key = Hash(Timestamp)
32 
Query Isolation 
Key = Hash(Timestamp) 
“Scatter-gather Query”
33 
Query Isolation 
Key = Hash(DeviceId) 
*Assumes bulk of queries on collection are in 
context of a single deviceId
34 
Reliability 
Key = Hash(Timestamp) Key = Hash(DeviceId)
Key = Hash(DeviceId) Key = DeviceId, Timestamp 
35 
Index Locality 
Random Access Index Right Balance Index 
 MD5  ... -2 day -1 day -0 day 
Right balanced index may only 
need to be partially in RAM to be 
effective. 
Working Set 
… Device N
36 
Good Shard Key 
Key = DeviceId, 
Timestamp 
Random Sequential 
Cardinality 
Write Distribution 
Query Isolation 
Reliability 
Index Locality
37 
Life-cycle of Sharding 
Performance Testing: Avoid Pitfalls 
Best Practices: 
•Pre-split: 
Sharding results in massive 
performance degradation! 
1. Hash Shard Key: specify numInitialChunks: 
http://docs.mongodb.org/manual/reference/command/shardC 
ollection/ 
2. Custom Shard Splits happen Key: create on demand 
a pre-split script: 
http://docs.as mongodb.chunks grow 
org/manual/tutorial/create-chunks-in-sharded- 
cluster/ 
Migrations happen when an 
imbalance is detected 
• Run mongos (query router) on app server if possible. 
Design & 
Development Test/QA Pre-Production Production
38 
Life-cycle of Sharding 
Capacity Planning: How many shards do I need? 
Sizing: 
•What are the total resources required for your initial 
deployment? 
•What are the ideal hardware specs, and the # shards 
necessary? 
Capacity Planning: create a model to scale MongoDB for a 
specific app. 
•How do I determine when more shards need to be added? 
•How much capacity do I gain from adding a shard? 
Design & 
Development Test/QA Pre-Production Production
39 
How Many Servers? 
Strategy Accuracy Level of Effort Feasibility of 
Early Project 
Analysis 
Domain Expert High to Low: 
inversely related 
to complexity of 
the Application 
Low Yes 
Empirical (Load 
Testing) 
High High Unlikely
41 
Domain Expert 
Normally, performed by MongoDB Solution Architect: 
http://bit.ly/1rkXcfN 
• What is the document model? Collections, documents, indexes 
• What are the major operations? 
- Throughput 
- Latency 
• What is the working set? Eg. Last 90 days of orders 
Business SolutionM odel and Load Definition 
Analysis Resource Analysis Hardware Specification
Adjust more or less depending on latency vs. cost requirements. Very large clusters 
should account for connection pooling/thread overhead (1MB per active thread) 
faults. Assume random IO. Account for replication, journal and log (note: sequential 
IO). Ideally, estimated empirically through prototype testing. Experts can use 
experience from similar applications as an estimate. Spot testing maybe needed. 
Storage Estimate using throughput, document and index size approximations, and retention 
42 
Domain Expert 
Resource Methodology 
RAM Standard: Working Set + Indexes 
IOPs Primarily based on throughput requirements. Writes + estimation on query page 
requirements. Account for overhead like fragmentation if applicable. 
CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using current 
commodity CPU specs will suffice. 
Network Estimate using throughput and document size approximations. 
Business SolutionM odel and Load Definition 
Analysis Resource Analysis Hardware Specification
44 
Sizing by Empirical Testing 
• Sizing can be more accurately obtained by prototyping your application, and 
performing load tests on selected hardware. 
• Capacity Planning can be simultaneously accomplished through load testing. 
• Past Webinars: http://www.mongodb.com/presentations/webinar-capacity-planning 
Strategy: 
1. Implement a prototype that can at least simulate major workloads 
2. Select an economical server that you plan to scale-out on. 
3. Saturate a single replicaSet or shard (maintaining latency SLA as needed). Address 
bottlenecks, optimize and repeat. 
4. Add an additional shard (as well as mongos and clients as needed). Saturate and 
confirm roughly linear scaling. 
5. Repeat step 4 until you are able to model capacity gains (throughput + latency) 
versus #physical servers.
45 
Operational Scale 
Business Critical Operations: How do I manage 100s to 1000s of nodes? 
MongoDB Management Services (MMS): https://mms.mongodb.com 
• Real-time monitoring 
and visualization of 
cluster health 
• Alerting 
• Automated cluster 
provisioning 
• Automation of daily 
operational tasks like no-downtime 
upgrades 
• Centralized configuration 
management 
• Automated PIT 
snapshotting of 
clusters 
• PITR support for 
sharded clusters 
Design & 
Development Test/QA Pre-Production Production
46 
MMS Automation 
Server Resources 
(anywhere) 
Agent 
MMS 
On-Prem or 
SaaS
48 
Scalable, Anywhere 
Quick Demo
Get Expert Advice on Scaling. For Free. 
For a limited time, if you’re 
considering a commercial 
relationship with 
MongoDB, you can sign up 
for a free one hour consult 
about scaling with one of 
our MongoDB Engineers. 
Sign Up: http://bit.ly/1rkXcfN
Webinar Q&A 
dylan.tong@mongodb.com 
Stay tuned after 
the webinar 
and take our 
survey for your 
chance to win 
MongoDB swag.
Everything You Need to Know About Sharding

Mais conteúdo relacionado

Mais procurados

HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdfvishal choudhary
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 

Mais procurados (20)

Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Database replication
Database replicationDatabase replication
Database replication
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdf
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Unit 3
Unit 3Unit 3
Unit 3
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 

Semelhante a Everything You Need to Know About Sharding

MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldStéphane Dorrekens
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...Redis Labs
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsRedis Labs
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936Alf Baez
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 

Semelhante a Everything You Need to Know About Sharding (20)

MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Everything You Need to Know About Sharding

  • 1. Everything You Need to Know About Sharding Dylan Tong dylan.tong@mongodb.com Senior Solutions Architect
  • 2. 2 Agenda Overview • What is sharding? • Why and what should I use sharding for? Building your First Sharded Cluster • What do I need to know to succeed with sharding? Q&A
  • 3. 3 What is Sharding? Sharding is a means of partitioning data across servers to enable: Scale needed by modern applications to support massive work loads and data volume. Geo-Locality to support geographically distributed deployments to support optimal UX for customers across vast geographies. Hardware Optimizations on Performance vs. Cost Lower Recovery Times to make “Recovery Time Objectives” (RTO) feasible.
  • 4. 4 What is Sharding? Sharding involves a shard key defined by a data modeler that describes the partition space of a data set. Data is partitioned into data chunks by the shard key, and these chunks are distributed evenly across shards that reside across many physical servers. MongoDB provides 3 Types of Sharding Strategies: • Ranged • Hashed • Tag-aware
  • 5. 5 Range Sharding Shard Key: {deviceId} …1000 1001……2000 2001……3000 3001……4000 4001… Composite Keys Supported: {deviceId, timestamp} …1000,1418244824 …1000,1418244825
  • 6. 6 Hash Sharding Hash Sharding is a subset of Range Sharding. MongoDB apples a MD5 hash on the key when a hash shard key is used: Hash Shard Key(deviceId) = MD5(deviceId) Ensures data is distributed randomly within the range of MD5 values …3333 …3334…8000 …8001…AAAA …AAAB…DDDD …DDDF
  • 7. 7 Tag-aware Sharding Tag-aware sharding allows subset of shards to be tagged, and assigned to a sub-range of the shard-key. Example: Sharding User Data belong to users from 100 “regions” Collection: Users, Shard Key: {uId, regionCode} Tag Start End West MinKey, MinKey MaxKey,50 East MinKey, 50 MaxKey, MaxKey Secondary Secondary Shard2, Tag=West Secondary Secondary Shard3, Tag=East Shard1, Tag=West Secondary Secondary Shard4, Tag=East Secondary Secondary Assign Regions 1-50 to the West Assign Regions 51-100 to the East
  • 8. 8 Applying Sharding Usage Required Strategy Scale Range or Hash Geo-Locality Tag-aware Hardware Optimization Tag-aware Lower Recovery Times Range or Hash
  • 9. 9 Sharding for Scale Performance Scale: Throughput and Latency Data Scale: Cardinality, Data Volume
  • 10. 10 Typical Small Deployment Highly Available but not Scalable Replica Set Writes Reads Limited by capacity of the Primary’s host When Immediate Consistency Matters: Limited by capacity of the Primary’s host When Eventual Consistency is Acceptable: Limited by capacity of available replicaSet members
  • 11. 11 Sharded Architecture Auto-balancing: data is partitioned based on a shard key, and automatically balanced across shards by MongoDB Query Routing: database operations are transparently routed across the cluster through a routing proxy process (software). Horizontal Scalability: load is distributed and resources are pooled across commodity servers. Increasing read/write capacity Decoupling for Development and Operational Simplicity: Ops can add capacity without app dependencies and Dev involvement.
  • 12. 13 Value of Scale-out Architecture Scale-down and re-allocate resources System Capacity $ Scale-up Limits High Capacity/$ Scale-out to 100-1000s of servers Optimize on Capacity/$ Apps of the Past Modern Apps Trend
  • 13. 14 Sharding for Geo-Locality Adobe Cloud Services among other popular consumer and Enterprise services use sharding to run servers across multiple data centers across geographies. Network latency from West to East is ~80ms ● Amazon - Every 1/10 second delay resulted in 1% loss of sales. ● Google - Half a second delay caused a 20% drop in traffic. ● Aberdeen Group - 1-second delay in page-load time o 11% fewer page views o 16% decrease in customer satisfaction o 7% loss in conversions
  • 14. 15 Multi-Active DCs via Tag-aware Sharding Tag Start End West MinKey, MinKey MaxKey,50 East MinKey, 50 MaxKey, MaxKey WEST EAST Secondary Secondary Query Tag = West Shard 2 Tag = East Shard 3 Tag = East Local Reads (Eg. Read Preference = Nearest) Query Shard 1 Secondary Secondary Secondary Secondary Update Update Collection: Users, Shard Key: {uId, regionCode} Priority=5 Priority=10 Priority=10 Priority=5
  • 15. 17 Optimizing Latency and Cost Magnitudes of Difference in Speed Event Latency Normalized to 1 s RAM access 120 ns 6 min SSD access 150 μs 6 days HDD access 10 ms 12 months Magnitudes of Difference in Cost Storage Type Avg. Cost ($/GB) Cost at 100TB ($) RAM 5.50 550K SSD 0.50-1.00 50K to 100K HDD 0.03 3K
  • 16. 18 Optimizing Latency and Cost Use Case: Sensor data collected from millions of devices. Data used for real-time decision automation, real-time monitoring and historical reporting. Data Type Description Latency SLA Data Volume Meta Data Fast look-ups to drive real-time decisions 95th Percentile < 1ms < 1 TB Last 90 days of Metrics 95+% of data reported and monitored 95th Percentile < 30ms < 10 TB Historic Used for historic reporting. Access infrequently 95th Percentile < 2s > 100TB
  • 17. 19 Hardware Optimizations Collections Tag Start End Meta Cache MinKey MaxKey Metrics Flash MinKey,MinKey MaxKey, 90 days ago Metrics Archive MinKey,>90 days ago MaxKey, MaxKey Collection Shard Key Meta DeviceId Metrics High Memory Ratio, DeviceId, Timestamp Fast Cores HDD Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Tag: Cache Tag: Flash Tag: Archive Secondary Secondary Secondary Secondary Medium Memory Ratio, High Compute SSDs Low Memory Ratio, Medium Compute HHD
  • 18. 20 Restoration Times Scenario: Application bug causes logical corruption of the data, and the database needs to be rolled back to a previous PIT. What’s RTO does your business require in this event? Total DB Snapshot Size = 100TB N = 10 10X10TB Snapshots generated and/or transferred in parallel N = 100 100X1TB Snapshots generated and/or transferred in parallel. Potentially 10X faster restoration time. Tar.g z Tar.g z Tar.g z Tar.g z
  • 19. Building Your First Sharded Cluster
  • 20. Predictive Maintenance Platform: a cloud platform for building predictive maintenance applications and services—such as a service that monitors various vehicle components by collecting data from sensors, and automatically prescribes actions to take. •Allow tenant to register, ingest and modify data collected by sensors •Define and apply workflows and business rules •Publish and subscribe to notifications •Data access API for things like reporting 22 Life-cycle of Sharding Product Definition: Starts with an idea to build something big! Design & Development Test/QA Pre-Production Production
  • 21. 23 Life-cycle of Sharding Data Modeling: Do I need to Shard? Throughput: data from millions of sensors updated in real-time Latency: The value of certain attributes need to be access with 95th percentile < 10ms to support real-time decisions and automation. Volume: 1TB of data collected per day. Retained for 5 years. Design & Development Test/QA Pre-Production Production
  • 22. 26 Life-cycle of Sharding Data Modeling: Select a Good Shard Key Critical Step •Sharding is only effective as the shard key. •Shard Key attributes are immutable. •Re-sharding is non-trivial. Requires re-partitioning data. Design & Development Test/QA Pre-Production Production
  • 23. 27 Good Shard Key Cardinality Write Distribution Query Isolation Reliability Index Locality
  • 24. 28 Cardinality Key = Data Center
  • 25. 29 Cardinality Key = Timestamp
  • 26. 30 Write Distribution Key = Timestamp
  • 27. 31 Write Distribution Key = Hash(Timestamp)
  • 28. 32 Query Isolation Key = Hash(Timestamp) “Scatter-gather Query”
  • 29. 33 Query Isolation Key = Hash(DeviceId) *Assumes bulk of queries on collection are in context of a single deviceId
  • 30. 34 Reliability Key = Hash(Timestamp) Key = Hash(DeviceId)
  • 31. Key = Hash(DeviceId) Key = DeviceId, Timestamp 35 Index Locality Random Access Index Right Balance Index  MD5  ... -2 day -1 day -0 day Right balanced index may only need to be partially in RAM to be effective. Working Set … Device N
  • 32. 36 Good Shard Key Key = DeviceId, Timestamp Random Sequential Cardinality Write Distribution Query Isolation Reliability Index Locality
  • 33. 37 Life-cycle of Sharding Performance Testing: Avoid Pitfalls Best Practices: •Pre-split: Sharding results in massive performance degradation! 1. Hash Shard Key: specify numInitialChunks: http://docs.mongodb.org/manual/reference/command/shardC ollection/ 2. Custom Shard Splits happen Key: create on demand a pre-split script: http://docs.as mongodb.chunks grow org/manual/tutorial/create-chunks-in-sharded- cluster/ Migrations happen when an imbalance is detected • Run mongos (query router) on app server if possible. Design & Development Test/QA Pre-Production Production
  • 34. 38 Life-cycle of Sharding Capacity Planning: How many shards do I need? Sizing: •What are the total resources required for your initial deployment? •What are the ideal hardware specs, and the # shards necessary? Capacity Planning: create a model to scale MongoDB for a specific app. •How do I determine when more shards need to be added? •How much capacity do I gain from adding a shard? Design & Development Test/QA Pre-Production Production
  • 35. 39 How Many Servers? Strategy Accuracy Level of Effort Feasibility of Early Project Analysis Domain Expert High to Low: inversely related to complexity of the Application Low Yes Empirical (Load Testing) High High Unlikely
  • 36. 41 Domain Expert Normally, performed by MongoDB Solution Architect: http://bit.ly/1rkXcfN • What is the document model? Collections, documents, indexes • What are the major operations? - Throughput - Latency • What is the working set? Eg. Last 90 days of orders Business SolutionM odel and Load Definition Analysis Resource Analysis Hardware Specification
  • 37. Adjust more or less depending on latency vs. cost requirements. Very large clusters should account for connection pooling/thread overhead (1MB per active thread) faults. Assume random IO. Account for replication, journal and log (note: sequential IO). Ideally, estimated empirically through prototype testing. Experts can use experience from similar applications as an estimate. Spot testing maybe needed. Storage Estimate using throughput, document and index size approximations, and retention 42 Domain Expert Resource Methodology RAM Standard: Working Set + Indexes IOPs Primarily based on throughput requirements. Writes + estimation on query page requirements. Account for overhead like fragmentation if applicable. CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using current commodity CPU specs will suffice. Network Estimate using throughput and document size approximations. Business SolutionM odel and Load Definition Analysis Resource Analysis Hardware Specification
  • 38. 44 Sizing by Empirical Testing • Sizing can be more accurately obtained by prototyping your application, and performing load tests on selected hardware. • Capacity Planning can be simultaneously accomplished through load testing. • Past Webinars: http://www.mongodb.com/presentations/webinar-capacity-planning Strategy: 1. Implement a prototype that can at least simulate major workloads 2. Select an economical server that you plan to scale-out on. 3. Saturate a single replicaSet or shard (maintaining latency SLA as needed). Address bottlenecks, optimize and repeat. 4. Add an additional shard (as well as mongos and clients as needed). Saturate and confirm roughly linear scaling. 5. Repeat step 4 until you are able to model capacity gains (throughput + latency) versus #physical servers.
  • 39. 45 Operational Scale Business Critical Operations: How do I manage 100s to 1000s of nodes? MongoDB Management Services (MMS): https://mms.mongodb.com • Real-time monitoring and visualization of cluster health • Alerting • Automated cluster provisioning • Automation of daily operational tasks like no-downtime upgrades • Centralized configuration management • Automated PIT snapshotting of clusters • PITR support for sharded clusters Design & Development Test/QA Pre-Production Production
  • 40. 46 MMS Automation Server Resources (anywhere) Agent MMS On-Prem or SaaS
  • 41. 48 Scalable, Anywhere Quick Demo
  • 42. Get Expert Advice on Scaling. For Free. For a limited time, if you’re considering a commercial relationship with MongoDB, you can sign up for a free one hour consult about scaling with one of our MongoDB Engineers. Sign Up: http://bit.ly/1rkXcfN
  • 43. Webinar Q&A dylan.tong@mongodb.com Stay tuned after the webinar and take our survey for your chance to win MongoDB swag.

Notas do Editor

  1. EA Sports FIFA: world&amp;apos;s best-selling sports video game franchise. User data and game state for millions of players, Yandex: The largest search engine in Russia uses MongoDB to manage all user and metadata for its file sharing service. MongoDB has scaled to support tens of billions of objects and TBs of data, growing at 10 million new file uploads per day. eBay FourSquare: Foursquare is used by over 50 million people worldwide, who have checked in over 6 billion times, with millions more added every day. MongoDB is Foursquare’s main database, supporting hundreds of thousands of operations per second and storing all check-ins and history, user and venue data along with reviews. AHL, a part of Man Group plc, is a quantitative investment manager based in London and Hong Kong, with over $11.3 billion in assets under management. After evaluating multiple technology options, AHL used MongoDB to replace its relational and specialised &amp;quot;tick&amp;quot; databases. MongoDB supports 250 million ticks per second, at 40x lower cost than the legacy technologies it replaced. Adobe: Many of the world’s most recognizable brands use Adobe Experience Manager to accelerate development of digital experiences that increase customer loyalty, engagement and demand. Adobe uses MongoDB to store petabytes of data the large-scale content repositories underpinning the Experience Manager. MongoDB MMS Back-up: 2PB Mcafee: MongoDB powers McAfee Global Threat Intelligence (GTI), a cloud-based intelligence service that correlates data from millions of sensors around the globe. Billions of documents are stored and analyzed in MongoDB to deliver real-time threat intelligence to other McAfee end-client products. Carfax: CARFAX relies on its Vehicle History database to connect potential buyers with used vehicles in their area, and for analytics to guide the business. To improve customer experience, CARFAX migrated to MongoDB which now manages over 13 billion documents, before replication across multiple data centers.
  2. Limited capacity: monolithic architecture, siloed data access, or complex application-level facilitated sharding that is difficult and expensive to successfully implement Scale down: system with seasonal load (ex. Online games- popular from the start and fade over time). Sharding enables scaling-down to reallocate resources on bare-metal servers. Can’t do the same for a database appliance
  3. Hash Key isn’t always a good key. Can be good for the most most simplistic k-v access patterns, but any query retrieving multiple documents like range based queries can potentially lead to scatter-gather queries, which have high overhead and impair the ability to scale linearly.
  4. www.mongodb.com/lp/contact/scaling-101 http://www.mongodb.com/lp/contact/planning-for-scale