How queries work with sharding

•

12 gostaram•7,033 visualizações

The document discusses how queries work in sharded MongoDB environments. It explains that MongoDB collections are partitioned into chunks based on a shard key, and each chunk is assigned to a particular shard. When a query is executed, the mongos process routes it to the correct shard(s) based on the shard key range in the query. Queries involving only the shard key are efficient, targeting specific shards. Queries on non-shard keys require scattering and gathering across all shards, but secondary indexes can help efficiency on each shard.

Tecnologia

MongoDB
Sharding

How
queries
work
in
sharded
environments

One
small
server.

We
want
more

capacity.

What
to
do?

TradiAonally,
we
would
scale

verAcally
with
a
bigger
box.

With
sharding
we
instead
scale

horizontally
to
achieve
the
same

computaAonal/storage/memory

footprint
from
smaller
servers.

m=10

We
will
show
the
verAcally
scale
db
and
the
horizontally
scaled
db

side-‐by-‐side
for
comparison.

A
sharded
MongoDB
collecAon
has
a
shard
key.

The
collecAon
is

parAAoned
in
an
order-‐preserving
manner
using
this
key.

In
this

example
a
is
our
shard
key:

{
a
:
…,
b
:
…,
c
:
…
}

a
is
declared
shard
key
for
the
collec0on

{
a
:
…,
b
:
…,
c
:
…
}

a
is
shard
key

Metadata
is
maintained
on
chunks
which
are
represented
by

shard
key
ranges.

Each
chunk
is
assigned
to
a
parAcular
shard.

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

When
a
chunk
becomes
too
large,
MongoDB
automaAcally
splits

it,
and
the
balancer
will
later
migrate
chunks
as
necessary.

{
a
:
…,
b
:
…,
c
:
…
}

a
is
shard
key

ﬁnd(
{
a
:
{
$gt
:
333,
$lt
:
400
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

The
mongos
process
routes
a
query
to
the
correct
shard(s).

For

the
query
above,
all
data
possibly
relevant
is
on
shard
2,
so
the

query
is
sent
to
that
node
only,
and
processed
there.

{
a
:
…,
b
:
…,
c
:
…
}

a
is
declared
shard
key

ﬁnd(
{
a
:
{
$gt
:
333,
$lt
:
2012
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

SomeAmes
a
query
range
might
span
more
than
one
shard,
but

very
few
in
total.
This
is
reasonably
eﬃcient.

{
a
:
…,
b
:
…,
c
:
…
}

non-‐shard
key
query,
no
index

ﬁnd(
{
b
:
99
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

Queries
not
involving
the
shard
key
will
be
sent
to
all
shards
as
a

“scader/gather”
operaAon.

This
is
someAmes
ok.

Here
on
both

our
tradiAonal
machine
and
the
shards,
we
do
a
table
scan
-‐-‐

equally
expensive
(roughly)
on
both.

{
a
:
…,
b
:
…,
c
:
…
}

Sca8er
/
gather
with
secondary
index

ensureIndex({b:1})

ﬁnd(
{
b
:
99
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

Once
again
a
query
with
a
shard
key
results
in
a
scader/gather

operaAon.

However
at
each
shard,
we
can
use
the
{b:1}
index
to

make
the
operaAon
eﬃcient
for
that
shard.

We
have
a
lidle
extra

overhead
over
the
verAcal
conﬁguraAon
for
the
communicaAons

eﬀort
from
mongos
to
each
shard
-‐-‐
not
too
bad
if
number
of

shards
is
small
(10)
but
quite
substanAal
for
say,
a
1000
shard

system.

{
a
:
…,
b
:
…,
c
:
…
}

Non-‐shard
key
query,
secondary
index

ﬁnd(
{
b
:
99,
a
:
100
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

The
a
term
involves
the
shard
key
and
allows
mongos
to

intelligently
route
the
query
to
shard
2.

Once
the
query
reaches

shard
2,
the
{
b
:
1
}
index
can
be
used
to
eﬃciently
process
the

query.

When
sorAng
is
speciﬁed,
the
relevant
shards
sort
locally,
and

then
mongos
merges
the
results.

Thus
the
mongos
resource

usage
is
not
terribly
high.

client

Adam

Bob

David

Julie

Sue

Time

Zack

mongos

Bob

David
Sue
Adam

Julie
Tim
Zack

When
using
replicaAon
(typically
a
replica
set),
we
simply
have

more
than
one
node
per
shard.

(arrows
below
indicate
replica0on,
tradi0onal
vs.
sharded

environments)

Mais conteúdo relacionado

Mais procurados

ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfJesmar Cannao'

MongoDB - Aggregation PipelineJason Terpko

Galera cluster for high availability Mydbops

MySQL Shell for DBAsFrederic Descamps

Achieving compliance With MongoDB Security Mydbops

Proxysql shardingMarco Tusa

ProxySQL High Avalability and Configuration Management OverviewRené Cannaò

Introduction to Oracle Data Guard BrokerZohar Elkayam

Intro To Mongo Dbchriskite

Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionMarkus Michalewicz

How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit

Deep dive into PostgreSQL statistics.Alexey Lesovsky

MariaDB MaxScaleMariaDB plc

Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam

Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada

What’s New in Oracle Database 19c - Part 1Satishbabu Gunukula

Percona Live 2022 - MySQL ArchitecturesFrederic Descamps

Faster, better, stronger: The new InnoDBMariaDB plc

MySQL Day Roma - MySQL Shell and Visual Studio Code ExtensionFrederic Descamps

Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps

Mais procurados (20)

ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf

MongoDB - Aggregation Pipeline

Galera cluster for high availability

MySQL Shell for DBAs

Achieving compliance With MongoDB Security

Proxysql sharding

ProxySQL High Avalability and Configuration Management Overview

Introduction to Oracle Data Guard Broker

Intro To Mongo Db

Oracle Multitenant meets Oracle RAC - IOUG 2014 Version

How to understand and analyze Apache Hive query execution plan for performanc...

Deep dive into PostgreSQL statistics.

MariaDB MaxScale

Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs

Introduction VAUUM, Freezing, XID wraparound

What’s New in Oracle Database 19c - Part 1

Percona Live 2022 - MySQL Architectures

Faster, better, stronger: The new InnoDB

MySQL Day Roma - MySQL Shell and Visual Studio Code Extension

Open Source 101 2022 - MySQL Indexes and Histograms

Destaque

Sharding Methods for MongoDBMongoDB

Everything You Need to Know About ShardingMongoDB

MongoDB for Time Series Data Part 3: ShardingMongoDB

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB

Choosing a Shard keyMongoDB

Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB

Eclipse Paho - MQTT and the Internet of ThingsAndy Piper

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB

How to monitor MongoDBServer Density

Efficient Pagination Using MySQLSurat Singh Bhati

Pagination Done the Right WayMarkus Winand

Open Source IoT at EclipseIan Skerrett

BigData_TP5 : Neo4JLilia Sfaxi

BigData_TP2: Design Patterns dans HadoopLilia Sfaxi

BigData_TP4 : CassandraLilia Sfaxi

BigData_Chp5: Putting it all togetherLilia Sfaxi

BigData_TP1: Initiation à Hadoop et Map-ReduceLilia Sfaxi

BigData_TP3 : SparkLilia Sfaxi

BigData_Chp3: Data ProcessingLilia Sfaxi

BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi

Destaque (20)

Sharding Methods for MongoDB

Everything You Need to Know About Sharding

MongoDB for Time Series Data Part 3: Sharding

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management

Choosing a Shard key

Lessons Learned from Building a Multi-Tenant Saas Content Management System o...

Eclipse Paho - MQTT and the Internet of Things

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...

How to monitor MongoDB

Efficient Pagination Using MySQL

Pagination Done the Right Way

Open Source IoT at Eclipse

BigData_TP5 : Neo4J

BigData_TP2: Design Patterns dans Hadoop

BigData_TP4 : Cassandra

BigData_Chp5: Putting it all together

BigData_TP1: Initiation à Hadoop et Map-Reduce

BigData_TP3 : Spark

BigData_Chp3: Data Processing

BigData_Chp2: Hadoop & Map-Reduce

Mais de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Último

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Manulife - Insurer Innovation Award 2024The Digital Insurer

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Why Teams call analytics are critical to your entire businesspanagenda

Partners Life - Insurer Innovation Award 2024The Digital Insurer

How queries work with sharding

1. MongoDB Sharding How queries work in sharded environments

2. One small server. We want more capacity. What to do?

3. TradiAonally, we would scale verAcally with a bigger box.

4. With sharding we instead scale horizontally to achieve the same computaAonal/storage/memory footprint from smaller servers. m=10

5. We will show the verAcally scale db and the horizontally scaled db side-‐by-‐side for comparison.

6. A sharded MongoDB collecAon has a shard key. The collecAon is parAAoned in an order-‐preserving manner using this key. In this example a is our shard key: { a : …, b : …, c : … } a is declared shard key for the collec0on

7. { a : …, b : …, c : … } a is shard key Metadata is maintained on chunks which are represented by shard key ranges. Each chunk is assigned to a parAcular shard. Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 When a chunk becomes too large, MongoDB automaAcally splits it, and the balancer will later migrate chunks as necessary.

8. { a : …, b : …, c : … } a is shard key ﬁnd( { a : { $gt : 333, $lt : 400 } ) Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 The mongos process routes a query to the correct shard(s). For the query above, all data possibly relevant is on shard 2, so the query is sent to that node only, and processed there.

9. { a : …, b : …, c : … } a is declared shard key ﬁnd( { a : { $gt : 333, $lt : 2012 } ) Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 SomeAmes a query range might span more than one shard, but very few in total. This is reasonably eﬃcient.

10. { a : …, b : …, c : … } non-‐shard key query, no index ﬁnd( { b : 99 } ) Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 Queries not involving the shard key will be sent to all shards as a “scader/gather” operaAon. This is someAmes ok. Here on both our tradiAonal machine and the shards, we do a table scan -‐-‐ equally expensive (roughly) on both.

11. { a : …, b : …, c : … } Sca8er / gather with secondary index ensureIndex({b:1}) find( { b : 99 } ) Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 Once again a query with a shard key results in a scader/gather operaAon. However at each shard, we can use the {b:1} index to make the operaAon efficient for that shard. We have a lidle extra overhead over the verAcal configuraAon for the communicaAons effort from mongos to each shard -‐-‐ not too bad if number of shards is small (10) but quite substanAal for say, a 1000 shard system.

12. { a : …, b : …, c : … } Non-‐shard key query, secondary index ﬁnd( { b : 99, a : 100 } ) Range Shard a in [-‐∞,2000) 2 a in [2000,2100) 8 a in [2100,5500) 3 … … a in [88700, ∞) 0 The a term involves the shard key and allows mongos to intelligently route the query to shard 2. Once the query reaches shard 2, the { b : 1 } index can be used to eﬃciently process the query.

13. When sorAng is speciﬁed, the relevant shards sort locally, and then mongos merges the results. Thus the mongos resource usage is not terribly high. client Adam Bob David Julie Sue Time Zack mongos Bob David Sue Adam Julie Tim Zack

14. When using replicaAon (typically a replica set), we simply have more than one node per shard. (arrows below indicate replica0on, tradi0onal vs. sharded environments)

How queries work with sharding

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

How queries work with sharding