2. DB Scalability @ eBay
eBay is one of the first and largest BASE
environments based on Oracle DB App1 App2
• Basic Availability
• Soft-state Business Business
• Eventual consistency Logic Logic
Every database we use is shared and partitioned
Hint (shard key) Hint (shard key)
• N logical hosts names are defined for each use case ahead
of time
DAL DAL
• These logical hosts are mapped to physical based on static
mapping tables which are controlled by DBAs Framework Framework
• A common ORM framework called DAL provides powerful
and consistent patterns for data scalability
Applications
If the client provides a hint along with every DB F1(Hint) F2(Hint)
query:
• DAL maps the hint to a logical host using one of N mapping Logical DB
schemes (ex: modulus, lookup table, range, etc)
… hosts …
• Logical host is then mapped to a physical using L-to-Ph map (shards)
• The query is sent to just one shard
If the client does not have a hint, the query is sent to Config
all shards and the results are joined on the client Physical
with the help of DAL framework … Master DB
Side-effects: hosts
• Hint is not part of the query; client has to manage it Physical
• Logical to Physical mapping scheme becomes extra piece of … standby
client configuration
• Shard rebalancing is “DBA magic” DB hosts
3. Key desired improvements
All eBay site-facing applications use the scheme outlined above
It’s proven to scale to tens of thousands of developers, petabytes of data, hundreds
of millions of SQL queries per day
But there is always room for improvements and new ideas
• ORM is not the fastest way to develop; how do we achieve faster development cycles and reduce
schema mapping frictions?
• How do we add new attributes to tables faster and without DBA’s involvement? Schema free approach
sounds interesting.
• Can we make the hint transparent, ex: auto-extract it from queries?
• Can we rebalance the data seamlessly and automatically?
• Can we add shards faster in order to scale out on demand and transparently to applications?
• How do we deploy new DBs to the cloud on demand?
And what about performance? Can we use RAM more aggressively and
seamlessly to speed up queries?
4. Enters MongoDB
We are playing with MongoDB since 2010.
Why? Business
Logic Document
Its scalability scheme is very similar to how
we shard RDBMS Morphia/Mongo
• Single master for writes, eventually consistent slaves for
Driver
reads Dynamic
• Horizontal partitioning of data sets is a norm at eBay Config
• MongoS is performing familiar scatter-gather and client-
side merge-sorts MongoS
F(Shard Key)
We don’t use distributed transactions since
day 1; transactional updates of multiple tables
…
that we do use can be simulated by atomic
<- Replicas ->
updates of a single Mongo document
MongoDB offers a number of features that …
help address our goals mentioned earlier:
• Developers love document model and schema-free
persistence
…
• Hints are embedded into the queries
• MongoDB has automatic shard rebalancing
• Shards can be added on demand without application
restart and data will be auto-rebalanced ---------- Shards -------
• We can easily bring it up in the cloud since cloud
machines have storage
5. Case study #1: eBay Search Suggestions
Search suggestion list is a MongoDB document
indexed by word prefix as well as by some
metadata: product category, search domain,
etc.
Must have < 60-70msec round trip end to end
MongoDB query < 1.4msec
Data set fits in RAM; 100-s M documents
Data is bulk loaded once a day from Hadoop,
but can be tweaked on demand during sale
promotions, etc
Single replica set, no shards in this case
MongoDB benefits:
• Multiple indexes allow flexible lookups
• In-memory data placement ensures lookup speed
• Large data set is durable and replicated
6. Case Study #2: Cloud Manager “State Hub”
Query State Hub powers eBay Cloud
Provision
Resources
Resources Every resource provisioned by the cloud is
and Topology
represented by a single Mongo document
Documents contain highly structured
metadata reflecting roles and grouping of
the resources
Lookup by both primary and secondary
State Hub indexes
Mongo Several GB data sets, easily fit in RAM
Update
Documents are not uniform
resource
state All resources have “State” field which is
updated periodically to reflect health state
of the underlying resource
Mixed workload: lots of in-place writes, but
also lots of read queries
7. Case Study #3: eBay Merchandizing Info Cache
Merchandizing backend powers eBay product/item
classification and categorization
Each MongoDB document represents a cluster of similar
products
Numerous relationships between clusters are modeled as
R1 document attributes
Cluster1 Cluster2
Relationship hierarchy traversal is achieved by issuing a
R3 number of queries on “edge” attributes
R2
Each instance of such a hierarchy is called a model; there
Cluster3 are lots of models
Again, data set fits in RAM, single replica set
Replica set members are located in 3 different data
centers (3+2+2) with all members in a single data center
having higher weight to avoid moving master away
MongoDB benefits:
• Schema-free design and declarative indexes are perfect for this use
case where new attributes and new queries are constantly being
added
• Async replication across multiple data centers
• MongoDB Java Driver ensures automatic detection of proximity
of clients to replica set members; reads with slaveOK=true are
served from local data center nodes which insures low
response latency
8. Case Study #4: Zoom – Media Metadata Store
This is a new mega project which is a work in progress
MongoDB is being evaluated as a storage backend for all media-related
metadata on the site (example: picture IDs with lots attributes)
Requirements:
• Tens of TBs data set, Millions of documents: data set must be partitioned; this is our
first use case where MongoDB sharding is used
• System of record for picture info; data can not be lost!
• Replication/DR across 2 data centers; local DC reads are required
• Queries are from site-facing flows; <10msec response time SLA
• Mixed workload: both inserts and reads are happening concurrently all the time
Can MongoDB do it ??
9. Zoom: Data Model
2 main collections: Item and Image
• Item references multiple Images
Item represents eBay Item:
• _id in Item is external ID of the item in eBay site DB
• These IDs are already sharded in balanced across N
logical DB hosts using ID ranges
• We use MongoDB pre-split points for initial
mapping our N site DB shards to M MongoDB shards
• This ensures good balance between the shards;
Image represents a picture attached to an
Item
• _id in Image is md5 of the image content
• This ensures good distribution across any number of
shards
• Md5 is also used to find duplicate images
Our choice of document IDs in both
collections ensures good balance across
Mongo shards
We never query both collections in a single
service request to ensure data consistency
and to have only one index lookup
10. Zoom: Service Topology and Configuration
MongoS is deployed on app servers
• Ensures network IO on MongoS won’t become a bottleneck
• This is a very familiar pattern in eBay as was explained in the
>
--- DC1(Primary)---
beginning of this presentation
M shards; each replica set has 6 members
M M M M • 3 + 3 in 2 data centers
• Master can be only in one DC during automatic failover; manual
failover may activate another DC
--- Replicas ---> • One slave in the secondary DC is invisible for reads and is
dedicated to periodic backups/snapshots (more on this later)
For reads, client first sets SlaveOK=true and if
required document is not found flips to
SlaveOK=false to read from Master
-- DC2(Secondary)-->
Home-grown MongoDB configuration and monitoring
agent is running on every node
• Fetches MongoD configuration from a central configuration store
and saves it to local config file
• Manages lifecycle of MongoD
B B B B • Monitors state and metrics
---- Shards -----
11. Zoom: Data Backup and Restore strategy
Goals:
• Take periodic backups of the entire data set
Application • Be able to recover from backup
• Do not loose any writes that have happened after last snapshot
• Briefly service unavailability during recovery is better than data
Dual-write loss …
to capped
M collection C Dual writes on the client
• Regular write to main cluster
…
• Second write to another Mongo cluster: single replica set,
capped collection, the data written is similar to REDO log record
Recovery
B Agent Hidden slave in each shard has volume mounted on a
remote storage appliance capable of instant file
system snapshot; captures both DB files and journal
files
If DB recovery is activated:
• All MongoD on primary cluster are shutdown
• NFS slave is remounted to snapshot volume
Instant • MongoD on this machine is started as a master
Shapshot • MongoD on other replica set members are started cold
• Full sync-up from master
Capable • Master is switched to a regular member
device • Writes that occurred since time when the backup was taken
are replayed from the REDO log capped collection in the
secondary cluster
•
12. Key Learning
MongoDB can be a very powerful tool but use it wisely
Deletes can be slow; automatic balancer is dangerous; use it only when you
must (example: be careful when adding new shards)
Use explain for every query; disable full scans to discover inefficiencies
early
Query profiler is great
Retry every failed query at least once; long tail in response times is possible
when data set > RAM size