MongoDB Knowledge Shareing

Mongo
Philip.Zhong/Chen.Tao/Leaf .Zhu, 2014

Agenda
• What’s Mongo?
• Mongo Advantages & Limitations
• Mongo Case Studies

What’s Mongo?

http://www.mongodb.org/

$1,200,000,000 (2007-2013)

http://www.mongodb.com/

Red Hat (1993-2013)

$ 16.75

Billion

$ 30.0+

Billion

What’s Mongo?
 MongoDB (from "humongous") is an open-source document database, and the
leading NoSQL database. Written in C++

 The most SQL-like NoSQL.
 Mongo is a Open, Schemaless, Document-Oriented NoSql data base with Rich
Query, High Performance, High Availbility, High Scalibility, High Flexibility

1. Document Data Model. Document, BSON.
2. Rich Query Model. Full Index, Various Query Type.
3. Idiomatic Drivers. Over 17 language drivers support.
4. Horizontal Scalability. Easy to append capacity
5. High Availability. HA, Journal, Auto-Recover.
6. In-Memory Performance. Memeory-Mapped Files, read/write in RAM.
7. Flexibility. Schema-free, multi-datacenter deployments, tunable consistency, widly
used across many industries.

Data Model

• Max BSON Document Size 16M
• Nested Depth for BSON Document 100Level
• Document-level Atomic operation

Query Type
1. Key-value
2. Range queries.
3. Text Search AND, OR, NOT etc.
4. Aggregation count, min, max, average etc.
5. MapReduce

Cursor

 Query returns a cursor
 Iterate the cursor to get results
 Return 101 results or size less than 1M bytes,
overrided by batchSize or limit, not exceeds 16M

Write Concern

 Error Ignored
 Unacknowledged
 Acknowledged

 Journaled

Index

1.

Single Field Indexes

2.

Compound Indexes.

3.

Array Indexes.

4.

Geospatial Indexes.

5.

Hash Indexes.

1.

Unique Indexes

6.

Text Search Indexes (V2.4, Beta)

2.

Spars Index

Index
 At least 8KB for each index.
 Negative performance impact for write operations. Expensive for high
write-to-read ratio collection.
 benefit high read-to-write ratio collections.
 Consumes disk space and memory. Carefully tracked and plan

Mongo Replication
 Have up to 12
Mongod
instances
 Have a Primary
member, which
receives write
requests

Basic Concepts
• Config Servers
Shards
Replica
Mongos Set










Contain APP requests
a group of mongod
Exist in sets of three
Process fractions of
global requests to
processes
Maintain metadata
Direct data
Are replica
Includes sets in
shards Primary and
Are mongod instances
production
Secondarys to clients
Direct results
Can be queried
Exist as 1+
directly by clients (not
Are mongos instances
recommended)
Cache metadata

Schema Design
•

Remember, "schemaless" doesn't mean you don't need to design your schema!

•
•
•
•
•
•
•

Considerations to avoid the pitfalls of MongoDB schema design:
1. Avoid growing documents
3. Pay attention to BSON data types
5. Field names take up space
6. Consider using _id for your own purposes
7. Can you use covered indexes?
8. Use collections and databases to your advantage
•
•

Test everything

Schema design effect performance
Schema design effect infrastructure: RAM > indexes + hot data = better performance

MongoDB for MDS – Sharding Strategy
• When need shard?
–
–
–

your data set approaches or exceeds the storage capacity of a single MongoDB instance.
the size of your system’s active working set will soon exceed the capacity of your system’s maximumRAM.
a single MongoDB instance cannot meet the demands of your write operations, and all other approaches have not
reduced contention.

• The considerations for sharding
–
–
–
–

Multiple ways to model a domain problem
Understand the key uses cases of your app
Balance between ease of query vs. ease of write
Random I/O should be avoided

• Meeting behavior and sharding consideration(From 10G)
–
–
–
–

Schedule meeting - ~800K meetings write/day
~20% instant meetings
Scalability best practice: Don’t scale by using replication. Scale by using local read nodes.
Recommend to implement local write to meet JOIN meetings use case requirements

Cross DC latency Testing
Local vs Remote Write/Read Latency Test:
Scenario:
Create two shards, each with three member replica sets. Make sure that Primary node of one runs on local DC(SJ), where as Primary
of the second runs on remote DC(TX). Run small number of writes from local DC to Replica1 Primary and then run the same against
Replica2 Primary. Writeconcern = majority. Average object size is 1500 bytes. (ping time 46 ms from local DC(SJ) to remote DC(TX).

Local vs Remote Insert Tests (YCSB test):

Replication delay cross DC
•
•

Repication Lag between data centers:
Scenario: On the local DC(SJ), where the replication Primary is running, insert 500 records at a time, upto a total of 550,000 records.
Record the record count and current timestamp at the end of every 500 insertions. Note that this is a single threaded operation and only
one process is inserting these records. On the remote DC(TX), where the 3rd secondary is running (this node is the least nearest of all
the secondaries and so, is not part of the initial write), in a loop keep getting the db.collection.count() and whenever the count returns a
multiple of 500, record the count and the current timestamp. Use the data collected on Primary and remote secondary, compute the
replication delay.

MongoDB for MDS – Sharding
Goals:
- write to a shard primary node with physical proximity to the application server

- keep the shard primary node in close proximity to the application server [monitor the primary node of the replica set and if possible, restore the primary t
- reduce 'scatter/gather' on reads - use smart shard keys

Solution:

Add a geo-location based field in the schema, create a shard index based on that field, assign a tag to each shard and assign specific shard index field ra

e.g., Say we can add a 'DC' field into our collection. Assuming that the application somehow knows the data center it is running on, it can use this value for

Associate the tag ranges to specific tagged shard.
Inferred Technical Requirements
1. MongoDB Sharding (shard keys: region + siteId + userId, region + siteId + meetingUUID) to support 3 regions
(US, EMEA, APAC)
2. Sharding by siteId + userId or siteId + meetingUUID allows hosts from the same company (siteId), same region
to create meetings in different shards. if we need to scale horizontally, the shard config will add another shard
for the same siteId
3. Based on shard keys, we can support the requirements of local writes, local reads
4. Replication requirement - replicating 600,000 meetings/day within 15 minutes between 2 nodes (remark: early
benchmarking shows 11M meetings data replicated across 3 sites within 4 minutes)
5. Availability requirement - a primary node fails over to a secondary node within the same data center = < 30
sec; a primary node fails over to a secondary in a different data center = < 10 minutes

MongoDB使用案例
•
•

BillRun 计费系统
奥弗•科恩发布下一代的开源计费解决方案BillRun ，此方案利用MongoDB作为其后端存储。此计费系统已经运行于以色列发展最快的移动运
营商的产品环境，每个月能处理超过500M的呼叫数据记录CDR。

•
•
•
•
•

视觉中国
存储comments/feed/full text search
问题：
Fail-over失效，由于没有正确配置replica set,至少1 primary+2 sencondary+n arbiter.
Out of Memory导致宕机 --增加内存，使用正确驱动（非开发版）

•
•

优酷
优酷的在线评论业务已部分迁移到MongoDB，运营数据分析及挖掘处理前在使用Hadoop/HBase;

•
•
•
•

奇虎360
Document>100Million
问题 Time out (数据超过内存，随机读写，moving chunk时间)
Solution: 增大内存（甚至用SSD），节省空间使用（schema refactor）;调整balancer工作时间，避免高峰

•
•
•
•

Mailbox
100 Million Messages Per Day, store email and related data by MongoDB
https://tech.dropbox.com/2013/09/scaling-mongodb-at-mailbox/
Lesson: write lock contention Solution: separate hot collection to standalone cluster, sharding

•
•
•

Other
百度开放云-云数据库非关系型数据库用了mongoDB有很多中小开发者基于mongodb进行开发
Amazon E2： MongoDB后台数据库，如果其上应用data

MongoDB Knowledge Shareing

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a MongoDB Knowledge Shareing

Semelhante a MongoDB Knowledge Shareing (20)

Mais de Philip Zhong

Mais de Philip Zhong (14)

Último

Último (20)

MongoDB Knowledge Shareing

Notas do Editor