10. Albatross structure
Internet
Request SessionDB
LayoutDB Gat page layout
MongoDB
WEB ReplSet
MongoDB
ReplSet Get components
Call APIs Memcache
API
Retrieve data
ContentsDB MongoDB
ReplSet 10
11. Albatross structure
Developer
HTML markup
LayoutDB Set page layout & Deploy API
API settings
CMS Batch servers
MongoDB
ReplSet
Set components
Insert Data
API servers
ContentsDB MongoDB
ReplSet 11
16. MapReduce
Our usage
We have never used MapReduce as regular operation.
However, We have used it for some irreglar case.
• To search the invalid articles that should be removed
because of someone’s mistakes...
• To analyze the number of new articles posted a day.
• To analyze the updated number an article.
• We get start considering to use it regularly for the
social data analyzing before long ...
16
18. Structure
We are using very poor machine (Virtual machine) !!
• Intel(R) Xeon(R) CPU X5650 2.67GHz 1core!!
• 4GB memory
• 50 GB disk space ( iScsi )
• CentOS5.5 64bit
• mongodb 1.8.0
– ReplicaSet 5 nodes ( + 1 Arbiter)
– Oplog size 1.2GB
– Average object size 1KB
18
19. Structure
Researched environment
We’ve also researched following environments...
• Virtual machine 1 core
– 1kb data , 6,000,000 documents
– 8kb data , 200,000 documents
• Virtual machine 3 core
– 1kb data , 6,000,000 documents
– 8kb data , 200,000 documents
• EC2 large instance
– 2kb data , 60,000,000 documents. ( 100GB )
19
20. Performance
I found the formula for making a rough estimation of QPS
1~8 kb documents + 1 unique index
C = Number of CPU cores (Xeon 2.67 GHz)
DD = Score of ‘dd’ command (byte/sec)
S = Document size (byte)
• GET qps = 4500 × C
• SET(fsync) bytes/s = 0.05×DD ÷ S
• SET(nsync) qps = 4500 BUT...
have chance of STALE
20
22. Performance example (on EC2 large)
Environment and amount of data
EC2 large instance
– 2kb data , 60,000,000 documents. ( 100GB )
– 1 unique index
Data-type
{
shop: 'someone',
item: 'something',
description: 'item explanation sentences...‘
} 22
29. Index probrem
Online indexisng is completely useless even if last version (2.0.2)
Indexing is lock operation in default.
Indexing operation can run as background
on the primary. But...
It CANNOT run as background on the secondary
Moreover the all secondary’s indexing run
at the same time !!
Result in above...
All slave freezes ! orz...
29
43. Index probrem
Accoding to mongodb.org this probrem will fix in 2.1.0
But not released formally.
So I checked out the source code up to date.
Certainlly it’ll be fixed !
Moreover it sounds like it’ll run as foreground
when slave status isn’t SECONDARY
(it means RECOVERING )
43
44. Index probrem
Probable 2.1.X indexing
Primary
save
Batch
Secondary Secondary Secondary
Client Client Client Client Client 44
48. Index probrem
Background indexing 2.1.X
But I think it’s not enough.
I think it can be fatal for the system that
the all secondaries slowdown at the same time !!
So...
48
56. Index probrem
But ... I easilly guess it’s difficult to apply for current Oplog
It would be great if I can operate indexing manually
at each secondaries
56
66. Index probrem
Manual indexing
Primary
Batch
Complete
It needs to support
ensureIndex(manual,background)
background operation
Secondary Secondary Secondary
Slowdown
Complete Complete Indexing
Just in case,if the ReplSet has only
one Secondary
Client Client Client Client Client 66
72. Unknown log & Out of control the ReplSet
We often suffered from going out of control the Secondaries...
• Secondaries change status repeatedly in a moment
between Secondary and Recovering (1.8.0)
• Then we found the strange line in the log...
[rsSync] replSet error RS102 too stale to catch up
72
73. What’s Stale ?
stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp
• 〈食品・飲料などが〉新鮮でない(⇔fresh);
• 気の抜けた, 〈コーヒーが〉香りの抜けた,
• 〈パンが〉ひからびた, 堅くなった,
• 〈空気・臭(にお)いなどが〉むっとする,
• いやな臭いのする
73
74. What’s Stale ?
stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp
• 〈食品・飲料などが〉新鮮でない(⇔fresh);
• 気の抜けた, 〈コーヒーが〉香りの抜けた,
• 〈パンが〉ひからびた, 堅くなった,
• 〈空気・臭(にお)いなどが〉むっとする,
• いやな臭いのする
どうも非常によろしくないらしい・・・
74
82. Stale
Client
mongod mongod
Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
82
83. Insert & Replication 2
B
Client
Insert
Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
83
84. Insert & Replication 2
C
Client
Insert
Insert C
C Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
84
85. Insert & Replication 2
A
Client
Update
Update A
Insert C
C Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
85
86. Insert & Replication 2
Client
Check Oplog
Update A
Insert C
C Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
86
87. Insert & Replication 2
Client
Sync
Update A Update A
Insert C Insert C
C Insert B C Insert B
B Insert A B Insert A
A A
Database Oplog Database Oplog
Primary Secondary
87
89. Stale
Client
mongod mongod
Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
89
90. Stale
B
Client
Insert
Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
90
91. Stale
C
Client
Insert
Insert C
C Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
91
92. Stale
A
Client
Update
Update A
Insert C
C Insert B
B Insert A Insert A
A A
Database Oplog Database Oplog
Primary Secondary
92
93. Stale
C
Client
Update
Update C
Update A
C Insert C
B Insert B Insert A
A Insert A A
Database Oplog Database Oplog
Primary Secondary
93
94. Stale
D
Client
Insert
Insert D
D Update C
C Update A
B Insert C Insert A
A Insert B A
Database Insert A Database Oplog
Primary Secondary
94
95. Stale
Client [Inset A]
not found !!
Check Oplog
Insert D
D Update C
C Update A
B Insert C Insert A
A Insert B A
Database Insert A Database Oplog
Primary Secondary
95
96. Stale
Client [Inset A]
not found !!
Check Oplog
It cannot get
infomation about
[Insert B].
Insert D
D Update C
C Update A So cannot sync !!
B Insert C Insert A
A Insert B A
It’s called STALE
Database Insert A Database Oplog
Primary Recovering
96
97. Stale
We have to understand the importance of adjusting oplog size
We can specify the oplog size as one of the command line option
Only at the first time per the dbpath
that is also specified as a command line.
Also we cannot change the oplog size
without clearing the dbpath.
Be careful !
97
100. InitialSync
Client
mongod mongod
Insert D
D Update C
C Update A
B Insert C
A
Database Oplog Database Oplog
Primary Startup
100
101. InitialSync
Client
Get last Oplog
Insert D
D Update C
C Update A
B Insert C Insert D
A
Database Oplog Database Oplog
Primary Recovering
101
102. InitialSync
D
Client
C
B
A Cloning DB
Insert D
D Update C
C Update A
B Insert C Insert D
A
Database Oplog Database Oplog
Primary Recovering
102
103. InitialSync
D
Client
C
B
A Cloning DB
Insert D
D Update C
C Update A
B Insert C Insert D
A A
Database Oplog Database Oplog
Primary Recovering
103
104. InitialSync
E D
Client
Insert C
B
A Cloning DB
E Insert E
D Insert D
C Update C
B B
Update A Insert D
A A
Insert C
Database Oplog Database Oplog
Primary Recovering
104
105. InitialSync
B
Client
Update
Cloning DB complete
E Update B
D Insert E D
C Insert D C
B Update C B Insert D
A Update A A
Database Oplog Database Oplog
Primary Recovering
105
106. InitialSync
Client
Check Oplog
E Update B
D Insert E D
C Insert D C
B Update C B Insert D
A A
Database Oplog Database Oplog
Primary Recovering
106
107. InitialSync
Client
Sync
E Update B E
D Insert E D Update B
C Insert D C Insert E
B Update C B Insert D
A A
Database Oplog Database Oplog
Primary Secondary
107
108. Additional infomation
From source code. ( I’ve never examed these... )
Secondary will try to sync from other Secondaries
when it cannot reach the Primary or
might be stale against the Primary.
There is a bit of chance that sync problem not occured if the
secondary has old Oplog or larger Oplog space than Primary
108
109. Sync from another secondary
Client
Insert D Insert D
D Update C D Update C
C Update A C Update A
B Insert C Insert A B Insert C
A Insert B A A Insert B
Database Insert A Database Oplog Database Insert A
Primary Secondary Secondary
109
110. Sync from another secondary
Client [Inset A]
not found !!
Check Oplog
Insert D Insert D
D Update C D Update C
C Update A C Update A
B Insert C Insert A B Insert C
A Insert B A A Insert B
Database Insert A Database Oplog Database Insert A
Primary Secondary Secondary
110
111. Sync from another secondary
Client But found at the other secondary
So it’s able to sync
Check Oplog
Insert D Insert D
D Update C D Update C
C Update A C Update A
B Insert C Insert A B Insert C
A Insert B A A Insert B
Database Insert A Database Oplog Database Insert A
Primary Secondary Secondary
111
112. Sync from the other secondary
Client But found at the other secondary
So it’s able to sync
Sync
Insert D Insert D Insert D
D Update C D Update C D Update C
C Update A C Update A C Update A
B Insert C B Insert C B Insert C
A Insert B A Insert B A Insert B
Insert A Insert A Insert A
Database Database Database
Primary Secondary Secondary
112
116. Disk space
Data fragment into any DB files sparsely...
We met the unfavorable circumstance in our DBs
This circumstance appears at some of our collections
around 3 months after we launched the services
db.ourcol.storageSize() = 16200727264 (15GB)
db.ourcol.totalSize() = 16200809184
db.ourcol.totalIndexSize() = 81920
db.outcol.dataSize() = 2032300 (2MB)
What’s happen to them !!
116
117. Disk space
Data fragment into any DB files sparsely...
It’s seems like to be caused by the specific operation
that insert , update and delete over and over.
Anyway we have to shrink the using disk space regularly
just like PostgreSQL’s vacume.
But how to do it ?
117
118. Disk space
Shrink the using disk spaces
MongoDB offers some functions for this case.
But couldn’t use in our case !
repairdatabase:
Only runable on the Primary.
It needs long time and BLOCK all operations !!
compact:
Only runable on the Secondary.
Zero-fill the blank space instead of shrink disk spaces.
So cannot shrink...
118
119. Disk space
Our measurements
For temporary collection:
To issue drop-command regularly.
For other collections:
1. Get rid of one secondary from the ReplSet.
2. Shut down this.
3. Remove all DB files.
4. Join to the ReplSet.
5. Do these operations one after another.
6. Step down the Primary. (Change Primary node)
7. At last, do 1 – 4 operations on prior Primary.
119
121. PHP client
We tried 1.4.4 and 1.2.2
1.4.4:
There is some critical bugs around connection pool.
We struggled to invalidate the broken connection.
I think, you should use 1.2.X instead of 1.4.X
1.2.2:
It seems like to be fixed around connection pool.
But there are 2 critical bugs !
– Socket handle leak
– Useless sleep
However, This version is relatively stable 121
as long as to fix these bugs
122. PHP client
We tried 1.4.4 and 1.2.2
https://github.com/crumbjp/Personal
- mongo1.2.2.non-wait.patch
- mongo1.2.2.sock-leak.patch
122
125. Closing
What’s MongoDB ?
It has very good READ performance.
We can use mongo instead of memcached.
if we can allow the limited write performance.
Die hard !
MongoDB have high availability even if under a severe stress..
Can use easilly without deep consideration
We can manage to do anything after getting start to use.
Let’s forget any awkward trivial things that have bothered us.
How to treat the huge data ?
How to put in the cache system ?
How to keep the availablity ?
And so on .... 125
126. Closing
Keep in mind
Sharding is challenging...
It’s last resort !
It’s hard to operate. In particular, to maintain config-servers.
[Mongos] is also difficult to keep alive.
I want the way to failover Mongos.
Mongo is able to run on the poor environment but...
You should ONLY put aside the large diskspace
Huge write is sensitive
Adjust the oplog size carefully
Indexing function has been unfinished
Cannot apply index online
126