8. 8
MongoDB Architecture
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
MMAP V1 WT In-Memory ? ?
Supported in MongoDB 3.0 Future Possible Storage Engines
Management
Security
Example Future State
Beta
9. 9
Storage Engine API
âąâŻ Allows to "plug-in" different storage engines
â⯠Different use cases require different performance characteristics
â⯠mmapv1 is not ideal for all workloads
â⯠More flexibility
âąâŻ Can mix storage engines on same replica set/sharded cluster
âąâŻ Opportunity to integrate further ( HDFS, native encrypted, hardware
optimized âŠ)
11. 2 Storage Engines Available
⊠and more in the making!
MMAPv1 WiredTiger
12. 12
MongoDB Architecture
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
MMAP V1 WT In-Memory ? ?
Supported in MongoDB 3.0 Future Possible Storage Engines
Management
Security
Example Future State
Beta
15. 15
MMAPv1
âąâŻ Improved concurrency control
âąâŻ Great performance on read-heavy workloads
âąâŻ Data & Indexes memory mapped into virtual address space
âąâŻ Data access is paged into RAM
âąâŻ OS evicts using LRU
âąâŻ More frequently used pages stay in RAM
3.0 Default
18. 18
What is WiredTiger?
âąâŻ Storage engine company founded by BerkeleyDB alums
âąâŻ Recently acquired by MongoDB
âąâŻ Available as a storage engine option in MongoDB 3.0
19. 19
Motivation for WiredTiger
âąâŻ Take advantage of modern hardware:
â⯠many CPU cores
â⯠lots of RAM
âąâŻ Minimize contention between threads
â⯠lock-free algorithms, e.g., hazard pointers
â⯠eliminate blocking due to concurrency control
âąâŻ Hotter cache and more work per I/O
â⯠compact file formats
â⯠compression
35. Cache
Page in Memory
root page
Internal Page
Internal Page
Internal Page
Internal Page
Leaf Page Leaf Page Leaf Page Leaf Page
Doc1Doc0 Doc5Doc2 Doc3 Doc4
Writer
Reader
Standard
pointers
between
tree levels
instead of
traditional
files system
offset
pointers
36. Page in Cache
Disk
Page images
Page images
Page images
on-disk Page
Image
index
Update
on-disk Page
Image
index
Updates
db.collection.insert( {"_id": 123})
Clean
Page
Dirty
Page
Eviction
Thread
37. Page in Cache
Disk
Page images
Page images
Page images
on-disk Page
Image
index
Update
on-disk Page
Image
index
Updates
db.collection.insert( {"_id": 123})
Clean
Page
Dirty
Page
Index is build
during read
38. Page in Cache
Disk
Page images
Page images
Page images
on-disk Page
Image
index
Updates'
db.collection.insert( {"_id": 123})
Dirty
Page
Updates are
held in a skip
list Updates''
db.collection.insert( {"_id": 125})
Updates'''
db.collection.insert( {"_id": 124})
Time
39. Reconciliation
Disk
Page images
Page images
Page images
on-disk Page
Image
index
Updates'
Dirty
Page
Eviction
Thread
Updates''
Updates'''
Reconciliation
Cache full
After X operations
On a checkpoint
40. 40
In-memory performance
âąâŻ Trees in cache are optimized for in-memory access
âąâŻ Follow pointers to traverse a tree
ââŻno locking to access pages in cache
âąâŻ Keep updates separate from clean data
âąâŻ Do structural changes (eviction, splits) in background threads
41. 41
Takeaways
âąâŻ Page and tree access is optimized
âąâŻ Allows concurrent operations â updates skip lists
âąâŻ Tuning options that you should be aware of:
â⯠storage.wiredTiger.engineConfig.cacheSizeGB
âąâŻ You can determine the space allocated for your cache
âąâŻ Makes your deployment and sizing more predictable
https://docs.mongodb.org/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB
Wiredtiger cache! MongoDB
uses more memory than just
the Storage Engine needs!
43. 43
What is Concurrency Control?
âąâŻ Computers have
ââŻmultiple CPU cores
ââŻmultiple I/O paths
âąâŻ To make the most of the hardware, software has to execute
multiple operations in parallel
âąâŻ Concurrency control has to keep data consistent
âąâŻ Common approaches:
ââŻlocking
ââŻkeeping multiple versions of data (MVCC)
45. 45
Multiversion Concurrency Control (MVCC)
âąâŻ Multiple versions of records kept in cache
âąâŻ Readers see the committed version before the transaction started
ââŻMongoDB âyieldsâ turn large operations into small transactions
âąâŻ Writers can create new versions concurrent with readers
âąâŻ Concurrent updates to a single record cause write conflicts
ââŻMongoDB retries with back-off
48. 48
WiredTiger Page IO
Document Data Model
WiredTiger Engine
Transactions
Snapshosts
Page
read/write
WAL
Cache
Schema &
Cursors
Row
Storage
Block
Management
DB Files Journal
on-disk Page
Image
index
Updates'''
Disk
Page
Reconciliation
Page Allocation
Splitting
49. 49
WiredTiger Page IO
Document Data Model
WiredTiger Engine
Transactions
Snapshosts
Page
read/write
WAL
Cache
Schema &
Cursors
Row
Storage
Block
Management
DB Files Journal
on-disk Page
Image
index
Updates'''
Disk
Page'
Reconciliation
Page Allocation
Splitting
Page''
50. 50
WiredTiger Page IO
Document Data Model
WiredTiger Engine
Transactions
Snapshosts
Page
read/write
WAL
Cache
Schema &
Cursors
Row
Storage
Block
Management
DB Files Journal
on-disk Page
Image
index
Updates'''
Disk
Reconciliation
Page Allocation
Splitting
Page
Compression
-⯠snappy (default)
-⯠zlib
-⯠none
51. 51
Compression
âąâŻ WiredTiger uses snappy compression by default in MongoDB
âąâŻ Supported compression algorithms:
â⯠snappy [default]: good compression, low overhead
â⯠zlib: better compression, more CPU
â⯠none
âąâŻ Indexes also use prefix compression
â⯠stays compressed in memory
52. 52
Checksums
âąâŻ A checksum is stored with every uncompressed page
âąâŻ Checksums are validated during page read
â⯠detects filesystem corruption, random bitflips
âąâŻ WiredTiger stores the checksum with the page address (typically
in a parent page)
â⯠extra safety against reading a stale page image
54. 54
Takeaway
âąâŻ Main feature that impacts the vast majority of MongoDB projects / use cases
âąâŻ CPU bound
âąâŻ Different algorithms for different workloads
â⯠Collection level compression tuning
âąâŻ Smaller Data Faster IO
â⯠Not only improves the disk space footprint
â⯠also IO
â⯠and index traversing
56. 56
Index per Directory
âąâŻ Use different drives for collections and indexes
â⯠Parallelization of write operations
â⯠MMAPv1 only allows directoryPerDatabase
mongod
Collection A
Index
"name
"
Collection B
Index
"_id"
57. 57
Consistency without Journaling
âąâŻ MMAPv1 uses write-ahead log (journal) to guarantee consistency
âąâŻ WT doesn't have this need: no in-place updates
â⯠Write-ahead log committed at checkpoints and with j:true
â⯠Better for insert-heavy workloads
â⯠By default journaling is enabled!
âąâŻ Replication guarantees the durability
60. 60
Whatâs next for WiredTiger?
âąâŻ Tune for (many) more workloads
ââŻavoid stalls during checkpoints with 100GB+ caches
ââŻmake capped collections (including oplog) more efficient
âąâŻ Adding encryption
âąâŻ More advanced transactional semantics in the storage engine API
61. 61
Updates and Upgrades
âąâŻ Can not
â⯠Can't copy database files
â⯠Can't just restart w/ same dbpath
âąâŻ Yes we can!
â⯠Initial sync from replica set works perfectly!
â⯠mongodump/restore
âąâŻ Rolling upgrade of replica set to WT:
â⯠Shutdown secondary
â⯠Delete dbpath
â⯠Relaunch w/ --storageEngine=wiredTiger
â⯠Wait for resync
â⯠Rollover