2. Information
•Tweet: Hashtag #jawsdays #ijaws
•Please register you on ijaws on Doorkeeper (Next
meetup on Mid April)
•There’s the JOB board behind the wall
4. Introducing MongoDB
•Hybrid of NoSQL and RDB
•Easily Scales (up to certain point)
•Stores JSON document as ‘BSON’
•Has Seconday Index ( on any part of JSON Doc), Query Optimizer
•Replication, Sharding ready
5. To make MongoDB runs fast on AWS
•You have to understand:
•its architectural feature of memory
management
•Workload pattern of your application
•Size of your ‘HOT’ data
6. What’s the ‘HOT’ data?
•‘Hot’ Data is what accessed frequently
•Ex: If you simply write data like access logs and transfer
them to somewhere else, ‘hot’ spot could be very small
•If the collection has indexes, one write can make many
places hot
7. MongoDB does not manage memory
•Most DBMS has built-in MMS,
but MongoDB doesn’t.
•MongoDB accesses database
files through ‘Memory
mapped files’: Let the OS
manage the buffer
Traditional RDB
Memory
Buffer
DB Files
MongoDB
Memory Mapped
DB Files
OS
App
8. The Rules of Thumb about Memory
•Give enough memory to the OS to hold ‘HOT’ data
•Don’t forget about the indexes
•Use dedicated EC2 instances
9. Keep your data safe with Replication
•Using ReplicaSet, you can distribute
your data to many places easily
•You have choices to keep your data
safe from crashes
•EBS or Instance Store : trade off
between cost, safety, performance
Primary
Secondary Secondary
Try MongoDB’s Replicaset with:
https://bitbucket.org/tamagawa_ryuji/mongodb_replicaset_playground_on_vagrant
10. Storage Performance Evaluated
• Converted Wikipadia-ja’s page data (about 1,700,000
documents) to JSON
• Write them to MongoDB on EC2 from another instance
• Data writer is a simple python application with
pymongo driver running 4 processes