2. Trends
• More data
• Complex data
• Cloud computing + Computer architecture
trends ->
– Many commodity-type servers rather than one
large server; commodity-type storage
• Fast application start->deploy expectations ->
– Agile software development methodologies /
iteration
– Service oriented architectures
3. Wants
• Horizontal scaling
• Ability to store complex data and deal with
the malleability of real world schemas without
pain
• Works with my (object-oriented) programming
language without friction
• Works with my frequent release cycles
(iteration) without friction
• High single server performance
• Cloud-friendly
4. Wants
• Horizontal scaling
• Ability to store complex data and deal with
the malleability of real world schemas without
pain We Need
• Works with my (object-oriented) programming
language without friction A way to scale out
• Works with my frequent release cycles
(iteration) without friction A new data model
• High single server performance
• Cloud-friendly
5. Approach
• A new data model gives us a way to
scale, and a way to solve our development
wants
• Goals for data model:
– Maintain data separation from code
– Low friction and mapping cost our programming
language
– Malleability for adapting to constant changes of
the real world
– Ability to deal with polymorphic data
6. Approach
• Rich documents + Partitioning
– Each document lives on one shard
(partitioning)
• The catch:
– No complex transactions
7. Approach
• Rich documents + Partitioning
– Each document lives on one shard
(partitioning)
• The catch:
– No complex transactions
8. Wants
• Horizontal scaling
• Ability to store complex data and deal with
the malleability of real world schemas
without pain
• Works with my (object-oriented)
programming language without friction
• Works with my frequent release cycles Thus implying
(iteration) without friction use cases…
• High single server performance
• Cloud-friendly
• Caveats / trade-offs:
– No complex transactions
9. When should you consider
using MongoDB?
• You find yourself coding around database performance issues – for
example adding lots of caching.
• You are storing data in flat files.
• You are batch processing yet you need real-time.
• You are doing agile development, e.g., Scrum.
• Your data is complex to model in a relational db. e.g. a complex
derivative security; electronic health records, ...
• Your project is late :-)
• You are forced to use expensive SANs, proprietary servers, or
proprietary networks for your existing db.
• You are deploying to a public or private cloud.
10. When should you use
something else?
• Problems requiring SQL.
• Systems with a heavy emphasis on complex transactions such as banking
systems and accounting.
• Traditional Non-Realtime Data Warehousing (sometimes). Traditional
relational data warehouses and variants (columnar relational) are well suited
for certain business intelligence problems – especially if you need SQL for
your client tool (e.g. MicroStrategy). Exceptions where MongoDB is good
are:
• cases where the analytics are realtime
• cases where the data very complicated to model in relational
• when the data volume is huge
• when the source data is already in a mongo database
12. RDBMS NoSQL
DB
RDBMS
Data Data
Warehouse Warehouse RDBMS
The beginning Last 10 years Today
14. High Volume Data Feeds
Machine • More machines, more sensors, more
Generated data
Data • Variably structured
Stock Market • High frequency trading
Data
Social Media • Multiple sources of data
Firehose • Each changes their format constantly
15. Operational Intelligence
• Large volume of state about users
Ad Targeting • Very strict latency requirements
• Expose report data to millions of customers
Real time • Report on large volumes of data
dashboards • Reports that update in real time
Social Media • What are people talking about?
Monitoring
16. Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to
derive interesting and actionable patterns from their customers’ website traffic
Problem Why MongoDB Impact
Intuit hosts more than 500,000 Intuit hosts more than 500,000 In one week Intuit was able to
websites websites become proficient in MongoDB
wanted to collect and analyze wanted to collect and analyze development
data to recommend conversion data to recommend conversion Developed application features
and lead generation and lead generation more quickly for MongoDB than
improvements to customers. improvements to customers. for relational databases
With 10 years worth of user With 10 years worth of user MongoDB was 2.5 times faster
data, it took several days to data, it took several days to than MySQL
process the information using a process the information using a
relational database. relational database.
We did a prototype for one week, and within one week we had made big progress. Very big progress. It
was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
17. Marketing Personalization
Rich profiles
collecting multiple
complex actions
1 See Ad
Scale out to support { cookie_id: ‚1234512413243‛,
high throughput of advertiser:{
apple: {
activities tracked actions: [
2 See Ad { impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
Click { purchase: ‘laptop’, time: 354 }
3 ]
}
}
}
Dynamic schemas
make it easy to track
Indexing and
4 Convert vendor specific
querying to support
attributes
matching, frequency
capping
18. Meta Data Management
Data • Meta data about artifacts
• Content in the library
Archiving
Information • Have data sources that you don’t have access to
• Stores meta-data on those stores and figure out
discovery which ones have the content
• Retina scans
Biometrics • Finger prints
19. Meta data
Indexing and rich
query API for easy
searching and sorting
db.archives.
find({ ‚country”: ‚Egypt‛ });
Flexible data model
for similar, but
different objects
{ type: ‚Artefact‛, { ISBN: ‚00e8da9b‛,
medium: ‚Ceramic‛, type: ‚Book‛,
country: ‚Egypt‛, country: ‚Egypt‛,
year: ‚3000 BC‛ title: ‚Ancient Egypt‛
} }
20. Shutterfly uses MongoDB to safeguard more than six billion images for millions of
customers in the form of photos and videos, and turn everyday pictures into keepsakes
Problem Why MongoDB Impact
Managing 20TB of data (six JSON-based data structure 500% cost reduction and 900%
billion images for millions of Provided Shutterfly with an performance improvement
customers) partitioning by agile, high compared to previous Oracle
function. performance, scalable solution implementation
Home-grown key value store on at a low cost. Accelerated time-to-market for
top of their Oracle database Works seamlessly with nearly a dozen projects on
offered sub-par performance Shutterfly’s services-based MongoDB
Codebase for this hybrid store architecture Improved Performance by
became hard to manage reducing average latency for
High licensing, HW costs inserts from 400ms to 2ms.
The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly
an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and
deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
21. Content Management
• Comments and user generated
News Site content
• Personalization of content, layout
Multi-Device • Generate layout on the fly for each
rendering device that connects
• No need to cache static pages
• Store large objects
Sharing • Simple modeling of metadata
22. Content Management
Geo spatial indexing
Flexible data model for location based
GridFS for large
for similar, but searches
object storage
different objects
{ camera: ‚Nikon d4‛,
location: [ -122.418333, 37.775 ]
}
{ camera: ‚Canon 5d mkII‛,
people: [ ‚Jim‛, ‚Carol‛ ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}
{ origin: ‚facebook.com/photos/xwdf23fsdf‛,
license: ‚Creative Commons CC0‛,
size: {
dimensions: [ 124, 52 ],
units: ‚pixels‛
Horizontal scalability }
for large data sets }
23. Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire
text corpus – 3.5T of data in 20 billion records
Problem Why MongoDB Impact
Analyze a staggering amount of Migrated 5 billion records in a Reduced code by 75%
data for a system build on single day with zero downtime compared to MySQL
continuous stream of high- MongoDB powers every Fetch time cut from 400ms to
quality text pulled from online website requests: 20m API calls 60ms
sources per day Sustained insert speed of 8k
Adding too much data too Ability to eliminated words per second, with
quickly resulted in outages; memcached layer, creating a frequent bursts of up to 50k per
tables locked for tens of simplified system that required second
seconds during inserts fewer resources and was less Significant cost savings and 15%
Initially launched entirely on prone to error. reduction in servers
MySQL but quickly hit
performance road blocks
Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller.
Since we don’t spend time worrying about the database, we can spend more time writing code for our
application. -Tony Tam, Vice President of Engineering and Technical Co-founder
In the beginning, there was RDBMS, and if you needed to store data, that was what you used. But RDBMS is performance critical, and BI workloads tended to suck up system resources. So we carved off the data warehouse as a place to store a copy of the operational data for use in analytical queries. This offloaded work from the RDBMS and bought us cycles to scale higher. Today, we’re seeing another split. There’s a new set of workloads that are saturating RDBMS, and these are being carved off into yet another tier of our data architecture: the NoSQL store.