- MongoDB enables businesses to scale databases horizontally on commodity hardware or cloud infrastructure to handle terabytes or petabytes of data without downtime. It also allows easy adaptation by making flexible data modeling and adding new data types and sources simple. Additionally, MongoDB supports rich querying across diverse and changing data sets in real time to unlock insights from data. Case studies show how MongoDB has helped companies improve performance, innovate faster, and gain competitive advantages over relational databases.
3. 3
Your Industry Has Changed
UPFRONT SUBSCRIBE
Business
YEARS MONTHS
Applications
PC MOBILE
Customers
ADS SOCIAL
Engagement
SERVERS CLOUD
Infrastructure
4. 4
Your Data Has Changed
• 90% of data created in the
last 2 years
• 80% of enterprise data is
unstructured
• Unstructured data growing
2x faster than structured
Sources: IBM, Gartner 2012
6. 6
How Do You Manage Big Data?
* From Big Data Executive Summary of 50+ execs from F100, gov orgs
Top Big Data Issues
“Of Gartner's "3Vs" of big data (volume, velocity, variety), the
variety of data sources is seen by our clients as both the greatest
challenge and the greatest opportunity.”
Forrester, 2014
Data Variety (68%)
Data Volume (15%)
Other Data (17%)
Diverse, streaming or new data types
Greater than 100TB
Less than 100TB
10. 10
• Horizontal scale – on commodity
hardware or cloud – is mandatory
• Most apps require TBs of data, but you
want PBs of headroom
Your Database Must Not Throttle Success
Ambitious startup scale to 1M+ users in
weeks; 100s of millions of emails/month
Global media company scaled MongoDB to
4.5 PBs across public cloud infrastructure
Automated failover and ability to add nodes
means scale without downtime; “Blown
away by MongoDB’s performance”
11. 11
Powerful predictive analytics system that started
on Chief Data Officer’s laptop
Iterate…
Problem Results
• Diverse data from 30+
different government
agencies
• Limited budget – had to
prove the system to
justify budget
• Had to be able to
integrate geospatial
data with other highly
unstructured data
• Scales from single node
to many, many servers
• Easy-to-manage
dynamic data model
enables limitless growth
• Support for ad hoc
queries, geospatial
• Award-winning
government project
• Cost effective while
delivering exceptional
performance
• Easily extended to
incorporate new data
sources
Why MongoDB
14. 14
RDBMS
From Complexity to Simplicity
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
15. 15
• Break free of schema servitude: focus
on your app, not object-relational
mapping and rigid schema design
Your Database Must Make It Simple to
Add New Data Sources and Types
Struggled for years with RDBMS: schema
customization too difficult. MongoDB
“added flexibility and easy scalability”
Shaved years off projects to <4 months;
lowered TCO; no security compromise.
“Devs build apps w/out becoming DBAs”
Dramatically decreased drug development
time by making adding new data types
easy; integrates seamlessly with RDBMS
16. 16
Single view of customer data
(Virtually impossible with RDBMS)
Diverse Data…
Problem Why MongoDB Results
• 70+ disparate data
sources (maintframe,
RDBMS)
• RDBMS could not
support centralized data
mgt and federation of
information services
• Document model allows
easy integration of diverse
data sources
• Fast, easy scalability
• Full query language
• Delivers high scalability,
fast performance, and
easy maintenance, while
keeping support costs low
• Successful POC in 3
weeks; in production
within 90 days
• Single view of the
customer (improved
customer experience,
improved sales)
• 71% less expensive
19. 19
• Storing data for fast access isn’t
enough. Questions matter most
• Database must support rich queries,
indexing, aggregation and search
across multi-structured, rapidly
changing data sets in real time
Your Database Must Enable Rich
Querying of Your Data
Transformed cumbersome data storage to
high-performance data analytics.
MongoDB-based Internet of Things platform
that takes advantage of ever-changing
sensor data and analytics against this data.
Runs unified data store serving hundreds of
diverse web properties on MongoDB
20. 20
50% increase in paid subscribers due to 95%
performance improvement over RDBMS
Multi-attribute Queries
Problem Why MongoDB Results
• RDBMS couldn’t handle
high-volume, bi-
directional searches
• Couldn’t persist a
billion-plus matches
• RDBMS was difficult to
manage in production
(schema changes were
painful; hard to scale)
• Ease of management:
auto-scaling, auto-
sharding, no downtime
• Complex queries across
250+ different attributes
• Exceptional performance
• Ability to dynamically
update schema without
complex schema redesign
• 95% performance
improvement: 3 billion
matches daily using 60
million complex queries
across 250+ attributes
• Big increase in
customer satisfaction,
paid subscribers
• Significantly less
expensive
21. “I have not failed. I've just found 10,000 ways that won't work.”
― Thomas A. Edison
23. 23
8,000,000+
MongoDB Downloads
Build on NoSQL’s Largest Ecosystem
1,000+
Customers Across All Industries; Hundreds of
Thousands of Users
600+
Technology and Services Partners
35,000+
MongoDB Management Service (MMS) Users
35,000+
MongoDB User Group Members
200,000+
Online Education Registrants
Good to start by asking who is:
-first time to the event
-a user
-etc
Talk about the relational database and how incredible it was for our transactional systems of record.
One thing hasn’t changed: data means money. Your business depends on data more than ever before. Therefore, finding ways to optimize productivity with your data is crucial.
There is no such thing as NoSQL. Not as we tend to think of it, anyway. While NoSQL was born as a movement away from rigid relational data models so web giants could embrace Big Data with scale-out architectures, the term has come to categorize a set of databases that are more different than they are the same.
This broad categorization doesn’t work. It’s not helpful.
While we at MongoDB still sometimes refer to NoSQL, we try to do it sparingly, given its propensity to confuse rather than enlighten.
Deconstructing NoSQL
Today the NoSQL category includes a cacophony of over 100 document, key-value, wide-column and graph databases (link is external). Each of these database types comes with its own strengths and limits. Each differs markedly from the others, with disparate models and capabilities relative to data storage, querying, consistency, scalability and high availability.
Comparing a document database to a key-value store, for example, is like comparing a smartphone to a beeper. A beeper is exceptionally useful for getting a simple message from Point A to Point B. It’s fast. It’s reliable. But it’s nowhere near as functional as a smartphone, which can quickly and reliably transmit messages, but can also do so much more.
Both are useful, but the smartphone fits a far broader range of applications than the more limited beeper.
As such, organizations searching for a database to tackle Gartner’s three V’s of Big Data -- volume, velocity and variety -- won’t find an immediate answer in “NoSQL.” Instead, they need to probe deeper for a modern database that can handle all of their Big Data application requirements.
One of these requirements is, of course, the ability to handle large volumes of data, the original impetus behind the NoSQL movement. But the ability to handle volume, or scale, is something all databases categorized as “NoSQL” share. MongoDB, for example, counts among its users those who regularly store petabytes of data, perform over 1,000,000 operations per second and clusters that exceed 1,000 nodes.
A modern database, however, must do more than scale. Scalability is table stakes. It also must enable agility to accelerate development and time to market. It must allow organizations to iterate as they embrace new business requirements.
And a modern database must, above all, enable enterprises to take advantage of rapidly growing data variety. Indeed the “greatest challenge and opportunity” for enterprises, as Forrester notes, is managing a “variety of data sources,” including data types and sources that may not even exist today.
In general, all so-called NoSQL databases are much more helpful than relational databases at storing a wide variety of data types and sources, including mobile device, geospatial, social and sensor data. But the hallmark of a modern database its ability to allow organizations to do useful things with their data.
eHarmony:
Started with a simple architecture running Oracle. As their data volumes ballooned, they found they couldn’t perform high volume, bi-directional searches. And the second problem was that they could no longer persist a billion-plus potential matches at scale.
They turned to Postgres running on a bunch of high-end, expensive servers. Each one of eHarmony’s compatibility matching platform applications was co-located with a local Postgres database server that stored a complete copy of all searchable data, so that it could perform queries locally, hence reducing the load on the central database.
This worked until the data size became bigger, and the data model became more complex.
Compounding the problem was that every single time they needed to make any schema changes, such as adding a new attribute to the data model, it was a complete nightmare for both their engineering and ops teams. They would spend spent several hours first extracting the data dump from Postgres, massaging the data, copy it to multiple servers and multiple machines, reloading the data back to Postgres, and that translated to a lot of high operational cost to maintain this solution. And it was a lot worse if that particular attribute needed to be part of an index.
They decided they needed something different.
They didn’t want to repeat the same mistake, that is, a decentralized SQL solution based on Postgres. It had to support auto-scaling.
They also wanted a solution that didn’t require that they spend a lot of time maintaining the database, like adding a new shard, a new cluster, a new server to the cluster, and so forth. They needed auto-sharding.
As their big data got bigger, they wanted to be able to spec the data to multiple shards, across multiple physical servers, to maintain high throughput performance without any server upgrade. They also needed the database to allow auto-balancing of data to ensure even distribution of data across multiple shards seamlessly. In addition, the new database had to support fast, complex, multi-attribute queries with high performance throughput.
So eHarmony chose MongoDB. Result?
eHarmony is now able to generate over 3 billion potential matches each day, which depends on over 60 million complex queries across 250+ attributes each day. Their systems store and manage roughly 200 million photos and another 4B+ relationship questionnaires, comprising many tens of terabytes of data.
Whereas eHarmony’s RDBMS solution took two weeks to reprocess all of the people in its database, with MongoDB eHarmony has cut that by more than 95% to under 12 hours, analyzing 3 billion-plus potential matches every single day. As a result, eHarmony now sees a 30% increase in two-way communication, 50% increase in the paid subscribers, and 60% plus increase in traffic growth, in terms of the unique visitors and visits.
Big Data is new, and you’re likely going to fail as you start. But it’s almost guaranteed, as well, that you won’t know which data to capture, or how to leverage it, without trial and error. As such, if you were to “design for failure,” what key things would you need? You need to reduce the cost of failure, both in terms of time and money. You’d need to build on data infrastructure that supports your iterations toward success and then rewards you by making it easy and cost effective to scale.
In 1985, storage was the key expense: $100,000 per GB; developer salary: $28,000 per year
So relational databases were built to optimize for storage
In 2013, storage is cheap: $0.05 per GB. Developers are expensive: $90,000 per year
So MongoDB was built to optimize for developer productivity
This is what the ratio of those expenses looks like, in 1985 and today
Assumptions:
3-year TCO
1985: 2 developers and 5 GB
2013: 2 developers and 5 TB
Developer costs comprise the lion’s share relative to storage today. So optimize for developer productivity