MongoDB is the leading NoSQL database due to a plenitude of reasons, open source, general purpose, document oriented database supported by a large community and educational platform. It's horizontal scalability features allows this to fit in the operational big data scenarios where the business needs point to realtime analytics and ever-increasing data sets. This talk will focus on the usage of MongoDB for big data operational purposes and why it's ideal to be used in such scenarios. Also integration with other notable big data technology out there like Hadoop and BI tools.
Norberto Leite - Senior Solutions Architect, @MongoDB.
Mongo DB presentation during the Pentaho & Big Data Ecosystem - Live Seminar 2013
7. MongoDB Overview
300+ employees
Offices in New York, Palo Alto, Washington DC,
London, Dublin, Barcelona and Sydney
600+ customers
Over $231 million in funding
9. MongoDB Vision
To provide the best database for how we build and
run apps today
Build
Run
– New and complex data
– Flexible
– New languages
– Faster development
– Big Data scalability
– Real-time
– Commodity hardware
– Cloud
12. MongoDB is full featured
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car built
between 1970 and 1980
Geospatial
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
• Find all the cars described as having leather
seats
Aggregation
• Calculate the average value of Paul’s car
collection
Map Reduce
• What is the ownership pattern of colors by
geography over time? (is purple trending up
in China?)
MongoDB
{ !
first_name: ‘Paul’,!
surname: ‘Miller’,!
city: ‘London’,!
location:
[45.123,47.232],!
cars: [ !
{ model: ‘Bentley’,!
year: 1973,!
value: 100000, … },!
{ model: ‘Rolls Royce’,!
year: 1965,!
value: 330000, … }!
}!
}!
16. RDBMS Scale = Bigger Computers
“Clients can also opt to run zEC12 without a raised
datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012
19. 2010 Search Index Size:
100,000,000 GB
New data added per day
100,000+ GB
Databases they could use
0
250,000+ MBP’s == 4.1 miles
This Was a Problem for Google
Source: http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
21. And for Facebook
2010: 13,000,000 queries per second
TPC #1 DB: 504,161 tps
TPC Top Results
22. And for Facebook
2010: 13,000,000 queries per second
TPC #1 DB: 504,161 tps
TPC Top Results
Top 10 combined: 1,370,368 tps
23. Living in the Post-transactional Future
Order-processing systems largely “done” (RDBMS);
primary focus on better search and recommendations
or adapting prices on the fly (NoSQL)
Vast majority of its engineering is focused on
recommending better movies (NoSQL), not
processing monthly bills (RDBMS)
Easy part is processing the credit card (RDBMS).
Hard part is making it location aware, so it knows
where you are and what you’re buying (NoSQL)
29. MongoDB/NoSQL Is Good for…
360° View of the
Customer
Mobile & Social
Apps
Fraud Detection
User Data
Management
Content
Management &
Delivery
Reference Data
Product Catalogs
Machine to
Machine Apps
Data Hub
30. MongoDB and Enterprise IT Stack
CRM, ERP, Collaboration, Mobile, BI
Data Management
Online Data
Offline Data
RDBMS
RDBMS
Hadoop
EDW
Infrastructure
OS & Virtualization, Compute, Storage, Network
Security & Auditing
Management & Monitoring
Applications
36. Fortune 500 & Global 500
• 10 of the Top Financial Services Institutions
• 10 of the Top Electronics Companies
• 10 of the Top Media and Entertainment Companies
• 8 of the Top Retailers
• 6 of the Top Telcos
• 5 of the Top Technology Companies
• 4 of the Top Healthcare Companies