MongoDB & Hadoop - Understanding Your Big Data

Hadoop & MongoDB
Understanding your Big Data

3
Speakers
Jnan Dash
Senior Advisor
jnan.dash@mongodb.com
Kelly Stirman
Director of Products
kelly.stirman@mongodb.com

4
• Last 12 years (2002-Now) - Executive Consultant, on the board
and advisory board of several new software companies
including Big Data players such as MongoDB
• 10 Years (1992-2002) – Oracle, Group Vice President, Systems
Architecture and Technology, responsible for the server product
planning and rollout
• 16 years (1975-1992) – IBM, Planner, architect, and
development manager for DB2 product line at Silicon Valley
Lab and Austin Lab. Head of IBM‟s Database
architecture, strategy, and technology
Jnan Dash

5
• Finally, some real innovation in DBMS
• MongoDB momentum is unprecedented!
• The changing landscape needs MongoDB
– “Internet scale” distributed operations + highly flexible
data model for agile development + open source
• Perfect fit for cloud, mobility, and big data
Why am I excited about MongoDB?

6
• Big Data - Observations
• Evolution of Database Technology
• Hadoop+MongoDB
• Customer Examples
• Roadmap
• Summary
Agenda

7
1. Thousand years ago – Experimental Science
Description of natural phenomenon
2. Last few hundred years – Theoretical Science
Newton‟s Laws, Maxwell‟s Equation,..
3. Last few decades – Computational Science
Simulation of complex phenomena
4. Today – Data-intensive Science
Scientists overwhelmed with data deluge
Unify theory, experiment & simulation
The Fourth Paradigm

8
Internet Scale Commercial Supercomputing
• Originated with companies operating at Internet scale (to process
ever increasing #users and data)
– Yahoo in the 1990s, then Google, Facebook, Twitter
– They needed to do it quickly, economically, and affordably at scale
• Hadoop is the first commercial supercomputing software platform
– Works at scale, affordable at scale
• HPC was used for meteorology and engineering scientific super
computing. Big data is commercial equivalent of HPC
– Less about equations, more about discovery, patterns
• Many technologies have been around for decades
• Clustering
• Parallel processing
• Distributed file systems

11
What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

12
Big Data – the full spectrum
Transaction
Processing
Analytical
Processing
Data
Mining, Visualiz
ation, and
Integration
Tools
RDBMS OLAP/DW
DW
Appliance
Hadoop, Im
pala,..
NoSQL
NewSQL, In
-
Memory, Str
eam...
Online/Realtime Offline/Batch

13
Hadoop Ecosystem
Programming
Languages
Computation
Object Storage
Zookeeper
(Coordination)
Core Apache Hadoop Related Apache Projects
HDFS
(Hadoop Distributed File System)
MapReduce
(Distributed Programing Framework)
Hive
(SQL)
Pig
(Data Flow)
HBase
(Wide Column Storage)
HCatalog
(Meta Data)
HMS
(Management)
Table Storage

15
Data Management over the years
1960’s
File
Systems
1970’s
1st Generation
DBMS
Data as
Shared Resource
1980’s
Relational
Technology
Ease of Query
1990’s
New data types
OLAP/DW
Web Support
Unstructured Data
2005+
Big Data
Post-PC, Data
Deluge, 3Vs,
NoSQL

16
Operational vs. Analytics
2010
RDBMS
Key-Value/
Wide-column
OLAP/DW
Hadoop
2000
RDBMS
OLAP/DW
1990
RDBMS
Operational
Database
Data warehouse
Document DB
NoSQL

17
MongoDB Features
• JSON Document Model
with Dynamic Schemas
• Auto-Sharding for
Horizontal Scalability
• Text Search
• Aggregation Framework
and MapReduce
• Full, Flexible Index Support
and Rich Queries
• Native Replication for High
Availability
• Advanced Security
• Large Media Storage with
GridFS

18
Documents are Rich Data Structures
{
first_name: „Paul‟,
surname: „Miller‟,
cell: „+447557505611‟
city: „London‟,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: „Bentley‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}
Fields can contain an
array of sub-documents
Fields
Typed field
values
Fields can
contain
arrays

20
• Hundreds of thousands of records per second
• Fast response required
• Sometimes all data kept, sometimes just
summary
• Horizontal scalability required
Fast Moving Data

21
• A machine generates a specific kind of data
• The data model is unlikely to change
• But there are so many different machines…
• Queryability across all types
Data is Structured, but Varied…

22
• Event data written multiple times per second,
minute, or hour
• Tracking progression of metrics over time
Time Series Data

23
Do More With Your Data
MongoDB
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980
Geospatial
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
• Find all the cars described as having
leather seats
Aggregation
• Calculate the average value of Paul’s
car collection
Map Reduce
• What is the ownership pattern of colors
by geography over time? (is purple
trending up in China?)
{
first_name: „Paul‟,
surname: „Miller‟,
city: „London‟,
location: [51.524,-0.087],
cars: [
{ model: „Bentley‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}

25
Enterprise Big Data Stack
EDWHadoop
Management&Monitoring
Security&Auditing
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data

26
MongoDB & Hadoop
• Multi-source analytics
• Interactive & Batch
• Data lake
• Online, Real-time
• High concurrency & HA
• Live analytics
Operational Analytical
MongoDB
Connector for
Hadoop

27
Hadoop Is Good for…
Risk Modeling Churn Analysis
Recommendation
Modeling
Ad Targeting
Transaction
Analysis
Trade
Surveillance
Network Failure
Prediction
Search Quality Data Lake

28
MongoDB Is Good for…
Single View Mobile Apps Fraud Detection
Customer Data
Management
Content
Management &
Delivery
Database-as-a-
Service
Product & Asset
Catalogs
Internet of Things
Social &
Collaboration

30
Many more examples
Big Data Product & Asset
Catalogs
Security &
Fraud
Internet of
Things
Database-as-a-
Service
Mobile
Apps
Customer Data
Management
Single
View
Social &
Collaboration
Content
Management
Intelligence Agencies
Top Investment and
Retail Banks
Top US Retailer
Top Global Shipping
Company
Top Industrial Equipment
Manufacturer
Top Media Company
Top Investment and
Retail Banks

32
• Makes MongoDB a Hadoop-enabled file system
• Full use of MongoDB‟s indexes
• Read and write to live data, in-place
• Copy data between Hadoop and MongoDB
• Full support for data processing
– Hive
– MapReduce
– Pig
– Streaming
– EMR
MongoDB+Hadoop Connector
MongoDB
Connector for
Hadoop

33
Customer Example – MetLife
Customer
Service
• Insurance policies
• Demographic data
• Customer web data
• Call center data
• Real-time churn detection
• Customer action analysis
• Churn prediction
algorithms
Churn Analysis
MongoDB
Connector for
Hadoop

34
Customer Example - eCommerce
Travel
• Flights, hotels and cars
• Real-time offers
• User profiles, reviews
• User metadata (previous
purchases, clicks, views)
• User segmentation
• Offer recommendation engine
• Ad serving engine
• Bundling engine
Algorithms
MongoDB
Connector for
Hadoop

35
Roadmap
Capability Today Soon
Connectivity Custom
Centralized
Administration
MongoDB  Hadoop Dynamic reads Automated Snapshots
BSON Support MapReduce, Hive, Pig Impala, Tez, Spark
Hadoop  MongoDB Dynamic writes Bulk Loader

36
• Big Data covers a wide spectrum
– Volume, Velocity, Variety
– Hence the mythical equation Big Data = Hadoop
• Enterprises are more concerned about Variety
– MongoDB provides the best platform
• Hadoop and MongoDB are complimentary
– MongoDB for operational workloads
– Hadoop for analytical workloads
Summary

MongoDB & Hadoop - Understanding Your Big Data

MongoDB & Hadoop - Understanding Your Big Data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a MongoDB & Hadoop - Understanding Your Big Data

Semelhante a MongoDB & Hadoop - Understanding Your Big Data (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

MongoDB & Hadoop - Understanding Your Big Data

Notas do Editor