SlideShare uma empresa Scribd logo
1 de 35
Paris 
Tugdual Grall 
Technical Evangelist 
tug@mongodb.com 
@tgrall
MongoDB & Hadoop 
Tugdual Grall 
Technical Evangelist 
tug@mongodb.com 
@tgrall
Agenda 
Evolving Data Landscape 
MongoDB & Hadoop Use Cases 
MongoDB Connector Features 
Demo
Evolving Data Landscape
Hadoop 
“The Apache Hadoop software library is a framework 
that allows for the distributed processing of large 
data sets across clusters of computers using simple 
programming models.” 
• Terabyte and Petabyte datasets 
• Data warehousing 
• Advanced analytics 
http://hadoop.apache.org
‹#› 
Enterprise IT Stack
‹#› 
Operational vs. Analytical: Enrichment 
Applications, Interactions Warehouse, Analytics
Operational: MongoDB 
First-Level 
Analytics 
Internet of Things 
Mobile Apps 
Social 
Product/Asset 
Catalog 
Security & Fraud 
Customer Data 
Management 
Single View 
Churn Analysis 
Risk Modeling 
Trade 
Surveillance 
Sentiment 
Analysis 
Recommender 
Warehouse & ETL 
Predictive 
Analytics 
Ad Targeting
Analytical: Hadoop 
First-Level 
Analytics 
Internet of Things 
Mobile Apps 
Social 
Product/Asset 
Catalog 
Security & Fraud 
Customer Data 
Management 
Single View 
Churn Analysis 
Risk Modeling 
Trade 
Surveillance 
Sentiment 
Analysis 
Recommender 
Warehouse & ETL 
Predictive 
Analytics 
Ad Targeting
Operational & Analytical: Lifecycle 
First-Level 
Analytics 
Internet of Things 
Mobile Apps 
Social 
Product/Asset 
Catalog 
Security & Fraud 
Customer Data 
Management 
Single View 
Churn Analysis 
Risk Modeling 
Trade 
Surveillance 
Sentiment 
Analysis 
Recommender 
Warehouse & ETL 
Predictive 
Analytics 
Ad Targeting
MongoDB & Hadoop Use Cases
Commerce 
Applications 
powered by 
Analysis 
powered by 
Products & Inventory 
Recommended products 
Customer profile 
Session management 
Elastic pricing 
Recommendation models 
Predictive analytics 
Clickstream history 
MongoDB Connector 
for Hadoop
Insurance 
Applications 
powered by 
Analysis 
powered by 
Customer profiles 
Insurance policies 
Session data 
Call center data 
Customer action analysis 
Churn analysis 
Churn prediction 
Policy rates 
MongoDB Connector 
for Hadoop
Fraud Detection 
Payments Nightly Analysis 
MongoDB Connector 
for Hadoop 
3rd Party 
Data Sources 
Results Cache 
Fraud 
Detection 
Query Only 
Query Only
MongoDB Connector for Hadoop
‹#› 
Connector Overview 
DATA 
• Read/Write MongoDB 
• Read/Write BSON 
TOOLS 
• MapReduce 
• Pig 
• Hive 
• Spark 
PLATFORMS 
• Apache Hadoop 
• Cloudera CDH 
• Hortonworks HDP 
• MapR 
• Amazon EMR
‹#› 
Connector Features and Functionality 
• Computes splits to read data 
• Single Node, Replica Sets, Sharded Clusters 
• Mappings for Pig and Hive 
• MongoDB as a standard data source/destination 
• Support for 
• Filtering data with MongoDB queries 
• Authentication 
• Reading from Replica Set tags 
• Appending to existing collections
‹#› 
MapReduce Configuration 
• MongoDB input/output 
mongo.job.input.format = com.mongodb.hadoop.MongoInputFormat 
mongo.input.uri = mongodb://mydb:27017/db1.collection1 
mongo.job.output.format = com.mongodb.hadoop.MongoOutputFormat 
mongo.output.uri = mongodb://mydb:27017/db1.collection2 
• BSON input/output 
mongo.job.input.format = com.hadoop.BSONFileInputFormat 
mapred.input.dir = hdfs:///tmp/database.bson 
mongo.job.output.format = com.hadoop.BSONFileOutputFormat 
mapred.output.dir = hdfs:///tmp/output.bson
‹#› 
Pig Mappings 
• Input: BSONLoader and MongoLoader 
data = LOAD ‘mongodb://mydb:27017/db.collection’ 
using com.mongodb.hadoop.pig.MongoLoader 
• Output: BSONStorage and MongoInsertStorage 
STORE records INTO ‘hdfs:///output.bson’ 
using com.mongodb.hadoop.pig.BSONStorage
‹#› 
Hive Support 
• Access collections as Hive tables 
• Use with MongoStorageHandler or BSONStorageHandler 
CREATE TABLE mongo_users (id int, name string, age int) 
STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" 
WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”) 
TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”)
‹#› 
Spark 
• Use with MapReduce input/output 
formats 
• Create Configuration objects with 
input/output formats and data URI 
• Load/save data using SparkContext 
Hadoop file API
‹#› 
Data Movement 
Dynamic queries to MongoDB vs. BSON snapshots in HDFS 
Dynamic queries with most 
recent data 
Puts load on operational 
database 
Snapshots move load to 
Hadoop 
Snapshots add predictable 
load to MongoDB
Demo : Recommendation Platform
‹#› 
Movie Web
‹#› 
MovieWeb Web Application 
• Browse 
- Top movies by ratings count 
- Top genres by movie count 
• Log in to 
- See My Ratings 
- Rate movies 
• Recommendations 
- Movies You May Like 
- Recommendations
‹#› 
MovieWeb Components 
• MovieLens dataset 
– 10M ratings, 10K movies, 70K users 
– http://grouplens.org/datasets/movielens/ 
• Python web app to browse movies, recommendations 
– Flask, PyMongo 
• Spark app computes recommendations 
– MLLib collaborative filter 
• Predicted ratings are exposed in web app 
– New predictions collection
‹#› 
Spark Recommender 
• Apache Hadoop (2.3) 
- HDFS & YARN 
- Top genres by movie count 
• Spark (1.0) 
- Execute within YARN 
- Assign executor resources 
• Data 
- From HDFS, MongoDB 
- To MongoDB
‹#› 
MovieWeb Workflow 
Snapshot db 
as BSON 
Predict ratings for 
all pairings 
Write Prediction to 
MongoDB 
collection 
Store BSON 
in HDFS 
Read BSON 
into Spark App 
Create user movie 
pairing 
Web Application 
exposes 
recommendations 
Train Model from 
existing ratings 
Repeat Process
‹#› 
Execution 
$ spark-submit --master local  
--driver-memory 2G --executor-memory 2G  
--jars mongo-hadoop-core.jar,mongo-java-driver.jar  
--class com.mongodb.workshop.SparkExercise  
./target/spark-1.0-SNAPSHOT.jar  
hdfs://localhost:9000  
mongodb://127.0.0.1:27017/movielens  
predictions
Should I use MongoDB or Hadoop?
‹#› 
Business First! 
First-Level 
Analytics 
Internet of 
Things 
Mobile Apps 
Social 
What/Why How 
Product/Asse 
t Catalog 
Security & 
Fraud 
Customer 
Data 
Management 
Single View 
Churn 
Analysis 
Risk 
Modeling 
Trade 
Surveillance 
Sentiment 
Analysis 
Recommend 
er 
Warehouse 
& ETL 
Predictive 
Analytics 
Ad Targeting
‹#› 
The good tool for the task 
• Dataset size 
• Data processing complexity 
• Continuous improvement 
V1.0
‹#› 
The good tool for the task 
• Dataset size 
• Data processing complexity 
• Continuous improvement 
V2.0
‹#› 
Resources / Questions 
• MongoDB Connector for Hadoop 
- http://github.com/mongodb/mongo-hadoop 
• Getting Started with MongoDB and Hadoop 
- http://docs.mongodb.org/ecosystem/tutorial/getting-started- 
with-hadoop/ 
• MongoDB-Spark Demo 
- https://github.com/crcsmnky/mongodb-hadoop-workshop
MongoDB & Hadoop 
Tugdual Grall 
Technical Evangelist 
tug@mongodb.com 
@tgrall

Mais conteúdo relacionado

Mais procurados

MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)Jing Chen (Jerry) He
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerIBM Cloud Data Services
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategyIBM Sverige
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbDimgba Kalu
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingMongoDB
 
MongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB
 
MongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB Launchpad 2016: MongoDB 3.4: Your Database EvolvedMongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB Launchpad 2016: MongoDB 3.4: Your Database EvolvedMongoDB
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 

Mais procurados (20)

MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
 
Key note big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategyKey note   big data analytics ecosystem strategy
Key note big data analytics ecosystem strategy
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdb
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
MongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB and Azure Databricks
MongoDB and Azure Databricks
 
MongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB Launchpad 2016: MongoDB 3.4: Your Database EvolvedMongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
MongoDB Launchpad 2016: MongoDB 3.4: Your Database Evolved
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 

Semelhante a MongoDB and Hadoop

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsMongoDB
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBMongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBTobias Trelle
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB
 
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a TreeMongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a TreeMongoDB
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
MongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & ExecutionMongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & ExecutioniFour Technolab Pvt. Ltd.
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_SparkMat Keep
 
Machine Learning on Google Cloud with H2O
Machine Learning on Google Cloud with H2OMachine Learning on Google Cloud with H2O
Machine Learning on Google Cloud with H2OSri Ambati
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesAndrás Fehér
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 

Semelhante a MongoDB and Hadoop (20)

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
 
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a TreeMongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
MongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & ExecutionMongoDB Introduction, Installation & Execution
MongoDB Introduction, Installation & Execution
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Machine Learning on Google Cloud with H2O
Machine Learning on Google Cloud with H2OMachine Learning on Google Cloud with H2O
Machine Learning on Google Cloud with H2O
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 Minutes
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 

Mais de Tugdual Grall

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopTugdual Grall
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignTugdual Grall
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Tugdual Grall
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDBTugdual Grall
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationTugdual Grall
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iotTugdual Grall
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseTugdual Grall
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0Tugdual Grall
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 

Mais de Tugdual Grall (20)

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi Workshop
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema Design
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
 
Neotys conference
Neotys conferenceNeotys conference
Neotys conference
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

MongoDB and Hadoop

  • 1. Paris Tugdual Grall Technical Evangelist tug@mongodb.com @tgrall
  • 2. MongoDB & Hadoop Tugdual Grall Technical Evangelist tug@mongodb.com @tgrall
  • 3. Agenda Evolving Data Landscape MongoDB & Hadoop Use Cases MongoDB Connector Features Demo
  • 5. Hadoop “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.” • Terabyte and Petabyte datasets • Data warehousing • Advanced analytics http://hadoop.apache.org
  • 7. ‹#› Operational vs. Analytical: Enrichment Applications, Interactions Warehouse, Analytics
  • 8. Operational: MongoDB First-Level Analytics Internet of Things Mobile Apps Social Product/Asset Catalog Security & Fraud Customer Data Management Single View Churn Analysis Risk Modeling Trade Surveillance Sentiment Analysis Recommender Warehouse & ETL Predictive Analytics Ad Targeting
  • 9. Analytical: Hadoop First-Level Analytics Internet of Things Mobile Apps Social Product/Asset Catalog Security & Fraud Customer Data Management Single View Churn Analysis Risk Modeling Trade Surveillance Sentiment Analysis Recommender Warehouse & ETL Predictive Analytics Ad Targeting
  • 10. Operational & Analytical: Lifecycle First-Level Analytics Internet of Things Mobile Apps Social Product/Asset Catalog Security & Fraud Customer Data Management Single View Churn Analysis Risk Modeling Trade Surveillance Sentiment Analysis Recommender Warehouse & ETL Predictive Analytics Ad Targeting
  • 11. MongoDB & Hadoop Use Cases
  • 12. Commerce Applications powered by Analysis powered by Products & Inventory Recommended products Customer profile Session management Elastic pricing Recommendation models Predictive analytics Clickstream history MongoDB Connector for Hadoop
  • 13. Insurance Applications powered by Analysis powered by Customer profiles Insurance policies Session data Call center data Customer action analysis Churn analysis Churn prediction Policy rates MongoDB Connector for Hadoop
  • 14. Fraud Detection Payments Nightly Analysis MongoDB Connector for Hadoop 3rd Party Data Sources Results Cache Fraud Detection Query Only Query Only
  • 16. ‹#› Connector Overview DATA • Read/Write MongoDB • Read/Write BSON TOOLS • MapReduce • Pig • Hive • Spark PLATFORMS • Apache Hadoop • Cloudera CDH • Hortonworks HDP • MapR • Amazon EMR
  • 17. ‹#› Connector Features and Functionality • Computes splits to read data • Single Node, Replica Sets, Sharded Clusters • Mappings for Pig and Hive • MongoDB as a standard data source/destination • Support for • Filtering data with MongoDB queries • Authentication • Reading from Replica Set tags • Appending to existing collections
  • 18. ‹#› MapReduce Configuration • MongoDB input/output mongo.job.input.format = com.mongodb.hadoop.MongoInputFormat mongo.input.uri = mongodb://mydb:27017/db1.collection1 mongo.job.output.format = com.mongodb.hadoop.MongoOutputFormat mongo.output.uri = mongodb://mydb:27017/db1.collection2 • BSON input/output mongo.job.input.format = com.hadoop.BSONFileInputFormat mapred.input.dir = hdfs:///tmp/database.bson mongo.job.output.format = com.hadoop.BSONFileOutputFormat mapred.output.dir = hdfs:///tmp/output.bson
  • 19. ‹#› Pig Mappings • Input: BSONLoader and MongoLoader data = LOAD ‘mongodb://mydb:27017/db.collection’ using com.mongodb.hadoop.pig.MongoLoader • Output: BSONStorage and MongoInsertStorage STORE records INTO ‘hdfs:///output.bson’ using com.mongodb.hadoop.pig.BSONStorage
  • 20. ‹#› Hive Support • Access collections as Hive tables • Use with MongoStorageHandler or BSONStorageHandler CREATE TABLE mongo_users (id int, name string, age int) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”) TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”)
  • 21. ‹#› Spark • Use with MapReduce input/output formats • Create Configuration objects with input/output formats and data URI • Load/save data using SparkContext Hadoop file API
  • 22. ‹#› Data Movement Dynamic queries to MongoDB vs. BSON snapshots in HDFS Dynamic queries with most recent data Puts load on operational database Snapshots move load to Hadoop Snapshots add predictable load to MongoDB
  • 25. ‹#› MovieWeb Web Application • Browse - Top movies by ratings count - Top genres by movie count • Log in to - See My Ratings - Rate movies • Recommendations - Movies You May Like - Recommendations
  • 26. ‹#› MovieWeb Components • MovieLens dataset – 10M ratings, 10K movies, 70K users – http://grouplens.org/datasets/movielens/ • Python web app to browse movies, recommendations – Flask, PyMongo • Spark app computes recommendations – MLLib collaborative filter • Predicted ratings are exposed in web app – New predictions collection
  • 27. ‹#› Spark Recommender • Apache Hadoop (2.3) - HDFS & YARN - Top genres by movie count • Spark (1.0) - Execute within YARN - Assign executor resources • Data - From HDFS, MongoDB - To MongoDB
  • 28. ‹#› MovieWeb Workflow Snapshot db as BSON Predict ratings for all pairings Write Prediction to MongoDB collection Store BSON in HDFS Read BSON into Spark App Create user movie pairing Web Application exposes recommendations Train Model from existing ratings Repeat Process
  • 29. ‹#› Execution $ spark-submit --master local --driver-memory 2G --executor-memory 2G --jars mongo-hadoop-core.jar,mongo-java-driver.jar --class com.mongodb.workshop.SparkExercise ./target/spark-1.0-SNAPSHOT.jar hdfs://localhost:9000 mongodb://127.0.0.1:27017/movielens predictions
  • 30. Should I use MongoDB or Hadoop?
  • 31. ‹#› Business First! First-Level Analytics Internet of Things Mobile Apps Social What/Why How Product/Asse t Catalog Security & Fraud Customer Data Management Single View Churn Analysis Risk Modeling Trade Surveillance Sentiment Analysis Recommend er Warehouse & ETL Predictive Analytics Ad Targeting
  • 32. ‹#› The good tool for the task • Dataset size • Data processing complexity • Continuous improvement V1.0
  • 33. ‹#› The good tool for the task • Dataset size • Data processing complexity • Continuous improvement V2.0
  • 34. ‹#› Resources / Questions • MongoDB Connector for Hadoop - http://github.com/mongodb/mongo-hadoop • Getting Started with MongoDB and Hadoop - http://docs.mongodb.org/ecosystem/tutorial/getting-started- with-hadoop/ • MongoDB-Spark Demo - https://github.com/crcsmnky/mongodb-hadoop-workshop
  • 35. MongoDB & Hadoop Tugdual Grall Technical Evangelist tug@mongodb.com @tgrall

Notas do Editor

  1. Apache def, a framework to enable many things Distributed File system one of the core component is MapReduce Now it is more YARN, that is resource manager, and MR is just one type of jobs you can manage Mongo DB : GB and Terabytes Hadoop : Tb and Pb
  2. You have 2 places where you deal with data
  3. You have to think about “enrichment” MongoDB is here to enrich data that are in Hadoop Hadoop is here to enrich data that are in Mongodb Let’s look at the different uses cases between Operational and Analytics
  4. First level could be done in MongoDB “what is your application is talking to?”
  5. Hadoop will be there to analyze a bigger problem and do some treatment We are talking of hadoop when it is Pb of data
  6. We are trying to solve the bigger problem, by connecting the 2 technologies when it makes sense
  7. Split the data when reading data (Mapper) But also filtering queries, for example to take data from a specific timestamp To reduce the load on your cluster read from replicaset tags a new feature that people asked for, is adding result to the existing collection
  8. Spark in a new data processing that happens most in memory Take all the power of the connector Open a new “hadoop file” that is loaded in RDD ( Resilient Distributed Dataset )
  9. Load Data Read Users from MongoDB (user collection) Read Movies from BSON (HDFS) Read Ratings from MongoDB (ratings collection) Data Processing Generate (user/movies) pairs users.cartesian(movies) Train : Collaborative Filter ALS.train(ratings.rdd(), 10, 10, 0.01); Predict/Recommend Save data into MongoDB prediction collection