SlideShare uma empresa Scribd logo
1 de 40
Massimo Brignoli
Principal Solution Architect
massimo@mongodb.com
@massimobrignoli
L’architettura di Classe Enterprise di Nuova
Generazione
Agenda
• Nascita dei Data Lake
• Overview di MongoDB
• Proposta di un’architettura
EDM
• Case Study & Scenarios
• Data Lake Lessons Learned
Quanti dati?
• Una cosa non manca alla aziende: dati
– Flussi dei sensori
– Sentiment sui social
– Log dei server
– App mobile
• Analisti stimano una crescita del volume di dati del 40%
annuo, 90% dei quali non strutturati.
• Le tecnologie tradizionali (alcune disegnate 40 anni fa) non
sono sufficienti
La Promessa del “Big Data”
• Scoprire informazioni collezionando ed analizzando i dati
porta la promessa di
– Un vantaggio competitivo
– Risparmio economico
• Un esempio diffuso dell’utilizzo della tecnologia Big Data è la
“Single View”: aggregare tutto quello che si conosce di un
cliente per migliorarne l’ingaggio e i ricavi
• Il tradizionale EDW scricchiola sotto il carico, sopraffatto dal
volume e varietà dei dati (e dall’alto costo).
La Nascita dei Data Lake
• Molte aziende hanno iniziato a guardare verso un’architettura
detta Data Lake:
– Piattaforma per gestire i dati in modo flessibile
– Per aggregare I dati cross-silo in un unico posto
– Permette l’esplorazione di tutti i dati
• La piattaforma più in voga in questo momento è Hadoop:
– Permette la scalabilità orizzontale su hardware commodity
– Permette una schema di dati variegati ottimizzato in lettura
– Include strati di lavorazione dei dati in SQL e linguaggi comuni
– Grandi referenze (Yahoo e Google in primis)
Perché Hadoop?
• Hadoop Distributed FileSystem è disegnato per scalare su
grandi operazioni batch
• Fornisce un modello write-one read-many append-only
• Ottimizzato per lunghe scansione di TB o PB di dati
• Questa capacità di gestire dati multi-strutturati è usata:
– Segmentazione dei clienti per campagne di marketing e
recommendation
– Analisi predittiva
– Modelli di Rischio
Ma va bene per tutto?
• I Data Lake sono disegnati per fornire l’output di Hadoop alle
applicazioni online. Queste applicazioni hanno dei requisiti
tra cui:
– Latenza di risposta in ms
– Accesso random su un sottoinsieme di dati indicizzato
– Supporto di query espressive ed aggregazioni di dati
– Update di dati che cambiano valori frequentemente in real-time
Hadoop è la risposta a tutto?
• Nel nostro mondo guidato ormai dai dati, i millisecondi sono
importanti.
– Ricercatori IBM affermano che il 60% dei dati perde valore alcuni
millisecondi dopo la generazione
– Ad esempio identificare una transazione di borsa fraudolenta è inutile
dopo alcuni minuti
• Gartner predice che il 70% delle installazioni di Hadoop fallirà
per non aver raggiunto gli obiettivi di costo e di incremento
del fatturato.
Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
More specifically
• Join non necessarie causano pessime performance
• Costoso scalare verticalmente
• Lo schema rigido rende difficile il consolidamento
di datai variabili o non strutturati
• Ci sono differenze nei record da eliminare
durante la fase di aggregazione
• Process often takes many hours overnight
• Data is too stale for intraday decisions and
engagement
First, Quick MongoDB Background
Documents Enable Dynamic Schema & Optimal
Performance
Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer
ID
First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC
Customer
ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2
Document Model Benefits
Agility and flexibility
Data model supports business change
Rapidly iterate to meet new requirements
Intuitive, natural data representation
Eliminates ORM layer
Developers are more productive
Reduces the need for joins, disk seeks
Programming is more simple
Performance delivered at scale
{
customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
MongoDB Technical Capabilities
Application
Driver
Mongos
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
Shard
2
…
Primary
Secondary
Secondary
Shard
N
db.customer.insert({…})
db.customer.find({
name: ”John Smith”})
1. Dynamic Document
Schema
{ name: “John
Smith”,
date: “2013-08-
01”,
address: “10 3rd
St.”,
phone: {
home:
1234567890,
mobile:
1234568138 }
}
2. Native language
drivers
5. High
performance
- Data
locality
- Indexes
- RAM
3. High
availability
6. Horizontal scalability
- Sharding
4. Workload
Isolation
Morphi
a
MEAN
Stack
Java Python PerlRuby
Drivers & Ecosystem
Scale
250M Ticks/Sec
300K+ Ops/Sec
500K+ Ops/SecFed Agency
Performance
Petabytes
10s of billions of objects
13B documents
Data
1,400 Servers
1,000+ Servers
250+ Servers
Entertainment Co.
Cluster
Asian Internet Co.
3.2 Features Relevant for EDM
• WiredTiger as default storage engine
• In-memory storage engine
• Encryption at rest
• Document Validation Rules
• Compass (data viewer & query builder)
• Connector for BI (Visualization)
• Connector for Hadoop
• Connector for Spark
• $lookUp (left outer join)
Data Governance with Document Validation
Implement data governance without
sacrificing agility that comes from dynamic
schema
• Enforce data quality across multiple
teams and applications
• Use familiar MongoDB expressions to
control document structure
• Validation is optional and can be as
simple as a single field, all the way to
every field, including existence, data
types, and regular expressions
MongoDB Compass
For fast schema discovery and
visual construction of ad-hoc
queries
• Visualize schema
– Frequency of fields
– Frequency of types
– Determine validator rules
• View Documents
• Graphically build queries
• Authenticated access
MongoDB Connector for BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI
tool into equivalent MongoDB queries that
are sent to MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then
visualize the data based on user
requirements
Dynamic Lookup
Combine data from multiple
collections with left outer joins for
richer analytics & more flexibility in
data modeling
• Blend data from multiple sources
for analysis
• Higher performance analytics with
less application-side code and less
effort from your developers
• Executed via the new $lookup
operator, a stage in the MongoDB
Aggregation Framework pipeline
Aggregation Framework – Pipelined Analysis
Start with the original collection; each record
(document) contains a number of shapes (keys),
each with a particular color (value)
• $match filters out documents that don’t
contain a red diamond
• $project adds a new “square” attribute with a
value computed from the value (color) of the
snowflake and triangle attributes
• $lookup performs a left outer join with
another collection, with the star being the
comparison key
• Finally, the $group stage groups the data by
the color of the square and produces statistics
for each group
DB & Partner Ecosystem
RANK DBMS MODEL SCORE GROWTH (20 MO)
1. Oracle Relational DBMS 1,442 -5%
2. MySQL Relational DBMS 1,294 2%
3.
Microsoft SQL
Server
Relational DBMS 1,131 -10%
4. MongoDB Document Store 277 172%
5. PostgreSQL Relational DBMS 273 40%
6. DB2 Relational DBMS 201 11%
7. Microsoft Access Relational DBMS 146 -26%
8. Cassandra Wide Column 107 87%
9. SQLite Relational DBMS 105 19%
Only non-relational in the top 5; 2.5x ahead of nearest NoSQL Competitor
Partner Ecosystem (500+)
MongoDB Architecture Patterns
1. Operational Data Store (ODS)
2. Enterprise Data Service
3. Datamart/Cache
4. Master Data Distribution
5. Single Operational View
6. Operationalizing Hadoop
System of Record
System of Engagement
Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
MongoDB Hadoop/Spark Connector
Distributed
processing/
analytics
• Sub-second latency
• Expressive querying
• Flexible indexing
• Aggregations in database
• Great for any subset of
data
• Longer jobs
• Batch analytics
• Append only files
• Great for scanning all data
or large subsets in files
- MongoDB Hadoop
Connector
- Spark-mongodb
Both provide:
• Schema-on-read
• Low TCO
• Horizontal scale
How to choose the data management layer for each or
all stages?
Processing
Layer
?
When you want:
1. Secondary indexes
2. Sub-second latency
3. Aggregations in DB
4. Updates of data
For:
1. Scanning files
2. When indexes
not needed
Wide column store
(e.g. HBase)
For:
1. Primary key
queries
2. If multiple indexes
& slices not
needed
3. Optimized for
writing, not
reading
Data Store for Transformed Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Data Store for Raw Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Store
raw data
Transfor
m
- Typically just writing
record-by-record from
source data
- Usually just need high
write volumes
- All 3 options handle that
Transform read requirements
- Benefits to reading multiple datasets
sorted [by index], e.g. to do a merge
- Might want to look up across tables
with indexes (and join functionality in
MDB v3.2)
- Want high read performance while
writes are happening
Interactive querying on
the raw data could use
indexes with MongoDB
Data Store for Transformed Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AggregateTransform
Often benefits to
updating data as
merging multiple
datasets
Dashboards &
reports can have
sub-second latency
with indexes
Aggregate read requirements
- Benefits to using indexes for grouping
- Aggregations natively in the DB would help
- With indexes, can do aggregations on slices of
data
- Might want to look up across tables with
indexes to aggregate
Data Store for Aggregated Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AnalyzeAggregate
Dashboards &
reports can have
sub-second
latency with
indexes
Analytics read requirements
- For scanning all of data, could
be in any data store
- Often want to analyze a slice
of data (using indexes)
- Querying on slices is best in
MongoDB
Data Store for Last Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Analyze
Users
Dashboards &
reports can have
sub-second
latency with
indexes
- At the last step, there are
many consuming systems and
users
- Need expressive querying with
secondary indexes
- MongoDB is best option for the
publication or distribution of
analytical results and
operationalization of data
Other
Systems
Often digital
applications
- High scale
- Expressive querying
- JSON preferred
Often
RESTful
services,
APIs
More Complete EDM Architecture & Data Lake
…
Siloed source
databases
External feeds
(batch)
Streams
Data processing pipeline
Pub-sub,ETL,fileimports
Stream Processing
Downstrea
m Systems
… …
Single CSR
Application
Unified
Digital Apps
Operational
Reporting
…
… …
Analytic
Reporting
Drivers & Stacks
Customer
Clusterin
g
Churn
Analysis
Predictiv
e
Analytics
…
Distributed
Processing
Governance to
choose where to
load and process
data
Optimal
location for
providing
operational
response times
& slices
Can run
processing on
all data or
slices
Data Lake
Example scenarios
1.Single Customer View
a. Operational
b. Analytics on customer segments
c. Analytics on all customers
2.Customer profiles & clustering
3.Presenting churn analytics on high value customers
Single View of Customer
Spanish bank replaces Teradata and Microstrategy to
increase business and avoid significant cost
Problem Why MongoDB Results
Problem Solution Results
Took days to implement new
functionality and business policies,
inhibiting revenue growth
Branches needed an app providing
single view of the customer and real
time recommendations for new
products and services
Multi-minute latency for accessing
customer data stored in Teradata and
Microstrategy
Built single view of customer on
MongoDB – flexible and scalable app
easy to adapt to new business needs
Super fast, ad hoc query capabilities
(milliseconds), and real-time analytics
thanks to MongoDB’s Aggregation
Framework
Can now leverage distributed
infrastructure and commodity
hardware for lower total cost of
ownership and greater availability
Cost avoidance of 10M$+
Application developed and deployed
in less than 6 months. New business
policies easily deployed and executed,
bringing new revenue to the company
Current capacity allows branches to
load instantly all customer info in
milliseconds, providing a great
customer experience
Large Spanish
Bank
Case Study
Insurance leader generates coveted single view of
customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer, leading
to poor customer experience and
churn
145 years of policy data, 70+
systems, 15+ apps that are not
integrated
Spent 2 years, $25M trying build
single view with Oracle – failed
Built “The Wall” pulling in
disparate data and serving single
view to customer service reps in
real time
Flexible data model to aggregate
disparate data into single data
store
Churn analysis done with Hadoop
with relevant results output to
MongoDB
Prototyped in 2 weeks
Deployed to production in 90
days
Decreased churn and improved
ability to upsell/cross-sell
Top 15
Global Bank
Kicking Out Oracle
Global bank with 48M customers in 50 countries terminates
Oracle ULA & makes MongoDB database of choice
Problem Why MongoDB Results
Problem Solution Results
Slow development cycles due to RDBMS’
rigid data model hindering ability to meet
business demands
High TCO for hardware, licenses,
development, and support
(>$50M Oracle ULA)
Poor overall performance of customer-
facing and internal applications
Building dozens of apps on MongoDB,
both net new and migrations from Oracle
– e.g., significant portion of retail
banking, including customer-facing and
backoffice apps, fraud detection, card
activation, equity research content mgt.)
Flexible data model to develop apps
quickly and accommodate diverse data
Ability to scale infrastructure and costs
elastically
Able to cancel Oracle ULA. Evaluating
what apps can be migrated to MongoDB.
For new apps, MongoDB is default choice
Apps built in weeks instead of months or
years, e.g., ebanking app prototyped in 2
weeks and in production in 4 weeks
70% TCO reduction
L’architettura di classe enterprise di nuova generazione

Mais conteúdo relacionado

Mais procurados

how_graphs_eat_the_world
how_graphs_eat_the_worldhow_graphs_eat_the_world
how_graphs_eat_the_world
Ora Weinstein
 

Mais procurados (20)

Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
GraphTalk Barcelona - Keynote
GraphTalk Barcelona - KeynoteGraphTalk Barcelona - Keynote
GraphTalk Barcelona - Keynote
 
Change data capture
Change data captureChange data capture
Change data capture
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-apr
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph PlatformNeo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
 
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
MongoDB Evenings Houston: Implementing EDW Using MongoDB by Purvesh Patel, Ch...
 
Denodo DataFest 2017: Enabling Single View of Entities with Microservices
Denodo DataFest 2017: Enabling Single View of Entities with MicroservicesDenodo DataFest 2017: Enabling Single View of Entities with Microservices
Denodo DataFest 2017: Enabling Single View of Entities with Microservices
 
Neo4j GraphTalk Düsseldorf - Einführung in Graphdatenbanken und Neo4j
Neo4j GraphTalk Düsseldorf - Einführung in Graphdatenbanken und Neo4jNeo4j GraphTalk Düsseldorf - Einführung in Graphdatenbanken und Neo4j
Neo4j GraphTalk Düsseldorf - Einführung in Graphdatenbanken und Neo4j
 
Consumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data VirtualizationConsumption based analytics enabled by Data Virtualization
Consumption based analytics enabled by Data Virtualization
 
Case Study - Ibotta Builds A Self-Service Data Lake To Enable Business Growth...
Case Study - Ibotta Builds A Self-Service Data Lake To Enable Business Growth...Case Study - Ibotta Builds A Self-Service Data Lake To Enable Business Growth...
Case Study - Ibotta Builds A Self-Service Data Lake To Enable Business Growth...
 
how_graphs_eat_the_world
how_graphs_eat_the_worldhow_graphs_eat_the_world
how_graphs_eat_the_world
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
Parallel In-Memory Processing and Data Virtualization Redefine Analytics Arch...
 
Denodo DataFest 2017: Modern Data Architectures Need Real-time Data Delivery
Denodo DataFest 2017: Modern Data Architectures Need Real-time Data DeliveryDenodo DataFest 2017: Modern Data Architectures Need Real-time Data Delivery
Denodo DataFest 2017: Modern Data Architectures Need Real-time Data Delivery
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
 
E-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comE-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.com
 

Destaque

Destaque (18)

借助 MongoDB 实现扩展
借助 MongoDB 实现扩展借助 MongoDB 实现扩展
借助 MongoDB 实现扩展
 
A Weight Off Your Shoulders: MongoDB Atlas
A Weight Off Your Shoulders: MongoDB AtlasA Weight Off Your Shoulders: MongoDB Atlas
A Weight Off Your Shoulders: MongoDB Atlas
 
Develop a Basic REST API from Scratch Using TDD with Val Karpov
Develop a Basic REST API from Scratch Using TDD with Val KarpovDevelop a Basic REST API from Scratch Using TDD with Val Karpov
Develop a Basic REST API from Scratch Using TDD with Val Karpov
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Better, Faster, Stronger! Migration to 3.0
Better, Faster, Stronger! Migration to 3.0Better, Faster, Stronger! Migration to 3.0
Better, Faster, Stronger! Migration to 3.0
 
MongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBook
 
Webinaire 1 de la série Retour aux fondamentaux : Introduction à NoSQL
Webinaire 1 de la série Retour aux fondamentaux : Introduction à NoSQLWebinaire 1 de la série Retour aux fondamentaux : Introduction à NoSQL
Webinaire 1 de la série Retour aux fondamentaux : Introduction à NoSQL
 
Das Back to Basics – Webinar 1: Einführung in NoSQL
Das Back to Basics – Webinar 1: Einführung in NoSQLDas Back to Basics – Webinar 1: Einführung in NoSQL
Das Back to Basics – Webinar 1: Einführung in NoSQL
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDBMongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
 
Gestion des données d'entreprise à l'ère de MongoDB et du Data Lake
Gestion des données d'entreprise à l'ère de MongoDB et du Data LakeGestion des données d'entreprise à l'ère de MongoDB et du Data Lake
Gestion des données d'entreprise à l'ère de MongoDB et du Data Lake
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
MongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCF
MongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCFMongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCF
MongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCF
 

Semelhante a L’architettura di classe enterprise di nuova generazione

MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
Denodo
 

Semelhante a L’architettura di classe enterprise di nuova generazione (20)

L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Webinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDBWebinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 

Mais de MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

L’architettura di classe enterprise di nuova generazione

  • 1. Massimo Brignoli Principal Solution Architect massimo@mongodb.com @massimobrignoli L’architettura di Classe Enterprise di Nuova Generazione
  • 2. Agenda • Nascita dei Data Lake • Overview di MongoDB • Proposta di un’architettura EDM • Case Study & Scenarios • Data Lake Lessons Learned
  • 3. Quanti dati? • Una cosa non manca alla aziende: dati – Flussi dei sensori – Sentiment sui social – Log dei server – App mobile • Analisti stimano una crescita del volume di dati del 40% annuo, 90% dei quali non strutturati. • Le tecnologie tradizionali (alcune disegnate 40 anni fa) non sono sufficienti
  • 4. La Promessa del “Big Data” • Scoprire informazioni collezionando ed analizzando i dati porta la promessa di – Un vantaggio competitivo – Risparmio economico • Un esempio diffuso dell’utilizzo della tecnologia Big Data è la “Single View”: aggregare tutto quello che si conosce di un cliente per migliorarne l’ingaggio e i ricavi • Il tradizionale EDW scricchiola sotto il carico, sopraffatto dal volume e varietà dei dati (e dall’alto costo).
  • 5. La Nascita dei Data Lake • Molte aziende hanno iniziato a guardare verso un’architettura detta Data Lake: – Piattaforma per gestire i dati in modo flessibile – Per aggregare I dati cross-silo in un unico posto – Permette l’esplorazione di tutti i dati • La piattaforma più in voga in questo momento è Hadoop: – Permette la scalabilità orizzontale su hardware commodity – Permette una schema di dati variegati ottimizzato in lettura – Include strati di lavorazione dei dati in SQL e linguaggi comuni – Grandi referenze (Yahoo e Google in primis)
  • 6. Perché Hadoop? • Hadoop Distributed FileSystem è disegnato per scalare su grandi operazioni batch • Fornisce un modello write-one read-many append-only • Ottimizzato per lunghe scansione di TB o PB di dati • Questa capacità di gestire dati multi-strutturati è usata: – Segmentazione dei clienti per campagne di marketing e recommendation – Analisi predittiva – Modelli di Rischio
  • 7. Ma va bene per tutto? • I Data Lake sono disegnati per fornire l’output di Hadoop alle applicazioni online. Queste applicazioni hanno dei requisiti tra cui: – Latenza di risposta in ms – Accesso random su un sottoinsieme di dati indicizzato – Supporto di query espressive ed aggregazioni di dati – Update di dati che cambiano valori frequentemente in real-time
  • 8. Hadoop è la risposta a tutto? • Nel nostro mondo guidato ormai dai dati, i millisecondi sono importanti. – Ricercatori IBM affermano che il 60% dei dati perde valore alcuni millisecondi dopo la generazione – Ad esempio identificare una transazione di borsa fraudolenta è inutile dopo alcuni minuti • Gartner predice che il 70% delle installazioni di Hadoop fallirà per non aver raggiunto gli obiettivi di costo e di incremento del fatturato.
  • 9. Enterprise Data Management Pipeline … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems
  • 10. More specifically • Join non necessarie causano pessime performance • Costoso scalare verticalmente • Lo schema rigido rende difficile il consolidamento di datai variabili o non strutturati • Ci sono differenze nei record da eliminare durante la fase di aggregazione • Process often takes many hours overnight • Data is too stale for intraday decisions and engagement
  • 11. First, Quick MongoDB Background
  • 12. Documents Enable Dynamic Schema & Optimal Performance Relational MongoDB { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, number : “1-212-777-1213”, type : “cell” }] } Customer ID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Daniels Boston Phone Number Type DNC Customer ID 1-212-555-1212 home T 0 1-212-555-1213 home T 0 1-212-555-1214 cell F 0 1-212-777-1212 home T 1 1-212-777-1213 cell (null) 1 1-212-888-1212 home F 2
  • 13. Document Model Benefits Agility and flexibility Data model supports business change Rapidly iterate to meet new requirements Intuitive, natural data representation Eliminates ORM layer Developers are more productive Reduces the need for joins, disk seeks Programming is more simple Performance delivered at scale { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, number : “1-212-777-1213”, type : “cell” }] }
  • 14. MongoDB Technical Capabilities Application Driver Mongos Primary Secondary Secondary Shard 1 Primary Secondary Secondary Shard 2 … Primary Secondary Secondary Shard N db.customer.insert({…}) db.customer.find({ name: ”John Smith”}) 1. Dynamic Document Schema { name: “John Smith”, date: “2013-08- 01”, address: “10 3rd St.”, phone: { home: 1234567890, mobile: 1234568138 } } 2. Native language drivers 5. High performance - Data locality - Indexes - RAM 3. High availability 6. Horizontal scalability - Sharding 4. Workload Isolation
  • 16. Scale 250M Ticks/Sec 300K+ Ops/Sec 500K+ Ops/SecFed Agency Performance Petabytes 10s of billions of objects 13B documents Data 1,400 Servers 1,000+ Servers 250+ Servers Entertainment Co. Cluster Asian Internet Co.
  • 17. 3.2 Features Relevant for EDM • WiredTiger as default storage engine • In-memory storage engine • Encryption at rest • Document Validation Rules • Compass (data viewer & query builder) • Connector for BI (Visualization) • Connector for Hadoop • Connector for Spark • $lookUp (left outer join)
  • 18. Data Governance with Document Validation Implement data governance without sacrificing agility that comes from dynamic schema • Enforce data quality across multiple teams and applications • Use familiar MongoDB expressions to control document structure • Validation is optional and can be as simple as a single field, all the way to every field, including existence, data types, and regular expressions
  • 19. MongoDB Compass For fast schema discovery and visual construction of ad-hoc queries • Visualize schema – Frequency of fields – Frequency of types – Determine validator rules • View Documents • Graphically build queries • Authenticated access
  • 20. MongoDB Connector for BI Visualize and explore multi-dimensional documents using SQL-based BI tools. The connector does the following: • Provides the BI tool with the schema of the MongoDB collection to be visualized • Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing • Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  • 21. Dynamic Lookup Combine data from multiple collections with left outer joins for richer analytics & more flexibility in data modeling • Blend data from multiple sources for analysis • Higher performance analytics with less application-side code and less effort from your developers • Executed via the new $lookup operator, a stage in the MongoDB Aggregation Framework pipeline
  • 22. Aggregation Framework – Pipelined Analysis Start with the original collection; each record (document) contains a number of shapes (keys), each with a particular color (value) • $match filters out documents that don’t contain a red diamond • $project adds a new “square” attribute with a value computed from the value (color) of the snowflake and triangle attributes • $lookup performs a left outer join with another collection, with the star being the comparison key • Finally, the $group stage groups the data by the color of the square and produces statistics for each group
  • 23. DB & Partner Ecosystem
  • 24. RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4. MongoDB Document Store 277 172% 5. PostgreSQL Relational DBMS 273 40% 6. DB2 Relational DBMS 201 11% 7. Microsoft Access Relational DBMS 146 -26% 8. Cassandra Wide Column 107 87% 9. SQLite Relational DBMS 105 19% Only non-relational in the top 5; 2.5x ahead of nearest NoSQL Competitor
  • 26. MongoDB Architecture Patterns 1. Operational Data Store (ODS) 2. Enterprise Data Service 3. Datamart/Cache 4. Master Data Distribution 5. Single Operational View 6. Operationalizing Hadoop System of Record System of Engagement
  • 27. Enterprise Data Management Pipeline … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems
  • 28. MongoDB Hadoop/Spark Connector Distributed processing/ analytics • Sub-second latency • Expressive querying • Flexible indexing • Aggregations in database • Great for any subset of data • Longer jobs • Batch analytics • Append only files • Great for scanning all data or large subsets in files - MongoDB Hadoop Connector - Spark-mongodb Both provide: • Schema-on-read • Low TCO • Horizontal scale
  • 29. How to choose the data management layer for each or all stages? Processing Layer ? When you want: 1. Secondary indexes 2. Sub-second latency 3. Aggregations in DB 4. Updates of data For: 1. Scanning files 2. When indexes not needed Wide column store (e.g. HBase) For: 1. Primary key queries 2. If multiple indexes & slices not needed 3. Optimized for writing, not reading
  • 30. Data Store for Transformed Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems
  • 31. Data Store for Raw Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems Store raw data Transfor m - Typically just writing record-by-record from source data - Usually just need high write volumes - All 3 options handle that Transform read requirements - Benefits to reading multiple datasets sorted [by index], e.g. to do a merge - Might want to look up across tables with indexes (and join functionality in MDB v3.2) - Want high read performance while writes are happening Interactive querying on the raw data could use indexes with MongoDB
  • 32. Data Store for Transformed Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems AggregateTransform Often benefits to updating data as merging multiple datasets Dashboards & reports can have sub-second latency with indexes Aggregate read requirements - Benefits to using indexes for grouping - Aggregations natively in the DB would help - With indexes, can do aggregations on slices of data - Might want to look up across tables with indexes to aggregate
  • 33. Data Store for Aggregated Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems AnalyzeAggregate Dashboards & reports can have sub-second latency with indexes Analytics read requirements - For scanning all of data, could be in any data store - Often want to analyze a slice of data (using indexes) - Querying on slices is best in MongoDB
  • 34. Data Store for Last Dataset … Siloed source databases External feeds (batch) Streams Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png Transform Store raw data AnalyzeAggregate Pub-sub,ETL,fileimports Stream Processing Users Other Systems Analyze Users Dashboards & reports can have sub-second latency with indexes - At the last step, there are many consuming systems and users - Need expressive querying with secondary indexes - MongoDB is best option for the publication or distribution of analytical results and operationalization of data Other Systems Often digital applications - High scale - Expressive querying - JSON preferred Often RESTful services, APIs
  • 35. More Complete EDM Architecture & Data Lake … Siloed source databases External feeds (batch) Streams Data processing pipeline Pub-sub,ETL,fileimports Stream Processing Downstrea m Systems … … Single CSR Application Unified Digital Apps Operational Reporting … … … Analytic Reporting Drivers & Stacks Customer Clusterin g Churn Analysis Predictiv e Analytics … Distributed Processing Governance to choose where to load and process data Optimal location for providing operational response times & slices Can run processing on all data or slices Data Lake
  • 36. Example scenarios 1.Single Customer View a. Operational b. Analytics on customer segments c. Analytics on all customers 2.Customer profiles & clustering 3.Presenting churn analytics on high value customers
  • 37. Single View of Customer Spanish bank replaces Teradata and Microstrategy to increase business and avoid significant cost Problem Why MongoDB Results Problem Solution Results Took days to implement new functionality and business policies, inhibiting revenue growth Branches needed an app providing single view of the customer and real time recommendations for new products and services Multi-minute latency for accessing customer data stored in Teradata and Microstrategy Built single view of customer on MongoDB – flexible and scalable app easy to adapt to new business needs Super fast, ad hoc query capabilities (milliseconds), and real-time analytics thanks to MongoDB’s Aggregation Framework Can now leverage distributed infrastructure and commodity hardware for lower total cost of ownership and greater availability Cost avoidance of 10M$+ Application developed and deployed in less than 6 months. New business policies easily deployed and executed, bringing new revenue to the company Current capacity allows branches to load instantly all customer info in milliseconds, providing a great customer experience Large Spanish Bank
  • 38. Case Study Insurance leader generates coveted single view of customers in 90 days – “The Wall” Problem Why MongoDB ResultsProblem Solution Results No single view of customer, leading to poor customer experience and churn 145 years of policy data, 70+ systems, 15+ apps that are not integrated Spent 2 years, $25M trying build single view with Oracle – failed Built “The Wall” pulling in disparate data and serving single view to customer service reps in real time Flexible data model to aggregate disparate data into single data store Churn analysis done with Hadoop with relevant results output to MongoDB Prototyped in 2 weeks Deployed to production in 90 days Decreased churn and improved ability to upsell/cross-sell
  • 39. Top 15 Global Bank Kicking Out Oracle Global bank with 48M customers in 50 countries terminates Oracle ULA & makes MongoDB database of choice Problem Why MongoDB Results Problem Solution Results Slow development cycles due to RDBMS’ rigid data model hindering ability to meet business demands High TCO for hardware, licenses, development, and support (>$50M Oracle ULA) Poor overall performance of customer- facing and internal applications Building dozens of apps on MongoDB, both net new and migrations from Oracle – e.g., significant portion of retail banking, including customer-facing and backoffice apps, fraud detection, card activation, equity research content mgt.) Flexible data model to develop apps quickly and accommodate diverse data Ability to scale infrastructure and costs elastically Able to cancel Oracle ULA. Evaluating what apps can be migrated to MongoDB. For new apps, MongoDB is default choice Apps built in weeks instead of months or years, e.g., ebanking app prototyped in 2 weeks and in production in 4 weeks 70% TCO reduction

Notas do Editor

  1. Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
  2. Could make more visual
  3. More info: http://www.mongodb.com/mongodb-scale
  4. Kernel 3.2 Scope tracking: https://docs.google.com/spreadsheets/d/1L1EbbWoshUIHXBzCh5e3sALtAFxm_dJ52SRPR6GzeAY/edit#gid=0 Release notes for 3.1.6: http://docs.mongodb.org/manual/release-notes/3.1-dev-series/
  5. Determine validator rules: You can use the tool to figure out what you want to set as validation rules
  6. $lookup – this creates new documents which contain everything from the previous stage but augmented with data from any document from the second collection containing a matching colored star (i.e., the blue and yellow stars had matching lookup values, whereas the red star had none)
  7. In terms of reporting, A number of Business Intelligence (BI) vendors have developed connectors to integrate MongoDB as a data source with their suites, alongside traditional relational dbs. This integration provides reporting, visualizations, dash-boarding of MongoDB data
  8. Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
  9. Just a logical diagram. Processing could be on same physical servers as storage nodes to minimize data movement