4. Large Data Center Virtual Machines VMs in the Cloud (EC2) Containers Serverless & Services
Manage H/W Manage less H/W Size & provision VMs Size & provision containers Just send in requests
Wasted resources Better utilization Rent, not buy Rent with less waste Pay as you go
$$$$$ $$$$ $$$ $$ $
Evolution of Computing Models
5. 5
So you decided to build your app with MongoDB and Docker…
• Easy: Work with data in a natural, intuitive
way
• Flexible: Adapt and make changes quickly
• Fast: Get great performance with less code
• Versatile: Supports a wide variety of data
models and queries
• Modern platform for all applications
• Faster time to market
• Developer productivity
• Developer velocity
• IT infra reduction
• IT ops efficiency
• Faster issue resolution
6. 6
Focus on Developers
Focus on Time to Market
Focus on What Matters
Same in Dev as Prod
So you decided to build your app with MongoDB and Docker…
7. 7
FROM mongo
RUN apt-get clean && apt-get update && apt-get install -y git nodejs npm
COPY source dest
RUN mongod&
CMD npm run start
So, let’s start simply…
8. 8
Everything works great until
a new business requirement:
Analytics
Congrats! Your application is a success!
9. For more than a decade,
organizations have been pursuing the promise of
Digital Transformation
10. 88%
CIOs believe they have yet to benefit
from their digital strategy
Source: Harvey Nash / KPMG CIO Survey 2017
13. For Example, the Auto/Transportation Industry
“The source of the value is no
longer the vehicle itself”
Millennials don’t care about owning cars
People will always need to get from “A” to “B”
Must find new sources of product/service value
“Cars as Product” to “Transportation as a Service”
14. Disruptors Force Incumbents to Transform
Disruptors
Incumbents
Consumer
Appliances/Industrial
Manufacturer
Tire Manufacturer
Internet/Marketing
Known For...
15. but the
DATAS I L O E D | C O M P L E X | T R A P P E D
We accelerated time to value to get new apps and new features to our customers
is still
16. Code user authentication
Code data access controls
Provision backend server
Install runtime environment
Add code to make backend HA
Add code to scale backend
Monitor & manage backend infrastructure
Code REST API for frontend to use backend
Code backend application logic
Code application frontend
Code against each external service API
Continuously poll database for changes
Old World
Simple JSON Config
Handled automatically by services
Code frontend using single SDK/API to
access backend services
New World
Backend
Data Access
Frontend
Provide code for Functions
17. Data
Democratization:
Getting data to the right people in the right format at the right
time so that they may turn it into actionable information,
knowledge, and wisdom.
18. 18
Data Democratization – What and Why?
“We spend more time shoveling coal than steering the ship. We want to shift our energy to looking at the data and
navigating where we are going.”
-- Robert Kagarise, director of population health informatics and IT for the Delaware Valley Accountable Care Organization
What it is... Why it exists...
Data and insights accessible to all who need them
Data is siloed/complex, making it difficult to reconcile and deliver
insights to consumers of varying technical abilities
No technical barriers to accessing the data and insights
Time intensive for IT/Sys Admin/Data Stewards to wrangle data,
delaying insights and their business benefit
Proper access control and governance Security is non-negotiable (esp. in healthcare)
19. 19
Data Democratization – Who and How?
Technicians
• Create
foundation for
data that’s easy
to organize and
enrich over time
• Reduce
complexity and
technical debt
• Define security
logic that works
across orgs and
apps
Developers
• Connect to the
data via APIs &
microservices
vs duplication
or tight
coupling to
monoliths
Analysts
• Utilize self-
service analytic
s through tools
Front-LineWorkers
• Access
insights via
pre-built
dashboards or
embedded in
apps
20. 20
Both the Opportunity and the Challenge is Data...
Patients
Members
Billing
Physicians
Hospitals
Pharmacy
Lab Results
Procedures
Medications
Sales
Enrollment
Claims
Web
Mobile
Social
...but data across the ecosystem is shaped very differently
21. 21
Tabular (Relational) Data Model
Related data split across multiple records and tables
Document Data Model
Related data contained in a single, rich document
{
"_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : {
"first" : "John",
"last" : "Doe" },
"address" : [
{ "location" : "work",
"address" : {
"street" : "16 Hatfields",
"city" : "London",
"postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [
51.5065752,-0.109081]}},
+ {...}
],
"phone" : [
{ "location" : "work",
"number" : "+44-1234567890"},
+ {...}
],
"dob" : ISODate("1977-04-01T05:00:00Z"),
"retirement_fund" : NumberDecimal("1292815.75")
}
Need to Think About the Data Differently
22. 22
Tabular (Relational) Data Model
Related data split across multiple records and tables
Document Data Model
Related data contained in a single, rich document
{
"_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : {
"first" : "John",
"last" : "Doe" },
"address" : [
{ "location" : "work",
"address" : {
"street" : "16 Hatfields",
"city" : "London",
"postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [
51.5065752,-0.109081]}},
+ {...}
],
"phone" : [
{ "location" : "work",
"number" : "+44-1234567890"},
+ {...}
],
"dob" : ISODate("1977-04-01T05:00:00Z"),
"retirement_fund" : NumberDecimal("1292815.75")
}
This becomes complex & rigid to change
23. 23
How to Approach This Transformation?
Data Layer Optimization:
• unlocks the value of data stored in
silos and legacy systems
• drives rapid, iterative integration
of data sources for new and
existing consuming applications
• supports enterprise data
governance efforts
• builds deep technical expertise
and best practices
Data Source
Integration
Data
APIs
Data Loading
and
Streaming
Legacy
Systems
Offloading
Data
Analytics
Data
Governance
25. 25
Freedom to run anywhere
Local
On-premises
Server & Mainframe Private cloud
Fully managed
cloud serviceHybrid cloud Public cloud
• Database that runs the same everywhere
• Leverage the benefits of a multi-cloud strategy
• Global coverage
• Avoid lock-in
Convenience: same codebase, same APIs, same tools, wherever you run
26. 26
Docker Deployment Options with MongoDB
Docker Container with MongoD
Swarm & Compose with Rep Set
Docker & K8S with RS / SC & K8S Operator in Ops Mgr
Cloud-native & MongoDB Atlas
27. 27
MongoDB Solution: Replica Sets
Replica Set – up to 50 nodes
Self-healing
Data Center Aware
Addresses availability considerations:
• High Availability
• Disaster Recovery
• Maintenance
Workload Isolation: operational & analytics
Application
Driver
Primary
Secondary
Secondary
Replication
28. 28
Put data where you need it: Workload Isolation
Enable different workloads on the same data
• Combine operational and analytical workloads on a single data
platform
• Extract live insights from real-time data to enrich applications
• One set of nodes serving operational apps, replicating to
dedicated nodes serving analytics: up to 50 nodes in a single
replica set
• ETL-free
31. 31
Put data where you need it: Scalability with Sharding
Auto-Sharding
● Automatically scale beyond the constraints of a single node
● Application transparent
● Scale and rebalance incrementally, in real time
● Unlike NoSQL systems that randomly spray data across a
cluster, MongoDB exposes multiple data distribution policies to
optimize for query patterns and locality
•••Shard 1 Shard 2 Shard 3 Shard N
● Multiple sharding policies:
hashed, ranged, zoned
● Increase or decrease capacity as you go
● Automatic balancing for elasticity
Horizontally Scalable
33. Declarative Data Access Rules
Fine-grained data access controls
Authorize the user, not just the app
34. 34
ConsumersAPI Layer
Declarative Access Controls
basicProfile
(Read Rule)
{
"%or": [
{"$$pipeline.currentRole": "Nurse"},
{"$$pipeline.currentRole": "Doctor"}
]
}
basicProfile
(Write Rule)
{"$$pipeline.currentRole": "Doctor"}
completeHistory
(Read/Write Rules)
{"$$pipeline.currentRole": "Doctor"}
Doctor
Scientist
Nurse
{Aggregated Data}
API Layer
Declarative Access Controls
Pipeline for Analysis
{"$$pipeline.currentRole": "Scientist"}
Optimized Data Layer
Patient Profile
{
"patient_id": … ,
"basicProfile": {…} ,
"completeHistory": {…}
}
{Read Basic Data}
{Read/Write All}
● Fine-grained data access controls
● Base access on document, field, or value
● Authorize the user, not just the app
● Associate with user profile or any other info
● Defined with JSON rules, not code
API Layer is about more than just serving data
API Calls
35. • 24% CAGR: Hadoop,
Spark & Streaming
• 18% CAGR: Databases
• Databases are key
components within the
big data landscape
“Big Data” is More than Just Hadoop