Webinar: How MongoDB is Used to Manage Reference Data - May 2014

How MongoDB is Used to
Manage Reference Data
Daniel Roberts
@dmroberts
#MongoDB

2
• Problems space
• Existing technology solutions
• Why MongoDB?
• Case Study
Agenda

4
• How do you globally distribute reference data?
– Polymorphic data
• Price / Products / Securities Master
• Counterparty information - KYC
• Corporate Actions
• Golden / Single source truth
– Often changing in structure,
• e.g. new products
– Often High volume
• How is this typically solved today?
Problem Space

5
• How do you make this available to client
applications?
– Easy to access
– No stale data
• Distribute data though multiple technologies
• What happens when schema changes are
required?
– Multiple down stream systems affected.
Problem Space

6
Relational: All Data is Column/Row
IssID IssuerName PVCurrency
117883 DWS Vietnam Fund USD
69461 Independence III Cdo Ltd USD
102862 Zamano Plc EUR
73277 Green Way BMD
65134 First European Growth Inc. CHF
SecID EventID Company_Meeting IssID
762288 407341 AGM 117883
81198 243459 SDCHG 69461
422999 410626 AGM 102862
422999 243440 SDCHG 102862
75128 20056 ISCHG 65134

7
MongoDB stores data as JSON
Relational MongoDB
{
"IssID" : 65134,
"IssuerName" : "First European
Growth Inc.",
”PVCurrency" : “USD”,
"actions" : [
{
"Company_Meeting" :
"ISCHG",
"EventID" : 20056,
"SecID" : 75128
},
{
"Company_Meeting" : ”AGM",
"EventID" : 2716296,
"SecID" : 75128
}
]
}

8
Do More With Your Data
MongoDB
Rich Queries
• Find all meeting company AGMs that
happened last week.
Text Search
• Find all actions where IssuerName
includes “European”
Aggregations
• How many companies have
PVCurrency as USD
{
"IssID" : 65134,
"IssuerName" : "First European
Growth Inc.",
”PVCurrency" : “USD”,
"actions" : [
{
"Company_Meeting" :
"ISCHG",
"EventID" : 20056,
"SecID" : 75128
},
{
"Company_Meeting" : ”AGM",
"EventID" : 2716296,
"SecID" : 75128
}
]
}

10
• What do reference data solutions look like today?
• Storage
– Relational Database and/or Caching Technologies
– File
• Replication
– ETL or Messaging
• Complex, Costly and Brittle
– Maintenance
• schema changes / infrastructure
• Multiple technologies
Current Implementations

11
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
Why MongoDB?

12
Document Model Benefits
• Agility and flexibility
– Data model supports business change
– Rapidly iterate to meet new requirements
• Intuitive, natural data representation
– Eliminates ORM layer
– Developers are more productive
• Reduces the need for joins, disk seeks
– Programming is more simple
– Performance delivered at scale

13
Developers are more productive

14
2. Built in replication and high availability
Why MongoDB?

15
Replica Sets
• Replica Set – two or more copies
• Self-healing
• Addresses availability
considerations:
– High Availability
– Disaster Recovery
– Maintenance
• Deployment Flexibility
– Data locality to users
– Workload isolation: operational &
analytics
Primary
Driver
Application
Secondary
Secondary
Replication

16
Global Replication
Bloomberg
IDC
Reuter
Integration
Avoid complicated and costly
internal data distribution
infrastructure.
Single Data vendor interface

17
Add many nodes
Real-Time
Real-Time Real-Time
Real-Time
Real-Time
Real-Time
Real-Time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary

18
2. Built in replication and high availability
3. Tag Aware Sharding (Geo)
Why MongoDB?

19
Automatic Sharding
• Three types of sharding: hash-based, range-based, tag-
aware
• Increase or decrease capacity as you go
• Automatic balancing

20
Query Routing
• Multiple query optimization models
• Each sharding option appropriate for different apps

21
Read Global/Write Local
Primary:NYC
Secondary:NYC
Primary:LON
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD

23
Distribute reference data globally in real-time for
fast local accessing and querying
Case Study: Global investment bank
Problem Why MongoDB Results
• Delays up to 20 hours
in distributing data via
ETL
• Charged multiple times
globally for same data
• Incurring regulatory
penalties from missing
SLAs
• Had to manage 20
distributed systems with
same data
• Dynamic schema: easy to
load initially & over time
• Auto-replication: data
distributed in real-time,
read locally
• Both cache and database:
cache always up-to-date
• Simple data modeling &
analysis: easy changes
and understanding
• Will save considerable
costs.
• Individual Groups use
internal data instead of
paying vendors separately
• Data in sync globally,
usually within seconds
• Moving towards one global
shared data service

24
Previous Reference Data
Management Architecture
Feeds & Batch data
• Pricing
• Accounts
• Securities Master
• Corporate actions
Source
Master Data
(RDBMS)
ETL
ETL ETL
ETL
ETL
ETL
ETL
Destination
Data
(RDBMS)
Each represents
• People $
• Hardware $
• License $
• Reg penalty $
• & other downstream
problems

25
Solution with MongoDB
Feeds & Batch data
• Pricing
• Accounts
• Securities Master
• Corporate actions
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Each represents
• No people $
• Less hardware $
• Less license $
• No penalty $
• & many less
problems
MongoDB
Secondaries
MongoDB
Primary

26
• Reference Data technology requirements:
Summary
Database
Cache
Geographically
replicated
Rich Query &
Search
Flexible Schema
Scalable
Cost Effective
MongoDB
Single Technology to
meet all these needs

27
For More Information
Resource Location
MongoDB Downloads mongodb.com/download
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
White Papers mongodb.com/white-papers
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Documentation docs.mongodb.org
Additional Info info@mongodb.com
Resource Location

28
• Learn to Build & Manage Modern Apps in Two Days
• Largest Gather of MongoDB World Experts Ever
• 80+ Sessions from Fundamentals to Advanced Opps. Use
cases from all industries
• Connect with developers, administrators & execs building
innovative applications
• Ecosystem Partners: IBM, AWS, Microsoft + More
• Meet the Experts – Includes Founder Dwight Merriman
• Code Webinar300 - $300 off Registration
• www.mongodbworld.com
MongoDB World – June 23-25, New York City

Webinar: How MongoDB is Used to Manage Reference Data - May 2014

Webinar: How MongoDB is Used to Manage Reference Data - May 2014

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Webinar: How MongoDB is Used to Manage Reference Data - May 2014

Semelhante a Webinar: How MongoDB is Used to Manage Reference Data - May 2014 (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

Webinar: How MongoDB is Used to Manage Reference Data - May 2014

Notas do Editor