Managing and distributing reference data globally has always been a challenge for financial institutions. Managing and maintaining database schemas while integrating and replicating that data across geographies is costly and time consuming. MongoDB's native replication capabilities and partitioned architecture make it simple to distribute and synchronize data efficiently across the globe. MongoDB’s dynamic schema dramatically reduces database maintenance for schema migrations – data structure changes can be applied with no down time, and with no impact to existing applications. For example, by migrating its reference data management application to MongoDB, a Tier 1 bank dramatically reduced the license and hardware costs associated with the proprietary relational database it previously ran.
4. 4
• How do you globally distribute reference data?
– Polymorphic data
• Price / Products / Securities Master
• Counterparty information - KYC
• Corporate Actions
• Golden / Single source truth
– Often changing in structure,
• e.g. new products
– Often High volume
• How is this typically solved today?
Problem Space
5. 5
• How do you make this available to client
applications?
– Easy to access
– No stale data
• Distribute data though multiple technologies
• What happens when schema changes are
required?
– Multiple down stream systems affected.
Problem Space
6. 6
Relational: All Data is Column/Row
IssID IssuerName PVCurrency
117883 DWS Vietnam Fund USD
69461 Independence III Cdo Ltd USD
102862 Zamano Plc EUR
73277 Green Way BMD
65134 First European Growth Inc. CHF
SecID EventID Company_Meeting IssID
762288 407341 AGM 117883
81198 243459 SDCHG 69461
422999 410626 AGM 102862
422999 243440 SDCHG 102862
75128 20056 ISCHG 65134
8. 8
Do More With Your Data
MongoDB
Rich Queries
• Find all meeting company AGMs that
happened last week.
Text Search
• Find all actions where IssuerName
includes “European”
Aggregations
• How many companies have
PVCurrency as USD
{
"IssID" : 65134,
"IssuerName" : "First European
Growth Inc.",
”PVCurrency" : “USD”,
"actions" : [
{
"Company_Meeting" :
"ISCHG",
"EventID" : 20056,
"SecID" : 75128
},
{
"Company_Meeting" : ”AGM",
"EventID" : 2716296,
"SecID" : 75128
}
]
}
10. 10
• What do reference data solutions look like today?
• Storage
– Relational Database and/or Caching Technologies
– File
• Replication
– ETL or Messaging
• Complex, Costly and Brittle
– Maintenance
• schema changes / infrastructure
• Multiple technologies
Current Implementations
11. 11
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
Why MongoDB?
12. 12
Document Model Benefits
• Agility and flexibility
– Data model supports business change
– Rapidly iterate to meet new requirements
• Intuitive, natural data representation
– Eliminates ORM layer
– Developers are more productive
• Reduces the need for joins, disk seeks
– Programming is more simple
– Performance delivered at scale
14. 14
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
2. Built in replication and high availability
Why MongoDB?
15. 15
Replica Sets
• Replica Set – two or more copies
• Self-healing
• Addresses availability
considerations:
– High Availability
– Disaster Recovery
– Maintenance
• Deployment Flexibility
– Data locality to users
– Workload isolation: operational &
analytics
Primary
Driver
Application
Secondary
Secondary
Replication
18. 18
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
2. Built in replication and high availability
3. Tag Aware Sharding (Geo)
Why MongoDB?
19. 19
Automatic Sharding
• Three types of sharding: hash-based, range-based, tag-
aware
• Increase or decrease capacity as you go
• Automatic balancing
23. 23
Distribute reference data globally in real-time for
fast local accessing and querying
Case Study: Global investment bank
Problem Why MongoDB Results
• Delays up to 20 hours
in distributing data via
ETL
• Charged multiple times
globally for same data
• Incurring regulatory
penalties from missing
SLAs
• Had to manage 20
distributed systems with
same data
• Dynamic schema: easy to
load initially & over time
• Auto-replication: data
distributed in real-time,
read locally
• Both cache and database:
cache always up-to-date
• Simple data modeling &
analysis: easy changes
and understanding
• Will save considerable
costs.
• Individual Groups use
internal data instead of
paying vendors separately
• Data in sync globally,
usually within seconds
• Moving towards one global
shared data service
24. 24
Previous Reference Data
Management Architecture
Feeds & Batch data
• Pricing
• Accounts
• Securities Master
• Corporate actions
Source
Master Data
(RDBMS)
ETL
ETL ETL
ETL
ETL
ETL
ETL
Destination
Data
(RDBMS)
Each represents
• People $
• Hardware $
• License $
• Reg penalty $
• & other downstream
problems
25. 25
Solution with MongoDB
Feeds & Batch data
• Pricing
• Accounts
• Securities Master
• Corporate actions
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Each represents
• No people $
• Less hardware $
• Less license $
• No penalty $
• & many less
problems
MongoDB
Secondaries
MongoDB
Primary
26. 26
• Reference Data technology requirements:
Summary
Database
Cache
Geographically
replicated
Rich Query &
Search
Flexible Schema
Scalable
Cost Effective
MongoDB
Single Technology to
meet all these needs
27. 27
For More Information
Resource Location
MongoDB Downloads mongodb.com/download
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
White Papers mongodb.com/white-papers
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Documentation docs.mongodb.org
Additional Info info@mongodb.com
Resource Location
28. 28
• Learn to Build & Manage Modern Apps in Two Days
• Largest Gather of MongoDB World Experts Ever
• 80+ Sessions from Fundamentals to Advanced Opps. Use
cases from all industries
• Connect with developers, administrators & execs building
innovative applications
• Ecosystem Partners: IBM, AWS, Microsoft + More
• Meet the Experts – Includes Founder Dwight Merriman
• Code Webinar300 - $300 off Registration
• www.mongodbworld.com
MongoDB World – June 23-25, New York City
Notas do Editor
117883, 69461, 102862, 73277, 65134
High Availability – Ensure application availability during many types of failures
Disaster Recovery – Address the RTO and RPO goals for business continuity
Maintenance – Perform upgrades and other maintenance operations with no application downtime
Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
MongoDB supports three types of sharding:
• Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries.
• Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
• Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers.
MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards.
For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.