Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch. Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData.
1. Organize & manage master
meta data centrally, built
upon kong, cassandra, neo4j
& elasticsearch.
2. Hello!
I am Akhil Agrawal
Managing master & meta data is
a very common problem with
no good opensource alternative
as far as I know, so initiating this
project – MasterMetaData
Started BIZense in 2008 &
Digikrit in 2015
4. Less Frequently Changing
Master data and meta data both have one common
behavior of less frequent changes although their
purpose is different.
The less frequently changing data whether it is data
about real world entities (master data) or data
about other data (meta data), both can be stored,
accessed and managed in very similar ways.
Why MasterMetaData ?
5. No Open Source Option
There are MDM solutions (mostly from ERP
vendors like SAP, Oracle etc. & analytics
companies like Informatica, SAS) but the
master meta data intersection is being
explored only recently.
There is no open source alternatives for smaller
companies or something that can be
embedded with SAAS products.
Why MasterMetaData ?
7. Definition of Data Categories
Meta Data
meta information
about other forms of
data (can describe
master, transaction
or lower level meta
data)
Master Data
real world entities
like customer,
partner etc. (only the
stable attributes are
considered part of
master data)
Transaction Data
real world
interactions which
have very short
lifespan and
occurrence is linked
with time/space
(unstable/changing
attribute values,
although
definition/description
is stable but each new
data point is unique)
Master Meta Data
combination of master and meta data
defined at application, enterprise or global
level (although the volume and variety
of master & meta data is very different, they
have lot of common access patterns)
10. Background
◎ Faced difficulty with managing master
and meta data in previous projects
◎ Implemented custom solution while
building mobile ad platform
◎ Currently implementing same features
required for the communication platform
◎ Have worked with elasticsearch + kibana
while kong + cassandra seems useful
11. Build With Following Technologies
neo4j
highly scalable native graph
database that leverages data
relationships as first-class entities,
handles evolving data challenges
elasticsearch
search and analyze data in real
time, defacto standard for making
data accessible through search
and aggregations
cassandra
right choice when you need linear
scalability and high availability
without compromising
performance & durability
kong
the open-source management
layer for APIs and microservices,
delivering security, high
performance and reliability
lua
lua is a powerful, fast, lightweight,
embeddable scripting language.
For writing kong plugins for access
to various meta master data
kibana
explore and visualize data in
elasticsearch, opensource project
from elasticsearch team, intuitive
interface, visualization & dashboards
13. Challenges
Complex & hierarchical
data sets
Real-time query
performance
Dynamic structure
Evolving relationships
Why neo4j for mastermetadata ?
Why neo4j ?
Native graph store
Flexible schema
Performance and
scalability
High availability
Referenced from
http://neo4j.com/use-cases/master-data-management
14. Why elasticsearch for mastermetadata ?
Scale
◎ Real-Time Data
◎ Massively
Distributed
◎ High Availability
◎ Multitenancy
◎ Per-Operation
Persistence
Search
◎ Full-Text Search
◎ Document-
Oriented
◎ Schema-Free
◎ Developer-
Friendly, RESTful
API
◎ Build on top of
Apache Lucene™
Analytics
◎ Real-Time Advanced
Analytics
◎ Very flexible Query
DSL
◎ Flexible analytics &
visualization
platform - Kibana
◎ Real-time summary
and charting of
streaming data
Referenced from https://www.elastic.co/products/elasticsearch
15. Why kong for mastermetadata ?
Secure, Manage &
Extend your APIs and
Microservices
RESTful Interface
Plugin Oriented
Platform Agnostic
Referenced from
https://getkong.org/
Without Kong With Kong
17. Master & Metadata Management Interesection
Maximized Metadata
Model
◎data model describing the metadata
needs to be “maximized” to cover as
many use cases possible
◎meta data model needs to be inclusive
of all metadata in the organization as
well as cover the master data
◎governance of metadata model
requires the ability to describe
maximum metadata in the system to
provide ability to govern data
describing other data
Minimalistic Master
Data Model
◎master data model describing master
data needs to be “minimalist”
◎master data model is neither inclusive
of all data in the organization, nor
specific to applications using it for
specific purpose
◎central governance of master data
requires that data model backing it is
minimalistic to be able to govern
without application specific details
◎master data model is basically
metadata describing the master data
Referenced from http://blogs.gartner.com/andrew_white/2011/04/26/more-
on-metadata-and-master-data-management-intersection/
18. From Big Data To Smart Data
Zero Latency Organization
data
◎latency linked to the data
(capturing)
◎latency linked to analytical
processes (processing)
structural
◎latency linked to decision
making processes
◎time needed to implement
actions linked with decisions
action
◎data latency added with
structural latency
◎time needed from capturing of
data till the action takes place
value
data is considered smart based on
the value it brings in decision
making and action taking (than
anything else like size, source, etc)
master
data which represents real world
entities and also remains stable
over time is the smart data as it
helps with common data reference
meta
data which describes other data
whether master, transactional or
lower level meta data is also smart
data as it helps in understanding
Types Of Latency
Smart Data
21. Areas where you can get involved ?
DEMO
Functional Tests,
Integration Tests,
Run Demo
CODE
Implement Ideas,
Fix Bugs,
Enhance Features
DOCUMENT
User
Documentation,
Developer
Documentation
22. Current Focus
Devices
Storage: Device,
Browser, OS
Access: User
Agent
Locations
Storage: Country,
State, City
Access: IP Address
Tours
Storage: People,
Interest, Culture,
Destination, City,
Activity, Duration
Access: What, Where,
For
23. Storage & Access
Master Data Storage
Storage which is highly efficient
for read but at the same time
efficient for writes. Additional
requirement to be able to search
the stored data as well as flexible
efficient query interface to
enable faster access
Meta Data Storage
Storage which is highly flexible
in defining relationships like
inheritance, composition or
other relationships. Graph
modeled relationships are most
flexible to change as and when
the model evolves
Diagram featured by poweredtemplate.com
Meta Data Access
CRUD, Fill in the blanks,
Semantic Query, Search
Master Data Access
CRUD, Query (Structured /
Unstructured) & Search
25. Thanks!
Any questions?
You can find me at:
@digikrit
akhil@digikrit.com
Special thanks to all the people who made and released these awesome
resources for free:
Presentation template by SlidesCarnival
Presentation models by SlideModel & PoweredTemplate
To companies behind kong, cassandra, neo4j & elasticsearch