Power of the Run Graph

Power of the RunGraph
Andrew Morgan, Dan Smith

Power of the Run Graph
Introduction
Exploring the Problem
The Idea
Registration

“Many hands” legacy:
Loosely connected yet critical
data pipelines
are remarkably complex
in enterprises when viewed
as a whole system.
They are hard to manage, operate
and improve as a group.
“Entanglement” a major risk.
Loosely
connected,
use-case
focussed,
data
pipelines

How Complex?
Typically across an Enterprise:
100s of Production OLTP databases
Multiple Orchestration/scheduling tools
10s of ETL tools / instances
Many Kafka/Conﬂuent installations
Multiple Logging/monitoring frameworks
10-100 OLAP reporting solutions
1000s of Reports
1000s of Web pages and/or microservices
Several Clouds and Data Centres
Several Data Warehouses
10+ Data Science sand boxes
Multiple Data Lakes
Loosely
connected,
use-case
focussed,
data
pipelines

Challenges:
Data management is difﬁcult:
● Managing change effectively
● Managing quality of service
● Delivering service oversight
● Attributing clear issue ownership
● Resolving complex failures
● Delivering trust: “Ground Truth”
Loosely
connected,
use-case
focussed,
data
pipelines

A side issue, also surfacing
Top-Down Ent. Data Architecture
(methods/governance) are deeply
unpopular, especially with engineers.
Why? It is ineffective
●
●
●
●
●
Loosely
connected,
use-case
focussed,
data
pipelines

How complex? (a)
Example:
Here’s a “logical” summary of
data ﬂows in one enterprise,
between production systems.
Shows: 100’s of logical
data pipelines, made up of:
- Batch ETL
- Messaging
- Streaming
Complex Pipeline Dependencies

How complex? (b)
Complexity also exists in the
content, not just in the pipes.
Here’s a conceptual model, a
“canonical data model,” for
most of a global ﬁrm:
410 core entities, 14 subjects.
Complex Content Dependencies

How complex? (c)
Even our OLAP reporting
architectures, are now
pipeline oriented and are
“inside out” rather than older
“star schemas”
Fact Pipelines and
sinks
Core dimension
pipelines and sinks
Peripheral Dimensions:
“side inputs,” lookups,
dictionaries, tags
Complex Information Dependencies

Notice the shape of this meta-data?
Notice the amount of existing engineering that must sit behind these views?

The weight of legacy
There is a huge amount of legacy data pipelines, and migrating
them requires retesting everything. Heterogeneous approach.
“Can you stabilize my operation, while moving Net New functionality to the cloud?”
- Many legacy ETL systems
- Many Orchestration / Scheduling instances
- Many datacentres, not just Cloud
- Many monolithic applications, still
- Many legacy ﬂows undocumented, misunderstood
- Many hidden pipelines, in DB stored procedures
- New functionality in Cloud.
Legacy
Pipelines
New
Pipelines
The combined service.

The Idea
data services
RunGraph

Framing the Problem
Let’s examine this

Shared Data Platform
Acquisition
Pipelines
Consumer
Pipelines

Add
Pipeline
Registration
Data to
GRAKN
Data Ops Enabled Pipelines

We add
Pipeline
Events and
metrics to
GRAKN

RunGraph
We build
Pipeline
Intelligence
via
GRAKN Run Log

RunGraph
Registration
(tool agnostic)
Instrumentation
(tool agnostic)
We build
Pipeline
Intelligence
via
GRAKN

RunGraph
● We can register ANY pipeline on our
estate, run using any orchestration tool
or ETL scheduler
● We can retro-ﬁt legacy pipelines into the
run graph, even legacy ETL tools
● We can build up complex enterprise
architecture views, and establish ground
truth, about
● We can determine “normal” pipeline
behaviors, and identify strange
behaviours and raise ﬂags
● We can use GRAKN ML facilties to start
doing predictive analytics on operations
We build
Pipeline
Intelligence
via
GRAKN
Registration
(tool agnostic)
Instrumentation
(tool agnostic)

RunGraph
● Studies all Pipeline Instrumentation
● Tracks Data Flows / Lineage
● Creates Data Quality Expectations
● Does Impact Analysis of Failures (usual
ancestors) and prioritisation
● Identiﬁes Key and Critical Data Assets,
(ie Core Dimensions)
● Tracks Data Lineage vs Data Quality
● Maps complex consumers to sources
bringing commercial line of sight
● Does Change Impact Analysis
We build
Pipeline
Intelligence
via
GRAKN
Registration
(tool agnostic)
Instrumentation
(tool agnostic)

Hybrid Data Ops Console
Once we can instrument across legacy and new cloud
environments, we can construct a combined Ops Console.
Legacy
Pipelines
New
Pipelines
The combined service.
Consumer Service Dashboards
and Operations Console.

RunGraph Model: Registration + Job

RunGraph: Registration + Job
Policy
Feed
Job
Data

RunGraph: Registration + Job
Policy
Source
Feeds
Jobs
Data
We can summarise the core registration
needs here.
Registering these makes them addressable,
actionable, and enriches the pipeline analytics.

RunGraph: Analytics
Even simple use cases, drive out value quickly:
On failure, unplanned change:
- Find descendants - remediation based on impact, contagion
- Find ancestors - apply pressure / corrections upstream
Planned change
- Run analytic queries to show typical connections over 6 months -- reverse
engineer your architectures
- Identify key risks in planned change

RunGraph improves your AI, BI, UI

Try it at home
There are some great open-source projects to check out:

Get in touch
Dr. Daniel A. Smith
Emerging Technology
dan.smith@6point6.co.uk
About 6point6
Integrating digital technology into your business can result in
fundamental changes to how you operate and deliver value to your
customers. To go digital is to reinvent yourself to the core, opening
yourself and your clients to a world of possibilities.
6point6 is a technology consultancy. We bring a wealth of hands-on
experience to help ﬁnancial service providers, media houses and
government achieve more with digital. Using cutting edge technology
and agile delivery methods, we help you reinvent, transform and
secure a brighter digital future.
Visit us on www.6point6.co.uk
Twitter: @6point6ltd
LinkedIn: linkedin.com/company/6point6

Power of the Run Graph

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Power of the Run Graph

Semelhante a Power of the Run Graph (20)

Mais de Vaticle

Mais de Vaticle (20)

Último

Último (20)

Power of the Run Graph