Managing Large Scale Financial Time-Series Data with Graphs

© Copyright - 2016 Objectivity, Inc.
N O V E M B E R 2 0 1 6
Managing Large Scale Financial Time-Series Data
with Graph

Overview
Financial Data
Challenges
Distributed
Graph
Platform
Demonstration
Use Case
Live Demo

• Volume, Velocity and Variety
• Current systems produce billions of transactions and events per day
• Combined streaming, operational and historical data
• Analytic challenge
• Statistical analysis is limited
• The need to discover complex relationships and patterns
• Deeper insight from the relationship value
• Time based query and graph analysis
• Reusability for multiple uses cases
The Challenge

• Risk Management
• Money Laundering
• Insider Threat
• Fraud Detection
• Communication Graph
• Operational
• Smart Trading Optimization
• Portfolio/Customer Management
• Regulatory Compliance Systems
• System/Process Optimization
Graph Use Cases

Performance and Scale
In small graphs, insights can be
lost due to limited RAM or
machine size
Big graphs (trillions of nodes and edges)
scale UP and scale OUT to reveal subtle
insights and hidden relationships in ALL
data
*Trillions of nodes and edges

ThingSpan Technology

Graph Analytics
ThingSpan Platform
Data Analytics
Objectivity Open source Partner
Spark
Streaming
Kafka, Storm
Workflow
Design GUI
H D F S / P O S I X
Analytics
MLlib
R E S T
S E R V E R
J A V A , C + + , C #
A P I
BI
Visualization
DO
Declarative Query
Language
Y A R N / M E S O S
SPARK ThingSpan
Distributed Graph

S p a r k C l u s t e r
H D F S
Spark + ThingSpan = Parallelism
W o r k e r
N o d e
D a t a f r a m e
D r i v e r A p p l i c a t i o n
W o r k e r
N o d e
D a t a f r a m e
W o r k e r
N o d e
D a t a f r a m e
W o r k e r
N o d e
D a t a f r a m e
W o r k e r
N o d e
D a t a f r a m e
T H I N G S P A N D I S T R I B U T E D G R A P H

• Inbound event streaming using
Kafka
• Event is formed into vertices
and edges
• Vertices and edges are inserted
into the pipeline and processed
using Samza
• Inserts/upserts:
• Consistent
• Idempotent
Distributed Ingest

• Data scientists and analysts use the
same language
• DO queries run in parallel
• Spark DataFrames allow data to be
processed with SparkSQL
Distributed Query

• Familiar to data scientists
• Adopted best-of-breed techniques from SQL and Cypher
• Extends SQL-like query with graph navigation capabilities
• Value based queries and complex graph queries
• Query data without having to write or compile code
• Support for Weighted graph query
• Weights are assigned at query time regardless of the model
• Support for Path and Trails
• A path is a walk with distinct vertices
• A trail is a walk with distinct edges
DO – The Query Language

Demonstration Overview

• A financial institution needs to process massive amount of
events per day
• Current system produces at least one billion transaction events
with a target of five billion in the near future
• Events represent both business and operational information
• Statistical analysis is possible, but certain graph (navigational)
queries are hard to do
• Time based query and analysis
Use Case

Financial Transaction Event
<TransactionProcessed>
<start_timestamp>2016-03-11 00:54:58.301</start_timestamp>
<start_epoch_ms>1457657698301</start_epoch_ms>
<end_timestamp>2016-03-11 00:54:58.343</end_timestamp>
<end_epoch_ms>1457657698343</end_epoch_ms>
<service_type>storm</service_type>
<service_instance_id>hadoop02.oktaylabs.com_6703_16
</service_instance_id>
<task_type>ParseFIXBolt</task_type>
<transaction_type>8</transaction_type>
<transaction_id>ALG_20160311_5</transaction_id>
<transaction_timestamp>2016-03-11 00:54:57.637066
</transaction_timestamp>
<transaction_epoch_ms>1457657697637</transaction_epoch_ms>
<parent_transaction_id></parent_transaction_id>
<security_id>USB</security_id>
<mutual_account_id>ACCT0001</mutual_account_id>
<firm_id>client2</firm_id>
<sender_id>acct1</sender_id>
<basket_id></basket_id>
</TransactionProcessed>

• Business Entities
• Account – The entity that is requesting the transaction
• Firm - The firm involved in the transaction
• Sender – Firm entity on-behalf of an account
• Basket - Bundle or batch of transactions related together
• Transaction - The Buy (order), Fill, Cancel or Cancel and Replace order
• System Entities
• Task - The operational task that process the transaction
• Service - The operational service that owns one or more Tasks
• Transaction event - Time based event information for financial transaction processing
Data Model

• Financial transaction events ingested in real time
• Concurrent graph queries during ingest
• 1 billion financial transaction events in ~12 hours (~23k per second)
• Each transaction event produces a sub-graph
• Graph size – 1.38 billion vertices and 5.25 billion edges
• Cluster: EC2 - 16 Instances of m4.4xlarge
The Results

Vertices Ingest 1 Billion
Rate per process Overall rate for all processes

Edges Ingest 1 Billion
Rate per process Overall rate for all processes

Queries: Processing a Basket
For a client’s basket, show all
system tasks used to process
the basket including
processing time.
Match p=(:Basket{m_Id=="ALG12"})-->(:Transaction)-[:m_Children*1..5]->(:Transaction)-->(:TransactionEvent)-->(:Task) return p
Basket
TransactionEvent
Task

Queries: Comparing Accounts
Match p=shortest((:Account{m_Id=='client2.ACCT0005' OR m_Id=='client3.ACCT0003'})-[:m_Baskets]->(:Basket{m_Id=~~'ALG.*'})
-[:m_Transactions]->(:Transaction{m_Type=='D'})-[:m_Children]->(:Transaction{m_Type=='8'})-[:m_Security]->(:Security{m_Id=='CBS'})) return p
Compare two Accounts for
their algorithmic baskets
that produce a fill order for
‘CBS’
Basket
Account
Fill Order
Security (CBS)

Basket Comparison with Tableau
Transactions, Tasks, etc. per
basket, viewed collectively

Live Demo

• Scale and performance
• High speed concurrent ingest and queries during mixed workloads
• Scalable massive and complex graph
• Enable real time pattern/anomaly detection and discovery
• Sub-graph similarity (capture the behavior, not just the statistics)
• Data governance and lineage
• Open source integration
• Fast navigation / path finding
• Visualization and BI tool integration
• DO query language – Data scientists and analysts use the same language
Why ?

For more information:
www.objectivity.com

Managing Large Scale Financial Time-Series Data with Graphs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Managing Large Scale Financial Time-Series Data with Graphs

Semelhante a Managing Large Scale Financial Time-Series Data with Graphs (20)

Último

Último (20)

Managing Large Scale Financial Time-Series Data with Graphs

Notas do Editor