How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time

How Graphs Continue to
Revolutionize The Prevention
of Financial Crime & Fraud
in Real-Time

© 2019 TigerGraph. All Rights Reserved
Agenda
2
Introduction1
4
3
2 What is a Graph Database
Why Graph DB & analytics for financial crimes
Customer use cases and case studies
5
Going real-time: need for speed in
investigating financial crimes
6 Querying graphs - bridging theory & practice
7 Key requirements of graph query languages
8
New Features - advanced pattern matching with
accumulators for financial crimes detection

Moderator & Client Speaker
3
George Anadiotis
Connected Data London Organizer
Linked Data Orchestration Founder
● Into graphs since 2005
● Wearer of many hats: consultant, analyst,
researcher, journalist, event organizer, curator
● ZDNet, Linked Data Orchestration, Connected
Data London
Rebecca Lee
Chief Data Officer, OpenCorporates
● OpenCorporates is the largest open database of companies
whose primary goal is to make information on companies more
usable and widely available for the public benefit, particularly to
tackle their use for criminal or anti-social purposes.
● Prior to this, Rebecca led PwC's Investigative Analytics team:
conducting forensic investigations into alleged criminality and
wrongdoing, and helping clients to build their fraud and risk
capabilities.

TigerGraph Panelists
4
Dr. Victor Lee
Director of Product Management
● BS in Electrical Engineering and Computer Science
from UC Berkeley, MS in Electrical Engineering from
Stanford University
● PhD in Computer Science from Kent State University
focused on graph data mining
● 15+ years in tech industry
Professor Alin Deutsch
Chief Scientist
● University of California at San Diego (UCSD) faculty since 2002
with research focused around data management challenges
for large-scale database-powered applications
● PhD in Computer Science from the University of Pennsylvania,
specialization in graph query language design and
optimization
● Several awards including the highly prestigious PODS Alberto
O. Mendelzon Test-of-Time Award at SIGMOD 2018

We Use Graphs Every Day
5
Social Media
Graphs are used to
analyze relationships for
social media
Website Search
PageRank is a graph algorithm
used by Google Search to rank
web pages in their search
engine results
Product
Recommendations
Graphs are used to
understand the behavior
and preferences of online
buyers
GRAPHS = RELATIONSHIPS
Knowledge GraphSocial Graph Customer 360 Graph

Relational Database Key-Value Database Graph Database
Customer
XXXXXX
Product
XXXXXXXX
Supplier
XXXXXXXX
Location
XXXXXXXX
Order
XXXXXXXXX
Product
Customer
Supplier
Location
KEY VALUE
XXXXX
Order
Customer
Produ
ct
Supplier
• Rigid schema
• High performance for transactions
• Poor performance for deep analytics
• Highly fluid schema/no schema
• High performance for simple transactions
• Poor performance for deep analytics
• Flexible schema
• High performance for complex transactions
• High performance for deep analytics
Location 1 = Delivery Location
Location 2 = Warehouse
Location 2
Product
Payment
PURCHASED
RESIDES
SHIPSTO
PURCHASED
SHIPS FROM
A
C
C
EPTED
MAKES
The Evolution of Databases
XXXXX
XXXXX
XXXXX Location 1
N
O
TIFIES
Complex, slow table joins required Multiple scans of massive table required Pre-connected business entities - no joins needed

© 2019 TigerGraph. All Rights Reserved 7
TigerGraph - Corporate Background
Founded by Dr. Yu Xu in 2012 in
Redwood City, California
Mission: Unleash the power of
interconnected data for deeper
insights and better outcomes
Industry’s first and only Native
Massively Parallel Processing (MPP)
Graph Technology
The only scalable graph database for
the enterprise used by organizations
including Visa, Intuit, Zillow and
Wish.com

TigerGraph Customers
TigerGraph customers are finding deeper insights for competitive
advantage. Our technology is powering the top e-payment, credit
card, mobile eCommerce, fintech, large pharma, healthcare and
power-grid companies as well as government organizations.
8

7 Key Data Science Capabilities Powered By a Native Parallel Graph
Deep Link Analysis Similarity and Frequent
Pattern Detection
For a set of entities (e.g. customers,
accounts, credit cards, banks),
show all links or connections
Given a graph (e.g. transactions),
finding similarly situated entities.
Find frequent patterns.
DC?
6
Multi-dimensional Entity &
Pattern Matching
Given a pattern (e.g. social and
financial interactions), find matching
occurrences within a greater graph
Hub & Community Detection
Find strongly-connected
communities and/or influential
hubs (accounts, etc.) within a graph
or sub-community
Community 1
Community 2
1 2 3 4
5 Geospatial Graph Analysis Analyze changes in entities & relationships with location data
A
B
A
B
Machine Learning Feature
Generation & Explainable AI
Extract graph-based features to feed as training data for
machine learning; Power Explainable AI7
Temporal (Time-Series) Graph Analysis Analyze changes in entities & relationships over time

Using Subgraph or Relationship Discovery Combined with Graph Computation to
find Diamonds of Money Laundering
10
Financial institutions collaborate to build transactional + knowledge graphs to identify money
laundering rings and layering.
Layering: Split "dirty money" into smaller amounts,
transfer it from account to account, eventually mergeMoney Laundering Ring: Money is transferred in a circle.

In our global corporate
world, clear, trusted,
open, legal entity data is
an essential requirement
for good business… and
a fair society
12
Today it’s diﬃcult.
Tomorrow it will be even harder
The Challenge

170+ million companies
220+ million oﬃcerships
‘Company registry for the
world’. The largest open
database of companies in the
world
is shining a light on the
corporate world
Our solution
— “White-box” legal-entities, automated from
oﬃcial public sources including provenance
and line of sight to source
— Free via web, at scale through API or bulk data

● Company Data
● Subsidiary relationships:
○ SEC (10k ﬁlings)
○ NIC
● Control relationships:
○ UK PSC register
● Shareholders:
○ New Zealand
○ Denmark
○ Alaska
○ Hong Kong
Building a cross jurisdictional graph of corporate structures
OpenCorporates
MySQL RELATIONSHIP
Company Company
Directed edge attributes include:
● Type
○ Shareholder of
○ Share issuer to
○ Has subsidiary
○ Is subsidiary
○ Controls
○ In controlled by
● Confidence
● Percentage / No Shares
● Earliest / Latest Date*
● Provenance
Node attributes include:
● Primary ID
● Company Name
● Company Number
● Jurisdiction
● Inactive flag
● Provenance

Graph challenges
Uncover hidden connections within data in OC’s ever-expanding datasets with
real-time response rates
● Degrees of separation
● Up the chain
● Ultimate parent (and down)
● Siblings
● Temporal graph search
● Traverse using specific relationship properties - Active vs Dead
relationships
● Expand to include different node types (company officers, addresses)

Time-Based Graph Search to Find Potential Fraud, Money Laundering or
Corruption - Find Subsidiaries That Pop Up For a Brief Time & Vanish
17
More details at http://tinyurl.com/y36skysr (World’s largest Open Database Migrates to TigerGraph) & http://tinyurl.com/y63ycxwr (Fireflies and algorithms - the coming explosion of companies –
OpenCorporates blog)

Alert
Reducing False Positives in AML with Deep Link Analytics
18
Download the solution brief at https://info.tigergraph.com/aml-executive-brief
High or
Low Risk?
New Alert
SAR 1
SAR 2
Traditional Limit of
real-time analysis:
2 Hops
Historical records:
offline analysis

Alert
Reducing False Positives in AML with Deep Link Analytics
19
Download the solution brief at https://info.tigergraph.com/aml-executive-brief
High or
Low Risk?
New Alert
SAR 1
SAR 2
Closed
Closed
Alert
Deep-Link Analytics reveals
important distinctions

Alipay Detects Fraud and Anti-Money Laundering (AML)
Violations in Real-time
20
Business Challenge
Over 520 million users & peak volume of 256,000 payments per second.
Detecting fraud & AML violations is like finding needles in an ever
increasing giant haystack.
Solution
• Attach each payment transaction in real-time to an operational
graph with device, location, credit card, account & customer
information
• Recompute and assess fraud and money laundering risk flagging
suspicious transactions for investigation
• Support fraud and AML investigation with visualization & up to 11 hop
queries in real-time
Business Benefits
Scale up for over 2 billion transactions per day for fraud & AML detection
and increase productivity to investigate and resolve fraud and AML alerts.
Visit the solution page - https://www.tigergraph.com/solutions/anti-money-laundering-aml/

The Age of the Graph Is Upon Us (Again)
• Early-mid-90s: semi- or unstructured data research was all the rage
• data logically viewed as graph, initially motivated by modeling WWW (page=vertex, link=edge)
• query languages expressing constrained reachability in graph
• Late 90s-late 2000s: special case XML (graph restricted to tree shape)
• Mature: W3C standard ecosystem for modeling and querying (XQuery, XPath, XLink, XSLT, XML Schema,
… )
• Since mid 2000s: JSON and friends (also graphs restricted to tree shape)
• Mongodb, Couchbase, SparQL, GraphQL, AsterixDB, …
• ~2010 to present: back to unrestricted graphs
• Initially motivated by analytic tasks in social networks
• Now universal use (most interesting data is linked, after all)
23

The Graph Data Model
• Nodes model real-world entities
• Edges are binary, they model relationships
• may be directed or undirected (asymmetric, resp. symmetric relationships)
• Nodes and edges may carry labels
• Nodes and edges annotated with data
• both have sets of attributes (key-value pairs)
24

Example Graph
Vertex types:
• Product (name, category, price)
• Customer (ssn, name, address)
Edge types:
• Bought (discount, quantity)
• Customer c bought 100 units of product p at discount 5%:
modeled by edge
c -- (Bought {discount=5%, quantity=100})--> p
25

Key Language Ingredients from the Past
• Pioneered by academic work on relational query extensions for graphs
(since ‘87)
• Path expressions (PEs) for navigation
• Variables for manipulating data found during navigation
• Stitching multiple PEs into complex navigation patterns: conjunctive path queries
• Constructors for new nodes and edges
26

The Present: Graph Query Language
Requirements

Current Representative Graph QLs
in Order of Appearance
• SparQL
• mature, W3C standard recommendation, but not aimed at analytics of arbitrary graphs: RDF,
ontologies, semantic web
• Cypher (neo4j)
• essentially 1990’s StruQL with bells and whistles, inherits CRPQ syntactic style
• Gremlin (Apache project and commercial products)
• dataflow programming model: graph annotated with tokens (“traversers”) that flow through it
according to user program
• GSQL (TigerGraph)
• Inspired by SQL, with support for massively parallel graph analytics
28

Key Language Ingredients Needed in Modern Applications
• All primitives inherited from past (path expressions, conjunctive patterns, variables,
node/edge construction) SparQL, Cypher, Gremlin, GSQL
• Support for large-scale graph analytics
• Customizable path traversal semantics Gremlin, GSQL
• Aggregation of data encountered during traversal
SparQL (partial), Cypher, Gremlin, GSQL
• Control flow for class of iterative algorithms that converge in multiple steps
• (e.g. PageRank-class, recommender systems, shortest paths, etc.) Gremlin, GSQL
• Intermediate results assigned to nodes/edges support parallel computation (programming
mindset + execution) GSQL
29

Aggregation in Current Graph QLs
• Cypher’s RETURN clause uses similar syntax as
aggregation-extended CRPQs
• Gremlin and SparQL use an SQL-style GROUP BY clause
• GSQL uses aggregating containers called “accumulators”
• soon to add above modes as syntactic sugar, but will preserve
accumulators who remain strictly more versatile
31

GSQL Accumulators
• GSQL traversals collect and aggregate data by writing it into accumulators
• Accumulators are containers (data types) that
• hold a data value
• accept inputs
• aggregate inputs into the data value using a binary operation
• May be built-in (sum, max, min, etc.) or user-defined
• May be
• global (a single container)
• vertex-attached (one container per vertex)
32

Vertex-Attached Accumulator Example: Revenue per
Customer and per Product
• Maximize opportunities for parallel evaluation
SumAccum<float> @cSales, @pSales;
SELECT c
FROM Customer :c -(-Bought-> :b)- Product :p
ACCUM float thisSaleRevenue = b.quantity*(1-b.discount)*p.price,
c.@cSales += thisSaleRevenue,
p.@pSales += thisSaleRevenue;
vertex-attached accums: one instance per node
groups are distributed, each node accumulates its
own group. Can be parallelized!
this sale’s revenue contributes to two
aggregations, each by distinct grouping criteria
33

Recommended Toys Ranked by
Log-Cosine Similarity
SumAccum<float> @rank, @lc;
SumAccum<int> @inCommon;
I = {Customer.1};
SELECT p INTO ToysILike, o INTO OthersWhoLikeThem
FROM I:c -(-Likes->)- Product:p -(<-Likes-)- Customer:o
WHERE p.category == “Toys” and o != c
ACCUM o.@inCommon += 1
POST-ACCUM o.@lc = log (1 + o.@inCommon);
SELECT t INTO ToysTheyLike
FROM OthersWhoLikeThem:o –(-Likes->)- Product:t
WHERE t.category == "toy"
ACCUM t.@rank += o.@lc;
RecommendedToys = ToysTheyLike – ToysILike;
34

Essential: Control-Flow, Particularly Loops
• Loops (until condition is satisfied)
• Necessary to program iterative algorithms, e.g. PageRank, recommender
systems, shortest-path, etc.
• They synergize with accumulators. This GSQL-unique combination
concisely expresses sophisticated graph analytics
• Can be used to program unbounded-length path traversal under various
semantics
35

PageRank in GSQL
CREATE QUERY pageRank (float maxChange, int maxIteration, float dampingFactor) {
MaxAccum<float> @@maxDifference = 9999; // max score change in an iteration
SumAccum<float> @received_score = 0; // sum of scores received from neighbors
SumAccum<float> @score = 1; // initial score for every vertex is 1.
AllV = {Page.*}; // start with all vertices of type Page
WHILE @@maxDifference > maxChange LIMIT maxIteration DO
@@maxDifference = 0;
S= SELECT s
FROM AllV:s -(Linkto)-> :t
ACCUM t.@received_score += s.@score/s.outdegree()
POST-ACCUM s.@score = 1-dampingFactor + dampingFactor * s.@received_score,
s.@received_score = 0,
@@maxDifference += abs(s.@score - s.@score');
END;
}
36

Additional Resources
● Download TigerGraph’s Developer or Enterprise Free Trial
● Get started with TigerGraph’s Developer Portal
● Connect with fellow GSQL users in the developer forum
● Advance your graph knowledge with the eBook - “Native Parallel Graphs”
37
@TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time

Semelhante a How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time (20)

Mais de Connected Data World

Mais de Connected Data World (20)

Último

Último (20)

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time