Decoding Patterns: Customer Churn Prediction Data Analysis Project
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money Laundering Rings
1.
Hardware Accelerated
Machine Learning Solution
for Detecting Fraud and
Money Laundering Rings
1
Sept 30, 2020
Victor Lee, TigerGraph
Kumar Deepak, Xilinx
2.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Our Presenters
2
Victor Lee
Head of Product Strategy &
Developer Relations
● BS in Electrical Engineering and
Computer Science from UC Berkeley,
MS in Electrical Engineering from
Stanford University
● PhD in Computer Science from Kent
State University focused on graph data
mining
● 20+ years in tech industry
Kumar Deepak
Distinguished Engineer
● B.S in Electronics and Communication
Engineering from Indian Institute of
Technology, Kharagpur.
● Leads Xilinx engineering efforts to
accelerate database and analytics
● 20+ years of experience in architecting
and developing large-scale complex
software and hardware systems
3.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
● How Graph Analytics provide better and faster insights
● How FPGAs amplify the speed and value of analytics
● Use Case: Fraud Detection and Money Laundering
- Finding Connected Communities for fraud detection
● How FPGAs work
● Louvain Modularity run on FPGA
● Benchmark
Agenda
3
4.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Graph-Powered Analytics & Machine Learning
Richer Data
● Relationships are 1st Class Citizens
● Connects different datasets and silos
Deeper Questions
● Look for semantic patterns of relationship
● Search far and wide more easily
Additional Computational Options
● Graph algorithms
● Graph-enhanced machine learning
Explainable Results
● Semantic data model, queries, and answers
● Visual exploration and results
4
5.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
The TigerGraph Difference
Feature Design Difference Benefit
Real-Time Deep-Link Querying ● Native Graph design
● C++ engine, for high performance
● Storage Architecture
● Uncovers hard-to-find patterns
● Operational, real-time
● HTAP: Transactions+Analytics
Handling Massive Scale ● Distributed DB architecture
● Massively parallel processing
● Compressed storage reduces
footprint and messaging
● Integrates all your data
● Automatic partitioning
● Elastic scaling of resource usage
In-Database Analytics ● GSQL: High-level yet
Turing-complete language
● User-extensible graph algorithm
library, runs in-DB
● ACID (OLTP) and Accumulators
(OLAP)
● Avoids transferring data
● Richer graph context
● In-DB machine learning
5 to 10+ hops deep
5
6.
TigerGraph Platform: Deploy Anywhere
Graph Storage Engine (GSE) Graph Processing Engine (GPE)
Parallel Query
Processing
Data
Snapshots
GSQL
Queries
Visual
Design UI
RESTful
APIs
Input
Data
Operational Data
Master Data
DBs
Spark
Kafka
Files
Business
Intelligence
Analytics
Visualization
Dashboards
Reports
Data Warehouses
Master Data
Stores
Machine Learning
ETL Data Loader
User queries,
graph algorithms
GSQL
Server
Graph-
Studio
Server
Graph Data
Storage
ID ServiceIndexing
Message Queuing
(Spark / Kafka
Zookeeper)
RESTPP
7.
TigerGraph Platform: Deploy Anywhere
Graph Storage Engine (GSE) Graph Processing Engine (GPE)
Parallel Query
Processing
Data
Snapshots
GSQL
Queries
Visual
Design UI
RESTful
APIs
Input
Data
Operational Data
Master Data
DBs
Spark
Kafka
Files
Business
Intelligence
Analytics
Visualization
Dashboards
Reports
Data Warehouses
Master Data
Stores
Machine Learning
ETL Data Loader
User queries,
graph algorithms
GSQL
Server
Graph-
Studio
Server
Graph Data
Storage
ID ServiceIndexing
Message Queuing
(Spark / Kafka
Zookeeper)
RESTPP
C++ UDF
on Alveo
8.
TigerGraph + XILINX = faster, deeper, and wider insights.
Vertical
Markets
TigerGraph
Use Cases
XILINX Acceleration Customer Benefits
Healthcare Member Journey/
Customer 360
“Show similar members”
via Cosine Similarity
400X faster on Alveo U50
$150M/year call
center savings
Financial
Services
Anti-fraud/Anti-
Money Laundering
“Show fraud ring activity”
via Louvain Community Detection
~ 20X faster on Alveo U50 (WIP)
$500M credit card
fraud prevention
Manufacturing Supply Chain
Optimization
“Balance portfolio forecast”
Soon…
£400M supply chain
savings
10.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
● Sophisticated fraud is multi-step, multi-actor,
orchestrated
● Graph Algorithms & ML both provide valuable
detection and investigative capabilities
Fraud Detection with Graph-enhanced ML
10
11.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Shortest Path
• Is this entity closely connected
to known suspicious/risky
entities?
Graph Algorithms for Fraud Detection
11
Community Detection
• Narrow the focus of the
investigation
• How many high risk entities
are in the community?
Cycle Detection
• Is there a closed loop of related
entities where there
shouldn’t be (conflicts of
interest, etc.)?
• Is there a closed loop is
money flow (money
laundering)?
Other valuable algorithms: PageRank, Cosine Similarity, etc…
12.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
● Suppose we partition a graph into communities:
● Modularity score measures how good is a particular graph partition:
Mod ~ (% of edges that are in-group) minus
(expected % of in-group edges, if edges were randomized in a certain way)
● Task: Find the partitioning that has the highest modularity
● Challenge: Exponential number of possible partitionings
● Solution: Louvain is one of the fastest methods for modularity-based partitioning
Louvain Modularity Method for Community Detection
12
first try ⇒ Mod(case 1) better ⇒ Mod(case 2)