How to use graphs to identify credit card thieves?
1. How to use graphs to identify credit
card thieves
SAS founded in 2013 in Paris | http://linkurio.us | @linkurious
2. WHAT IS A GRAPH?
Father Of
Father Of
Siblings
This is a graph
3. WHAT IS A GRAPH : NODES AND RELATIONSHIPS
Father Of
Father Of
Siblings
A graph is a set of nodes linked by
relationships
This is a node
This is a
relationship
4. People, objects, movies,
restaurants, music
Antennas, servers, phones,
people
Supplier, roads, warehouses,
products
Graphs can be used to model many domains
DIFFERENT DOMAINS WHERE GRAPHS ARE IMPORTANT
Supply chains Social networks Communications
5. But why can graphs can help identify credit card
thieves?
GRAPH AND FRAUD DETECTION
6. Get access to the numbers...and turn them into
cash
HOW CREDIT CARD THIEVES OPERATE
Steal the credit
card
Make online
purchases
Turn goods in
cash
Later on, the criminal uses
the credit card numbers to
make purchases online. He
chooses items that can be
sold back.
The criminal intercepts the
goods at the shipping
address. He sells back the
goods : now he has cash!
The criminal is an employee
in a store. During check-out
he copies the credit card
information of certain
customers.
7. The first step to detect card thieves is to turn
transaction history into a graph
A GRAPH DATA MODEL TO IDENTIFY CARD THIEVES
Paul
(Person)
Nicole
(Person)
(Merchant)
(Merchant)
(Merchant)(Merchant)
(Merchant)
HAS_BOUGHT_AT29$ (05/05/2014)
8$
(05/05/2014)
HAS_BO
UG
HT_AT
19.5$
(05/05/2014)
HAS_BO
UG
HT_AT
8$ (06/05/2014)
HAS_BOUGHT_AT
10.5$(05/05/2014)
HAS_BOUGHT_AT
199$
(08/05/2014)
HAS_BO
UG
HT_AT
78.9$ (08/05/2014)
HAS_BOUGHT_AT
The edges are transactions. In red two fraudulent transactions.
8. WHERE IS THE THIEF?
We are looking for
the common
connection between
the 2 victims….
9. The only place the theft could have happened is
at the coffee shop...
LOOKING AT THE COMMON CONNECTION
Paul
(Person)
Nicole
(Person)
(Merchant)
(Merchant)(Merchant)
(Merchant)
HAS_BOUGHT_AT29$ (05/05/2014)
8$
(05/05/2014)
HAS_BO
UG
HT_AT
19.5$
(05/05/2014)
HAS_BO
UG
HT_AT
8$ (06/05/2014)
HAS_BOUGHT_AT
10.5$(05/05/2014)
HAS_BOUGHT_AT
199$
(08/05/2014)
HAS_BO
UG
HT_AT
78.9$ (08/05/2014)
HAS_BOUGHT_AT
(Merchant)
10. WHAT IF WE NEED TO ANALYSE >100M TRANSACTIONS?
Doing it in real life involves querying a large
number of transactions to find connections
11. THE PAINS OF WORKING ON CONNECTED DATA WITH RELATIONAL TECHNOLOGIES
Relational databases are not good at handling...
relationships
Depth RDBMS execution time (s) Neo4j execution time (s) Records returned
2 0.016 0.01 ~2500
3 30.267 0.168 ~110 000
4 1543.505 1.359 ~600 000
5 Unfinished 2.132 ~800 000
Finding extended friends in a 1M people social network (from the book Graph Databases)
12. GRAPH DATABASES MAKE IT POSSIBLE TO QUERY LARGE GRAPHS
Graph databases makes it possible to identify
the fraud patterns in real-time
An event triggers
security checks
Customer complaint
Suspicious transaction
Merchant alert
A Neo4j Cypher query
runs to detect patterns
Identification of the
fraudsters
13. EXAMPLE : A GRAPH QUERY TO IDENTIFY CREDIT CARD THIEVES
MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
MATCH victim-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN DISTINCT othermerchants.name as suspicious_store, count(DISTINCT
t) as count, collect(DISTINCT victim.name) as victims
ORDER BY count DESC
14. EXAMPLE : A GRAPH QUERY TO IDENTIFY CREDIT CARD THIEVES
MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
We select the victims, people involved in “disputed” transactions
MATCH victim-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
We look at the transactions that happened before the fraudulent
transactions
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN DISTINCT othermerchants.name as suspicious_store, count(DISTINCT
t) as count, collect(DISTINCT victim.name) as victims
ORDER BY count DESC
We return the list of suspicious merchants, ordered by the number of
transactions they are involved in
Complete explanation and dataset here!
15. The fraud teams acts faster
and more fraud cases can be
avoided.
WHAT IS THE IMPACT OF LINKURIOUS
If something suspicious comes up, the analysts
can use Linkurious to quickly assess the
situation
Linkurious allows the fraud
teams to go deep in the data
and build cases against fraud
rings.
Treat false
positives
Investigate
serious cases
Save money
Linkurious allows you to
control the alerts and make
sure your customers are not
treated like criminals.
20. Article on credit card thieves identification
- the article : http://linkurio.us/stolen-credit-cards-and-fraud-detection-with-neo4j/
- the dataset : https://www.dropbox.com/s/4uij4gs2iyva5bd/credit%20card%20fraud.zip
GraphGist on credit card fraud :
- the article : http://gist.neo4j.org/?3ad4cb2e3187ab21416b
SOME ADDITIONAL RESOURCES TO CONSIDER