2. ...with software.
Co-founder of the Gephi project - 2008
Co-founder of the Linkurious startup - 2013
PhD in computer science, UPMC LIP6 - 2013
A few words about me
I democratise graph thinking
(with pink titles)
makes graphs handy
3. Open source project started in 2008
Built to solve large graph visualization problems
Latest version downloaded ~ 400,000 times
http://gephi.org
A few words about me / Gephi
makes graphs handy
5. A few words about me / Linkurious
Started by a collaboration with Stanford - Mapping the
Republic of Letters and DensityDesign in 2012.
Now French startup of 3 people.
Linkurious helps companies make sense of data with user-
friendly visualization software.
We help business analysts, R&D teams, developers and
scientists.
8. 0. Why?
1. Key takeaways
a. The 5 questions
b. User stories
c. Design visualization + interaction
2. Fraud detection use case
3. Q&A
How to create and use graph visualization successfully?
Agenda
PRACTICE
PRACTICE
10. What is a graph?
This is a graph.
Father Of
Father Of
Siblings
11. What is a graph? / Nodes & relationships
A graph is a set of nodes
linked by relationships.
Father Of
Father Of
Siblings
This is a node
This is a
relationship
12. People, objects, movies,
restaurants, music...
Antennas, servers, phones,
people...
Supplier, roads, warehouses,
products...
Graphs can be used
to model many domains.
Supply chains Social networks Communications
Differents domains where graphs are important
14. “The greatest
value of a picture
is when it forces
us to notice what
we never
expected to see.”
Why?
John Tukey
(1962)
15. How to create and use graph visualization successfully?
1. Key takeaways
to kick-start your
projects.
a. Ask 5 questions.
b. Write user stories.
c. Design visualization and
interaction.
16. Ask 5 questions / Q1: Data, tadaa?
You need data.
sourcing - cleaning - update
17. sensemaking - scale - complexity
Ask 5 questions / Q1: Data, tadaa?
Can you model data as graphs?
image: Martin Grandjean
18. Hypothesis discovery,
evidence finding
Impact analysis, reportingData modelling, database
administration
Set up your goal.
Administrate Understand Monitor
Ask 5 questions / Q2: Why using graph visualization in your project?
images: XKCD & the web
19. Ask 5 questions / Q3: Who will use it?
Define personas.
data scientist business analyst
developer public audience
images: PhdComics & Despicable Me
20. Short-term memory
max 7 items otherwise the ability to make
decisions drops
Vision
more than 10 000 nodes is generally useless
Ask 5 questions / Q4: What are the constraints?
Acknowledge human limits.
21. 50 nodes – 1B nodes Graph size
Machine performances
Server side VS client side rendering
Interactive VS print
Ask 5 questions / Q4: What are the constraints?
Acknowledge technical limits.
22. individual use VS collaborative work
artwork VS integrated into an application
Ask 5 questions / Q5: How is it used?
Define scope.
23. 1. What are the data?
2. What is your goal?
3. Who is your end-user?
4. What are the constraints?
5. How is it used?
Ask 5 questions / Summary
The 5 questions
24. Ask 5 questions / Your turn!
Answer the 5
questions of
your project.
PRACTICE
25. How to create and use graph visualization successfully?
1. Key takeaways
to kick-start your
projects.
a. Ask 5 questions.
b. Write user stories.
c. Design visualization and
interaction.
26. I define a data model.
I generate a significant graph sample.
I create a business query with Cypher.
I visualize the query result.
I iterate on the data model until it is satisfying.
Write user story / The developer story
“I am creating a Neo4j graph
database for my application.”
27. Write user story / Your turn!
Write your
own user story.
PRACTICE
28. How to create and use graph visualization successfully?
1. Key takeaways
to kick-start your
projects.
a. Ask 5 questions.
b. Write user stories.
c. Design visualization and
interaction.
31. (a) Nodes are ordered as rows and columns; connections are indicated as filled cells.
(b) A matrix representation of a typical biological pathway. in (Gehlenborg 2012)
Design visualization / Common graph representations
Matrices
32. (a) A directed graph typical of a biological pathway. (b) An undirected graph with
nodes arranged in a circle. (c) A spring-embedded layout of data from b. in
(Gehlenborg 2012)
Design visualization / Common graph representations
Node-link diagrams
36. Design visualization and interaction / Graph Viz 101
Learn more at
http://linkurio.us/graph-viz-101
37. How to create and use graph visualization successfully?
1. Key takeaways
to kick-start your
projects.
a. Ask 5 questions.
b. Write user stories.
c. Design visualization and
interaction.
39. Use case / The cost of fraud
$28.6B
AITE Group estimates that first party
fraud will cost $28.6 billion in credit
card losses a year by 2016.
http://news.alaric.com/industry-news/fraud/a-new-approach-to-first-party-fraud-reducing-bad-debt/
http://bankinganalyticsblog.fico.com/2013/02/first-party-fraud-it-was-me.html
40. A criminal uses the fake
identity to register a bank
account. He acts like a
normal customer and tries to
secure a loan.
Once the criminal feels he
cannot get access to more
money he carefully prepares
his exit : in a short amount of
time he empties all of his
accounts and disappears.
A criminal or a group of
criminal mix pieces of
information (addresses,
phone numbers, social
security number) to create a
“synthetic-identity”.
A look at a common fraud
scenario banks face.
Create a fake
identity
Go to the bank,
ask for a loan
Disappear with
the money
Use case / A common fraud scenario
41. Use case / How do we set up a graph-based fraud detection system?
Let’s ask our 5 questions.
1. What are the data?
2. What is your goal?
3. Who is your end-user?
4. What are the constraints?
5. How is it used?
42. Use case / Q1: What are the data?
We model customer data as a graph.
Loan
$25k
Home address
58, Eisenhower Square
Customer name
J. Smith
Phone number
+33 5 68 98 25 74
Credit card
1 234$
ID
J. Smith
A graph showing a legitimate customer and the information she is linked to.
43. Use case / Q1: What are the data?
In a fraud ring people share the same
information.
58, Eisenhower Square
14, Roses Street
+33 6 75 89 22 14
$7k
P. Martin
$12,5k
+331 42 58 66 00
J. Smith
SSN 17873897893
31195855
$20k
E. Selmati
SSN 1787576553
$45k
P. Smith
SSN 1787579953
SSN 1267576553
31184274
44. Use case / Q2: What is your goal?
We want to detect
fake customer
identities.
45. She is a fraud expert but has limited
data and computer skills.
She works with a team of data
analysts for a large bank.
When an alert is triggered, she
checks if the customer account
belongs to a potential fraud ring.
Use case / Q3: Who is your end-user?
Our user is a fraud analyst.
image: PhdComics
46. Thousands of new loans per month.
Time: a few days
Investigate before transferring more money.
Interaction
Detect fraud rings by exploring the graph
gradually.
Use case / Q4: What are the constraints?
We have a large graph on a
single database.
47. Use case / Q5: How it is used?
The visualization is embedded
in a business process.
Lifecycle events trigger
security checks
A new customer opens
an account
An existing customer asks
for a loan
A customer skips a loan
payment
A Neo4j Cypher query
runs to detect patterns
An analyst visualizes the
connections to make an
informed decision.
48. Use case / The user story
The fraud teams acts faster
and more fraud cases can be
avoided.
If something suspicious comes up, the analysts can
use Linkurious to quickly assess the situation.
Linkurious allows the fraud
teams to go deep in the data
and build cases against fraud
rings.
Treat false
positives
Investigate
serious cases
Save money
Linkurious allows you to
control the alerts and make
sure your customers are not
treated like criminals.
49. Max 200 nodes visualized
Relationships information is important
Multiple node categories (address, phone, ..)
-> node-link diagram
-> icons or node colors by category
Interactivity : yes
Display node and rels information on demand
Expand node connections on demand
Use case / Visualization and interaction design
Design
54. Detailed use case on our blog :
● Part 1 : http://linkurio.us/how-to-detect-bank-loan-fraud-with-graphs-part-1/
● Part 2 : http://linkurio.us/how-to-detect-bank-loan-fraud-with-graphs-part-2/
● Neo4j data set : https://www.dropbox.com/s/wk8k5r23syp6kbx/fraud%20detection.zip
GraphGist by Kenny Bastani : http://gist.neo4j.org/?github-neo4j-contrib%2Fgists%2F%2Fother%
2FBankFraudDetection.adoc
Video demonstration : https://vimeo.com/76891393 (around the 12 minutes mark)
Graph Visualization 101: http://linkurio.us/graph-viz-101/
Resources
Resources
55. Research papers
Visual Analysis of Complex Networks for Business Intelligence with Gephi. Sébastien Heymann and Bénédicte Le
Grand. to appear in the Proceedings of the 1st International Symposium on Visualisation and Business Intelligence, in
conjunction with the 17th International Conference Information Visualisation (IV 2013 - VBI).
Gephi: an open source software for exploring and manipulating networks. Mathieu Bastian, Sébastien Heymann and
Mathieu Jacomy. in Proceedings of the Third International AAAI Conference on Weblogs and Social Media
(ICWSM'09), in American Journal of Sociology (2009), pp.361-362
Points of View: Bar Charts and Box Plots. M Streit and N Gehlenborg. Nature Methods 11(2):117 (2014).
Book chapters
Exploratory Network Analysis: Visualization and Interaction. Sébastien Heymann and Bénédicte Le Grand. to appear in
Hocine Cherifi (editor), Complex Networks and their Applications, Cambridge University Press.
Gephi. Sébastien Heymann. to appear in the Encyclopedia of Social Networks and Mining (ESNAM), Springer.
Books
Exploratory data analysis. Tukey, J. W. (1977).
References
References