Graph databases may be the unsung heroes of data platforms. They are poised to expand dramatically in the next few years as the nature of important analytics data expands dramatically into understanding. We live and work today in a highly connected world where individuals and their relationships brand perceptions, consumer behaviors, and many other business success factors. Where patterns are involved in relationships, it is imperative to understand them. Graph databases are the technology that is best-suited to determining and understanding data relationships.
This code-lite session will help you determine why, how, and where to apply graphs in your enterprise.
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Advanced Analytics: Graph Database Use Cases
1. Graph Database Use
Cases
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
An Inc. 5000 Company in 2018 and 2017
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
2. 2023 Advanced Analytics Topics
1. 2023 Trends in Enterprise Analytics
2. Showing ROI for your Analytic Project
3. Architecture, Products and Total Cost of Ownership of the Leading
Machine Learning Stacks
4. Competitive Analytic Architectures: Comparing the Data Mesh, Data
Fabric, Data Lakehouse and Data Cloud
5. Why Analytics Leaders deploy Master Data Management
6. What Does Information Management Maturity Look Like in 2023
7. Understanding the Modern Applications of Graph Databases
8. Common Misconceptions About Master Data Management
9. Organizational Change Management: Will it Hold Back Artificial
Intelligence Deployments?
10. Open-Source vs Commercial Vendor Software in the Enterprise
11. Data Quality: The ROI of Adding Intelligence to Data
12. Strategies for Machine Learning Success
2
3. Relational DBs Can’t Handle Data
Relationships Well
• Cannot model or store data and
relationships without complexity
• Performance degrades with number
and levels of relationships, and
database size
• Query complexity grows with need for
JOINs
• Adding new types of data and
relationships requires schema redesign,
increasing time to market
3
Slow development
Poor performance
Low scalability
Hard to maintain
… making traditional databases inappropriate
when data relationships are valuable in real-time
4. Discrete Data
Minimally
connected data
Graph Databases are designed for data relationships
Use the Right Database for the Right Job
Other NoSQL Relational DBMS Graph DB
Connected Data
Focused on
Data Relationships
Development Benefits
Model maintenance
Deployment Benefits
Performance
Minimal resource usage
5. What Can Be Vertices?
• Things
– Bank accounts
– Customer accounts
• Mobile phones
– Products
– Trading networks, auctions
– Water, power, gas grids
– Disease, drugs, molecules
• Interactions, transmission
– Insurance policies
– Machines, servers, URLs
– Sensor networks
5
• People
– Customers, families
– Employees
– Affinity groups, clubs
• Politics, causes, doctors
• Professionals (LinkedIn)
– Companies, institutions
• Places
– Map locations
• Cities, landmarks
– Retail stores
– Houses or buildings
– Communication networks
– Transportation hubs
• Airports, shipping lanes, etc.
6. What Can be Edges?
• People
– Relationships
– Ideas, preferences
– Email, phone calls, SMS, IM
– Collaborations
• Places
– Roads, routes, railways
– Water, power, gas,
pipelines, telephone lines
– Anything with GPS
coordinates
• Things
– Events
– Money Transactions
– Purchases
– Pressure
– Diseases
– Contraband
– URLs
– Phone calls
– Citations
– Weights, scores
– Timestamps
6
7. Actions
Model actions depending on what you want
as vertices
(Bill)-[:SENT]->(email)-[:TO]->(Jim)
OR
(Bill)-[:EMAILED]->(Jim)
7
9. Semantic/RDF/Knowledge Graphs
• A triple is a data entity composed of subject-predicate-
object
– "Bob is 35”
– "Bob knows Fred”
– “William likes running”
• In the image:
– Subject: John R Peterson Predicate: Knows Object: Frank T Smith
– Subject: Triple #1 Predicate: Confidence Percent Object: 70
– Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
9
12. PageRank
12
Page A
1.0
Page C
1.0
Page B
1.0
Page D
1.0
1*0.85/2
1*0.85/2
1*0.85
1*0.85
1*0.85
Sum of inputs + 0.15
http://www.whitelines.nl/html/google-page-rank.html see spreadsheet
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
13. +0.150
page D +0.850
page B +0.850
page A +0.425
C Total 2.275
PageRank: After 1st Results
Page A
1.0
Page C
2.275
Page B
0.575
Page D
0.15
+0.150
page A +0.425
B Total 0.575
+0.15
Page C +0.85
A Total 1.00
+0.150
D Total 0.150
1*0.85/2
1*0.85/2
1*0.85
1*0.85
1*0.85
http://www.whitelines.nl/html/google-page-rank.html (see spreadsheet)
13
15. PageRank: 20 Iterations Until Convergence
Page A
1.49
Page C
1.58
Page B
0.78
Page D
0.15
Most important
web page
Page C
increases page A
importance
15
16. Betweenness
• Find bridges across different communities
• High score = edge links different
communities
Bridge
vertex
Bridge
vertex
16
18. Eigen Centrality
• Measures the importance of a vertex by
the importance of its neighbors
important
important
important
must be
important
18
19. Clustering Coefficient: Cascading Churn
19
If two people churn,
what is the likelihood
others will?
The two churners affect
the central influencer
Finally: All contacts churn.
An Individual-focused model underestimates
churn by 6X.
SELECT *
FROM LocalClusteringCoefficient(
ON Calls as edges
PARTITION BY caller_from
ON caller_from as vertices
PARTITION BY caller_id
targetKey(caller_to')
directed('f')
degreeRange('[3:]')
accumulate('personId')
);
20. Great Questions for Graph Databases
• In what order did a specific set of related events
happen?
• Are there patterns of events in our data that seem
to be related by time?
• How far apart in a (social or physical) network are
two “actors” and how strong is their relationship?
• What are the identifiable social groups and what are
the general patterns of such groups?
• How important is any given “actor” in any given
network and event?
• What type of messages emanate from a specific
area?
20
21. How to Identify a Graph Workload
• Workload is identified by “network,
hierarchy, tree, ancestry, structure” words
• You are planning to use relational
performance tricks
• Your queries will be about pathing
• You are limiting queries by their complexity
• You are looking for “non-obvious” patterns
in the data
21
22. Excessive
relationships
Healthcare Fraud
• Monitor drugs and
treatments
– Excessive prescribers
– Excessive consumers
• Patients connected to
– Doctors, pharmacies,
medications
• Use Graph Access
– Find outliers and investigate
22
23. Online Shopping
• Bring fast context to a shopping experience
• Need to recall past similar interactions
• Need probabilistic models
– Product catalog
– Shopper attributes
23
24. Major Insurer
• Insight into risk environment
• Risks such as
– People appearing in multiple policies and
claims
– Premium leakage i.e., Underestimated mileage,
undeclared drivers, false garaging
– Padded claims
• Policyholder graph with risk indicators
– Risk indicators spread in graph
• Worker’s Compensation Fraud
24
25. Television, Magazine and Media
• Analyze content and consumption for
personalization
• Most users don’t “log in”
• Identified anonymous users through unique
cookies
– Cookies unstable, used third-party to enrich;
needed to vet
• Determine valuable (connected) providers,
audience segments
• Enabled evaluation of the accuracy of vendor
data
– And cut the cost of using unreliable data
25
26. Cybersecurity
• Can categorize new websites and sources
• Continuous updated knowledge of
classifications, risk scores and identification
of new cyber threats
26
27. Automotive
• Identify which robotic parts were about to
fail so they could replace the failing parts all
at once
• Able to reconcile data to the same piece of
the production line machinery
• Able to identify when a part is about to fail
so they can pre-plan and avoid unnecessary
breaks in the production assembly line
28
28. Pharmaceutical/Research
• Need to connect data from disparate parts of
the company to increase research and
operational efficiency, increase output, and
accelerate drug research
– Allow analysts to quickly and easily access the full
body of institutional knowledge
• Graph allowed bioinformaticians to more
easily identify useful signals within large sets of
noisy data and to answer highly-specific
questions
• Link targets, genes, and disease data across
different parts of the company
30
29. Financial Services
• Anti-Money Laundering
– Identify connections
– Display the connections
surrounding a specific
point
– Identify which
connections and
situations of interest lead
to productive
investigations
and inform work
31
Company
Trading
Partner
Customer
Creditor
30. Conclusion
• Graph is a Fast Growing data category
• It’s all about the Use Case; Good for Graph:
– Real-time recommendations
– Fraud detection
– Network and IT operations
– Identity and access management
– Graph-based search
– Identifying relative importance
• Reimagine your data as a graph
– The whiteboard model is the physical model
• Remember Page Rank
33
31. Graph Database Use
Cases
Presented by: William McKnight
“#1 Global Influencer in Data Warehousing” OnAlytica
President, McKnight Consulting Group
An Inc. 5000 Company in 2018 and 2017
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET