11. Connectedness and Size of Data Set
ResponseTime
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
1000x
Advantage
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Neo4j
“Minutes to
milliseconds”
14. • B2B SaaS:
Greatly simplified management of DB infrastructure for your customers.
• Multi-tenancy:
A single instance of Neo4j Server/Cluster may serve multiple customers/users within an
organization.
• Rapid Testing/Development/Deployment:
Manage separate databases for development, testing, staging, etc. in a single infrastructure.
• Scalability:
Disjoint data is organized in physically separate structures, strong isolation.
• Cloud-Friendly:
Databases can be associated to cloud storage and easily detached from a server and attached
to another server.
Multi-Database: Use Cases
18. • Scale-out model
• Two ways of using:
• Operate over single large, decomposed graph
• Query across disjoint graphs, per business domain
Data Scientists
Run analysis on large, distributed databases.
Developers
Develop large scale applications on
laptops/desktops and deploy
in a network of Neo4j clusters.
Enterprises
Keep data in designated geographies
Analyse graphs without replicating or
moving them.
Fabric: Distributed Graph Query
19. Cypher Queries
SQL
Cypher in Neo4j
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
RETURN boss.name AS Boss,
sub.name AS Subordinate,
count(report) AS Total
20. Multi-graph Cypher Queries
SQL
UNWIND corporate.graphIds() AS gid
CALL {
USE corporate.graph( gid )
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
RETURN boss.name AS Boss,
sub.name AS Subordinate,
count(report) AS Total
}
RETURN Boss, Subordinate, Total ORDER BY Total
Cypher in Neo4j 4.0
• Executes queries in parallel on multiple databases, combining or aggregating results.
• Chains queries together from multiple databases for sophisticated real-time analyses.
21. The foundation:
Causal Cluster
How will this help a Telco to scale?
The evolution:
Fabric
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Scaling R/W access
22. The foundation:
Causal Cluster
How will this help a Telco to scale?
The evolution:
Fabric
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Scaling R/W access
28. BobJoe
• Based on Role-based Access Control for
graphs
• Restrictions on what data can be seen by
different users, applied to all database
interactions
• Implicit security view of the data for each
user through schema-based security
definitions
• Grant/Deny permissions to traverse, read or
write data based on node labels, relationship
types or database and property names
• Security rules are replicated across the
cluster via roles that are associated with the
users
Security and Data Privacy
Baseline_Personnel
_Security_Standard
Security_Check Counter_Terrorism
_Check
Developed_Vetting
30. • Call Centre Agent:
-> needs Doctor’s name
-> not allowed to read diagnosis
• Doctor:
-> ability to view patient records and
-> ability to view patient diagnoses
Constraints
31. // Doctors get wide-ranging access
GRANT ACCESS ON DATABASE healthcare TO doctor;
GRANT TRAVERSE {*} ON GRAPH healthcare TO doctor;
GRANT READ {*} ON GRAPH healthcare TO doctor;
GRANT WRITE ON GRAPH healthcare TO doctor;
Security Config
// Agents get narrower access
GRANT ACCESS ON DATABASE healthcare TO agent;
GRANT TRAVERSE {*} ON GRAPH healthcare TO agent;
GRANT READ {Name} ON GRAPH healthcare NODES Doctor TO agent;
GRANT READ {Name} ON GRAPH healthcare NODES Patient TO agent;
32. Call Centre Agent
MATCH (:CallcenterAgent {name: 'Alice'})
<-[:CALLED]-(p:Patient)-[:HAS_DIAGNOSIS]-(dia)
<-[:ESTABLISHED]-(d:Doctor)
RETURN p.name, d.name, dia.name;
34. • Flow control throughout the stack, allowing for
the client application to fully control the
production and flow of records within a result
• Synchronous/Asynchronous execution
• Based on reactive streams with non-blocking
backpressure library
• Client applications can pull or discard the whole
result or N elements
• Can also be gracefully cancelled
• Exposed through a reactive API in Drivers v4.0
• Use Cases:
• Long queries with large result sets
• Paged results
• Thin/small clients
Reactive Architecture
35. Graph Recipes & Analytics Graph Enhanced ML & AI
Graph Data Science
Science-driven approach to gain knowledge from the
relationships and structures in data, typically to power predictions.
Uses multi-disciplinary workflows that may include
queries, statistics, algorithms and machine learning.
`
Answers specific questions to gain insights from
connections in existing/historical data
Approaches typically include global queries and
algorithms and direct use of results
Training models (ML) with graph structured data
to be used to emulate human, probabilistic
decisions within a solution/ application (AI
system)
36. Optimized for Analytics
Leverage custom data structures
optimized for global traversals and
aggregation
Flexibly decompose and reshape
your graph for specific use cases
Algorithms for Insights
Robust algorithms that are highly
parallelized and scale to billions of
nodes
Early access to dozens of
experimental implementations
Intuitive Interface
Drastically simplified and
standardized API that enables
custom configurations
Documentation, training, and
examples so getting started is simple
Product Supported & Under Active Development
The Graph Data Science Library
37. Graph Data Science
Analytics projections:
- Specialized data structure for algorithms,
capable of supporting billions of nodes
- Cypher loaders for experimentation
- Quickly reshape, combine, aggregate, and
deduplicate your transactional data
- Support for multiple node labels,
relationship types, and properties
- Manage multiple in-memory analytics
graphs for different workloads
- Memory footprint allowing large scale use
Graph algorithms & more:
- 40+ algorithms in 5 categories: community,
centrality, similarity, pathfinding, and link
prediction
- Helper algorithms like graph generation, one
hot encoding, and random walk
- Early previews to new implementations in the
alpha & beta name spaces
- Supported, scalable algorithms include seeding,
determinism, and incremental calculations
- Estimate mode for memory requirements
38. Graph Data Science Algorithms
Generally Unsupervised
38
A subset of data science algorithms that come from network science,
Graph Algorithms enable reasoning about network structure.
Pathfinding
and Search
Centrality
(Importance)
Community
Detection
Heuristic
Link Prediction
Similarity
39. • Neo4j provides
• Scalability for Telco’s
• Carrier grade high availability with Causal Cluster
• Security features to fulfill privacy requirements
• Graph Analytics to provide Data Science infrastructure for Telcos
Conclusions