SlideShare uma empresa Scribd logo
1 de 71
Optimizing Cypher
Queries in Neo4j
Wes Freeman (@wefreema)
Mark Needham (@markhneedham)
Today's schedule
• Brief overview of cypher syntax
• Graph global vs Graph local queries
• Labels and indexes
• Optimization patterns
• Profiling cypher queries
• Applying optimization patterns
Cypher Syntax
• Statement parts
o Optional: Querying part (MATCH|WHERE)
o Optional: Updating part (CREATE|MERGE)
o Optional: Returning part (WITH|RETURN)
• Parts can be chained together
Cypher Syntax - Refresher
MATCH (n:Label)-[r:LINKED]->(m)
WHERE n.prop = "..."
RETURN n, r, m
Starting points
• Graph scan (global; potentially slow)
• Label scan (usually reserved for aggregation
queries; not ideal)
• Label property index lookup (local; good!)
Introducing the football dataset
The 1.9 global scan
O(n)
n = # of nodes
START pl = node(*)
MATCH (pl)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
150ms w/ 30k nodes, 120k rels
The 2.0 global scan
MATCH (pl)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
130ms w/ 30k nodes, 120k rels
O(n)
n = # of nodes
Why is it a global scan?
• Cypher is a pattern matching language
• It doesn't discriminate unless you tell it to
o It must try to start at all nodes to find this pattern, as
specified
Introduce a label
Label your starting points
CREATE (player:Player
{name: "Wayne Rooney"} )
O(k)
k = # of nodes with that labelLabel scan
MATCH (pl:Player)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
80ms w/ 30k nodes, 120k rels (~900 :Player nodes)
Indexes don't come for free
CREATE INDEX ON :Player(name)
OR
CREATE CONSTRAINT ON pl:Player
ASSERT pl.name IS UNIQUE
O(log k)
k = # of nodes with that labelIndex lookup
MATCH (pl:Player)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
6ms w/ 30k nodes, 120k rels (~900 :Player nodes)
Optimization Patterns
• Avoid cartesian products
• Avoid patterns in the WHERE clause
• Start MATCH patterns at the lowest
cardinality and expand outward
• Separate MATCH patterns with minimal
expansion at each stage
Introducing the movie data set
Anti-pattern: Cartesian Products
MATCH (m:Movie), (p:Person)
Subtle Cartesian Products
MATCH (p:Person)-[:KNOWS]->(c)
WHERE p.name="Tom Hanks"
WITH c
MATCH (k:Keyword)
RETURN c, k
Counting Cartesian Products
MATCH (pl:Player),(t:Team),(g:Game)
RETURN COUNT(DISTINCT pl),
COUNT(DISTINCT t),
COUNT(DISTINCT g)
80000 ms w/ ~900 players, ~40 teams, ~1200 games
MATCH (pl:Player)
WITH COUNT(pl) as players
MATCH (t:Team)
WITH COUNT(t) as teams, players
MATCH (g:Game)
RETURN COUNT(g) as games, teams, players8ms w/
~900 players, ~40 teams, ~1200 games
Better Counting
Directions on patterns
MATCH (p:Person)-[:ACTED_IN]-(m)
WHERE p.name = "Tom Hanks"
RETURN m
Parameterize your queries
MATCH (p:Person)-[:ACTED_IN]-(m)
WHERE p.name = {name}
RETURN m
Fast predicates first
Bad:
MATCH (t:Team)-[:played_in]->(g)
WHERE NOT (t)-[:home_team]->(g)
AND g.away_goals > g.home_goals
RETURN t, COUNT(g)
Better:
MATCH (t:Team)-[:played_in]->(g)
WHERE g.away_goals > g.home_goals
AND NOT (t)-[:home_team]->()
RETURN t, COUNT(g)
Fast predicates first
Patterns in WHERE clauses
• Keep them in the MATCH
• The only pattern that needs to be in a
WHERE clause is a NOT
MERGE and CONSTRAINTs
• MERGE is MATCH or CREATE
• MERGE can take advantage of unique
constraints and indexes
MERGE (without index)
MERGE (g:Game
{date:1290257100,
time: 1245,
home_goals: 2,
away_goals: 3,
match_id: 292846,
attendance: 60102})
RETURN g
188 ms w/ ~400 games
Adding an index
CREATE INDEX ON :Game(match_id)
MERGE (with index)
MERGE (g:Game
{date:1290257100,
time: 1245,
home_goals: 2,
away_goals: 3,
match_id: 292846,
attendance: 60102})
RETURN g
6 ms w/ ~400 games
Alternative MERGE approach
MERGE (g:Game { match_id: 292846 })
ON CREATE
SET g.date = 1290257100
SET g.time = 1245
SET g.home_goals = 2
SET g.away_goals = 3
SET g.attendance = 60102
RETURN g
Profiling queries
• Use the PROFILE keyword in front of the
query
o from webadmin or shell - won't work in browser
• Look for db_hits and rows
• Ignore everything else (for now!)
Reviewing the football dataset
Football Optimization
MATCH (game)<-[:contains_match]-(season:Season),
(team)<-[:away_team]-(game),
(stats)-[:in]->(game),
(team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games
Football Optimization
==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", "
INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT
team.name)", "goals"], _rows=10, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"],
limit="Literal(10)", _rows=10, _db_hits=0)
==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-
e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-
8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)
==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", "
UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)
==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0)
==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542)
==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE
true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-[ UNNAMED55:away_team WHERE true AND true]-
(game)", _rows=15542, _db_hits=1620462)
Break out the match statements
MATCH (game)<-[:contains_match]-(season:Season)
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games
Start small
• Smallest cardinality label first
• Smallest intermediate result set first
Exploring cardinalities
MATCH (game)<-[:contains_match]-(season:Season)
RETURN COUNT(DISTINCT game), COUNT(DISTINCT season)
1140 games, 3 seasons
MATCH (team)<-[:away_team]-(game:Game)
RETURN COUNT(DISTINCT team), COUNT(DISTINCT game)
25 teams, 1140 games
Exploring cardinalities
MATCH (stats)-[:in]->(game:Game)
RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game)
31117 stats, 1140 games
MATCH (stats)<-[:played]-(player:Player)
RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player)
31117 stats, 880 players
Look for teams first
MATCH (team)<-[:away_team]-(game:Game)
MATCH (game)<-[:contains_match]-(season)
WHERE season.name = "2012-2013"
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10162 ms w/ ~900 players, ~20 teams, ~400 games
==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", "
INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT
team.name)", "goals"], _rows=10, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"],
limit="Literal(10)", _rows=10, _db_hits=0)
==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-
b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f-
aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)
==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", "
UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)
==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0)
==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0)
==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380)
==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140)
==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140,
_db_hits=1140)
Look for teams first
Filter games a bit earlier
MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10148 ms w/ ~900 players, ~20 teams, ~400 games
Filter out stats with no goals
MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)WHERE stats.goals > 0
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
59 ms w/ ~900 players, ~20 teams, ~400 games
Movie query optimization
MATCH (movie:Movie {title: {title} })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]-
(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]-
>(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]-
>(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]-
>(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]-
>(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH
(movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related,
count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writersORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight:
actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-
[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword,
count(movies) as keyword_weight, movie, related,
actors, genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor
ORDER BY actormoviesweight DESC
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor
ORDER BY actormoviesweight DESC
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-[:ACTED_IN]-
>()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight:
actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]-
>(genre)
WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]-
>(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movie
ORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as
related, movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as
related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors,
writers // 1 row
10x faster
Design for Queryability
Model
Design for Queryability
Query
Design for Queryability
Model
Making the implicit explicit
• When you have implicit relationships in the
graph you can sometimes get better query
performance by modeling the relationship
explicitly
Making the implicit explicit
Refactor property to node
Bad:
MATCH (g:Game)
WHERE
g.date > 1343779200
AND g.date < 1369094400
RETURN g
Good:
MATCH (s:Season)-[:contains]->(g)
WHERE season.name = "2012-2013"
RETURN g
Refactor property to node
Conclusion
• Avoid the global scan
• Add indexes / unique constraints
• Split up MATCH statements
• Measure, measure, measure, tweak, repeat
• Soon Cypher will do a lot of this for you!
Bonus tip
• Use transactions/transactional cypher
endpoint
Q & A
• If you have them send them in

Mais conteúdo relacionado

Mais procurados

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
Neo4j
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j
 

Mais procurados (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
How to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large ClustersHow to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large Clusters
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
Inside PostgreSQL Shared Memory
Inside PostgreSQL Shared MemoryInside PostgreSQL Shared Memory
Inside PostgreSQL Shared Memory
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 

Destaque

Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
Cloudera, Inc.
 

Destaque (20)

Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
OrientDB
OrientDBOrientDB
OrientDB
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypher
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Webinar: Stop Complex Fraud in its Tracks with Neo4j
Webinar: Stop Complex Fraud in its Tracks with Neo4jWebinar: Stop Complex Fraud in its Tracks with Neo4j
Webinar: Stop Complex Fraud in its Tracks with Neo4j
 

Semelhante a Optimizing Cypher Queries in Neo4j

Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
Mark Needham
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 

Semelhante a Optimizing Cypher Queries in Neo4j (20)

How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
 
Delivering Winning Results with Sports Analytics and HPCC Systems
Delivering Winning Results with Sports Analytics and HPCC SystemsDelivering Winning Results with Sports Analytics and HPCC Systems
Delivering Winning Results with Sports Analytics and HPCC Systems
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
Extracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and PysparkExtracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and Pyspark
 
Building Streaming Recommendation Engines on Apache Spark with Rui Vieira
Building Streaming Recommendation Engines on Apache Spark with Rui VieiraBuilding Streaming Recommendation Engines on Apache Spark with Rui Vieira
Building Streaming Recommendation Engines on Apache Spark with Rui Vieira
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
 
Predictions European Championships 2020
Predictions European Championships 2020Predictions European Championships 2020
Predictions European Championships 2020
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Mignesh Birdi Assignment3.pdf
Mignesh Birdi Assignment3.pdfMignesh Birdi Assignment3.pdf
Mignesh Birdi Assignment3.pdf
 
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo apps
 
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
 
R for you
R for youR for you
R for you
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
Simple Strategies for faster knowledge discovery in big data
Simple Strategies for faster knowledge discovery in big dataSimple Strategies for faster knowledge discovery in big data
Simple Strategies for faster knowledge discovery in big data
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12c
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013
 

Mais de Neo4j

Mais de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Optimizing Cypher Queries in Neo4j

  • 1. Optimizing Cypher Queries in Neo4j Wes Freeman (@wefreema) Mark Needham (@markhneedham)
  • 2. Today's schedule • Brief overview of cypher syntax • Graph global vs Graph local queries • Labels and indexes • Optimization patterns • Profiling cypher queries • Applying optimization patterns
  • 3. Cypher Syntax • Statement parts o Optional: Querying part (MATCH|WHERE) o Optional: Updating part (CREATE|MERGE) o Optional: Returning part (WITH|RETURN) • Parts can be chained together
  • 4. Cypher Syntax - Refresher MATCH (n:Label)-[r:LINKED]->(m) WHERE n.prop = "..." RETURN n, r, m
  • 5. Starting points • Graph scan (global; potentially slow) • Label scan (usually reserved for aggregation queries; not ideal) • Label property index lookup (local; good!)
  • 7. The 1.9 global scan O(n) n = # of nodes START pl = node(*) MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 150ms w/ 30k nodes, 120k rels
  • 8. The 2.0 global scan MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 130ms w/ 30k nodes, 120k rels O(n) n = # of nodes
  • 9. Why is it a global scan? • Cypher is a pattern matching language • It doesn't discriminate unless you tell it to o It must try to start at all nodes to find this pattern, as specified
  • 10. Introduce a label Label your starting points CREATE (player:Player {name: "Wayne Rooney"} )
  • 11. O(k) k = # of nodes with that labelLabel scan MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 80ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  • 12. Indexes don't come for free CREATE INDEX ON :Player(name) OR CREATE CONSTRAINT ON pl:Player ASSERT pl.name IS UNIQUE
  • 13. O(log k) k = # of nodes with that labelIndex lookup MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 6ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  • 14. Optimization Patterns • Avoid cartesian products • Avoid patterns in the WHERE clause • Start MATCH patterns at the lowest cardinality and expand outward • Separate MATCH patterns with minimal expansion at each stage
  • 16. Anti-pattern: Cartesian Products MATCH (m:Movie), (p:Person)
  • 17. Subtle Cartesian Products MATCH (p:Person)-[:KNOWS]->(c) WHERE p.name="Tom Hanks" WITH c MATCH (k:Keyword) RETURN c, k
  • 18. Counting Cartesian Products MATCH (pl:Player),(t:Team),(g:Game) RETURN COUNT(DISTINCT pl), COUNT(DISTINCT t), COUNT(DISTINCT g) 80000 ms w/ ~900 players, ~40 teams, ~1200 games
  • 19. MATCH (pl:Player) WITH COUNT(pl) as players MATCH (t:Team) WITH COUNT(t) as teams, players MATCH (g:Game) RETURN COUNT(g) as games, teams, players8ms w/ ~900 players, ~40 teams, ~1200 games Better Counting
  • 20. Directions on patterns MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = "Tom Hanks" RETURN m
  • 21. Parameterize your queries MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = {name} RETURN m
  • 22. Fast predicates first Bad: MATCH (t:Team)-[:played_in]->(g) WHERE NOT (t)-[:home_team]->(g) AND g.away_goals > g.home_goals RETURN t, COUNT(g)
  • 23. Better: MATCH (t:Team)-[:played_in]->(g) WHERE g.away_goals > g.home_goals AND NOT (t)-[:home_team]->() RETURN t, COUNT(g) Fast predicates first
  • 24. Patterns in WHERE clauses • Keep them in the MATCH • The only pattern that needs to be in a WHERE clause is a NOT
  • 25. MERGE and CONSTRAINTs • MERGE is MATCH or CREATE • MERGE can take advantage of unique constraints and indexes
  • 26. MERGE (without index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 188 ms w/ ~400 games
  • 27. Adding an index CREATE INDEX ON :Game(match_id)
  • 28. MERGE (with index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 6 ms w/ ~400 games
  • 29. Alternative MERGE approach MERGE (g:Game { match_id: 292846 }) ON CREATE SET g.date = 1290257100 SET g.time = 1245 SET g.home_goals = 2 SET g.away_goals = 3 SET g.attendance = 60102 RETURN g
  • 30. Profiling queries • Use the PROFILE keyword in front of the query o from webadmin or shell - won't work in browser • Look for db_hits and rows • Ignore everything else (for now!)
  • 32. Football Optimization MATCH (game)<-[:contains_match]-(season:Season), (team)<-[:away_team]-(game), (stats)-[:in]->(game), (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games
  • 33. Football Optimization ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", " INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8- e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2- 8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", " UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542) ==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-[ UNNAMED55:away_team WHERE true AND true]- (game)", _rows=15542, _db_hits=1620462)
  • 34. Break out the match statements MATCH (game)<-[:contains_match]-(season:Season) MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games
  • 35. Start small • Smallest cardinality label first • Smallest intermediate result set first
  • 36. Exploring cardinalities MATCH (game)<-[:contains_match]-(season:Season) RETURN COUNT(DISTINCT game), COUNT(DISTINCT season) 1140 games, 3 seasons MATCH (team)<-[:away_team]-(game:Game) RETURN COUNT(DISTINCT team), COUNT(DISTINCT game) 25 teams, 1140 games
  • 37. Exploring cardinalities MATCH (stats)-[:in]->(game:Game) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game) 31117 stats, 1140 games MATCH (stats)<-[:played]-(player:Player) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player) 31117 stats, 880 players
  • 38. Look for teams first MATCH (team)<-[:away_team]-(game:Game) MATCH (game)<-[:contains_match]-(season) WHERE season.name = "2012-2013" MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10162 ms w/ ~900 players, ~20 teams, ~400 games
  • 39. ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", " INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297- b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f- aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", " UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0) ==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380) ==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140) ==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140, _db_hits=1140) Look for teams first
  • 40. Filter games a bit earlier MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10148 ms w/ ~900 players, ~20 teams, ~400 games
  • 41. Filter out stats with no goals MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game)WHERE stats.goals > 0 MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 59 ms w/ ~900 players, ~20 teams, ~400 games
  • 42. Movie query optimization MATCH (movie:Movie {title: {title} }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 43. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 44. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]- (movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 45. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]- >(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 46. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]- >(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 47. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]- >(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 48. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]- >(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 49. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 50. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 51. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 52. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<- [:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 53. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 54. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 55. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 56. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-[:ACTED_IN]- >()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 57. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 58. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]- >(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 59. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]- >(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 60. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 61. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers // 1 row 10x faster
  • 65. Making the implicit explicit • When you have implicit relationships in the graph you can sometimes get better query performance by modeling the relationship explicitly
  • 67. Refactor property to node Bad: MATCH (g:Game) WHERE g.date > 1343779200 AND g.date < 1369094400 RETURN g
  • 68. Good: MATCH (s:Season)-[:contains]->(g) WHERE season.name = "2012-2013" RETURN g Refactor property to node
  • 69. Conclusion • Avoid the global scan • Add indexes / unique constraints • Split up MATCH statements • Measure, measure, measure, tweak, repeat • Soon Cypher will do a lot of this for you!
  • 70. Bonus tip • Use transactions/transactional cypher endpoint
  • 71. Q & A • If you have them send them in