BudgetMatch is a subgraph matching algorithm for large networks that uses a dynamic cost model. It assigns an initial cost estimate and processes nodes using the current cost as a budget. If the budget is exceeded, processing is aborted and the cost estimate updated. Experiments on a network with 1.12 billion edges showed BudgetMatch outperformed the DOGMA algorithm on a set of benchmark queries, with query times improving as the cost update parameter λ increased. Initializing costs using average degree statistics also improved performance.
2. likes
type BudgetMatch
friend friend Star Sci-Fi
Bob Mark Wars IV
friend
The attended likes
friend friend Titanic
likes
Godfather
likes attended Halloween Pizza
John 2008 Peter
likes attended Feast organized likes type
friend attended organized
organized attended
Francis Jennifer
Peter‘s Drama
attended friend
likes Bday party attended
organized attended Ashley likes
attended
Pulp Sylvester attended type
Home- friend
Fiction 2009 organized
coming attended
09
type attended Fundraiser
organized Gone
for School
Bob with the
Thriller Jessie attended
Chill-out wind
Night Alice
friend attended likes Goodbye
type likes
Mrs.
attended organized Doubtfire organized
Spring
Inception type
Melissa friend Break
Trip Alice
attended
likes likes
Comedy likes
friend organized likes
Harry likes
Emily The Lion
Potter
type King
likes Jon likes
type
Mystery Toy type type
likes Story Family
6. BudgetMatch
Prior Work
Systems (Storage, Index, Query answering)
- Jena, Sesame, RDF-3X, YARS, DOGMA,
COSI, Hexastore, column stores, etc
- AllegroGraph, Neo4J, OWLIM, etc
Query Optimization
- Stocker (WWW’08) and others
- similar to RDBMS with schema discovery
• Selectivity estimation, query plan search
and join ordering
6
7. likes
type BudgetMatch
friend friend Star Sci-Fi
Bob Mark Wars IV
friend
The attended likes
friend friend Titanic
likes
Godfather
likes attended Halloween Pizza
John 2008 Peter
likes attended Feast organized likes type
friend attended organized
organized attended
Francis Jennifer
Peter‘s Drama
attended friend
likes Bday party attended
organized attended Ashley likes
attended
Pulp Sylvester attended type
Home- friend
Fiction 2009 organized
coming attended
09
type attended Fundraiser
organized Gone
for School
Bob with the
Thriller Jessie attended
Chill-out wind
Night Alice
friend attended likes Goodbye
type likes
Mrs.
attended organized Doubtfire organized
Spring
Inception type
Melissa friend Break
Trip Alice
attended
likes likes
Comedy likes
friend organized likes
Harry likes
Emily The Lion
Potter
type King
likes Jon likes
type
Mystery Toy type type
likes Story Family
10. BudgetMatch
Subgraph Matching
On networks with power-law degree
distributions, subgraph matching
algorithms will visit high degree nodes
when using static cost models
- Statistics won’t help us avoid those
- Existing subgraph matching cost models
are static
10
11. likes
type BudgetMatch
friend friend Star Sci-Fi
Bob Mark Wars IV
friend
The attended likes
friend friend Titanic
likes
Godfather
likes attended Halloween Pizza
John 2008 Peter
attended Feast type
?
likes organized likes
friend attended organized
organized attended
Francis Jennifer
Peter‘s Drama
attended friend
likes Bday party attended
organized attended Ashley likes
attended
Pulp Sylvester attended type
Home- friend
Fiction 2009 organized
coming attended
09
type attended Fundraiser
organized Gone
for School
Bob with the
Thriller Jessie attended
Chill-out wind
Night Alice
friend attended likes Goodbye
type likes
Mrs.
attended organized Doubtfire organized
Spring
Inception type
Melissa friend Break
Trip Alice
attended
likes likes
Comedy likes
friend organized likes
Harry likes
Emily The Lion
Potter
type King
likes Jon likes
type
Mystery Toy type type
likes Story Family
12. BudgetMatch
BudgetMatch
IDEA: Use a dynamic cost model which
updates its cost estimates as it learns
more about the network
- Assigns an initial cost estimate
• Fixed or based on average statistics
- Processes nodes using its current cost
estimate as a budget for processing
- If budget is exceeded, processing is
aborted and the cost estimate updated
13. BudgetMatch
BudgetMatch
Depth first search query answering
algorithm
- Memory efficient
- Parallelizable
Based on the DOGMA query answering
algorithm
- ISWC’09
Provably correct
13
14. BudgetMatch
Example Query
attended
?p Francis
friend
organi zed attended
Peter ?u ?f
likes
likes
type
?b Drama
15. likes
type BudgetMatch
friend friend Star Sci-Fi
Bob Mark Wars IV
friend
The attended likes
friend friend Titanic
likes
Godfather
likes attended Halloween Pizza
John 2008 Peter
likes attended Feast organized likes type
friend attended organized
organized attended
Francis Jennifer
Peter‘s Drama
attended friend
likes Bday party attended
organized attended Ashley likes
attended
Pulp Sylvester attended type
Home- friend
Fiction 2009 organized
coming attended
09
type attended Fundraiser
organized Gone
for School
Bob with the
Thriller Jessie attended
Chill-out wind
Night Alice
friend attended likes Goodbye
type likes
Mrs.
attended organized Doubtfire organized
Spring
Inception type
Melissa friend Break
Trip Alice
attended
likes likes
Comedy likes
friend organized likes
Harry likes
Emily The Lion
Potter
type King
likes Jon likes
type
Mystery Toy type type
likes Story Family
16. BudgetMatch
BudgetMatch Example I
attended
c= 5 c=5
R = {} ?p Francis R = {francis}
friend
organi zed attended
c=5
Peter ?u R = {} ?f
likes c=5
c=5 R = {}
R = {peter} likes
type
?b Drama
c=5 c=5
ANS = R = {} R = {drama}
{}
θ = {}
16
17. BudgetMatch
BudgetMatch Example II
c= 5,
R = {}, c=5
R’= {Peter’s bday party, Homecoming ?p Francis R = {francis}
09, Silvester 2009} R’= {}
organi zed attended
c=5
Peter ?u R = {} ?f
likes c=5
c=5 R = {}
R = {peter} likes R’ = {Mark, John}
type
?b Drama
c=5 c=5
ANS = R = {} R = {drama}
{}
θ = {}
18. BudgetMatch
BudgetMatch Example III
c= 5,
R = {Peter’s bday party, c=5
Silvester 2009} ?p Francis R’= {}
attended
c=5
Peter ?u R = {} ?f
likes c=5
c=5 R = {Mark, John}
R = {peter} likes
type
?b Drama
c=5 c = 25
ANS = R = {} R = {drama}
{} R’ = {drama}
θ = {}
19. BudgetMatch
BudgetMatch Example IV
c= 5,
R = {Peter’s bday party, c=5
Silvester 2009} ?p Francis R’= {}
attended
c=5
Peter ?u R = {} ?f
c=5
c=5 R = {}
R = {peter} likes
type
?b Drama
c = 25
R = {Titanic, Star Wars IV} c = 25
ANS = R’ = {Titanic, Star Wars IV} R = {drama}
{}
θ = {?f/Mark}
20. BudgetMatch
BudgetMatch Example V
c= 5, c=5
R = {} ?p Francis R’= {}
c=5
Peter ?u R = {Francis, ?f
Jennifer, Ashley} c=5
c=5 R’= {} R = {}
R = {peter}
type
?b Drama
c = 25
R = {Titanic, Star Wars IV} c = 25
ANS = R’ = {Titanic} R = {drama}
{}
θ = {?f/Mark, ?p/Peter’s bday party, ?u/Jennifer}
21. BudgetMatch
BudgetMatch Example VI
c= 5, c=5
R = {} ?p Francis R’= {}
c=5
Peter ?u R = {} ?f
c=5
c=5 R = {}
R = {peter}
c = 25
R = {}} ?b Drama = 25
c
ANS = {θ} R = {}
θ = {?f/Mark, ?p/Peter’s bday party, ?u/
Jennifer, ?b/Titanic}
22. BudgetMatch
Cost Initialization & Update
Initialize cost
- Constant initial cost
- Using average degree statistics
Cost estimate update
- Multiply by a constant
22
24. BudgetMatch
Experiments
Evaluated on a network with 1.12 billion
edges
- Delicious social network crawl (partial)
Used Neo4J as storage engine
- Custom batch loading, degree lookup
Compared against DOGMA algorithm
Evaluated on a set of 9 diverse benchmark
queries (5-12 edges)
24
29. BudgetMatch
Comparison
Compared configuration 4 against
- Neo4J subgraph matching (SN-1)
- DOGMA without statistics (SN-2)
- DOGMA with statistics (SN-3)
SN-1 SN-2 SN-3
Cold Cache 12,867 x 12 x 11 x
Warm Cache 44,794 x 18 x 14 x
29
30. BudgetMatch
DOGMA Index
3
1 Graph Locality
2 4
3 3
1 1
2 4 4
2
3 3 3 3
1 1 1 1
4 4 2 4 2 4
2 2
Alice sponsor Bill
Term Tax Term Jeff Term A0467
Nimbe hasRole B004 10/02/94 Healt A1589
10/02/94 forOffice Code 11/06/90
Ryser
subject r Carla 5 h
Has Role hasRole Male amendmentTo sponsor
hasRole Bunes Has Role Care
IL B074 Term Senate subject Bill
John gender gender Pierce
4 10/12/94 A0056 NY Keith B053
McRie Dickes
For Office Term Farmer US 2sponsor
amendmentTo Has Role
sponsor 10/21/94
For Office sponsor
Senate Female Senate Peter
Term Traves Term
A0772 A2187 A0342 MD B1432 11/10/90
A1232 10/12/94
Disk Pages
31. BudgetMatch
COSI Architecture
Graph Data Client B ?X
?Z C
A ?Y
load Receive query -
Return results
Partition Graph
Distribute data/
(automatic) Dispatch query Query answer
Exchange Data /
Answer Queries
(complexity hidden)
Forward query
32. BudgetMatch
COSI Partitioning
Key Theorem
Suppose vertex retrieval and inter-node comms
are uniform across storage nodes. The partition
of DB that minimizes query exec time coincides
with the partition that minimizes edge cut cost
in the graph (V,VV) with weight function
w(u,v)= (E(u,v))+ (E(v,u)).
SO MIN EDGE-CUTS IN COMPLETE GRAPHS IS
CLOSELY RELATED TO MINIMIZING QUERY
EXECUTION TIME.
32
33. BudgetMatch
Further Information
COSI: Cloud Oriented Subgraph Identification
in Massive Social Networks
Matthias Bröcheler, Andrea Pugliese and V.S. Subrahmanian, The
2010 International Conference on Advances in Social Networks
Analysis and Mining
- Patent Pending -
DOGMA: A Disk-Oriented Graph Matching
Algorithm
Matthias Broecheler, Andrea Pugliese, V.S. Subrahmanian,
Proceedings of the 8th International Semantic Web Conference
- Patent Pending -
33
35. BudgetMatch
Conclusion
Dynamic cost models are beneficial for
networks with heavy-tailed distributions
Developed BudgetMatch query answering
algorithm which dynamically updates cost
estimations during execution.
BudgetMatch yields huge improvements over
standard static approaches for some queries
35