We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a ``butterfly'', a complete $2 \times 2$ biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.
2. Bipartite Network Definition
2
• A graph G = (V, E) is called a bipartite graph if:
V can be partitioned into two disjoint subsets L and R such that E ⊆ L × R, i.e. Every edge
connects a vertex in L and a vertex in R.
1
x
y
w
z
2
3
Partition
L
Partition R
3. Applications of Bipartite Networks (I)
• Web Search [He et. al, ‘17]:
The set L is set of users.
The set R is set of web pages.
E is user actions (“clicks”).
3
Partition
L
Partition R
4. Applications of Bipartite Networks (II)
• Tag Recommendation [Song et. al, ‘08]:
The set L is set of documents.
The set R is set of hash tags.
If hashtag exists in the content,
there is an edge.
4
Partition
L
Partition R
5. Characteristics of Bipartite Networks
5
• Average Degree
• Diameter
• Betweenness Centrality
• Maximum Cardinality Matching
• Motif Count, e.g. Butterfly Count
7. Properties of a Butterfly
7
• Butterfly is the smallest non-trivial unit of cohesion.
• Butterfly is the smallest structure with multiple vertices at each
side.
• Butterfly is the smallest cycle in the bipartite graph.
dc
ba
8. Problem Statement
8
• Input: We are given a bipartite graph G = V = L, R , E
• Goal: We need to compute the number of butterflies in G.
• Challenges:
• Graphs are very large ( 𝐸 ≥ 3 × 108
, V ≥ 3 × 107
)
• There might be trillions of butterflies.
• Exact counting algorithm is expensive.
• Can we find the number of butterflies fast?
9. Butterfly Applications (I)
9
• Metamorphosis Coefficient [AKSOY et. al, ‘17]
Let ⋈ be the number of butterflies in a bipartite graph.
Let be the number of caterpillars.
𝐜 =
𝟒 ⋈
• It is a building block for creating community structure.
1
x
y
2
10. Butterfly Applications (II)
10
• Bipartite graphs decomposition: [Sarıyüce and Pinar, ‘18]
A subgraph is called K-tip iff:
The subgraph is connected.
Each vertex in the subgraph has at least k number of butterflies.
Subgraph is maximal.
11. Related Work (I)
11
• Matrix multiplication via arithmetic progressions [Copper Smith et. al, ’87]
• Exact algorithm with O(n2.371
) time complexity
• Doulion: counting triangles in massive graphs with a coin [Tsourakakis et. al, ’09]
• Colorful triangle counting and a mapreduce implementation [Pagh and Tsourakakis,
’12]
• Counting and sampling triangles from a graph stream [Pavan et. al, ‘13]
• Triangle counting in streaming setting [Stefani et. al, ‘17]
5
4
1
2
3
12. Related Work (II)
12
• Short cycle counting in bipartite graphs [Halford and Chugg et. al, ‘06]
• Proposed O(gn3
), where n = max L , R , and g is the girth of graph.
• Butterfly counting in bipartite graphs [Wang et. al, ‘14]
• Exact algorithm to count butterflies in a bipartite network.
• We reduced their time complexity.
• No randomized algorithm to count butterflies in bipartite networks
• We proposed several randomized algorithms to count butterflies.
13. Exact Butterfly Counting
13
• Exact Butterfly Counting (ExactBFC):
Goal: count the number of butterflies in a bipartite graph G = V = L, R , E .
Method:
oIterate thorough all vertices in one partition.
oCount the number of butterflies containing the vertices and add them up.
oDo not enumerate all butterflies.
oDo not count a butterfly twice.
14. Exact Butterfly Counting
14
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Partition
L
Partition R
15. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
A ← L
15
16. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
A
16
17. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = { }
17
18. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = { }
Vertex u
18
19. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = {〈′2′, 1〉}
Vertex u
Vertex w
hashmap C = { }
19
20. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
3
Vertex v
Vertex u
hashmap C = {〈′2′, 1〉}
2
Vertex w
hashmap C = {〈′2′, 2〉}
20
21. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
Vertex u
hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 2〉}
21
22. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
Vertex u
hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 4〉, 〈′3′, 2〉}
22
23. Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = {〈′2′, 4〉, 〈′3′, 2〉}
23
24. Time Complexity of ExactBFC
24
• The time Complexity proposed by [Wang et. al, ‘14] is O v∈R dv
2 .
• Our exact algorithm requires O min{ v∈L dv
2 , v∈R dv
2}
28. Exact Counting Algorithms are Slow
28
• What if O min{ v∈L dv
2 , v∈R dv
2} is very high (e.g. ≥ 1010)
• Do we really need the exact number of butterflies?
• What if we estimate the number of butterflies?
• Our randomized algorithms are fast.
• We proposed provable guarantees on the accuracy of algorithms.
29. Randomized Algorithm
• Edge Sampling
o Sample an edge.
o Estimate number of butterflies in the graph based on the sampled edge.
• Repeat sampling of edges and finally take the average of all estimates.
o This will increase the accuracy
29
30. Edge Sampling
30
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4] ⋈e ← Count the number of butterflies containing 𝑒.
[5] Return
|𝐸|
4
⋈e
1
x
y
w
z
2
3
Partition
L
Partition R
31. Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4] ⋈e ← Count the number of butterflies containing 𝑒.
[5] Return
|𝐸|
4
⋈e
1
x
y
w
z
2
3
31
32. Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Count the number of butterflies containing e.
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
32
33. Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Count the number of butterflies containing e.
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
33
34. Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
34
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
35. Lines Algorithm::Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
35
36. Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
36
37. Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
37
38. Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
38
39. Lines Algorithm::Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
39
40. Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
40
41. Edge Sampling Performance
• Error rate is high.
• What is the reason?
• Exact counting # of butterflies in a sampled edge is very slow.
• Few edges are sampled.
• The accuracy is low.
41
42. Number of Sampled Edges
42
Graphs |L| |R| |E|
𝒍∈𝑳
𝒅𝒍
𝟐
𝒓∈𝑹
𝒅 𝒓
𝟐
⋈
Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B
Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T
WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T
1
100
10000
1000000
100000000
Delicious Orkut Web trackers
22550
980 360
10179895 32703748 14061376
# of sampled edges using ESamp
|E|
# of sampled edges in 60s
43. Fast Edge Sampling
Lines Algorithm:: Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Fast EBFC(e).
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
43
44. Fast Local Butterfly Counting
Lines Algorithm:: Fast EBFC Example
[1] Input: Sampled edge 𝐞 = 〈𝐯, 𝐮〉
[2] Output: estimate of number of butterflies containing 𝐞
[3] Choose a neighbor w of vertex u uniformly at random.
[4] Choose a neighbor x of vertex v uniformly at random.
[5] If (𝐮, 𝐯, 𝐰, 𝐱) forms a butterfly: β ← 1.
[6] Else β ← 0.
[7] Return 𝛃 ⋅ 𝐝 𝐮⋅ 𝐝 𝐯
1
x
y
w
z
2
3
Comment: dv is the degree of vertex v
44
45. Fast Esamp – Variance Analysis
• Let Y be the returned value by Fast ESamp.
Var Y ≤
E
4
(⋈ +PE)
• Let Z be the average of α =
8 E 1+
PE
⋈
ϵ2⋈
independent instances of Y .
Using Chebyshev’s inequality:
Pr Z − ⋈ ≥ ϵ ⋈ ≤
Var Z
ϵ2 ⋈2
=
Var Y
αϵ2 ⋈2
≤
E ⋈ +PE
4αϵ2 ⋈2
=
1
32
45
1 x
y
w
2
3
A pair of butterfly with a
common edge.
45
46. Performance of Fast ESamp
46
Graph Delicious
|𝐋| 833K
|𝐑| 33.7M
|𝐄| 101M
𝒍∈𝑳
𝒅𝒍
𝟐
85.9B
𝒓∈𝑹
𝒅 𝒓
𝟐 52.7B
⋈ 56.8B
# of sampled
edges after 60s
355K
Wang et. Al 311secs
ExactBFC 202secsError(%) =
|Estimate −Exact|
Exact
× 100
120K sampled
edges
47. Performance of Fast ESamp
47
Graph Orkut
|𝐋| 2.7M
|𝐑| 8.7M
|𝐄| 327M
𝒍∈𝑳
𝒅𝒍
𝟐
156M
𝒓∈𝑹
𝒅 𝒓
𝟐 4.9T
⋈ 22.1T
# of sampled
edges after 60s
125K
Wang et. Al 15281secs
ExactBFC 423secsError(%) =
|Estimate −Exact|
Exact
× 100
48. Performance of Fast ESamp
48
Graph WebTracker
|𝐋| 27.6M
|𝐑| 12.7M
|𝐄| 140M
𝒍∈𝑳
𝒅𝒍
𝟐
1.7T
𝒓∈𝑹
𝒅 𝒓
𝟐 211T
⋈ 20T
# of sampled
edges after 60s
125K
Wang et. Al 𝟒 ⋅ 𝟏𝟎 𝟒 secs
ExactBFC 9165 secs
Error(%) =
|Estimate −Exact|
Exact
× 100
50. Resources
Butterfly Counting Implementation:
https://github.com/beginner1010/butterfly-counting
Our Paper on KDD:
Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sarıyüce, and Srikanta Tirthapura.
2018. Butterfly Counting in Bipartite Networks. In KDD ’18: The 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining,
August 19–23, 2018, London, United Kingdom.
The Long version of paper on arXiv:
Sanei-Mehri, Seyed-Vahid, Erdem Saryuce, and Srikanta Tirthapura.
"Butterfly Counting in Bipartite Networks." arXiv preprint
arXiv:1801.00338 (2017). 50
51. References (I)
• He, Xiangnan, Ming Gao, Min-Yen Kan, and Dingxian Wang. "Birank: Towards ranking on bipartite graphs." IEEE Transactions on
Knowledge and Data Engineering 29, no. 1 (2017): 57-71.
• Song, Yang, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C. Lee Giles. "Real-time automatic tag
recommendation." In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 515-522. ACM, 2008.
• Aksoy, Sinan G., Tamara G. Kolda, and Ali Pinar. "Measuring and modeling bipartite graphs with community structure." Journal of
Complex Networks 5, no. 4 (2017): 581-603.
• Sarıyüce, Ahmet Erdem, and Ali Pinar. "Peeling bipartite networks for dense subgraph discovery." In Proceedings of the Eleventh
ACM International Conference on Web Search and Data Mining, pp. 504-512. ACM, 2018.
• Coppersmith, Don, and Shmuel Winograd. "Matrix multiplication via arithmetic progressions." In Proceedings of the nineteenth
annual ACM symposium on Theory of computing, pp. 1-6. ACM, 1987.
• Tsourakakis, Charalampos E., U. Kang, Gary L. Miller, and Christos Faloutsos. "Doulion: counting triangles in massive graphs with
a coin." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 837-846.
ACM, 2009.
• Pagh, Rasmus, and Charalampos E. Tsourakakis. "Colorful triangle counting and a mapreduce implementation." Information
Processing Letters 112, no. 7 (2012): 277-281.
51
52. References (II)
• Pavan, Aduri, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. "Counting and sampling triangles from a graph
stream." Proceedings of the VLDB Endowment 6, no. 14 (2013): 1870-1881.
• Stefani, Lorenzo De, Alessandro Epasto, Matteo Riondato, and Eli Upfal. "Triest: Counting local and global triangles in fully
dynamic streams with fixed memory size." ACM Transactions on Knowledge Discovery from Data (TKDD) 11
• Halford, Thomas R., and Keith M. Chugg. "An algorithm for counting short cycles in bipartite graphs." IEEE Transactions on
Information Theory 52, no. 1 (2006): 287-292.
• Wang, Jia, Ada Wai-Chee Fu, and James Cheng. "Rectangle counting in large bipartite graphs." In Big Data (BigData Congress),
2014 IEEE International Congress on, pp. 17-24. IEEE, 2014.
52
53. Open Problems
• Can we count larger motifs suitable to bipartite
networks?
• Can we count larger graphlets in bipartite
networks? For example, (a, b)- bicliques instead of
(2,2)- bicliques.
• Can we adapt our algorithms in the streaming
scenario?
54. Time Complexity of ExactBFC (I)
1*
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
w
z
2
3
• If A ← L, the time Complexity is O v∈R dv
2
.
55. Time Complexity of ExactBFC (II)
2*
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] If v∈L dv
2
< v∈R dv
2
: A ← R
[5] For every vertex v ∈ A:
[6] C ← hashmap, initialized with zero
[7] For every neighbor u of v:
[8] For every neighbor w of u s.t. IDw > IDv:
[9] C w ← C w + 1
[10] For every w ∈ C s.t. C x > 0:
[11] ⋈ ← ⋈ +
C x
2
[12] Return ⋈
1
x
y
w
z
2
3
• O v∈R dv
2
→ O min{ v∈L dv
2
, v∈R dv
2
} .
Notas do Editor
What is the users’ perspectives based on their clicks on different web pages.
What is the users’ perspectives based on their clicks on different web pages.
High coefficients are indicative of a relatively large number of butterflies indicating cohesion or community structure, which is rare in sparse random graphs unless there are some behaviors that go beyond just the degree structure.
. The butterfly is also adopted in a recent work by Aksoy et al. [4] to generate bipartite graphs with community struct
High coefficients are indicative of a relatively large number of butterflies indicating cohesion or community structure, which is rare in sparse random graphs unless there are some behaviors that go beyond just the degree structure.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.