SlideShare uma empresa Scribd logo
1 de 55
Seyed-Vahid Sanei-Mehri
Ahmet Erdem Sariyuce
Srikanta Tirthapura
Summer 2018
Bipartite Network Definition
2
• A graph G = (V, E) is called a bipartite graph if:
 V can be partitioned into two disjoint subsets L and R such that E ⊆ L × R, i.e. Every edge
connects a vertex in L and a vertex in R.
1
x
y
w
z
2
3
Partition
L
Partition R
Applications of Bipartite Networks (I)
• Web Search [He et. al, ‘17]:
 The set L is set of users.
 The set R is set of web pages.
 E is user actions (“clicks”).
3
Partition
L
Partition R
Applications of Bipartite Networks (II)
• Tag Recommendation [Song et. al, ‘08]:
 The set L is set of documents.
 The set R is set of hash tags.
 If hashtag exists in the content,
there is an edge.
4
Partition
L
Partition R
Characteristics of Bipartite Networks
5
• Average Degree
• Diameter
• Betweenness Centrality
• Maximum Cardinality Matching
• Motif Count, e.g. Butterfly Count
Butterfly Definition
6
• Butterfly is a 2,2-biclique.
1
x
y
w
z
2
3 wz
23
Properties of a Butterfly
7
• Butterfly is the smallest non-trivial unit of cohesion.
• Butterfly is the smallest structure with multiple vertices at each
side.
• Butterfly is the smallest cycle in the bipartite graph.
dc
ba
Problem Statement
8
• Input: We are given a bipartite graph G = V = L, R , E
• Goal: We need to compute the number of butterflies in G.
• Challenges:
• Graphs are very large ( 𝐸 ≥ 3 × 108
, V ≥ 3 × 107
)
• There might be trillions of butterflies.
• Exact counting algorithm is expensive.
• Can we find the number of butterflies fast?
Butterfly Applications (I)
9
• Metamorphosis Coefficient [AKSOY et. al, ‘17]
 Let ⋈ be the number of butterflies in a bipartite graph.
 Let be the number of caterpillars.
𝐜 =
𝟒 ⋈
• It is a building block for creating community structure.
1
x
y
2
Butterfly Applications (II)
10
• Bipartite graphs decomposition: [Sarıyüce and Pinar, ‘18]
 A subgraph is called K-tip iff:
 The subgraph is connected.
 Each vertex in the subgraph has at least k number of butterflies.
 Subgraph is maximal.
Related Work (I)
11
• Matrix multiplication via arithmetic progressions [Copper Smith et. al, ’87]
• Exact algorithm with O(n2.371
) time complexity
• Doulion: counting triangles in massive graphs with a coin [Tsourakakis et. al, ’09]
• Colorful triangle counting and a mapreduce implementation [Pagh and Tsourakakis,
’12]
• Counting and sampling triangles from a graph stream [Pavan et. al, ‘13]
• Triangle counting in streaming setting [Stefani et. al, ‘17]
5
4
1
2
3
Related Work (II)
12
• Short cycle counting in bipartite graphs [Halford and Chugg et. al, ‘06]
• Proposed O(gn3
), where n = max L , R , and g is the girth of graph.
• Butterfly counting in bipartite graphs [Wang et. al, ‘14]
• Exact algorithm to count butterflies in a bipartite network.
• We reduced their time complexity.
• No randomized algorithm to count butterflies in bipartite networks
• We proposed several randomized algorithms to count butterflies.
Exact Butterfly Counting
13
• Exact Butterfly Counting (ExactBFC):
 Goal: count the number of butterflies in a bipartite graph G = V = L, R , E .
 Method:
oIterate thorough all vertices in one partition.
oCount the number of butterflies containing the vertices and add them up.
oDo not enumerate all butterflies.
oDo not count a butterfly twice.
Exact Butterfly Counting
14
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Partition
L
Partition R
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
A ← L
15
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
A
16
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = { }
17
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = { }
Vertex u
18
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = {〈′2′, 1〉}
Vertex u
Vertex w
hashmap C = { }
19
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
3
Vertex v
Vertex u
hashmap C = {〈′2′, 1〉}
2
Vertex w
hashmap C = {〈′2′, 2〉}
20
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
Vertex u
hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 2〉}
21
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
Vertex u
hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 4〉, 〈′3′, 2〉}
22
Exact Butterfly Counting
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
t
z
2
3
Vertex v
hashmap C = {〈′2′, 4〉, 〈′3′, 2〉}
23
Time Complexity of ExactBFC
24
• The time Complexity proposed by [Wang et. al, ‘14] is O v∈R dv
2 .
• Our exact algorithm requires O min{ v∈L dv
2 , v∈R dv
2}
Performance of ExactBFC
25
Graphs |L| |R| |E|
𝒍∈𝑳
𝒅𝒍
𝟐
𝒓∈𝑹
𝒅 𝒓
𝟐
⋈
Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B
Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T
WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T
202.8
432.8
9165.7
311.3
15281.9
1
10
100
1000
10000
100000
1000000
10000000
Delicious Orkut Web trackers
TIME(SEC)
GRAPHS
ExactBFC
ExactBFC Wang et. al
Performance of ExactBFC
Graphs |L| |R| |E|
𝒍∈𝑳
𝒅𝒍
𝟐
𝒓∈𝑹
𝒅 𝒓
𝟐
⋈
Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B
Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T
WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T
202.8
432.8
9165.7
311.3
15281.9
1
10
100
1000
10000
100000
1000000
10000000
Delicious Orkut Web trackers
TIME(SEC)
GRAPHS
ExactBFC
ExactBFC Wang et. al
26
Performance of ExactBFC
Graphs |L| |R| |E|
𝒍∈𝑳
𝒅𝒍
𝟐
𝒓∈𝑹
𝒅 𝒓
𝟐
⋈
Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B
Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T
WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T
202.8
432.8
9165.7
311.3
15281.9
1
10
100
1000
10000
100000
1000000
10000000
Delicious Orkut Web trackers
TIME(SEC)
GRAPHS
ExactBFC
ExactBFC Wang et. al
27
Exact Counting Algorithms are Slow
28
• What if O min{ v∈L dv
2 , v∈R dv
2} is very high (e.g. ≥ 1010)
• Do we really need the exact number of butterflies?
• What if we estimate the number of butterflies?
• Our randomized algorithms are fast.
• We proposed provable guarantees on the accuracy of algorithms.
Randomized Algorithm
• Edge Sampling
o Sample an edge.
o Estimate number of butterflies in the graph based on the sampled edge.
• Repeat sampling of edges and finally take the average of all estimates.
o This will increase the accuracy
29
Edge Sampling
30
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4] ⋈e ← Count the number of butterflies containing 𝑒.
[5] Return
|𝐸|
4
⋈e
1
x
y
w
z
2
3
Partition
L
Partition R
Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4] ⋈e ← Count the number of butterflies containing 𝑒.
[5] Return
|𝐸|
4
⋈e
1
x
y
w
z
2
3
31
Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Count the number of butterflies containing e.
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
32
Edge Sampling
Lines Algorithm:: ESamp Example
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Count the number of butterflies containing e.
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
33
Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
34
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
Lines Algorithm::Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
35
Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
36
Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
37
Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
38
Lines Algorithm::Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
39
Lines Algorithm:: ESamp
[1] Input: graph G = (V = L, R , E)
[2]
Output: estimate of number of
butterflies.
[3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random.
[4]
⋈e ← Count the number of butterflies
containing 𝑒.
[5] Return
|𝐸|
4
⋈e
ESamp – Expectation Analysis
Analysis
Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈
Proof:
Let Xi be a random variable and is one if ith
butterfly contains the sampled
edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈).
Suppose X = i=1
⋈
Xi = 1.
We have Pr Xi = 1 =
4
|E|
.
E X = i=1
⋈
E[Xi] = i=1
⋈
Pr[Xi = 1] =
4
|E|
⋈.
Since Y = X .
|E|
4
, it follows that E Y = ⋈.
40
Edge Sampling Performance
• Error rate is high.
• What is the reason?
• Exact counting # of butterflies in a sampled edge is very slow.
• Few edges are sampled.
• The accuracy is low.
41
Number of Sampled Edges
42
Graphs |L| |R| |E|
𝒍∈𝑳
𝒅𝒍
𝟐
𝒓∈𝑹
𝒅 𝒓
𝟐
⋈
Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B
Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T
WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T
1
100
10000
1000000
100000000
Delicious Orkut Web trackers
22550
980 360
10179895 32703748 14061376
# of sampled edges using ESamp
|E|
# of sampled edges in 60s
Fast Edge Sampling
Lines Algorithm:: Fast ESamp
[1] Input: graph G = (V = L, R , E)
[2] Output: estimate of number of butterflies.
[3] Choose an edge e ∈ E uniformly at random.
[4] ⋈e ← Fast EBFC(e).
[5] Return
|E|
4
⋈e
1
x
y
w
z
2
3
43
Fast Local Butterfly Counting
Lines Algorithm:: Fast EBFC Example
[1] Input: Sampled edge 𝐞 = 〈𝐯, 𝐮〉
[2] Output: estimate of number of butterflies containing 𝐞
[3] Choose a neighbor w of vertex u uniformly at random.
[4] Choose a neighbor x of vertex v uniformly at random.
[5] If (𝐮, 𝐯, 𝐰, 𝐱) forms a butterfly: β ← 1.
[6] Else β ← 0.
[7] Return 𝛃 ⋅ 𝐝 𝐮⋅ 𝐝 𝐯
1
x
y
w
z
2
3
Comment: dv is the degree of vertex v
44
Fast Esamp – Variance Analysis
• Let Y be the returned value by Fast ESamp.
Var Y ≤
E
4
(⋈ +PE)
• Let Z be the average of α =
8 E 1+
PE
⋈
ϵ2⋈
independent instances of Y .
Using Chebyshev’s inequality:
Pr Z − ⋈ ≥ ϵ ⋈ ≤
Var Z
ϵ2 ⋈2
=
Var Y
αϵ2 ⋈2
≤
E ⋈ +PE
4αϵ2 ⋈2
=
1
32
45
1 x
y
w
2
3
A pair of butterfly with a
common edge.
45
Performance of Fast ESamp
46
Graph Delicious
|𝐋| 833K
|𝐑| 33.7M
|𝐄| 101M
𝒍∈𝑳
𝒅𝒍
𝟐
85.9B
𝒓∈𝑹
𝒅 𝒓
𝟐 52.7B
⋈ 56.8B
# of sampled
edges after 60s
355K
Wang et. Al 311secs
ExactBFC 202secsError(%) =
|Estimate −Exact|
Exact
× 100
120K sampled
edges
Performance of Fast ESamp
47
Graph Orkut
|𝐋| 2.7M
|𝐑| 8.7M
|𝐄| 327M
𝒍∈𝑳
𝒅𝒍
𝟐
156M
𝒓∈𝑹
𝒅 𝒓
𝟐 4.9T
⋈ 22.1T
# of sampled
edges after 60s
125K
Wang et. Al 15281secs
ExactBFC 423secsError(%) =
|Estimate −Exact|
Exact
× 100
Performance of Fast ESamp
48
Graph WebTracker
|𝐋| 27.6M
|𝐑| 12.7M
|𝐄| 140M
𝒍∈𝑳
𝒅𝒍
𝟐
1.7T
𝒓∈𝑹
𝒅 𝒓
𝟐 211T
⋈ 20T
# of sampled
edges after 60s
125K
Wang et. Al 𝟒 ⋅ 𝟏𝟎 𝟒 secs
ExactBFC 9165 secs
Error(%) =
|Estimate −Exact|
Exact
× 100
Our Contribution
 Exact Butterfly Counting Algorithm
 Randomized Butterfly Counting Algorithms
 One-shot Sparsification: Colorful Sparsification, Edge Sparsification
 Local Sampling: Vertex Sampling, Wedge Sampling, Edge Sampling, Fast Edge
Sampling.
 Provable Guarantees
 Extensive experimental results on real-world networks
49
Resources
 Butterfly Counting Implementation:
 https://github.com/beginner1010/butterfly-counting
 Our Paper on KDD:
 Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sarıyüce, and Srikanta Tirthapura.
2018. Butterfly Counting in Bipartite Networks. In KDD ’18: The 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining,
August 19–23, 2018, London, United Kingdom.
 The Long version of paper on arXiv:
 Sanei-Mehri, Seyed-Vahid, Erdem Saryuce, and Srikanta Tirthapura.
"Butterfly Counting in Bipartite Networks." arXiv preprint
arXiv:1801.00338 (2017). 50
References (I)
• He, Xiangnan, Ming Gao, Min-Yen Kan, and Dingxian Wang. "Birank: Towards ranking on bipartite graphs." IEEE Transactions on
Knowledge and Data Engineering 29, no. 1 (2017): 57-71.
• Song, Yang, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C. Lee Giles. "Real-time automatic tag
recommendation." In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 515-522. ACM, 2008.
• Aksoy, Sinan G., Tamara G. Kolda, and Ali Pinar. "Measuring and modeling bipartite graphs with community structure." Journal of
Complex Networks 5, no. 4 (2017): 581-603.
• Sarıyüce, Ahmet Erdem, and Ali Pinar. "Peeling bipartite networks for dense subgraph discovery." In Proceedings of the Eleventh
ACM International Conference on Web Search and Data Mining, pp. 504-512. ACM, 2018.
• Coppersmith, Don, and Shmuel Winograd. "Matrix multiplication via arithmetic progressions." In Proceedings of the nineteenth
annual ACM symposium on Theory of computing, pp. 1-6. ACM, 1987.
• Tsourakakis, Charalampos E., U. Kang, Gary L. Miller, and Christos Faloutsos. "Doulion: counting triangles in massive graphs with
a coin." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 837-846.
ACM, 2009.
• Pagh, Rasmus, and Charalampos E. Tsourakakis. "Colorful triangle counting and a mapreduce implementation." Information
Processing Letters 112, no. 7 (2012): 277-281.
51
References (II)
• Pavan, Aduri, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. "Counting and sampling triangles from a graph
stream." Proceedings of the VLDB Endowment 6, no. 14 (2013): 1870-1881.
• Stefani, Lorenzo De, Alessandro Epasto, Matteo Riondato, and Eli Upfal. "Triest: Counting local and global triangles in fully
dynamic streams with fixed memory size." ACM Transactions on Knowledge Discovery from Data (TKDD) 11
• Halford, Thomas R., and Keith M. Chugg. "An algorithm for counting short cycles in bipartite graphs." IEEE Transactions on
Information Theory 52, no. 1 (2006): 287-292.
• Wang, Jia, Ada Wai-Chee Fu, and James Cheng. "Rectangle counting in large bipartite graphs." In Big Data (BigData Congress),
2014 IEEE International Congress on, pp. 17-24. IEEE, 2014.
52
Open Problems
• Can we count larger motifs suitable to bipartite
networks?
• Can we count larger graphlets in bipartite
networks? For example, (a, b)- bicliques instead of
(2,2)- bicliques.
• Can we adapt our algorithms in the streaming
scenario?
Time Complexity of ExactBFC (I)
1*
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] For every vertex v ∈ A:
[5] C ← hashmap, initialized with zero
[6] For every neighbor u of v:
[7] For every neighbor w of u s.t. IDw > IDv:
[8] C w ← C w + 1
[9] For every w ∈ C s.t. C x > 0:
[10] ⋈ ← ⋈ +
C x
2
[11] Return ⋈
1
x
y
w
z
2
3
• If A ← L, the time Complexity is O v∈R dv
2
.
Time Complexity of ExactBFC (II)
2*
Lines Algorithm::ExactBFC Example
[1] Input: graph G = (V = L, R , E)
[2] Output: ⋈, exact number of butterflies.
[3] A ← L, ⋈← 0.
[4] If v∈L dv
2
< v∈R dv
2
: A ← R
[5] For every vertex v ∈ A:
[6] C ← hashmap, initialized with zero
[7] For every neighbor u of v:
[8] For every neighbor w of u s.t. IDw > IDv:
[9] C w ← C w + 1
[10] For every w ∈ C s.t. C x > 0:
[11] ⋈ ← ⋈ +
C x
2
[12] Return ⋈
1
x
y
w
z
2
3
• O v∈R dv
2
→ O min{ v∈L dv
2
, v∈R dv
2
} .

Mais conteúdo relacionado

Mais procurados

Planning in AI(Partial order planning)
Planning in AI(Partial order planning)Planning in AI(Partial order planning)
Planning in AI(Partial order planning)Vicky Tyagi
 
Knnowledge representation and logic lec 11 to lec 15
Knnowledge representation and logic lec 11 to lec 15Knnowledge representation and logic lec 11 to lec 15
Knnowledge representation and logic lec 11 to lec 15Subash Chandra Pakhrin
 
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9HrithikSinghvi
 
elevator problem.pdf
elevator problem.pdfelevator problem.pdf
elevator problem.pdfJayaprasanna4
 
Bitwa pod Wiedniem
Bitwa pod WiedniemBitwa pod Wiedniem
Bitwa pod Wiedniemgblonska
 
Madam rides the bus
Madam rides the busMadam rides the bus
Madam rides the busNVSBPL
 
The best christmas present in the world
The best christmas present in the worldThe best christmas present in the world
The best christmas present in the worldNAGAPRANEETHBYSANI7B
 

Mais procurados (8)

Planning in AI(Partial order planning)
Planning in AI(Partial order planning)Planning in AI(Partial order planning)
Planning in AI(Partial order planning)
 
Knnowledge representation and logic lec 11 to lec 15
Knnowledge representation and logic lec 11 to lec 15Knnowledge representation and logic lec 11 to lec 15
Knnowledge representation and logic lec 11 to lec 15
 
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9
Vygyanic Chetana Ke Vahak Sir Chandra Shekhar Venkat Ramamn (CV Raman) Class 9
 
elevator problem.pdf
elevator problem.pdfelevator problem.pdf
elevator problem.pdf
 
Forward Backward Chaining
Forward Backward ChainingForward Backward Chaining
Forward Backward Chaining
 
Bitwa pod Wiedniem
Bitwa pod WiedniemBitwa pod Wiedniem
Bitwa pod Wiedniem
 
Madam rides the bus
Madam rides the busMadam rides the bus
Madam rides the bus
 
The best christmas present in the world
The best christmas present in the worldThe best christmas present in the world
The best christmas present in the world
 

Semelhante a Butterfly Counting in Bipartite Networks

MATLAB Questions and Answers.pdf
MATLAB Questions and Answers.pdfMATLAB Questions and Answers.pdf
MATLAB Questions and Answers.pdfahmed8651
 
Matlab ch1 (3)
Matlab ch1 (3)Matlab ch1 (3)
Matlab ch1 (3)mohsinggg
 
parameterized complexity for graph Motif
parameterized complexity for graph Motifparameterized complexity for graph Motif
parameterized complexity for graph MotifAMR koura
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
1 College Algebra Final Examination---FHSU Math & C.S.docx
 1 College Algebra Final Examination---FHSU Math & C.S.docx 1 College Algebra Final Examination---FHSU Math & C.S.docx
1 College Algebra Final Examination---FHSU Math & C.S.docxjoyjonna282
 
Lab 4 Three-Bit Binary Adder
Lab 4 Three-Bit Binary AdderLab 4 Three-Bit Binary Adder
Lab 4 Three-Bit Binary AdderKatrina Little
 
Introduction to matlab lecture 2 of 4
Introduction to matlab lecture 2 of 4Introduction to matlab lecture 2 of 4
Introduction to matlab lecture 2 of 4Randa Elanwar
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...IJECEIAES
 
Introduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordIntroduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordLakshmi Sarvani Videla
 
Ejercicios asignados a yonathan david diaz granados
Ejercicios asignados a yonathan david diaz granadosEjercicios asignados a yonathan david diaz granados
Ejercicios asignados a yonathan david diaz granadosYonathanDavidDiazGra
 
Matlab solved tutorials 2017 june.
Matlab solved tutorials  2017 june.Matlab solved tutorials  2017 june.
Matlab solved tutorials 2017 june.musadoto
 
0580 w09 qp_4
0580 w09 qp_40580 w09 qp_4
0580 w09 qp_4King Ali
 

Semelhante a Butterfly Counting in Bipartite Networks (20)

Algorithms Design Exam Help
Algorithms Design Exam HelpAlgorithms Design Exam Help
Algorithms Design Exam Help
 
MATLAB Questions and Answers.pdf
MATLAB Questions and Answers.pdfMATLAB Questions and Answers.pdf
MATLAB Questions and Answers.pdf
 
0606 m18 2_2_qp
0606 m18 2_2_qp0606 m18 2_2_qp
0606 m18 2_2_qp
 
Matlab ch1 (3)
Matlab ch1 (3)Matlab ch1 (3)
Matlab ch1 (3)
 
0580 s12 qp_41
0580 s12 qp_410580 s12 qp_41
0580 s12 qp_41
 
parameterized complexity for graph Motif
parameterized complexity for graph Motifparameterized complexity for graph Motif
parameterized complexity for graph Motif
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
L08.pdf
L08.pdfL08.pdf
L08.pdf
 
Algorithms Design Assignment Help
Algorithms Design Assignment HelpAlgorithms Design Assignment Help
Algorithms Design Assignment Help
 
1 College Algebra Final Examination---FHSU Math & C.S.docx
 1 College Algebra Final Examination---FHSU Math & C.S.docx 1 College Algebra Final Examination---FHSU Math & C.S.docx
1 College Algebra Final Examination---FHSU Math & C.S.docx
 
CS.3.Arrays.pdf
CS.3.Arrays.pdfCS.3.Arrays.pdf
CS.3.Arrays.pdf
 
Lab 4 Three-Bit Binary Adder
Lab 4 Three-Bit Binary AdderLab 4 Three-Bit Binary Adder
Lab 4 Three-Bit Binary Adder
 
Counting sort
Counting sortCounting sort
Counting sort
 
Introduction to matlab lecture 2 of 4
Introduction to matlab lecture 2 of 4Introduction to matlab lecture 2 of 4
Introduction to matlab lecture 2 of 4
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Introduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordIntroduction to Data Science With R Lab Record
Introduction to Data Science With R Lab Record
 
Ejercicios asignados a yonathan david diaz granados
Ejercicios asignados a yonathan david diaz granadosEjercicios asignados a yonathan david diaz granados
Ejercicios asignados a yonathan david diaz granados
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Matlab solved tutorials 2017 june.
Matlab solved tutorials  2017 june.Matlab solved tutorials  2017 june.
Matlab solved tutorials 2017 june.
 
0580 w09 qp_4
0580 w09 qp_40580 w09 qp_4
0580 w09 qp_4
 

Último

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 

Último (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

Butterfly Counting in Bipartite Networks

  • 1. Seyed-Vahid Sanei-Mehri Ahmet Erdem Sariyuce Srikanta Tirthapura Summer 2018
  • 2. Bipartite Network Definition 2 • A graph G = (V, E) is called a bipartite graph if:  V can be partitioned into two disjoint subsets L and R such that E ⊆ L × R, i.e. Every edge connects a vertex in L and a vertex in R. 1 x y w z 2 3 Partition L Partition R
  • 3. Applications of Bipartite Networks (I) • Web Search [He et. al, ‘17]:  The set L is set of users.  The set R is set of web pages.  E is user actions (“clicks”). 3 Partition L Partition R
  • 4. Applications of Bipartite Networks (II) • Tag Recommendation [Song et. al, ‘08]:  The set L is set of documents.  The set R is set of hash tags.  If hashtag exists in the content, there is an edge. 4 Partition L Partition R
  • 5. Characteristics of Bipartite Networks 5 • Average Degree • Diameter • Betweenness Centrality • Maximum Cardinality Matching • Motif Count, e.g. Butterfly Count
  • 6. Butterfly Definition 6 • Butterfly is a 2,2-biclique. 1 x y w z 2 3 wz 23
  • 7. Properties of a Butterfly 7 • Butterfly is the smallest non-trivial unit of cohesion. • Butterfly is the smallest structure with multiple vertices at each side. • Butterfly is the smallest cycle in the bipartite graph. dc ba
  • 8. Problem Statement 8 • Input: We are given a bipartite graph G = V = L, R , E • Goal: We need to compute the number of butterflies in G. • Challenges: • Graphs are very large ( 𝐸 ≥ 3 × 108 , V ≥ 3 × 107 ) • There might be trillions of butterflies. • Exact counting algorithm is expensive. • Can we find the number of butterflies fast?
  • 9. Butterfly Applications (I) 9 • Metamorphosis Coefficient [AKSOY et. al, ‘17]  Let ⋈ be the number of butterflies in a bipartite graph.  Let be the number of caterpillars. 𝐜 = 𝟒 ⋈ • It is a building block for creating community structure. 1 x y 2
  • 10. Butterfly Applications (II) 10 • Bipartite graphs decomposition: [Sarıyüce and Pinar, ‘18]  A subgraph is called K-tip iff:  The subgraph is connected.  Each vertex in the subgraph has at least k number of butterflies.  Subgraph is maximal.
  • 11. Related Work (I) 11 • Matrix multiplication via arithmetic progressions [Copper Smith et. al, ’87] • Exact algorithm with O(n2.371 ) time complexity • Doulion: counting triangles in massive graphs with a coin [Tsourakakis et. al, ’09] • Colorful triangle counting and a mapreduce implementation [Pagh and Tsourakakis, ’12] • Counting and sampling triangles from a graph stream [Pavan et. al, ‘13] • Triangle counting in streaming setting [Stefani et. al, ‘17] 5 4 1 2 3
  • 12. Related Work (II) 12 • Short cycle counting in bipartite graphs [Halford and Chugg et. al, ‘06] • Proposed O(gn3 ), where n = max L , R , and g is the girth of graph. • Butterfly counting in bipartite graphs [Wang et. al, ‘14] • Exact algorithm to count butterflies in a bipartite network. • We reduced their time complexity. • No randomized algorithm to count butterflies in bipartite networks • We proposed several randomized algorithms to count butterflies.
  • 13. Exact Butterfly Counting 13 • Exact Butterfly Counting (ExactBFC):  Goal: count the number of butterflies in a bipartite graph G = V = L, R , E .  Method: oIterate thorough all vertices in one partition. oCount the number of butterflies containing the vertices and add them up. oDo not enumerate all butterflies. oDo not count a butterfly twice.
  • 14. Exact Butterfly Counting 14 Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Partition L Partition R
  • 15. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 A ← L 15
  • 16. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v A 16
  • 17. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v hashmap C = { } 17
  • 18. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v hashmap C = { } Vertex u 18
  • 19. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v hashmap C = {〈′2′, 1〉} Vertex u Vertex w hashmap C = { } 19
  • 20. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 3 Vertex v Vertex u hashmap C = {〈′2′, 1〉} 2 Vertex w hashmap C = {〈′2′, 2〉} 20
  • 21. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v Vertex u hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 2〉} 21
  • 22. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v Vertex u hashmap C = {〈′2′, 3〉, 〈′3′, 1〉}hashmap C = {〈′2′, 4〉, 〈′3′, 2〉} 22
  • 23. Exact Butterfly Counting Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y t z 2 3 Vertex v hashmap C = {〈′2′, 4〉, 〈′3′, 2〉} 23
  • 24. Time Complexity of ExactBFC 24 • The time Complexity proposed by [Wang et. al, ‘14] is O v∈R dv 2 . • Our exact algorithm requires O min{ v∈L dv 2 , v∈R dv 2}
  • 25. Performance of ExactBFC 25 Graphs |L| |R| |E| 𝒍∈𝑳 𝒅𝒍 𝟐 𝒓∈𝑹 𝒅 𝒓 𝟐 ⋈ Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T 202.8 432.8 9165.7 311.3 15281.9 1 10 100 1000 10000 100000 1000000 10000000 Delicious Orkut Web trackers TIME(SEC) GRAPHS ExactBFC ExactBFC Wang et. al
  • 26. Performance of ExactBFC Graphs |L| |R| |E| 𝒍∈𝑳 𝒅𝒍 𝟐 𝒓∈𝑹 𝒅 𝒓 𝟐 ⋈ Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T 202.8 432.8 9165.7 311.3 15281.9 1 10 100 1000 10000 100000 1000000 10000000 Delicious Orkut Web trackers TIME(SEC) GRAPHS ExactBFC ExactBFC Wang et. al 26
  • 27. Performance of ExactBFC Graphs |L| |R| |E| 𝒍∈𝑳 𝒅𝒍 𝟐 𝒓∈𝑹 𝒅 𝒓 𝟐 ⋈ Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T 202.8 432.8 9165.7 311.3 15281.9 1 10 100 1000 10000 100000 1000000 10000000 Delicious Orkut Web trackers TIME(SEC) GRAPHS ExactBFC ExactBFC Wang et. al 27
  • 28. Exact Counting Algorithms are Slow 28 • What if O min{ v∈L dv 2 , v∈R dv 2} is very high (e.g. ≥ 1010) • Do we really need the exact number of butterflies? • What if we estimate the number of butterflies? • Our randomized algorithms are fast. • We proposed provable guarantees on the accuracy of algorithms.
  • 29. Randomized Algorithm • Edge Sampling o Sample an edge. o Estimate number of butterflies in the graph based on the sampled edge. • Repeat sampling of edges and finally take the average of all estimates. o This will increase the accuracy 29
  • 30. Edge Sampling 30 Lines Algorithm:: ESamp Example [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e 1 x y w z 2 3 Partition L Partition R
  • 31. Edge Sampling Lines Algorithm:: ESamp Example [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e 1 x y w z 2 3 31
  • 32. Edge Sampling Lines Algorithm:: ESamp Example [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge e ∈ E uniformly at random. [4] ⋈e ← Count the number of butterflies containing e. [5] Return |E| 4 ⋈e 1 x y w z 2 3 32
  • 33. Edge Sampling Lines Algorithm:: ESamp Example [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge e ∈ E uniformly at random. [4] ⋈e ← Count the number of butterflies containing e. [5] Return |E| 4 ⋈e 1 x y w z 2 3 33
  • 34. Lines Algorithm:: ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis 34 Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈.
  • 35. Lines Algorithm::Fast ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 35
  • 36. Lines Algorithm:: ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 36
  • 37. Lines Algorithm:: ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 37
  • 38. Lines Algorithm:: ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 38
  • 39. Lines Algorithm::Fast ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 39
  • 40. Lines Algorithm:: ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge 𝑒 ∈ 𝐸 uniformly at random. [4] ⋈e ← Count the number of butterflies containing 𝑒. [5] Return |𝐸| 4 ⋈e ESamp – Expectation Analysis Analysis Lemma: Let Y be the random variable returned by ESamp. Then E Y = ⋈ Proof: Let Xi be a random variable and is one if ith butterfly contains the sampled edge e. Otherwise, Xi is zero (1 ≤ 𝑖 ≤⋈). Suppose X = i=1 ⋈ Xi = 1. We have Pr Xi = 1 = 4 |E| . E X = i=1 ⋈ E[Xi] = i=1 ⋈ Pr[Xi = 1] = 4 |E| ⋈. Since Y = X . |E| 4 , it follows that E Y = ⋈. 40
  • 41. Edge Sampling Performance • Error rate is high. • What is the reason? • Exact counting # of butterflies in a sampled edge is very slow. • Few edges are sampled. • The accuracy is low. 41
  • 42. Number of Sampled Edges 42 Graphs |L| |R| |E| 𝒍∈𝑳 𝒅𝒍 𝟐 𝒓∈𝑹 𝒅 𝒓 𝟐 ⋈ Delicious 833K 33.7M 101M 85.9B 52.7B 56.8B Orkut 2.7M 8.7M 327M 156M 4.9T 22.1T WebTrackers 27.6M 12.7M 140M 1.7T 211T 20T 1 100 10000 1000000 100000000 Delicious Orkut Web trackers 22550 980 360 10179895 32703748 14061376 # of sampled edges using ESamp |E| # of sampled edges in 60s
  • 43. Fast Edge Sampling Lines Algorithm:: Fast ESamp [1] Input: graph G = (V = L, R , E) [2] Output: estimate of number of butterflies. [3] Choose an edge e ∈ E uniformly at random. [4] ⋈e ← Fast EBFC(e). [5] Return |E| 4 ⋈e 1 x y w z 2 3 43
  • 44. Fast Local Butterfly Counting Lines Algorithm:: Fast EBFC Example [1] Input: Sampled edge 𝐞 = 〈𝐯, 𝐮〉 [2] Output: estimate of number of butterflies containing 𝐞 [3] Choose a neighbor w of vertex u uniformly at random. [4] Choose a neighbor x of vertex v uniformly at random. [5] If (𝐮, 𝐯, 𝐰, 𝐱) forms a butterfly: β ← 1. [6] Else β ← 0. [7] Return 𝛃 ⋅ 𝐝 𝐮⋅ 𝐝 𝐯 1 x y w z 2 3 Comment: dv is the degree of vertex v 44
  • 45. Fast Esamp – Variance Analysis • Let Y be the returned value by Fast ESamp. Var Y ≤ E 4 (⋈ +PE) • Let Z be the average of α = 8 E 1+ PE ⋈ ϵ2⋈ independent instances of Y . Using Chebyshev’s inequality: Pr Z − ⋈ ≥ ϵ ⋈ ≤ Var Z ϵ2 ⋈2 = Var Y αϵ2 ⋈2 ≤ E ⋈ +PE 4αϵ2 ⋈2 = 1 32 45 1 x y w 2 3 A pair of butterfly with a common edge. 45
  • 46. Performance of Fast ESamp 46 Graph Delicious |𝐋| 833K |𝐑| 33.7M |𝐄| 101M 𝒍∈𝑳 𝒅𝒍 𝟐 85.9B 𝒓∈𝑹 𝒅 𝒓 𝟐 52.7B ⋈ 56.8B # of sampled edges after 60s 355K Wang et. Al 311secs ExactBFC 202secsError(%) = |Estimate −Exact| Exact × 100 120K sampled edges
  • 47. Performance of Fast ESamp 47 Graph Orkut |𝐋| 2.7M |𝐑| 8.7M |𝐄| 327M 𝒍∈𝑳 𝒅𝒍 𝟐 156M 𝒓∈𝑹 𝒅 𝒓 𝟐 4.9T ⋈ 22.1T # of sampled edges after 60s 125K Wang et. Al 15281secs ExactBFC 423secsError(%) = |Estimate −Exact| Exact × 100
  • 48. Performance of Fast ESamp 48 Graph WebTracker |𝐋| 27.6M |𝐑| 12.7M |𝐄| 140M 𝒍∈𝑳 𝒅𝒍 𝟐 1.7T 𝒓∈𝑹 𝒅 𝒓 𝟐 211T ⋈ 20T # of sampled edges after 60s 125K Wang et. Al 𝟒 ⋅ 𝟏𝟎 𝟒 secs ExactBFC 9165 secs Error(%) = |Estimate −Exact| Exact × 100
  • 49. Our Contribution  Exact Butterfly Counting Algorithm  Randomized Butterfly Counting Algorithms  One-shot Sparsification: Colorful Sparsification, Edge Sparsification  Local Sampling: Vertex Sampling, Wedge Sampling, Edge Sampling, Fast Edge Sampling.  Provable Guarantees  Extensive experimental results on real-world networks 49
  • 50. Resources  Butterfly Counting Implementation:  https://github.com/beginner1010/butterfly-counting  Our Paper on KDD:  Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sarıyüce, and Srikanta Tirthapura. 2018. Butterfly Counting in Bipartite Networks. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19–23, 2018, London, United Kingdom.  The Long version of paper on arXiv:  Sanei-Mehri, Seyed-Vahid, Erdem Saryuce, and Srikanta Tirthapura. "Butterfly Counting in Bipartite Networks." arXiv preprint arXiv:1801.00338 (2017). 50
  • 51. References (I) • He, Xiangnan, Ming Gao, Min-Yen Kan, and Dingxian Wang. "Birank: Towards ranking on bipartite graphs." IEEE Transactions on Knowledge and Data Engineering 29, no. 1 (2017): 57-71. • Song, Yang, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C. Lee Giles. "Real-time automatic tag recommendation." In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 515-522. ACM, 2008. • Aksoy, Sinan G., Tamara G. Kolda, and Ali Pinar. "Measuring and modeling bipartite graphs with community structure." Journal of Complex Networks 5, no. 4 (2017): 581-603. • Sarıyüce, Ahmet Erdem, and Ali Pinar. "Peeling bipartite networks for dense subgraph discovery." In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 504-512. ACM, 2018. • Coppersmith, Don, and Shmuel Winograd. "Matrix multiplication via arithmetic progressions." In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pp. 1-6. ACM, 1987. • Tsourakakis, Charalampos E., U. Kang, Gary L. Miller, and Christos Faloutsos. "Doulion: counting triangles in massive graphs with a coin." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 837-846. ACM, 2009. • Pagh, Rasmus, and Charalampos E. Tsourakakis. "Colorful triangle counting and a mapreduce implementation." Information Processing Letters 112, no. 7 (2012): 277-281. 51
  • 52. References (II) • Pavan, Aduri, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. "Counting and sampling triangles from a graph stream." Proceedings of the VLDB Endowment 6, no. 14 (2013): 1870-1881. • Stefani, Lorenzo De, Alessandro Epasto, Matteo Riondato, and Eli Upfal. "Triest: Counting local and global triangles in fully dynamic streams with fixed memory size." ACM Transactions on Knowledge Discovery from Data (TKDD) 11 • Halford, Thomas R., and Keith M. Chugg. "An algorithm for counting short cycles in bipartite graphs." IEEE Transactions on Information Theory 52, no. 1 (2006): 287-292. • Wang, Jia, Ada Wai-Chee Fu, and James Cheng. "Rectangle counting in large bipartite graphs." In Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 17-24. IEEE, 2014. 52
  • 53. Open Problems • Can we count larger motifs suitable to bipartite networks? • Can we count larger graphlets in bipartite networks? For example, (a, b)- bicliques instead of (2,2)- bicliques. • Can we adapt our algorithms in the streaming scenario?
  • 54. Time Complexity of ExactBFC (I) 1* Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] For every vertex v ∈ A: [5] C ← hashmap, initialized with zero [6] For every neighbor u of v: [7] For every neighbor w of u s.t. IDw > IDv: [8] C w ← C w + 1 [9] For every w ∈ C s.t. C x > 0: [10] ⋈ ← ⋈ + C x 2 [11] Return ⋈ 1 x y w z 2 3 • If A ← L, the time Complexity is O v∈R dv 2 .
  • 55. Time Complexity of ExactBFC (II) 2* Lines Algorithm::ExactBFC Example [1] Input: graph G = (V = L, R , E) [2] Output: ⋈, exact number of butterflies. [3] A ← L, ⋈← 0. [4] If v∈L dv 2 < v∈R dv 2 : A ← R [5] For every vertex v ∈ A: [6] C ← hashmap, initialized with zero [7] For every neighbor u of v: [8] For every neighbor w of u s.t. IDw > IDv: [9] C w ← C w + 1 [10] For every w ∈ C s.t. C x > 0: [11] ⋈ ← ⋈ + C x 2 [12] Return ⋈ 1 x y w z 2 3 • O v∈R dv 2 → O min{ v∈L dv 2 , v∈R dv 2 } .

Notas do Editor

  1. What is the users’ perspectives based on their clicks on different web pages.
  2. What is the users’ perspectives based on their clicks on different web pages.
  3. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  4. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  5. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  6. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  7. High coefficients are indicative of a relatively large number of butterflies indicating cohesion or community structure, which is rare in sparse random graphs unless there are some behaviors that go beyond just the degree structure. . The butterfly is also adopted in a recent work by Aksoy et al. [4] to generate bipartite graphs with community struct
  8. High coefficients are indicative of a relatively large number of butterflies indicating cohesion or community structure, which is rare in sparse random graphs unless there are some behaviors that go beyond just the degree structure.
  9. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  10. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  11. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  12. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  13. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  14. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  15. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  16. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  17. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  18. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  19. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  20. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  21. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  22. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  23. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  24. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  25. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  26. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  27. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  28. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  29. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  30. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  31. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  32. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  33. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  34. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  35. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  36. After 3.3 secs
  37. After 3.3 secs
  38. After 3.3 secs
  39. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  40. Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks)
  41. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.
  42. PPI -> cell signaling. Stability and Transient. You can experimentally which proteins are bound to each other. We need connected subgraphs to see what proteins are functionally related. Because the patterns are heterogonous, it is not good to use ML algorithms.