NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.

Nguyen Thanh Sang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: sang.ngt99@gmail.com
23/05/2023

1
 Introduction
• Link Prediction
• Graph Isomorphic and Automorphic
• Subgraphs
 Methods
• Subgraph methods for link prediction
• Subgraph Sketching
• Scaling with preprocessing
 Evaluations
• Results
 Conclusions

2
Graphs
Graphs (Networks) are complex.
Several applications of Graph mining:
• Link prediction: predict whether there are missing
links between two nodes
• Ex: Knowledge graph completion
• Node classification: predict a property of a node
• Ex: Categorize online users / items
• Graph classification: categorize different graphs
• Ex: Molecule property prediction
• Clustering: detect if nodes form a community
• Ex: Social circle detection
• Other tasks:
• Graph generation: drug discovery
• Graph evolution: physical simulation

3
Link Prediction
Link Prediction (LP) is an important problem in graph ML with many industrial applications
+ For example, recommender systems can be
formulated as LP;
+ Link prediction is also a key process in drug
discovery and knowledge graph construction.

4
Link Prediction
 There are three main classes of LP methods:
o Heuristics: estimate the distance between two nodes (e.g. personalized page rank (PPR) or graph
distance) or the similarity of their neighborhoods (e.g Common Neighbors (CN), Adamic-Adar
(AA), or Resource Allocation (RA));
o Unsupervised node embeddings or factorization methods: encompass the majority of
production.
o Graph Neural Networks, in particular of the Message-Passing type (MPNNs).

5
Graph Isomorphic and Graph Automorphic
Two graphs are also called isomorphic whenever there exists an isomorphism between the two.
+ An automorphism of a graph is a graph isomorphism
with itself, i.e., a mapping from the vertices of the given
graph G back to vertices of G such that the resulting graph
is isomorphic with G.

6
Subgraphs
+ The state-of-the-art methods for LP restrict computation to subgraphs enclosing a link,
transforming link prediction into binary subgraph classification.
+ Subgraph GNNs (SGNN) are inspired by the strong performance of LP heuristics compared
to more sophisticated techniques and are motivated as an attempt to learn data-driven LP
heuristics.

7
Problems
 MPNNs tend to be poor performance in link prediction:
+ Standard MPNNs are incapable of counting triangles and
consequently of counting Common Neighbors or computing one-hop
or two-hop LP heuristics.
+ GNN-based LP approaches combine permutation-equivariant
structural node representations and a readout function that maps from
two node representations to a link probability.
 All nodes u in the same orbit induced by the graph automorphism
group have equal representations.
 Cannot distinguish nodes in the graph.
 Some existing subgraph-based methods solve this problem by
sorting the nodes but the extraction of subgraph is complicated and
not easy to parallelize.
 SGNN shows a strong performance in LP.

8
Problems
 Some methods use feature count triangles:
 Every edge has different node features.
 No automorphic.
 Complicated and lack of scalability.

9
Problems
 SGNNs suffer from some serious limitations:
1. Constructing the subgraphs is expensive;
2. Subgraphs are irregular and so batching them is inefficient on GPUs;
3. Each step of inference is almost as expensive as each training step because subgraphs must
be constructed for every test link.

10
Problems
 SEAL generates a subgraph around a link.
 Must generated for every link.
 Labels must be generated for every subgraph.
 Difficult to use for large scale.

11
Contributions
• Analyze the SGNN components and reveal which properties of the subgraphs are salient
to the LP problem.
 Develop an MPNN (ELPH) that passes subgraph sketches as messages.
• Using the sketches which allow the most important qualities of the subgraphs to be
summarized in the nodes.
• The resulting model removes the need for explicit subgraph construction and is a full-
graph MPNN with the similar complexity to GCN.
• ELPH is strictly more expressive than MPNNs for LP => solve automorphic node
problem.
• Proposed BUDDY, a highly scalable model that precomputes sketches and node features
to solve scalability issues when the data exceeds GPU memory.

12
Sketches for Intersection Estimation
 Two sketching techniques: Given sets A and B.
 HyperLogLog efficiently estimates the cardinality of the union |A ∪ B|.
 MinHashing estimates the Jaccard index J(A, B) = |A ∩ B|/|A ∪ B|.
 combine these approaches to estimate the intersection of node sets
produced by graph traversals.
 These techniques represent sets as sketches.
 Each technique has a parameter p controlling the trade-off between the
accuracy and computational cost.
 The sketches of the union of sets are given by permutation-invariant
operations (element-wise min for minhash and element-wise max for
hyperloglog).
=> The main idea is consider a node feature based on both edges and
count triangles.

13
HyperLogLog
• HyperLogLog efficiently estimates the cardinality of
large sets.
• It accomplishes this by representing sets using a
constant size data sketch.
• These sketches can be combined in time that is
constant w.r.t the data size and linear in the sketch size
using elementwise maximum to estimate the size of a
set union.
• The algorithm finds the harmonic mean of 2𝑀[𝑚] for
each of m registers.
• This mean estimates the cardinality of the set divided
by m.

14
Minhashing
• The MinHash algorithm estimates the Jaccard index.
• It can similarly be expressed in three functions Initialize,
Union, and J.
• The algorithm stores the minimum value for each of the p
permutations of all hashed elements.
• The Jaccard estimate of the similarity of two sets is given by
the Hamming similarity of their sketches.

15
Analyzing Subgraph Methods for Link Prediction
 SGNNs can be decomposed into the following steps:
1. Subgraph extraction around every pair of nodes for which one desires to
perform LP;
2. Augmentation of the subgraph nodes with structure features;
3. Feature propagation over the subgraphs using a GNN, and
4. Learning a graph-level readout function to predict the link.

16
 Structure Features: to address limitations in GNN expressivity stemming
from the inherent inability of message passing to distinguish
automorphic nodes.
 Three most well known are Zero-One (ZO) encoding, Double Radius
Node Labeling (DRNL) and Distance Encoding (DE).
 Figure 3 shows that most of the predictive performance is concentrated
in low distances.
Structure Features

17
 Propagation / GNN: structure features are usually embedded
into a continuous space, concatenated to any node features
and propagated over subgraphs.
 Readout / Pooling Function: a readout function R(𝑆𝑢𝑣, 𝑌𝑢𝑣)
maps a representations to link probabilities.

18
Link Prediction with Subgraph Sketching
+ Let 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] be the number of (𝑑𝑢, 𝑑𝑣) labels for the link (u, v), which is equivalent to the number of
nodes at distances exactly 𝑑𝑢 𝑎𝑛𝑑 𝑑𝑣 from u and v respectively.
• Compute 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] for all 𝑑𝑢, 𝑑𝑣 less than the receptive field k, which guarantees a number of counts
that do not depend on the graph size and mitigates overfitting.
+ To alleviate the loss of information coming from a fixed k, compute
• Counting the number of nodes at distance d from u and at distance > k from v.
Structure Features Counts

19
• Approximate the intersection of neighborhood sets as:
Estimating Intersections and Cardinalities

20
• By augmenting the messages with subgraph sketches, it achieves higher expressiveness for the same
asymptotic complexity.
• Sketches computed by aggregating with min and max operators.
=> compute the intersection estimations up to the l-hop neighborhood as edge features.
=> modulate message transmission based on local graph structures, similarly to how attention is used to
modulate message transmission based on feature couplings.
• A link predictor:
Efficient Link Prediction with Hashes (ELPH)
learnable functions
MinHashing sketch
HyperLogLog sketch
a local permutation-
invariant aggregatio
n function
MLP

21
Problem solved
• Count intersection cardinality to distinguish automorphic
nodes.
• More expressive than MPNN.
=> Improve performance of link prediction.
Automorphic nodes

22
Scaling ELPH with Preprocessing (BUDDY)
• ELPH is efficient when the dataset fits into GPU memory. When it does not, the graph must be batched
into subgraphs.
• Preprocessing: make a fixed propagation of the node features almost recovers the performance of
learnable SGNN propagation.
+ Sketches can also be precomputed in a similar way:
+ Concatenate features diffused at different hops to obtain the input node features:
• Link Predictor:
• Time Complexity:

23
Experiments
• Subgraph statistics are generated by expanding k-
hop subgraphs around 1000 randomly selected
links.
• The size of subgraphs is highly irregular with high
standard deviations making efficient parallelization
in scalable architectures challenging.
Datasets

24
Experiments
• Either ELPH or BUDDY achieve the best performance
in five of the seven datasets.
• Being a full-graph method, ELPH runs out of
memory on the two largest datasets.
• There is no clear winner between ELPH and BUDDY
in terms of performance.
• BUDDY is orders of magnitude faster both in
training and inference.
Baseline comparisons

25
Conclusions
• Proposed a new model for LP which achieves better time and space complexity and superior predictive
performance on a range of standard benchmarks.
• The current work is limited to undirected graphs or directed graphs that are first preprocessed to make
them undirected as is common in GNN research.

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.

Semelhante a NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023. (20)

Mais de ssuser4b1f48

Mais de ssuser4b1f48 (20)

Último

Último (20)

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.