SlideShare uma empresa Scribd logo
1 de 27
Nguyen Thanh Sang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: sang.ngt99@gmail.com
23/05/2023
1
 Introduction
• Link Prediction
• Graph Isomorphic and Automorphic
• Subgraphs
 Methods
• Subgraph methods for link prediction
• Subgraph Sketching
• Scaling with preprocessing
 Evaluations
• Results
 Conclusions
2
Graphs
Graphs (Networks) are complex.
Several applications of Graph mining:
• Link prediction: predict whether there are missing
links between two nodes
• Ex: Knowledge graph completion
• Node classification: predict a property of a node
• Ex: Categorize online users / items
• Graph classification: categorize different graphs
• Ex: Molecule property prediction
• Clustering: detect if nodes form a community
• Ex: Social circle detection
• Other tasks:
• Graph generation: drug discovery
• Graph evolution: physical simulation
3
Link Prediction
Link Prediction (LP) is an important problem in graph ML with many industrial applications
+ For example, recommender systems can be
formulated as LP;
+ Link prediction is also a key process in drug
discovery and knowledge graph construction.
4
Link Prediction
 There are three main classes of LP methods:
o Heuristics: estimate the distance between two nodes (e.g. personalized page rank (PPR) or graph
distance) or the similarity of their neighborhoods (e.g Common Neighbors (CN), Adamic-Adar
(AA), or Resource Allocation (RA));
o Unsupervised node embeddings or factorization methods: encompass the majority of
production.
o Graph Neural Networks, in particular of the Message-Passing type (MPNNs).
5
Graph Isomorphic and Graph Automorphic
Two graphs are also called isomorphic whenever there exists an isomorphism between the two.
+ An automorphism of a graph is a graph isomorphism
with itself, i.e., a mapping from the vertices of the given
graph G back to vertices of G such that the resulting graph
is isomorphic with G.
6
Subgraphs
+ The state-of-the-art methods for LP restrict computation to subgraphs enclosing a link,
transforming link prediction into binary subgraph classification.
+ Subgraph GNNs (SGNN) are inspired by the strong performance of LP heuristics compared
to more sophisticated techniques and are motivated as an attempt to learn data-driven LP
heuristics.
7
Problems
 MPNNs tend to be poor performance in link prediction:
+ Standard MPNNs are incapable of counting triangles and
consequently of counting Common Neighbors or computing one-hop
or two-hop LP heuristics.
+ GNN-based LP approaches combine permutation-equivariant
structural node representations and a readout function that maps from
two node representations to a link probability.
 All nodes u in the same orbit induced by the graph automorphism
group have equal representations.
 Cannot distinguish nodes in the graph.
 Some existing subgraph-based methods solve this problem by
sorting the nodes but the extraction of subgraph is complicated and
not easy to parallelize.
 SGNN shows a strong performance in LP.
8
Problems
 Some methods use feature count triangles:
 Every edge has different node features.
 No automorphic.
 Complicated and lack of scalability.
9
Problems
 SGNNs suffer from some serious limitations:
1. Constructing the subgraphs is expensive;
2. Subgraphs are irregular and so batching them is inefficient on GPUs;
3. Each step of inference is almost as expensive as each training step because subgraphs must
be constructed for every test link.
10
Problems
 SEAL generates a subgraph around a link.
 Must generated for every link.
 Labels must be generated for every subgraph.
 Difficult to use for large scale.
11
Contributions
• Analyze the SGNN components and reveal which properties of the subgraphs are salient
to the LP problem.
 Develop an MPNN (ELPH) that passes subgraph sketches as messages.
• Using the sketches which allow the most important qualities of the subgraphs to be
summarized in the nodes.
• The resulting model removes the need for explicit subgraph construction and is a full-
graph MPNN with the similar complexity to GCN.
• ELPH is strictly more expressive than MPNNs for LP => solve automorphic node
problem.
• Proposed BUDDY, a highly scalable model that precomputes sketches and node features
to solve scalability issues when the data exceeds GPU memory.
12
Sketches for Intersection Estimation
 Two sketching techniques: Given sets A and B.
 HyperLogLog efficiently estimates the cardinality of the union |A ∪ B|.
 MinHashing estimates the Jaccard index J(A, B) = |A ∩ B|/|A ∪ B|.
 combine these approaches to estimate the intersection of node sets
produced by graph traversals.
 These techniques represent sets as sketches.
 Each technique has a parameter p controlling the trade-off between the
accuracy and computational cost.
 The sketches of the union of sets are given by permutation-invariant
operations (element-wise min for minhash and element-wise max for
hyperloglog).
=> The main idea is consider a node feature based on both edges and
count triangles.
13
HyperLogLog
• HyperLogLog efficiently estimates the cardinality of
large sets.
• It accomplishes this by representing sets using a
constant size data sketch.
• These sketches can be combined in time that is
constant w.r.t the data size and linear in the sketch size
using elementwise maximum to estimate the size of a
set union.
• The algorithm finds the harmonic mean of 2𝑀[𝑚] for
each of m registers.
• This mean estimates the cardinality of the set divided
by m.
14
Minhashing
• The MinHash algorithm estimates the Jaccard index.
• It can similarly be expressed in three functions Initialize,
Union, and J.
• The algorithm stores the minimum value for each of the p
permutations of all hashed elements.
• The Jaccard estimate of the similarity of two sets is given by
the Hamming similarity of their sketches.
15
Analyzing Subgraph Methods for Link Prediction
 SGNNs can be decomposed into the following steps:
1. Subgraph extraction around every pair of nodes for which one desires to
perform LP;
2. Augmentation of the subgraph nodes with structure features;
3. Feature propagation over the subgraphs using a GNN, and
4. Learning a graph-level readout function to predict the link.
16
Analyzing Subgraph Methods for Link Prediction
 Structure Features: to address limitations in GNN expressivity stemming
from the inherent inability of message passing to distinguish
automorphic nodes.
 Three most well known are Zero-One (ZO) encoding, Double Radius
Node Labeling (DRNL) and Distance Encoding (DE).
 Figure 3 shows that most of the predictive performance is concentrated
in low distances.
Structure Features
17
Analyzing Subgraph Methods for Link Prediction
 Propagation / GNN: structure features are usually embedded
into a continuous space, concatenated to any node features
and propagated over subgraphs.
 Readout / Pooling Function: a readout function R(𝑆𝑢𝑣, 𝑌𝑢𝑣)
maps a representations to link probabilities.
18
Link Prediction with Subgraph Sketching
+ Let 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] be the number of (𝑑𝑢, 𝑑𝑣) labels for the link (u, v), which is equivalent to the number of
nodes at distances exactly 𝑑𝑢 𝑎𝑛𝑑 𝑑𝑣 from u and v respectively.
• Compute 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] for all 𝑑𝑢, 𝑑𝑣 less than the receptive field k, which guarantees a number of counts
that do not depend on the graph size and mitigates overfitting.
+ To alleviate the loss of information coming from a fixed k, compute
• Counting the number of nodes at distance d from u and at distance > k from v.
Structure Features Counts
19
Link Prediction with Subgraph Sketching
• Approximate the intersection of neighborhood sets as:
Estimating Intersections and Cardinalities
20
Link Prediction with Subgraph Sketching
• By augmenting the messages with subgraph sketches, it achieves higher expressiveness for the same
asymptotic complexity.
• Sketches computed by aggregating with min and max operators.
=> compute the intersection estimations up to the l-hop neighborhood as edge features.
=> modulate message transmission based on local graph structures, similarly to how attention is used to
modulate message transmission based on feature couplings.
• A link predictor:
Efficient Link Prediction with Hashes (ELPH)
learnable functions
MinHashing sketch
HyperLogLog sketch
a local permutation-
invariant aggregatio
n function
MLP
21
Problem solved
• Count intersection cardinality to distinguish automorphic
nodes.
• More expressive than MPNN.
=> Improve performance of link prediction.
Automorphic nodes
22
Scaling ELPH with Preprocessing (BUDDY)
• ELPH is efficient when the dataset fits into GPU memory. When it does not, the graph must be batched
into subgraphs.
• Preprocessing: make a fixed propagation of the node features almost recovers the performance of
learnable SGNN propagation.
+ Sketches can also be precomputed in a similar way:
+ Concatenate features diffused at different hops to obtain the input node features:
• Link Predictor:
• Time Complexity:
23
Experiments
• Subgraph statistics are generated by expanding k-
hop subgraphs around 1000 randomly selected
links.
• The size of subgraphs is highly irregular with high
standard deviations making efficient parallelization
in scalable architectures challenging.
Datasets
24
Experiments
• Either ELPH or BUDDY achieve the best performance
in five of the seven datasets.
• Being a full-graph method, ELPH runs out of
memory on the two largest datasets.
• There is no clear winner between ELPH and BUDDY
in terms of performance.
• BUDDY is orders of magnitude faster both in
training and inference.
Baseline comparisons
25
Conclusions
• Proposed a new model for LP which achieves better time and space complexity and superior predictive
performance on a range of standard benchmarks.
• The current work is limited to undirected graphs or directed graphs that are first preprocessed to make
them undirected as is common in GNN research.
26

Mais conteúdo relacionado

Semelhante a NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.

Semelhante a NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023. (20)

NS-CUK Seminar: S.T.Nguyen Review on "Accurate learning of graph representati...
NS-CUK Seminar: S.T.Nguyen Review on "Accurate learning of graph representati...NS-CUK Seminar: S.T.Nguyen Review on "Accurate learning of graph representati...
NS-CUK Seminar: S.T.Nguyen Review on "Accurate learning of graph representati...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
 
Southwick anguiano lmu-symposium_presentation_20140329
Southwick anguiano lmu-symposium_presentation_20140329Southwick anguiano lmu-symposium_presentation_20140329
Southwick anguiano lmu-symposium_presentation_20140329
 
Colloquium.pptx
Colloquium.pptxColloquium.pptx
Colloquium.pptx
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
 
NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...
NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...
NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
A Generalization of Transformer Networks to Graphs.pptx
A Generalization of Transformer Networks to Graphs.pptxA Generalization of Transformer Networks to Graphs.pptx
A Generalization of Transformer Networks to Graphs.pptx
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
 
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
NS - CUK Seminar: S.T.Nguyen, Review on "Hypergraph Neural Networks", AAAI 2019
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
 
node2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxnode2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptx
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 
NS-CUK Seminar: S.T.Nguyen, Review on "Make Heterophily Graphs Better Fit GNN...
NS-CUK Seminar: S.T.Nguyen, Review on "Make Heterophily Graphs Better Fit GNN...NS-CUK Seminar: S.T.Nguyen, Review on "Make Heterophily Graphs Better Fit GNN...
NS-CUK Seminar: S.T.Nguyen, Review on "Make Heterophily Graphs Better Fit GNN...
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
Southwick britain gr_nsight_cmsi402-presentation_20140508
Southwick britain gr_nsight_cmsi402-presentation_20140508Southwick britain gr_nsight_cmsi402-presentation_20140508
Southwick britain gr_nsight_cmsi402-presentation_20140508
 

Mais de ssuser4b1f48

Mais de ssuser4b1f48 (20)

NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
 
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
 
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
 
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
 
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
 
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
 
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
 
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for Link Prediction with Subgraph Sketching", ICLR 2023.

  • 1. Nguyen Thanh Sang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: sang.ngt99@gmail.com 23/05/2023
  • 2. 1  Introduction • Link Prediction • Graph Isomorphic and Automorphic • Subgraphs  Methods • Subgraph methods for link prediction • Subgraph Sketching • Scaling with preprocessing  Evaluations • Results  Conclusions
  • 3. 2 Graphs Graphs (Networks) are complex. Several applications of Graph mining: • Link prediction: predict whether there are missing links between two nodes • Ex: Knowledge graph completion • Node classification: predict a property of a node • Ex: Categorize online users / items • Graph classification: categorize different graphs • Ex: Molecule property prediction • Clustering: detect if nodes form a community • Ex: Social circle detection • Other tasks: • Graph generation: drug discovery • Graph evolution: physical simulation
  • 4. 3 Link Prediction Link Prediction (LP) is an important problem in graph ML with many industrial applications + For example, recommender systems can be formulated as LP; + Link prediction is also a key process in drug discovery and knowledge graph construction.
  • 5. 4 Link Prediction  There are three main classes of LP methods: o Heuristics: estimate the distance between two nodes (e.g. personalized page rank (PPR) or graph distance) or the similarity of their neighborhoods (e.g Common Neighbors (CN), Adamic-Adar (AA), or Resource Allocation (RA)); o Unsupervised node embeddings or factorization methods: encompass the majority of production. o Graph Neural Networks, in particular of the Message-Passing type (MPNNs).
  • 6. 5 Graph Isomorphic and Graph Automorphic Two graphs are also called isomorphic whenever there exists an isomorphism between the two. + An automorphism of a graph is a graph isomorphism with itself, i.e., a mapping from the vertices of the given graph G back to vertices of G such that the resulting graph is isomorphic with G.
  • 7. 6 Subgraphs + The state-of-the-art methods for LP restrict computation to subgraphs enclosing a link, transforming link prediction into binary subgraph classification. + Subgraph GNNs (SGNN) are inspired by the strong performance of LP heuristics compared to more sophisticated techniques and are motivated as an attempt to learn data-driven LP heuristics.
  • 8. 7 Problems  MPNNs tend to be poor performance in link prediction: + Standard MPNNs are incapable of counting triangles and consequently of counting Common Neighbors or computing one-hop or two-hop LP heuristics. + GNN-based LP approaches combine permutation-equivariant structural node representations and a readout function that maps from two node representations to a link probability.  All nodes u in the same orbit induced by the graph automorphism group have equal representations.  Cannot distinguish nodes in the graph.  Some existing subgraph-based methods solve this problem by sorting the nodes but the extraction of subgraph is complicated and not easy to parallelize.  SGNN shows a strong performance in LP.
  • 9. 8 Problems  Some methods use feature count triangles:  Every edge has different node features.  No automorphic.  Complicated and lack of scalability.
  • 10. 9 Problems  SGNNs suffer from some serious limitations: 1. Constructing the subgraphs is expensive; 2. Subgraphs are irregular and so batching them is inefficient on GPUs; 3. Each step of inference is almost as expensive as each training step because subgraphs must be constructed for every test link.
  • 11. 10 Problems  SEAL generates a subgraph around a link.  Must generated for every link.  Labels must be generated for every subgraph.  Difficult to use for large scale.
  • 12. 11 Contributions • Analyze the SGNN components and reveal which properties of the subgraphs are salient to the LP problem.  Develop an MPNN (ELPH) that passes subgraph sketches as messages. • Using the sketches which allow the most important qualities of the subgraphs to be summarized in the nodes. • The resulting model removes the need for explicit subgraph construction and is a full- graph MPNN with the similar complexity to GCN. • ELPH is strictly more expressive than MPNNs for LP => solve automorphic node problem. • Proposed BUDDY, a highly scalable model that precomputes sketches and node features to solve scalability issues when the data exceeds GPU memory.
  • 13. 12 Sketches for Intersection Estimation  Two sketching techniques: Given sets A and B.  HyperLogLog efficiently estimates the cardinality of the union |A ∪ B|.  MinHashing estimates the Jaccard index J(A, B) = |A ∩ B|/|A ∪ B|.  combine these approaches to estimate the intersection of node sets produced by graph traversals.  These techniques represent sets as sketches.  Each technique has a parameter p controlling the trade-off between the accuracy and computational cost.  The sketches of the union of sets are given by permutation-invariant operations (element-wise min for minhash and element-wise max for hyperloglog). => The main idea is consider a node feature based on both edges and count triangles.
  • 14. 13 HyperLogLog • HyperLogLog efficiently estimates the cardinality of large sets. • It accomplishes this by representing sets using a constant size data sketch. • These sketches can be combined in time that is constant w.r.t the data size and linear in the sketch size using elementwise maximum to estimate the size of a set union. • The algorithm finds the harmonic mean of 2𝑀[𝑚] for each of m registers. • This mean estimates the cardinality of the set divided by m.
  • 15. 14 Minhashing • The MinHash algorithm estimates the Jaccard index. • It can similarly be expressed in three functions Initialize, Union, and J. • The algorithm stores the minimum value for each of the p permutations of all hashed elements. • The Jaccard estimate of the similarity of two sets is given by the Hamming similarity of their sketches.
  • 16. 15 Analyzing Subgraph Methods for Link Prediction  SGNNs can be decomposed into the following steps: 1. Subgraph extraction around every pair of nodes for which one desires to perform LP; 2. Augmentation of the subgraph nodes with structure features; 3. Feature propagation over the subgraphs using a GNN, and 4. Learning a graph-level readout function to predict the link.
  • 17. 16 Analyzing Subgraph Methods for Link Prediction  Structure Features: to address limitations in GNN expressivity stemming from the inherent inability of message passing to distinguish automorphic nodes.  Three most well known are Zero-One (ZO) encoding, Double Radius Node Labeling (DRNL) and Distance Encoding (DE).  Figure 3 shows that most of the predictive performance is concentrated in low distances. Structure Features
  • 18. 17 Analyzing Subgraph Methods for Link Prediction  Propagation / GNN: structure features are usually embedded into a continuous space, concatenated to any node features and propagated over subgraphs.  Readout / Pooling Function: a readout function R(𝑆𝑢𝑣, 𝑌𝑢𝑣) maps a representations to link probabilities.
  • 19. 18 Link Prediction with Subgraph Sketching + Let 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] be the number of (𝑑𝑢, 𝑑𝑣) labels for the link (u, v), which is equivalent to the number of nodes at distances exactly 𝑑𝑢 𝑎𝑛𝑑 𝑑𝑣 from u and v respectively. • Compute 𝐴𝑢𝑣[𝑑𝑢, 𝑑𝑣] for all 𝑑𝑢, 𝑑𝑣 less than the receptive field k, which guarantees a number of counts that do not depend on the graph size and mitigates overfitting. + To alleviate the loss of information coming from a fixed k, compute • Counting the number of nodes at distance d from u and at distance > k from v. Structure Features Counts
  • 20. 19 Link Prediction with Subgraph Sketching • Approximate the intersection of neighborhood sets as: Estimating Intersections and Cardinalities
  • 21. 20 Link Prediction with Subgraph Sketching • By augmenting the messages with subgraph sketches, it achieves higher expressiveness for the same asymptotic complexity. • Sketches computed by aggregating with min and max operators. => compute the intersection estimations up to the l-hop neighborhood as edge features. => modulate message transmission based on local graph structures, similarly to how attention is used to modulate message transmission based on feature couplings. • A link predictor: Efficient Link Prediction with Hashes (ELPH) learnable functions MinHashing sketch HyperLogLog sketch a local permutation- invariant aggregatio n function MLP
  • 22. 21 Problem solved • Count intersection cardinality to distinguish automorphic nodes. • More expressive than MPNN. => Improve performance of link prediction. Automorphic nodes
  • 23. 22 Scaling ELPH with Preprocessing (BUDDY) • ELPH is efficient when the dataset fits into GPU memory. When it does not, the graph must be batched into subgraphs. • Preprocessing: make a fixed propagation of the node features almost recovers the performance of learnable SGNN propagation. + Sketches can also be precomputed in a similar way: + Concatenate features diffused at different hops to obtain the input node features: • Link Predictor: • Time Complexity:
  • 24. 23 Experiments • Subgraph statistics are generated by expanding k- hop subgraphs around 1000 randomly selected links. • The size of subgraphs is highly irregular with high standard deviations making efficient parallelization in scalable architectures challenging. Datasets
  • 25. 24 Experiments • Either ELPH or BUDDY achieve the best performance in five of the seven datasets. • Being a full-graph method, ELPH runs out of memory on the two largest datasets. • There is no clear winner between ELPH and BUDDY in terms of performance. • BUDDY is orders of magnitude faster both in training and inference. Baseline comparisons
  • 26. 25 Conclusions • Proposed a new model for LP which achieves better time and space complexity and superior predictive performance on a range of standard benchmarks. • The current work is limited to undirected graphs or directed graphs that are first preprocessed to make them undirected as is common in GNN research.
  • 27. 26