Why Teams call analytics are critical to your entire business
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting", IEEE 2020
1. LAB SEMINAR
Nguyen Thanh Sang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: sang.ngt99@gmail.com
Improving Graph Neural Network Expressivity via
Subgraph Isomorphism Counting
--- Giorgos Bouritsas , Fabrizio Frasca, Stefanos Zafeiriou, and Michael M.
Bronstein ---
2023-06-01
3. 2
Introduction
Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications.
GNNs use an aggregation function to update the vector representation of each node by transforming and
aggregating the vector representations of its neighbours.
4. 3
Graph Isomorphism
+ Two graphs are also called isomorphic whenever there exists an isomorphism between the two.
+ In graph theory, an isomorphism of graphs 𝐺 and 𝐻
• A bijection between the vertex sets of 𝐺 and H: 𝐹:𝑉(𝐺)→𝑉(𝐻)
• Any two vertices 𝑢 and 𝑣 of 𝐺 are adjacent in 𝐺 if and only if 𝑓(𝑢) and 𝑓(𝑣) are adjacent in 𝐻
5. 4
Graph Automorphism
+ A bijection mapping onto itself
• When 𝐺 and 𝐻 are one and the same graph
• A form of symmetry
+ Problem
• Testing whether a graph has a nontrivial automorphism
=> Computational complexity
• Constructing the automorphism group
=> Orbit
6. 5
Problems
❖ The Weisfeiler-Lehman test: representative test for isomorphism
• Low computational complexity
• Good for all graphs
Limit in some case
• Not apply in real world data.
Arbitrarily initialized for test.
Initial 1st iteration 2nd iteration 3rd iteration
7. 6
Problems
• Since message-passing GNNs are at most as powerful as the Weisfeiler Leman test (WL), they
are limited in their abilities to adequately exploit the graph structure, e.g. by counting
substructures.
important in the study of complex networks.
How to go beyond isotropic, i.e., locally symmetric, aggregation functions?
How to ensure e structural characteristics of the graph?
How to achieve the above two without sacrificing invariance to isomorphism?
8. 7
Contributions
• Break local symmetries by introducing structural information in the aggregation function.
• Each neighbour (message) is transformed differently depending on its structural relationship with
the central node.
counting the appearance of certain substructures.
• Graph Substructure Network (GSN) is strictly more expressive than traditional GNNs for the vast
majority of substructures, while retaining the locality of message passing, as opposed to higher-
order methods.
• When choosing the structural inductive biases based on domainspecific knowledge, GSN
achieves state-of-the-art results
9. 8
Structural Features
+ Features encoded from structural roles by counting the appearance
of certain substructures.
+ Step 1: A set of small (connected) graphs 𝐻 ∈ ℋ, ℋ = 𝐻1, 𝐻2, ⋯ , 𝐻𝐾,
e.g., cycles, paths, cliques, or trees
- Find its(each graph 𝐻 ∈ ℋ) isomorphic subgraphs in 𝐺 denoted 𝐺𝑆
- For each node 𝑣 ∈ 𝑉𝐺𝑆
, infer its role w.r.t 𝐻 by obtaining the orbit of its mapping 𝑓 𝑣 in 𝐻, Orb𝐻 𝑓 𝑣
+ Step 2: the 𝑣𝑒𝑟𝑡𝑒𝑥 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝐱𝐻
𝑉
𝑣 of 𝑣 by counting all the possible appearances of different orbits in 𝑣:
- For all 𝑖 ∈ 1, 2, ⋯ , 𝑑𝐻 : 𝐱𝐻
𝑉
𝑣 = 𝐺𝑆 ≃ 𝐻 𝑣 ∈ 𝑉𝐺𝑆
, 𝑓 𝑣 ∈ 𝑂𝐻,𝑖
𝑉
the number of elements in the set of nodes used in the orbit that make up a specific isomorphic mapping
𝑓: functions can map a subgraph 𝐺𝑆 to 𝐻
can be used to determine the orbit mapping of each node 𝑣
- Feature vector: 𝐱𝑣
𝑉
= 𝐱𝐻,1
𝑉
𝑣 , 𝐱𝐻,2
𝑉
𝑣 , ⋯ , 𝐱𝐻,𝐾
𝑉
𝑣
- The 𝑒𝑑𝑔𝑒 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝐱𝐻
𝐸
𝑢, 𝑣 of 𝑢, 𝑣 : 𝐱𝐻
𝐸
𝑢, 𝑣 = 𝐺𝑆 ≃ 𝐻 𝑢, 𝑣 ∈ ℰ𝐺𝑆
, 𝑓 𝑢 , 𝑓 𝑣 ∈ 𝑂𝐻,𝑖
𝐸
𝐱𝑢,𝑣
𝐸
= 𝐱𝐻,1
𝐸
𝑢, 𝑣 , 𝐱𝐻,2
𝐸
𝑢, 𝑣 , ⋯ , 𝐱𝐻,𝐾
𝐸
𝑢, 𝑣
10. 9
Structure-aware Message Passing
The substructure layer as a Message Passing Neural Network:
[Message Info.] + [Structural Roles Info.]
𝐡𝑡+1 = UP𝑡+1 𝐡𝑣
𝑡 , 𝐦𝑣
𝑡+1
UP𝑡+1 : an arbitrary function approximator (e.g., an MLP)
𝑀𝑡+1
: the neighborhood aggregation function
An arbitrary function on multisets
𝐞𝑢,𝑣: the edge features
the vertex structural identifiers
the edge structural identifiers
11. 10
Power of GSNs
+ GSN > MPNN: MPNN-based architecture
+ GSN > 1-WL: Considering possible all orbits
+ Open Problem: the fixed subgraph has not been
defined yet
Rook's 4x4 graph Shrikhande graph
(4-clique) (triangle)
2-FWL fails
12. 11
Experiments
Settings
+ Baseline: MPNN with MLP
+ Substructure families: Cycles, paths, trees and cliques
+ Substructure size: k
+ Datasets: Synthetic, TUD, ZINC, and OGB-MOLHIV
13. 12
Synthetic Graph Isomorphism Test
+ Dataset: a collection of Strongly Regular graphs of size up to 35 nodes
Isomorphic decision
+ The Euclidean distance of their representations is smaller than a predefined threshold 𝜖.
+ The number of failure cases of GSN decreases rapidly as we increase k; cycles and
paths of maximum length k = 6.
14. 13
TUD Graph Classification
• Dataset: Bioinformatics, Social networks
• Comparison: GNN, Graph Kernels with 10-fold cv
• Base architecture: GIN
• Best performing substructures both for GSN-e and GSN-v
=> The proposed model obtains SOTA performance in most of the datasets, with a considerable margin
against the main GNN baselines in some cases.
15. 14
ZINC Molecular Graphs
• Dataset
+ Commercially-available compounds for
virtual screening
+ John J. Irwin et al.
+ Graph regression (mainly)
• Task
+ k-cycle counting
+ Molecule: 10k / 2k
+ Regression (MAE)
=> GSN achieves state-of-the-art results
outperforming all the baseline architectures.
16. 15
OGB-MOLHIV
• GSN seamlessly improves the performance of the base architecture
• Cyclical substructures are a good inductive bias when learning on molecules, confirming our
results on the ZINC dataset, while the same holds for triangles in PPA networks. Tasks defined on
graphs with community structure correlate with the presence of triangles (or cliques), as was the
case for social networks in the TU Datasets experiments.
• General purpose GNNs benefit from symmetry breaking mechanisms, either in the form of
eigenvectors (DGN) or in the form of substructures.
17. 16
Ablation Studies
• The test error is not guaranteed to decrease when the identifiers become more discriminative.
• This method fails to improve the baseline architecture in terms of the performance in the test set.
unique identifiers can be hard to generalise when chosen in a non-permutation equivariant way and
motivates once more the importance of choosing the identifiers not only based on their
discriminative power, but also in a way that allows incorporating the appropriate inductive biases.
• GSN manages to generalize much better even with a small fraction of the training dataset.
18. 17
Conclusions
• A novel way to design structure aware graph neural networks. Motivated by the limitations of traditional
GNNs to capture important topological properties of the graph.
• A message passing scheme enhanced with structural features that are extracted by subgraph
isomorphism.
• For some types of substructures such as paths and cycles the counting can be done with significantly
lower complexity.
• The computationally expensive step is done only once as preprocessing and thus does not affect
network training and inference that remain linear, the same way as in message-passing neural
networks. The memory complexity in training and inference is linear as well
• Most importantly, the expressive power of GSN is different from k-WL tests and in some cases is
stronger