How can we apply machine learning techniques on graphs to obtain predictions in a variety of domains? Know more from Sami Abu-El-Haija, an AI Scientist with experience from both industry (Google Research) and academia (University of Southern California).
2. Agenda
â Background & Motivation
â [Breadth] ML Models on Graphs
â [Depth] Recent ML Models on Graphs
â MixHop (ICMLâ19)
â Watch Your Step (NeurIPSâ18)
â Fast Training
â GTTF (ICLRâ21)
â Fast GRL with unique optimal solutions (ICLRâ21 Workshop GTRL)
2
5. What is a graph?
5
Nodes: entities
Edges: relationships between entities
6. What is a graph?
6
Nodes: entities
Edges: relationships between entities
x
x x
x x x
x
x
x
x
y
y: labels
y
y
x
x
x: features
7. What is a graph?
7
nodes = people
edges = friendship
Social Network
y= engaging ads
x=[age, gender, .]
8. What is a graph?
8
News Articles
nodes = articles
edges = citations
y=article type
x=article text
9. What is a graph?
Chemical compounds can be viewed as a graph:
9
y=molecule properties
(per graph)
x=[H, F, C, O,
N,.]
10. Why ML on Graphs?
Motivation
10
Across domains, practitioners benefit from predictions on graphs.
Some Popular Tasks:
â Predict node labels (node classification)
â E.g., predict users engagement to ads (in a social network).
â Predict missing edges (link prediction / edge classification)
â E.g., predict which proteins interact with each other.
â Classify an entire graph
â E.g., predict physical properties of a chemical molecule represented as a graph
â Generate Graphs [e.g. with certain properties]
â E.g., can answer âGive me a chemical molecule with the following propertiesâ
11. High Level of Various Graph Algorithms
Fine! You have a graph.
You want to predict information on the graph. How to proceed?
Next: Identify the modeling technique!
â Option (Graph Embeddings): Place nodes onto an embedding space â throw the
graph away but keep embeddings.
â Option (Graph Regularization): Use graph as a regularizer. No graph is needed
after model training.
â Option (Graph Convolution â Message Passing). Representation of a node is a
function of its neighbors. Graph is needed for training and inference.
13. (undirected) Graph
Adjacency Matrix
Degree Matrix
Feature Matrix
Transition Matrix
Laplacian Matrix
Quiz: What does TX encode?
L gives relaxed estimates to NP-Hard Problems e.g. Graph Partitioning.
Its eigenbasis provide an a continuous axes on which nodes live
15. High Level of Various Graph Algorithms
15
â Option (Graph Embeddings)
â Option (Graph Regularization)
â Option (Graph Convolution â Message Passing)
16. Overview: Graph Embedding
v1 v2
v3 v5
v4
v6
v11
v9
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
Embed in Rd
Factorize A or L [1]
Auto Encode A [2]
Skipgram on E[walk] [3, B, D]
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013
[2] Wang et al, Structural Deep Network Embedding, KDDâ2016
[3] Perozzi et al, DeepWalk, KDDâ2014
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKMâ2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDDâ2018
16
17. Overview: Graph Embedding
v1 v2
v3 v5
v4
v6
v11
v9
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
Embed in Rd
Factorize A or L [1]
Auto Encode A [2]
Skipgram on E[walk] [3, B, D]
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013
[2] Wang et al, Structural Deep Network Embedding, KDDâ2016
[3] Perozzi et al, DeepWalk, KDDâ2014
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKMâ2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDDâ2018
17
Random Walk
v3 â v5 â v9 â v11 â v5 â
...
âŠ
...
Random Walk Sequences
word2vec algorithm
18. Review: Embedding via Random Walks
â Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
v3 â v5 â v9 â v11 â v5 â ...
Random Walk Sequences Embeddings Y
18
.v1
.v11 .v6
.v2
.v3
.v4
.v5
x
y
.v9
19. Review: Embedding via Random Walks
â Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
v3 â v5 â v9 â v11 â v5 â ...
Random Walk Sequences Embeddings Y
19
anchor
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
20. Review: Embedding via Random Walks
â Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
v3 â v5 â v9 â v11 â v5 â ...
Random Walk Sequences Embeddings Y
20
anchor
node
context
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
21. Review: Embedding via Random Walks
â Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
v3 â v5 â v9 â v11 â v5 â ...
Random Walk Sequences Embeddings Y
21
anchor
node
context
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
Stochastic
Update
22. High Level of Various Graph Algorithms
â Option (Graph Embeddings)
â Option (Graph Regularization)
â Option (Graph Convolution â Message Passing)
22
23. Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fÎ : X â Y
h11
h6
2
l2
minΠλ - y6 log h6 - y11 log h11
23
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLRâ2006.
[5] Bui et al, Neural Graph Machines, Arxivâ17
24. Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fÎ : X â Y
fÎ(x6)
2
l2
minÎ - y6 log fÎ(x6) - y11 log fÎ(x11)
fÎ(x11
)
λ
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLRâ2006.
[5] Bui et al, Neural Graph Machines, Arxivâ17
24
25. Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fÎ : X â Y
fÎ(x6)
2
l2
minÎ - y6 log fÎ(x6) - y11 log fÎ(x11)
fÎ(x11
)
λ
f(xi)
2
l2
minÎ Ai, j - yi log f(xi) - yj log f(xj)
f(xj)
Σi, j λ
Overall Objective:
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLRâ2006.
[5] Bui et al, Neural Graph Machines, Arxivâ17
25
26. High Level of Various Graph Algorithms
26
â Option (Graph Embeddings)
â Option (Graph Regularization)
â Option (Graph Convolution â Message Passing)
27. Overview: Message Passing
The first neural network on graph data (I am aware of)
[6] Scarselli et al. The graph neural network model. IEEE Transactions on Neural Networksâ2009
30. Watch Your Step (Node Embedding Method)
Watch Your Step learns the context distribution (while
learning the embeddings):
Shortcoming of DeepWalk (/ node2vec): they have a fixed Context Distribution
controlled by hyperparameter (C) context window size. Graphs prefer different C:
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
31. WatchYourStep (WYS): Derivation
Rather than factorizing:
31
or:
Into low-rank L x RT, with objective:
WYS Factorizes:
Additionally training Q âthe context distributionâ
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
32. WYS Results: Link Prediction
32
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
33. WYS Results: Node Classification & T-SNE plot
33
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
34. WYS Experiments: What does Q learn?
Different distribution for different graph!
34
Correspond to manual sweeping of node2vec:
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
37. Recall: Image Convolution
â State-of-the-art on image / video / speech.
â (segmentation, detection, classification, etc).
input
2D (Spatial) Convolutional Layer: Representing image as a regular grid
4D trainable filter
output
vectors
Message Passing
*
38. [H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxivâ2020
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
â There are multiple definitions we survey in [H]
â For now, we stick to the most popular [7] (=[61] above)
What is Graph Convolutions
39. GCN [7] for semi-supervised node classification
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
41. GCN [7] for semi-supervised node classification
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
42. GCN [7] for semi-supervised node classification
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
y2
y4
Some nodes are labeled
Task: Can we guess label of
unlabeled nodes?
43. GCN [7] for semi-supervised node classification
GC Layer 1
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
60. MixHop
MixHop GC Layer
đ Can incorporate distant nodes
đ Can mix neighbors across distances
i.e. can learn Gabor Filters!
62
[C] Abu-El-Haija et al, MixHop, ICML 2019
62. [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
63. Goal of GTTF
â Take any Graph Learning Algorithm.
â Re-write it using âGTTFâ functions (AccumulateFn and BiasFn)
â This makes the algorithm scalable to arbitrarily large graphs!
64. GTTF
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
65. [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
66. Graph Convolution on top of GTTF
Define the GTTF functions:
Run model on sampled (rooted) Adjacency:
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
67. Node Embeddings on top of GTTF
Define the accumulation function (No Bias Fn)
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
69. Algorithms on top of GTTF are scalable
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
70. GTTF: Scale Performance Experiments [G]
76
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
71. GTTF: Test Metrics Experiments
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
72. [J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, ICLRâ21 GTRL
73. What is SVD?
[J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal
Solutions, arxiv 2021
74. We open-source a Functional SVD for TensorFlow
https://github.com/samihaija/tf-fsvd. Useful if:
â You want to run SVD on a sparse matrix in TensorFlow (our code, out of the
box, provides a specialization of tf.linalg.svd onto sparse matrices)
â You want to run SVD on a dense matrix M (that is expensive to compute).
However, your matrix M is structured (e.g. geometric sum of sparse matrices),
such that, multiplying M by vectors is much cheaper than explicitly
constructing M.
75. SVD for Graph Learning
â SVD can be used as ML technique for graphs
â Steps:
â Linearize models.
â Make objective function convex.
â We show this next, for two popular techniques
81. [A] Abu-El-Haija et al, YouTube-8M: A Large-Scale Video Classification Benchmark, Arxivâ2016
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPSâ2018
[C] Abu-El-Haija,âŠ, Ver Steeg, Aram Galstyan, MixHop: Higher-Order Graph Convolution, ICMLâ2019.
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKMâ2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDDâ2018
[F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLRâ2021
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLRâ2021
[H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxivâ2020
[I] Abu-El-Haija et al, N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification, UAIâ2019
[J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, arxiv 2021
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computationâ2013
[2] Wang et al, Structural Deep Network Embedding, KDDâ2016
[3] Perozzi et al, DeepWalk, KDDâ2014
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLRâ2006.
[5] Bui et al, Neural Graph Machines, Arxivâ17
[6] Scarselli et al, The graph neural network model, IEEE Transactions on Neural Networksâ2009
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLRâ2017
[8] Daugman, Two-dimensional spectral analysis of cortical receptive field profiles, Vision Researchâ1980
[9] Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by ..., JOSAâ1985
[10] Honglak Lee et al, ICMLâ2009
[11] Alex Krizhevsky et al, NeurIPSâ2012
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPRâ2018
References
83. We add group L2-Lasso
Regularization to drop-out columns
feature matrices, similar to [12]
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPRâ2018
[images are rotated for space]
89
MixHop Sparsification
84. MixHop Sparsification
We add group L2-Lasso
Regularization to drop-out columns
feature matrices, similar to [12]
2nd layer of Cora drops-out zeroth-
power completely.
[images are rotated for space]
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPRâ2018
90
87. MixHop Results on (Synthetic) Homophily Datasets
With less homophily, our
performance gap increases
88. MixHop Results on (Synthetic) Homophily Datasets
With less homophily, our
performance gap increases
With less homophily, our method
learns more feature differences
(i.e. Gabor-like Filters)
90. [F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLRâ2021
Ad: Message Passing for Zero-Shot Synthesis
Graph of semantic similarity
between training samples
We can develop an auto-enocder with a
disentangled feature space.
If two samples share one attribute value (per
graph edge), they need to prove it:
96
Notas do Editor
Data structure that can represent entities and their relationships.
Data structure that can represent entities and their relationships.
Many random walks == Many (long) Sequences
Current embedding
Sample context node, within distance from anchor node.
Sample context node, within distance from anchor node.