SlideShare uma empresa Scribd logo
1 de 90
Machine Learning on
Graph-Structured Data
Sami Abu-El-Haija to JOSA
<firstName>@haija.org
Agenda
● Background & Motivation
● [Breadth] ML Models on Graphs
● [Depth] Recent ML Models on Graphs
○ MixHop (ICML’19)
○ Watch Your Step (NeurIPS’18)
● Fast Training
○ GTTF (ICLR’21)
○ Fast GRL with unique optimal solutions (ICLR’21 Workshop GTRL)
2
Background & Motivation
3
Motivation
Why Graphs?
● Graphs are a natural way to encode data from various domains.
4
What is a graph?
5
Nodes: entities
Edges: relationships between entities
What is a graph?
6
Nodes: entities
Edges: relationships between entities
x
x x
x x x
x
x
x
x
y
y: labels
y
y
x
x
x: features
What is a graph?
7
nodes = people
edges = friendship
Social Network
y= engaging ads
x=[age, gender, .]
What is a graph?
8
News Articles
nodes = articles
edges = citations
y=article type
x=article text
What is a graph?
Chemical compounds can be viewed as a graph:
9
y=molecule properties
(per graph)
x=[H, F, C, O,
N,.]
Why ML on Graphs?
Motivation
10
Across domains, practitioners benefit from predictions on graphs.
Some Popular Tasks:
● Predict node labels (node classification)
○ E.g., predict users engagement to ads (in a social network).
● Predict missing edges (link prediction / edge classification)
○ E.g., predict which proteins interact with each other.
● Classify an entire graph
○ E.g., predict physical properties of a chemical molecule represented as a graph
● Generate Graphs [e.g. with certain properties]
○ E.g., can answer “Give me a chemical molecule with the following properties”
High Level of Various Graph Algorithms
Fine! You have a graph.
You want to predict information on the graph. How to proceed?
Next: Identify the modeling technique!
● Option (Graph Embeddings): Place nodes onto an embedding space → throw the
graph away but keep embeddings.
● Option (Graph Regularization): Use graph as a regularizer. No graph is needed
after model training.
● Option (Graph Convolution ⊂ Message Passing). Representation of a node is a
function of its neighbors. Graph is needed for training and inference.
[Math Refresher]
Graph Matrices
12
(undirected) Graph
Adjacency Matrix
Degree Matrix
Feature Matrix
Transition Matrix
Laplacian Matrix
Quiz: What does TX encode?
L gives relaxed estimates to NP-Hard Problems e.g. Graph Partitioning.
Its eigenbasis provide an a continuous axes on which nodes live
[Breadth]
ML on Graphs
14
High Level of Various Graph Algorithms
15
● Option (Graph Embeddings)
● Option (Graph Regularization)
● Option (Graph Convolution ⊂ Message Passing)
Overview: Graph Embedding
v1 v2
v3 v5
v4
v6
v11
v9
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
Embed in Rd
Factorize A or L [1]
Auto Encode A [2]
Skipgram on E[walk] [3, B, D]
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013
[2] Wang et al, Structural Deep Network Embedding, KDD’2016
[3] Perozzi et al, DeepWalk, KDD’2014
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018
16
Overview: Graph Embedding
v1 v2
v3 v5
v4
v6
v11
v9
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
Embed in Rd
Factorize A or L [1]
Auto Encode A [2]
Skipgram on E[walk] [3, B, D]
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013
[2] Wang et al, Structural Deep Network Embedding, KDD’2016
[3] Perozzi et al, DeepWalk, KDD’2014
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018
17
Random Walk
v3 → v5 → v9 → v11 → v5 →
...


...
Random Walk Sequences
word2vec algorithm
Review: Embedding via Random Walks
● Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
v3 → v5 → v9 → v11 → v5 → ...
Random Walk Sequences Embeddings Y
18
.v1
.v11 .v6
.v2
.v3
.v4
.v5
x
y
.v9
Review: Embedding via Random Walks
● Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
v3 → v5 → v9 → v11 → v5 → ...
Random Walk Sequences Embeddings Y
19
anchor
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
Review: Embedding via Random Walks
● Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
v3 → v5 → v9 → v11 → v5 → ...
Random Walk Sequences Embeddings Y
20
anchor
node
context
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
Review: Embedding via Random Walks
● Word2vec learns word embeddings by stochastically moving
embedding of an anchor node closer to a neighboring context
node.
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
v3 → v5 → v9 → v11 → v5 → ...
Random Walk Sequences Embeddings Y
21
anchor
node
context
node
.v1
.v11 .v6
.v2
.v3
.v4
.v5
.v9
x
y
Stochastic
Update
High Level of Various Graph Algorithms
● Option (Graph Embeddings)
● Option (Graph Regularization)
● Option (Graph Convolution ⊂ Message Passing)
22
Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fΘ : X → Y
h11
h6
2
l2
minΘ λ - y6 log h6 - y11 log h11
23
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006.
[5] Bui et al, Neural Graph Machines, Arxiv’17
Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fΘ : X → Y
fΘ(x6)
2
l2
minΘ - y6 log fΘ(x6) - y11 log fΘ(x11)
fΘ(x11
)
λ
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006.
[5] Bui et al, Neural Graph Machines, Arxiv’17
24
Overview: Graph Regularization
v1 v2
v3 v5
v4
v6
v11
v9
x11
x6
fΘ : X → Y
fΘ(x6)
2
l2
minΘ - y6 log fΘ(x6) - y11 log fΘ(x11)
fΘ(x11
)
λ
f(xi)
2
l2
minΘ Ai, j - yi log f(xi) - yj log f(xj)
f(xj)
Σi, j λ
Overall Objective:
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006.
[5] Bui et al, Neural Graph Machines, Arxiv’17
25
High Level of Various Graph Algorithms
26
● Option (Graph Embeddings)
● Option (Graph Regularization)
● Option (Graph Convolution ⊂ Message Passing)
Overview: Message Passing
The first neural network on graph data (I am aware of)
[6] Scarselli et al. The graph neural network model. IEEE Transactions on Neural Networks’2009
[Depth]
ML on Graphs - Models for:
Node Embedding
& Message Passing
28
Node Embedding: Watch
Your Step (NeurIPS’18)
29
Watch Your Step (Node Embedding Method)
Watch Your Step learns the context distribution (while
learning the embeddings):
Shortcoming of DeepWalk (/ node2vec): they have a fixed Context Distribution
controlled by hyperparameter (C) context window size. Graphs prefer different C:
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
WatchYourStep (WYS): Derivation
Rather than factorizing:
31
or:
Into low-rank L x RT, with objective:
WYS Factorizes:
Additionally training Q “the context distribution”
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
WYS Results: Link Prediction
32
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
WYS Results: Node Classification & T-SNE plot
33
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
WYS Experiments: What does Q learn?
Different distribution for different graph!
34
Correspond to manual sweeping of node2vec:
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
Graph Convolution (popular
form of Message Passing)
35
We review Image Convolution then Graph Convolution
Recall: Image Convolution
● State-of-the-art on image / video / speech.
○ (segmentation, detection, classification, etc).
input
2D (Spatial) Convolutional Layer: Representing image as a regular grid
4D trainable filter
output
vectors
Message Passing
*
[H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxiv’2020
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
● There are multiple definitions we survey in [H]
● For now, we stick to the most popular [7] (=[61] above)
What is Graph Convolutions
GCN [7] for semi-supervised node classification
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
x1
x3
x5
x6
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
y2
y4
Some nodes are labeled
Task: Can we guess label of
unlabeled nodes?
GCN [7] for semi-supervised node classification
GC Layer 1
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
h1
(1)
h6
(1
)
Latent Features
h2
(1)
h3
(1)
h4
(1)
h5
(1)
GC Layer 1
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
h1
(1)
h6
(1
)
Latent Features
h2
(1)
h3
(1)
h4
(1)
h5
(1)
GC Layer 1
x1
x3
x5
x6
Input Features
x4
x2
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017


x1
x3
x5
x6
Input Features
x4
x2
h1
(1)
h6
(1
)
Latent Features
h2
(1)
h3
(1)
h4
(1)
h5
(1)
GC Layer 1
GCN [7] for semi-supervised node classification
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
GC Layer L
h1
(L)
h6
(L
)
Output Features
h2
(L)
h3
(L)
h4
(L)
h5
(L)
x1
x3
x5
x6
Input Features
x4
x2


h1
(1)
h6
(1
)
Latent Features
h2
(1)
h3
(1)
h4
(1)
h5
(1)
GC Layer 1
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
h1
(L)
h6
(L
)
Output Features
h2
(L)
h3
(L)
h4
(L)
h5
(L)
y2
y4
● Classification Loss is
measured on labeled
nodes (e.g. y4, y2)
● GC layers optimized to
reduce loss
Loss
Loss
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
GCN [7] for semi-supervised node classification
GC Layer L
h1
(L)
h6
(L
)
Output Features
h2
(L)
h3
(L)
h4
(L)
h5
(L)
y2
Loss
y4
Loss
SGD


h1
(1)
h6
(1
)
Latent Features
h2
(1)
h3
(1)
h4
(1)
h5
(1)
x1
x3
x5
x6
Input Features
x4
x2
GC Layer 1
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
Graph Convolutional Networks (GCN) [7]
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
51
NxN Normalized Adjacency Matrix
(Sparse)
Nxdl feature matrix, one row per node.
with H(0) = X
dlxdl+1 Trainable “filter”.
Location invariant.
Dimension independent of N
Shortcoming of Vanilla GCN
Vanilla GC Layer
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
53
Shortcoming of Vanilla GCN
😱 Appendix Experiments of [7]
shows no gains beyond 2 layers
Vanilla GC Layer
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
54
Shortcoming of Vanilla GCN
😱 Appendix Experiments of [7]
shows no gains beyond 2 layers
😱 cannot arbitrary mix neighbors
from various distances
i.e. cannot learn Gabor
Filters!
Vanilla GC Layer
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
55
MixHop
[C] Abu-El-Haija et al, MixHop, ICML 2019 56
Extend the class of representations realizable by GCNs
MixHop: Motivation
57
[C] Abu-El-Haija et al, MixHop, ICML 2019
h1
(1)
h6
(1
)
h2
(1)
h3
(1)
h4
(1)
h5
(1)
MixHop Layer 1
x1
x3
x5
x6
x4
x2
MixHop: High Order Graph Conv Layer [C]
[C] Abu-El-Haija et al, MixHop, ICML 2019
58
MixHop
Vanilla GC Layer
MixHop GC Layer
Couple of code lines
implements concatenation
59
[C] Abu-El-Haija et al, MixHop, ICML 2019
MixHop
MixHop GC Layer
60
[C] Abu-El-Haija et al, MixHop, ICML 2019
MixHop
MixHop GC Layer
😀 Can incorporate distant nodes
61
[C] Abu-El-Haija et al, MixHop, ICML 2019
MixHop
MixHop GC Layer
😀 Can incorporate distant nodes
😀 Can mix neighbors across distances
i.e. can learn Gabor Filters!
62
[C] Abu-El-Haija et al, MixHop, ICML 2019
Faster Training
67
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
Goal of GTTF
● Take any Graph Learning Algorithm.
● Re-write it using “GTTF” functions (AccumulateFn and BiasFn)
● This makes the algorithm scalable to arbitrarily large graphs!
GTTF
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
Graph Convolution on top of GTTF
Define the GTTF functions:
Run model on sampled (rooted) Adjacency:
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
Node Embeddings on top of GTTF
Define the accumulation function (No Bias Fn)
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
GTTF Approximates Learning of underlying algorithms
Algorithms on top of GTTF are scalable
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
GTTF: Scale Performance Experiments [G]
76
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
GTTF: Test Metrics Experiments
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
[J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, ICLR’21 GTRL
What is SVD?
[J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal
Solutions, arxiv 2021
We open-source a Functional SVD for TensorFlow
https://github.com/samihaija/tf-fsvd. Useful if:
● You want to run SVD on a sparse matrix in TensorFlow (our code, out of the
box, provides a specialization of tf.linalg.svd onto sparse matrices)
● You want to run SVD on a dense matrix M (that is expensive to compute).
However, your matrix M is structured (e.g. geometric sum of sparse matrices),
such that, multiplying M by vectors is much cheaper than explicitly
constructing M.
SVD for Graph Learning
● SVD can be used as ML technique for graphs
○ Steps:
■ Linearize models.
■ Make objective function convex.
● We show this next, for two popular techniques
Classification via Graph Neural Networks (GNN)
Optimized as:
GNN models:
model:
Traditionally
Our Approximation
Optimized as:
Solution:
Link Prediction via Network Embedding
Optimized as:
Skipgram models:
model:
Traditionally
Our Approximation Optimized with SVD:
Solution:
Recall SVD:
Results: Link Prediction
More results: Classification (left) Link Prediction (right)
Thank you!
Questions?
86
[A] Abu-El-Haija et al, YouTube-8M: A Large-Scale Video Classification Benchmark, Arxiv’2016
[B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
[C] Abu-El-Haija,
, Ver Steeg, Aram Galstyan, MixHop: Higher-Order Graph Convolution, ICML’2019.
[D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017
[E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018
[F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLR’2021
[G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
[H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxiv’2020
[I] Abu-El-Haija et al, N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification, UAI’2019
[J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, arxiv 2021
[1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation’2013
[2] Wang et al, Structural Deep Network Embedding, KDD’2016
[3] Perozzi et al, DeepWalk, KDD’2014
[4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006.
[5] Bui et al, Neural Graph Machines, Arxiv’17
[6] Scarselli et al, The graph neural network model, IEEE Transactions on Neural Networks’2009
[7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
[8] Daugman, Two-dimensional spectral analysis of cortical receptive field profiles, Vision Research’1980
[9] Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by ..., JOSA’1985
[10] Honglak Lee et al, ICML’2009
[11] Alex Krizhevsky et al, NeurIPS’2012
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018
References
Optional Material
(time permitting)
88
We add group L2-Lasso
Regularization to drop-out columns
feature matrices, similar to [12]
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018
[images are rotated for space]
89
MixHop Sparsification
MixHop Sparsification
We add group L2-Lasso
Regularization to drop-out columns
feature matrices, similar to [12]
2nd layer of Cora drops-out zeroth-
power completely.
[images are rotated for space]
[12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018
90
Further Experiments
91
MixHop Results on Citation Datasets
MixHop Results on (Synthetic) Homophily Datasets
With less homophily, our
performance gap increases
MixHop Results on (Synthetic) Homophily Datasets
With less homophily, our
performance gap increases
With less homophily, our method
learns more feature differences
(i.e. Gabor-like Filters)
Advertisement
95
[F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLR’2021
Ad: Message Passing for Zero-Shot Synthesis
Graph of semantic similarity
between training samples
We can develop an auto-enocder with a
disentangled feature space.
If two samples share one attribute value (per
graph edge), they need to prove it:
96

Mais conteĂșdo relacionado

Mais procurados

DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot LearningDLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
Masahiro Suzuki
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
FedorNikolaev
 
Survey ecc 09june12
Survey ecc 09june12Survey ecc 09june12
Survey ecc 09june12
IJASCSE
 

Mais procurados (20)

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph Classification
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot LearningDLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
DLèŒȘèȘ­ïŒ‰Matching Networks for One Shot Learning
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
 
Lec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-systemLec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-system
 
Lec16 subspace optimization
Lec16 subspace optimizationLec16 subspace optimization
Lec16 subspace optimization
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Lec17 sparse signal processing & applications
Lec17 sparse signal processing & applicationsLec17 sparse signal processing & applications
Lec17 sparse signal processing & applications
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
 
AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Ev...
AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Ev...AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Ev...
AILABS Lecture Series - Is AI The New Electricity. Topic - Deep Learning - Ev...
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Survey ecc 09june12
Survey ecc 09june12Survey ecc 09june12
Survey ecc 09june12
 

Semelhante a JOSA TechTalks - Machine Learning on Graph-Structured Data

Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
butest
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 

Semelhante a JOSA TechTalks - Machine Learning on Graph-Structured Data (20)

Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
GNNs meet RL.pdf
GNNs meet RL.pdfGNNs meet RL.pdf
GNNs meet RL.pdf
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
kdd_talk.pdf
kdd_talk.pdfkdd_talk.pdf
kdd_talk.pdf
 
kdd_talk.pdf
kdd_talk.pdfkdd_talk.pdf
kdd_talk.pdf
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
1422798749.2779lecture 5
1422798749.2779lecture 51422798749.2779lecture 5
1422798749.2779lecture 5
 
Embeddings the geometry of relational algebra
Embeddings  the geometry of relational algebraEmbeddings  the geometry of relational algebra
Embeddings the geometry of relational algebra
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Training Graph Convolutional Neural Networks in Graph Database
Training Graph Convolutional Neural Networks in Graph DatabaseTraining Graph Convolutional Neural Networks in Graph Database
Training Graph Convolutional Neural Networks in Graph Database
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 

Mais de Jordan Open Source Association

Mais de Jordan Open Source Association (20)

JOSA TechTalks - Data Oriented Architecture
JOSA TechTalks - Data Oriented ArchitectureJOSA TechTalks - Data Oriented Architecture
JOSA TechTalks - Data Oriented Architecture
 
OpenSooq Mobile Infrastructure @ Scale
OpenSooq Mobile Infrastructure @ ScaleOpenSooq Mobile Infrastructure @ Scale
OpenSooq Mobile Infrastructure @ Scale
 
Data-Driven Digital Transformation
Data-Driven Digital TransformationData-Driven Digital Transformation
Data-Driven Digital Transformation
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Processing Arabic Text
Processing Arabic TextProcessing Arabic Text
Processing Arabic Text
 
JOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your CostsJOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your Costs
 
JOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in ProductionJOSA TechTalks - Docker in Production
JOSA TechTalks - Docker in Production
 
JOSA TechTalks - Word Embedding and Word2Vec Explained
JOSA TechTalks - Word Embedding and Word2Vec ExplainedJOSA TechTalks - Word Embedding and Word2Vec Explained
JOSA TechTalks - Word Embedding and Word2Vec Explained
 
JOSA TechTalks - Better Web Apps with React and Redux
JOSA TechTalks - Better Web Apps with React and ReduxJOSA TechTalks - Better Web Apps with React and Redux
JOSA TechTalks - Better Web Apps with React and Redux
 
JOSA TechTalks - RESTful API Concepts and Best Practices
JOSA TechTalks - RESTful API Concepts and Best PracticesJOSA TechTalks - RESTful API Concepts and Best Practices
JOSA TechTalks - RESTful API Concepts and Best Practices
 
Web app architecture
Web app architectureWeb app architecture
Web app architecture
 
Intro to the Principles of Graphic Design
Intro to the Principles of Graphic DesignIntro to the Principles of Graphic Design
Intro to the Principles of Graphic Design
 
Intro to Graphic Design Elements
Intro to Graphic Design ElementsIntro to Graphic Design Elements
Intro to Graphic Design Elements
 
JOSA TechTalk: Realtime monitoring and alerts
JOSA TechTalk: Realtime monitoring and alerts JOSA TechTalk: Realtime monitoring and alerts
JOSA TechTalk: Realtime monitoring and alerts
 
JOSA TechTalk: Metadata Management‹in Big Data
JOSA TechTalk: Metadata Management‹in Big DataJOSA TechTalk: Metadata Management‹in Big Data
JOSA TechTalk: Metadata Management‹in Big Data
 
JOSA TechTalk: Introduction to Supervised Learning
JOSA TechTalk: Introduction to Supervised LearningJOSA TechTalk: Introduction to Supervised Learning
JOSA TechTalk: Introduction to Supervised Learning
 
JOSA TechTalk: Taking Docker to Production
JOSA TechTalk: Taking Docker to ProductionJOSA TechTalk: Taking Docker to Production
JOSA TechTalk: Taking Docker to Production
 
JOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to dockerJOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to docker
 
D programming language
D programming languageD programming language
D programming language
 
A taste of Functional Programming
A taste of Functional ProgrammingA taste of Functional Programming
A taste of Functional Programming
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

JOSA TechTalks - Machine Learning on Graph-Structured Data

  • 1. Machine Learning on Graph-Structured Data Sami Abu-El-Haija to JOSA <firstName>@haija.org
  • 2. Agenda ● Background & Motivation ● [Breadth] ML Models on Graphs ● [Depth] Recent ML Models on Graphs ○ MixHop (ICML’19) ○ Watch Your Step (NeurIPS’18) ● Fast Training ○ GTTF (ICLR’21) ○ Fast GRL with unique optimal solutions (ICLR’21 Workshop GTRL) 2
  • 4. Motivation Why Graphs? ● Graphs are a natural way to encode data from various domains. 4
  • 5. What is a graph? 5 Nodes: entities Edges: relationships between entities
  • 6. What is a graph? 6 Nodes: entities Edges: relationships between entities x x x x x x x x x x y y: labels y y x x x: features
  • 7. What is a graph? 7 nodes = people edges = friendship Social Network y= engaging ads x=[age, gender, .]
  • 8. What is a graph? 8 News Articles nodes = articles edges = citations y=article type x=article text
  • 9. What is a graph? Chemical compounds can be viewed as a graph: 9 y=molecule properties (per graph) x=[H, F, C, O, N,.]
  • 10. Why ML on Graphs? Motivation 10 Across domains, practitioners benefit from predictions on graphs. Some Popular Tasks: ● Predict node labels (node classification) ○ E.g., predict users engagement to ads (in a social network). ● Predict missing edges (link prediction / edge classification) ○ E.g., predict which proteins interact with each other. ● Classify an entire graph ○ E.g., predict physical properties of a chemical molecule represented as a graph ● Generate Graphs [e.g. with certain properties] ○ E.g., can answer “Give me a chemical molecule with the following properties”
  • 11. High Level of Various Graph Algorithms Fine! You have a graph. You want to predict information on the graph. How to proceed? Next: Identify the modeling technique! ● Option (Graph Embeddings): Place nodes onto an embedding space → throw the graph away but keep embeddings. ● Option (Graph Regularization): Use graph as a regularizer. No graph is needed after model training. ● Option (Graph Convolution ⊂ Message Passing). Representation of a node is a function of its neighbors. Graph is needed for training and inference.
  • 13. (undirected) Graph Adjacency Matrix Degree Matrix Feature Matrix Transition Matrix Laplacian Matrix Quiz: What does TX encode? L gives relaxed estimates to NP-Hard Problems e.g. Graph Partitioning. Its eigenbasis provide an a continuous axes on which nodes live
  • 15. High Level of Various Graph Algorithms 15 ● Option (Graph Embeddings) ● Option (Graph Regularization) ● Option (Graph Convolution ⊂ Message Passing)
  • 16. Overview: Graph Embedding v1 v2 v3 v5 v4 v6 v11 v9 .v1 .v11 .v6 .v2 .v3 .v4 .v5 .v9 Embed in Rd Factorize A or L [1] Auto Encode A [2] Skipgram on E[walk] [3, B, D] [1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013 [2] Wang et al, Structural Deep Network Embedding, KDD’2016 [3] Perozzi et al, DeepWalk, KDD’2014 [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018 [D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017 [E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018 16
  • 17. Overview: Graph Embedding v1 v2 v3 v5 v4 v6 v11 v9 .v1 .v11 .v6 .v2 .v3 .v4 .v5 .v9 Embed in Rd Factorize A or L [1] Auto Encode A [2] Skipgram on E[walk] [3, B, D] [1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation 2013 [2] Wang et al, Structural Deep Network Embedding, KDD’2016 [3] Perozzi et al, DeepWalk, KDD’2014 [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018 [D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017 [E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018 17 Random Walk v3 → v5 → v9 → v11 → v5 → ... 
 ... Random Walk Sequences word2vec algorithm
  • 18. Review: Embedding via Random Walks ● Word2vec learns word embeddings by stochastically moving embedding of an anchor node closer to a neighboring context node. v3 → v5 → v9 → v11 → v5 → ... Random Walk Sequences Embeddings Y 18 .v1 .v11 .v6 .v2 .v3 .v4 .v5 x y .v9
  • 19. Review: Embedding via Random Walks ● Word2vec learns word embeddings by stochastically moving embedding of an anchor node closer to a neighboring context node. Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013 v3 → v5 → v9 → v11 → v5 → ... Random Walk Sequences Embeddings Y 19 anchor node .v1 .v11 .v6 .v2 .v3 .v4 .v5 .v9 x y
  • 20. Review: Embedding via Random Walks ● Word2vec learns word embeddings by stochastically moving embedding of an anchor node closer to a neighboring context node. v3 → v5 → v9 → v11 → v5 → ... Random Walk Sequences Embeddings Y 20 anchor node context node .v1 .v11 .v6 .v2 .v3 .v4 .v5 .v9 x y Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013
  • 21. Review: Embedding via Random Walks ● Word2vec learns word embeddings by stochastically moving embedding of an anchor node closer to a neighboring context node. Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013 v3 → v5 → v9 → v11 → v5 → ... Random Walk Sequences Embeddings Y 21 anchor node context node .v1 .v11 .v6 .v2 .v3 .v4 .v5 .v9 x y Stochastic Update
  • 22. High Level of Various Graph Algorithms ● Option (Graph Embeddings) ● Option (Graph Regularization) ● Option (Graph Convolution ⊂ Message Passing) 22
  • 23. Overview: Graph Regularization v1 v2 v3 v5 v4 v6 v11 v9 x11 x6 fΘ : X → Y h11 h6 2 l2 minΘ λ - y6 log h6 - y11 log h11 23 [4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006. [5] Bui et al, Neural Graph Machines, Arxiv’17
  • 24. Overview: Graph Regularization v1 v2 v3 v5 v4 v6 v11 v9 x11 x6 fΘ : X → Y fΘ(x6) 2 l2 minΘ - y6 log fΘ(x6) - y11 log fΘ(x11) fΘ(x11 ) λ [4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006. [5] Bui et al, Neural Graph Machines, Arxiv’17 24
  • 25. Overview: Graph Regularization v1 v2 v3 v5 v4 v6 v11 v9 x11 x6 fΘ : X → Y fΘ(x6) 2 l2 minΘ - y6 log fΘ(x6) - y11 log fΘ(x11) fΘ(x11 ) λ f(xi) 2 l2 minΘ Ai, j - yi log f(xi) - yj log f(xj) f(xj) ÎŁi, j λ Overall Objective: [4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006. [5] Bui et al, Neural Graph Machines, Arxiv’17 25
  • 26. High Level of Various Graph Algorithms 26 ● Option (Graph Embeddings) ● Option (Graph Regularization) ● Option (Graph Convolution ⊂ Message Passing)
  • 27. Overview: Message Passing The first neural network on graph data (I am aware of) [6] Scarselli et al. The graph neural network model. IEEE Transactions on Neural Networks’2009
  • 28. [Depth] ML on Graphs - Models for: Node Embedding & Message Passing 28
  • 29. Node Embedding: Watch Your Step (NeurIPS’18) 29
  • 30. Watch Your Step (Node Embedding Method) Watch Your Step learns the context distribution (while learning the embeddings): Shortcoming of DeepWalk (/ node2vec): they have a fixed Context Distribution controlled by hyperparameter (C) context window size. Graphs prefer different C: [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
  • 31. WatchYourStep (WYS): Derivation Rather than factorizing: 31 or: Into low-rank L x RT, with objective: WYS Factorizes: Additionally training Q “the context distribution” [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
  • 32. WYS Results: Link Prediction 32 [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
  • 33. WYS Results: Node Classification & T-SNE plot 33 [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
  • 34. WYS Experiments: What does Q learn? Different distribution for different graph! 34 Correspond to manual sweeping of node2vec: [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018
  • 35. Graph Convolution (popular form of Message Passing) 35
  • 36. We review Image Convolution then Graph Convolution
  • 37. Recall: Image Convolution ● State-of-the-art on image / video / speech. ○ (segmentation, detection, classification, etc). input 2D (Spatial) Convolutional Layer: Representing image as a regular grid 4D trainable filter output vectors Message Passing *
  • 38. [H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxiv’2020 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 ● There are multiple definitions we survey in [H] ● For now, we stick to the most popular [7] (=[61] above) What is Graph Convolutions
  • 39. GCN [7] for semi-supervised node classification [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 40. GCN [7] for semi-supervised node classification x1 x3 x5 x6 x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 41. GCN [7] for semi-supervised node classification x1 x3 x5 x6 Input Features x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 42. GCN [7] for semi-supervised node classification x1 x3 x5 x6 Input Features x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 y2 y4 Some nodes are labeled Task: Can we guess label of unlabeled nodes?
  • 43. GCN [7] for semi-supervised node classification GC Layer 1 x1 x3 x5 x6 Input Features x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 44. GCN [7] for semi-supervised node classification h1 (1) h6 (1 ) Latent Features h2 (1) h3 (1) h4 (1) h5 (1) GC Layer 1 x1 x3 x5 x6 Input Features x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 45. GCN [7] for semi-supervised node classification h1 (1) h6 (1 ) Latent Features h2 (1) h3 (1) h4 (1) h5 (1) GC Layer 1 x1 x3 x5 x6 Input Features x4 x2 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 46. 
 x1 x3 x5 x6 Input Features x4 x2 h1 (1) h6 (1 ) Latent Features h2 (1) h3 (1) h4 (1) h5 (1) GC Layer 1 GCN [7] for semi-supervised node classification [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 47. GCN [7] for semi-supervised node classification GC Layer L h1 (L) h6 (L ) Output Features h2 (L) h3 (L) h4 (L) h5 (L) x1 x3 x5 x6 Input Features x4 x2 
 h1 (1) h6 (1 ) Latent Features h2 (1) h3 (1) h4 (1) h5 (1) GC Layer 1 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 48. GCN [7] for semi-supervised node classification h1 (L) h6 (L ) Output Features h2 (L) h3 (L) h4 (L) h5 (L) y2 y4 ● Classification Loss is measured on labeled nodes (e.g. y4, y2) ● GC layers optimized to reduce loss Loss Loss [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 49. GCN [7] for semi-supervised node classification GC Layer L h1 (L) h6 (L ) Output Features h2 (L) h3 (L) h4 (L) h5 (L) y2 Loss y4 Loss SGD 
 h1 (1) h6 (1 ) Latent Features h2 (1) h3 (1) h4 (1) h5 (1) x1 x3 x5 x6 Input Features x4 x2 GC Layer 1 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017
  • 50. Graph Convolutional Networks (GCN) [7] [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 51 NxN Normalized Adjacency Matrix (Sparse) Nxdl feature matrix, one row per node. with H(0) = X dlxdl+1 Trainable “filter”. Location invariant. Dimension independent of N
  • 51. Shortcoming of Vanilla GCN Vanilla GC Layer [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 53
  • 52. Shortcoming of Vanilla GCN 😱 Appendix Experiments of [7] shows no gains beyond 2 layers Vanilla GC Layer [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 54
  • 53. Shortcoming of Vanilla GCN 😱 Appendix Experiments of [7] shows no gains beyond 2 layers 😱 cannot arbitrary mix neighbors from various distances i.e. cannot learn Gabor Filters! Vanilla GC Layer [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 55
  • 54. MixHop [C] Abu-El-Haija et al, MixHop, ICML 2019 56
  • 55. Extend the class of representations realizable by GCNs MixHop: Motivation 57 [C] Abu-El-Haija et al, MixHop, ICML 2019
  • 56. h1 (1) h6 (1 ) h2 (1) h3 (1) h4 (1) h5 (1) MixHop Layer 1 x1 x3 x5 x6 x4 x2 MixHop: High Order Graph Conv Layer [C] [C] Abu-El-Haija et al, MixHop, ICML 2019 58
  • 57. MixHop Vanilla GC Layer MixHop GC Layer Couple of code lines implements concatenation 59 [C] Abu-El-Haija et al, MixHop, ICML 2019
  • 58. MixHop MixHop GC Layer 60 [C] Abu-El-Haija et al, MixHop, ICML 2019
  • 59. MixHop MixHop GC Layer 😀 Can incorporate distant nodes 61 [C] Abu-El-Haija et al, MixHop, ICML 2019
  • 60. MixHop MixHop GC Layer 😀 Can incorporate distant nodes 😀 Can mix neighbors across distances i.e. can learn Gabor Filters! 62 [C] Abu-El-Haija et al, MixHop, ICML 2019
  • 62. [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 63. Goal of GTTF ● Take any Graph Learning Algorithm. ● Re-write it using “GTTF” functions (AccumulateFn and BiasFn) ● This makes the algorithm scalable to arbitrarily large graphs!
  • 64. GTTF [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 65. [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 66. Graph Convolution on top of GTTF Define the GTTF functions: Run model on sampled (rooted) Adjacency: [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 67. Node Embeddings on top of GTTF Define the accumulation function (No Bias Fn) [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 68. GTTF Approximates Learning of underlying algorithms
  • 69. Algorithms on top of GTTF are scalable [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 70. GTTF: Scale Performance Experiments [G] 76 [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 71. GTTF: Test Metrics Experiments [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021
  • 72. [J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, ICLR’21 GTRL
  • 73. What is SVD? [J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, arxiv 2021
  • 74. We open-source a Functional SVD for TensorFlow https://github.com/samihaija/tf-fsvd. Useful if: ● You want to run SVD on a sparse matrix in TensorFlow (our code, out of the box, provides a specialization of tf.linalg.svd onto sparse matrices) ● You want to run SVD on a dense matrix M (that is expensive to compute). However, your matrix M is structured (e.g. geometric sum of sparse matrices), such that, multiplying M by vectors is much cheaper than explicitly constructing M.
  • 75. SVD for Graph Learning ● SVD can be used as ML technique for graphs ○ Steps: ■ Linearize models. ■ Make objective function convex. ● We show this next, for two popular techniques
  • 76. Classification via Graph Neural Networks (GNN) Optimized as: GNN models: model: Traditionally Our Approximation Optimized as: Solution:
  • 77. Link Prediction via Network Embedding Optimized as: Skipgram models: model: Traditionally Our Approximation Optimized with SVD: Solution: Recall SVD:
  • 79. More results: Classification (left) Link Prediction (right)
  • 81. [A] Abu-El-Haija et al, YouTube-8M: A Large-Scale Video Classification Benchmark, Arxiv’2016 [B] Abu-El-Haija et al, Watch Your Step: Learning Node Embeddings via Graph Attention, NeurIPS’2018 [C] Abu-El-Haija,
, Ver Steeg, Aram Galstyan, MixHop: Higher-Order Graph Convolution, ICML’2019. [D] Abu-El-Haija et al, Learning Edge Representations via Low-Rank Asymmetric Projections, CIKM’2017 [E] Lee, Abu-El-Haija, Varadarajan, Natsev, Collaborative Deep Metric Learning for Video Understanding, KDD’2018 [F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLR’2021 [G] Markowitz* et al, Graph traversal with tensor functionals: a meta-algorithm for scalable learning, ICLR’2021 [H] Chami, Abu-El-Haija, Perozzi, Re, Murphy, Machine Learning on Graphs: A Model and Comprehensive Taxonomy, arxiv’2020 [I] Abu-El-Haija et al, N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification, UAI’2019 [J] Abu-El-Haija et al, Fast Graph Learning with Unique Optimal Solutions, arxiv 2021 [1] Belkin & Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation’2013 [2] Wang et al, Structural Deep Network Embedding, KDD’2016 [3] Perozzi et al, DeepWalk, KDD’2014 [4] Belkin et al, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR’2006. [5] Bui et al, Neural Graph Machines, Arxiv’17 [6] Scarselli et al, The graph neural network model, IEEE Transactions on Neural Networks’2009 [7] Kipf & Welling, Semi-supervised classification with graph convolutional networks, ICLR’2017 [8] Daugman, Two-dimensional spectral analysis of cortical receptive field profiles, Vision Research’1980 [9] Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by ..., JOSA’1985 [10] Honglak Lee et al, ICML’2009 [11] Alex Krizhevsky et al, NeurIPS’2012 [12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018 References
  • 83. We add group L2-Lasso Regularization to drop-out columns feature matrices, similar to [12] [12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018 [images are rotated for space] 89 MixHop Sparsification
  • 84. MixHop Sparsification We add group L2-Lasso Regularization to drop-out columns feature matrices, similar to [12] 2nd layer of Cora drops-out zeroth- power completely. [images are rotated for space] [12] Gordon et al, MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks, CVPR’2018 90
  • 86. MixHop Results on Citation Datasets
  • 87. MixHop Results on (Synthetic) Homophily Datasets With less homophily, our performance gap increases
  • 88. MixHop Results on (Synthetic) Homophily Datasets With less homophily, our performance gap increases With less homophily, our method learns more feature differences (i.e. Gabor-like Filters)
  • 90. [F] Ge, Abu-El-Haija, Xin, Itti, Zero-shot Synthesis with Group-Supervised Learning, ICLR’2021 Ad: Message Passing for Zero-Shot Synthesis Graph of semantic similarity between training samples We can develop an auto-enocder with a disentangled feature space. If two samples share one attribute value (per graph edge), they need to prove it: 96

Notas do Editor

  1. Data structure that can represent entities and their relationships.
  2. Data structure that can represent entities and their relationships.
  3. Many random walks == Many (long) Sequences Current embedding
  4. Sample context node, within distance from anchor node.
  5. Sample context node, within distance from anchor node.