Geometric Deep Learning

Structure of the presentation
What
High-level overview onemergingfieldofgeometric
deeplearning(andgraphdeeplearning)
How
Presentation focusedonstartup-style organizations
witheveryone doingabitofeverything,everyone
needingtounderstandabit of everything.CEOcannot
bethe ‘ideaguy’not knowinganythingabout graphs
andgeometricdeeplearning,ifyouare operatingin
thisspace
EFFECTUATION – THE BEST THEORY OF ENTREPRENEURSHIP YOU ACTUALLY FOLLOW,
WHETHER YOU’VE HEARD OF IT OR NOT
by Ricardo dos Santos

Brief Review of
Geometric Deep Learning

Geometric Deep Learning #1
Bronstein et al. (July 2017): “Geometric deep learning (
http://geometricdeeplearning.com/) is an umbrella term for e merging
techniques attempting to generalize (structured) deep neural models to non-
Euclidean domains, such as graphs and manifolds. The purpose of this article
is to overview different examples of geometric deep-learning problems and
present available solutions, key difficulties, applications, and future research
directions in this nascent field”
SCNN (2013)
GCNN/ChebNet (2016)
GCN (2016)
GNN (2009)
Geodesic CNN (2015)
Anisotropic CNN (2016)
MoNet (2016)
Localized SCNN (2015)

Bronstein et al. (July 2017): “The non-Euclidean nature of data
implies that there are no such familiar properties as global
parameterization, common system of coordinates, vector space
structure, or shift-invariance. Consequently, basic operations like
convolution that are taken for granted in the Euclidean case are even
not well defined on non-Euclidean domains.”
“First attempts to generalize neural networks to graphs we are aware of
are due to Mori et al. (2005) who proposed a scheme combining
recurrent neural networks and random walk models. This approach
went almost unnoticed, re-emerging in a modern form in
Suhkbaatar et al. (2016) and Li et al. (2015) due to the renewed recent
interest in deep learning.”
“In a parallel effort in the computer vision and graphics community,
Masci et al. (2015) showed the first CNN model on meshed surfaces,
resorting to a spatial definition of the convolution operation based on
local intrinsic patches. Among other applications, such models were
shown to achieve state-of-the-art performance in finding
correspondence between deformable 3D shapes. Followup works
proposed different construction of intrinsic patches on point clouds
Boscaini et al. (2016)a,b and general graphs Monti et al. (2016).”
In calculus, the notion of derivative describes
how the value of a function changes with an
infinitesimal change of its argument. One of the
big differences distinguishing classical calculus
from differential geometry is a lack of vector
space structure on the manifold, prohibiting us
from naïvely using expressions like f(x+dx). The
conceptual leap that is required to generalize
such notions to manifolds is the need to work
locally in the tangent space.
Physically, a tangent vector field can be
thought of as a flow of material on a manifold.
The divergence measures the net flow of a field
at a point, allowing to distinguish between field
‘sources’ and ‘sinks’. Finally, the Laplacian (or
Laplace-Beltrami operator in differential
geometric jargon)
“A centerpiece of classical Euclidean signal processing is the property of the Fourier
transform diagonalizing the convolution operator, colloquially referred to as the
Convolution Theorem. This property allows to express the convolution f⋆g of two
functions in the spectral domain as the element-wise product of their Fourier transforms.
Unfortunately, in the non-Euclidean case we cannot even define the operation x-x’ on the
manifold or graph, so the notion of convolution does not directly extend to this case.

Bronstein et al. (July 2017): “We expect the following years to bring exciting new approaches
and results, and conclude our review with a few observations of current key difficulties and
potential directions of future research.”
Generalization: Generalizing
deep learning models to
geometric data requires not only
finding non-Euclidean
counterparts of basic building
blocks (such as convolutional
and pooling layers), but also
generalization across different
domains. Generalization
capability is a key requirement in
many applications, including
computer graphics, where a
model is learned on a training set
of non-Euclidean domains (3D
shapes) and then applied to
previously unseen ones.
Time-varying domains: An
interesting extension of geometric
deep learning problems discussed
in this review is coping with signals
defined over a dynamically
changing structure. In this case, we
cannot assume a fixed domain and
must track how these changes
affect signals. This could prove
useful to tackle applications such
as abnormal activity detection in
social or financial networks. In the
domain of computer graphics and
vision, potential applications deal
with dynamic shapes (e.g. 3D video
captured by a range sensor).
Computation: The final consideration is
a computational one. All existing deep
learning software frameworks are
primarily optimized for Euclidean data.
One of the main reasons for the
computational efficiency of deep
learning architectures (and one of the
factors that contributed to their
renaissance) is the assumption of
regularly structured data on 1D or 2D
grid, allowing to take advantage of
modern GPU hardware. Geometric data,
on the other hand, in most cases do not
have a grid structure, requiring different
ways to achieve efficient computations.
It seems that computational paradigms
developed for large-scale graph
processing are more adequate
frameworks for such applications.

Primer on
GRAPHs
Taylor and Wrana (2012)
doi: 10.1002/pmic.201100594

Graph theory especially useful for networks analysis
https://doi.org/10.1126/science.286.5439.509
Cited by 29,071 articles
https://doi.org/10.1038/30918
Cited by 33,772
Random rewiring procedure for interpolating between a
regular ring lattice and a random network, without
altering the number of vertices or edges in the graph.
http://www.bbc.co.uk/newsbeat/article/35500398/how-facebook-updated-six-degree
s-of-separation-its-now-357
https://research.fb.com/three-and-a-half-degrees-of-separation/
http://slideplayer.com/slide/9267536/

Graph theory Common metrics and definitions
Graph-theoretic node
importance mining on
network topology
- Xue et al. (2017)
The graph-theoretic node importance mining methods based on
network topologies comprise two main categories: node
relevance and shortest path. The method of node relevance is
measured by degree analysis. The methods of shortest path that
aim at finding optimal spreading paths are measured by several
node importance analyses, e.g., betweenness, closeness
centrality, eigenvector centrality, Bonacich centrality and alter-
based centrality.
Betweenness is used particularly for measurements of power
while closeness centrality and eigenvector centrality are
used particularly for measurements of centrality. Bonacich
centrality is an extension of eigenvector centrality which
measures node importance on both centrality and power. The
other mining methods for node importance based on network
topologies included in this review are via processes such as
node deleting, node contraction, and data mining and machine
learning embedded techniques. For heterogeneous network
structures, fusion methods integrate all the previously
mentioned measurements.
28 February, 2013
Google’s Knowledge Graph: one step
closer to the semantic web?
By Andrew Isidoro
Knowledge Graph, a database of over 570m of the most
searched-for people, places and things (entities), including around
18bn cross-references.
The knowledge graph as the default data
model for learning on heterogeneous
knowledge
Wilcke, Xandera; Bloem, Peterc; de Boer, Victor
Data Science, vol. Preprint, no. Preprint, pp. 1-19, 2017
http://doi.org/10.3233/DS-170007
The FuhSen Architecture. High-level architecture
comprising (a) Mediator and wrappers architecture to
build the (b) knowledge graph on demand. The answer of a
keyword query corresponds to an RDF subject-molecule
that integrates RDF molecules collected from the
wrappers. (c) The components to enrich the results KG.
FuhSen: A Federated Hybrid
Search Engine for building a
knowledge graph on-demand
July 2016
https://doi.org/10.1007/978-3-319-48472-3_47
+ https://doi.org/10.1109/ICSC.2017.85
researchgate.net

Ranking in time-varying complex networks
Ranking in evolving complex networks
Hao Liao, Manuel Sebastian Mariani, Matúš Medo, Yi-Cheng Zhang, Ming-
Yang Zhou
Physics Reports Volume 689, 19 May 2017, Pages 1-54
https://doi.org/10.1016/j.physrep.2017.05.001
Top: The often-studied Zachary’s karate club network has 34 nodes
and 78 links (here visualized with the Gephi software). Bottom:
Ranking of the nodes in the Zachary karate club network by the
centrality metrics described in this section. Node labels on the
horizontal axis correspond to the node labels in the top panel.
For the APS citation data from the period 1893–2015
(560,000 papers in total), we compute the ranking of
papers according to various metrics— citation count c,
PageRank centrality p (with the teleportation
parameter α = 0.5), and rescaled PageRank R(p). The
figure shows the median ranking position of the top 1%
of papers from each year. The three curves show three
distinct patterns. For c, the median rank is stable until
approximately 1995; then it starts to grow because the
best young papers have not yet reached sufficiently
high citation counts. For p, the median rank grows
during the whole displayed time period because
PageRank applied on an acyclic time-ordered citation
network favors old papers. By contrast, the curve is
approximately flat for R(p) during the whole period
which confirms that the metric is not biased by paper
age and gives equal chances to all papers.
An illustration of the difference between the first-order Markovian
(time-aggregated) and second-order network representation of
the same data. Panels A–B represent the destination cities (the
right-most column) of flows of passengers from Chicago to other
cities, given the previous location (the left-most column). When
including memory effects (panel B), the fraction of passengers
coming back to the original destination is large, in agreement with
our intuition. A similar effect is found for the network of academic
journals

information diffusion intro
Many graphs can be modeled or used to predict how an information flows in the given graph.
● How influential are with your Instagram posts, tweets, LinkedIn posts, etc?
● How does tweet affect the stock market, or in more general terms, how can the causality be inferred from graph?
● In practice, you see heat diffusion methods applied also applied to information diffusion
Random walks and diffusion on networks
Naoki Masuda, Mason A. Porter, Renaud Lambiotte
Physics Reports (Available online 31 August 2017)
https://doi.org/10.1016/j.physrep.2017.07.007
Fig. 12. The weary random walker retires from the network and heads off
into the distant sunset. [This picture was drawn by Yulian Ng.].
Inferring networks of diffusion and influence
Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause
KDD '10 Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
https://doi.org/10.1145/1835804.1835933
There are several interesting directions for future work.
Here we only used time difference to infer edges and thus
it would be interesting to utilize more informative features
(e.g., textual content of postings etc.) to more accurately
estimate the influence probabilities. Moreover, our work
considers static propagation networks, however real
influence networks are dynamic and thus it would be
interesting to relax this assumption. Last, there are many
other domains where our methodology could be useful:
inferring interaction networks in systems biology (protein-
protein and gene interaction networks), neuroscience
(inferring physical connections between neurons) and
epidemiology. We believe that our results provide a
promising step towards understanding complex processes
on networks based on partial observations.

information diffusion Social Networks #1
Nonlinear Dynamics of Information Diffusion in Social Networks
ACM Transactions on the Web (TWEB) Volume 11 Issue 2, May 2017 Article No. 11
https://doi.org/10.1145/3057741
Online Social Networks and information diffusion: The role of ego networks
Valerio Arnaboldi, Marco Conti, Andrea Passarella, Robin I.M. Dunbar
Online Social Networks and Media 1 (2017) 44–55
http://dx.doi.org/10.1016/j.osnem.2017.04.001
Data Driven Modeling of Continuous Time Information Diffusion in Social
Networks
Liang Liu ; Bin Chen ; Bo Qu ; Lingnan He ; Xiaogang Qiu
Data Science in Cyberspace (DSC), 2017 IEEE
https://doi.org/10.1109/DSC.2017.103
Online Bayesian Inference of Diffusion Networks
Shohreh Shaghaghian ; Mark Coates
IEEE Transactions on Signal and Information Processing over Networks ( Volume: 3, Issue: 3, Sept. 2017 )
https://doi.org/10.1109/TSIPN.2017.2731160
Modeling the reemergence of information diffusion in social network
Dingda Yang, Xiangwen Liao, Huawei Shen, Xueqi Cheng, Guolong Chen
Physica A: Statistical Mechanics and its Applications [Available online 1 September 2017]
http://dx.doi.org/10.1016/j.physa.2017.08.115
Information Diffusion in Online Social Networks: A Survey
Adrien Guille, Hakim Hacid, Cécile Favre, Djamel A. Zighed
ACM SIGMOD Volume 42 Issue 2, May 2013 Pages 17-28
https://doi.org/10.1145/2503792.2503797

information diffusion Social Networks #2
Literature Survey on Interplay of Topics, Information Diffusion and
Connections on Social Networks
Kuntal Dey, Saroj Kaushik, L. Venkata Subramaniam
(Submitted on 3 Jun 2017)
https://arxiv.org/abs/1706.00921

information diffusion scientific citation networks #1
Integration of Scholarly Communication Metadata
Using Knowledge Graphs
Afshin Sadeghi, Christoph Lange, Maria-Esther Vidal, Sören Auer
International Conference on Theory and Practice of Digital Libraries
TPDL 2017: Research and Advanced Technology for Digital Libraries pp 328-341
https://doi.org/10.1007/978-3-319-67008-9_26
Particularly, we demonstrate the benefits of exploiting semantic web
technology to reconcile data about authors, papers, and conferences.
A Recommendation System Based on Hierarchical
Clustering of an Article-Level Citation Network
Jevin D. West ; Ian Wesley-Smith ; Carl T. Bergstrom
IEEE Transactions on Big Data ( Volume: 2, Issue: 2, June 1 2016 )
https://doi.org/10.1109/TBDATA.2016.2541167
http://babel.eigenfactor.org/
The scholarly literature is expanding at a rate that necessitates
intelligent algorithms for search and navigation. For the most part, the
problem of delivering scholarly articles has been solved. If one knows the
title of an article, locating it requires little effort and, paywalls permitting,
acquiring a digital copy has become trivial. However, the navigational
aspect of scientific search - finding relevant, influential articles that
one does not know exist - is in its early development
Big Scholarly Data: A Survey
Feng Xia ; Wei Wang ; Teshome Megersa Bekele ; Huan Liu
IEEE Transactionson Big Data ( Volume: 3, Issue: 1, March 1 2017 )
https://doi.org/10.1109/TBDATA.2016.2641460
ASNA - Academic Social Network Analysis

information diffusion scientific citation networks #2
Implicit Multi-Feature Learning for Dynamic Time
Series Prediction of the Impact of Institutions
Xiaomei Bai ; Fuli Zhang ; Jie Hou ; Feng Xia ; Amr Tolba ; ElsayedElashkarr
IEEE Access ( Volume: 5 )
https://doi.org/10.1109/ACCESS.2017.2739179
Predicting the impact of research institutions is an important tool for
decision makers, such as resource allocation for funding bodies.
Despite significant effort of adopting quantitative indicators to measure
the impact of research institutions, little is known that how the impact of
institutions evolves in time
The Role of Positive and Negative Citations in
Scientific Evaluation
Xiaomei Bai ; Ivan Lee ; Zhaolong Ning ; Amr Tolba ; Feng Xia
IEEE Access ( Volume: PP, Issue: 99 )
https://doi.org/10.1109/ACCESS.2017.2740226
Predicting the impact of
research institutions is an
important tool for decision
makers, such as resource
allocation for funding bodies.
Despite significant effort of
adopting quantitative indicators
to measure the impact of
research institutions, little is
known that how the impact of
institutions evolves in time
Recommendation for Cross-
Disciplinary Collaboration
Based on Potential Research
Field Discovery
Wei Liang ; Xiaokang Zhou ; Suzhen Huang ;
Chunhua Hu ; Qun Jin
Advanced Cloud and Big Data (CBD), 2017
https://doi.org/10.1109/CBD.2017.67
The cross-disciplinary information is hidden
in tons of publications, and the relationships
between different fields are complicated,
which make it challengeable
recommending cross-disciplinary
collaboration for a specific researcher.
Petteri: Whether to recommend “outliers”
i.e. unexpected combinations of fields, or
something outside your field that would be
useful to you. Or just the typical landmark
papers of your field? Depends on your
needs for sure.
https://iris.ai/
http://www.bibblio.org/learning-and-knowledge
In the future, we will further explore the relationships between the impact of
institutions and the features driving the impact of institutions change to
enhance the prediction performance. In addition, this work is conducted only
on literatures from the eight top conferences based on Microsoft Academic
Graph (MAG), dataset, examining other conferences for the same observed
patterns could widen the significance of our findings.

information diffusion Finance, Quant trading, decision making
Information Diffusion, Cluster formation
and Entropy-based Network Dynamics
in Equity and Commodity Markets
Stelios Bekiros , Duc Khuong Nguyen , Leonidas Sandoval
Junior , Gazi Salah Uddin
European Journal of Operational Research (2016)
http://dx.doi.org/10.1016/j.ejor.2016.06.052
https://www.prowler.io/
https://www.causalitylink.com/
https://www.forbes.com/sites/antoinegara/2017/02/28/kensho-sp-5
00-million-valuation-jpmorgan-morgan-stanley/#6fe4bb0b5cbf
Technology that brings transparency to complex systems
https://www.kensho.com/
Our platform uses artificial intelligence
to discover, extract and index events,
variables and relationships about
markets, sectors, industries and
equities. It absorbs news articles,
analysts’ point-of view or equity-
related materials as they are
published. Save time and get ahead by
letting AI do the repetitive reading for
you. Focus on new knowledge.
Analysis of Investment Relationships
Between Companies and Organizations
Based on Knowledge Graph
Xiaobo Hu, Xinhuai Tang, Feilong Tang
In: Barolli L., Enokido T. (eds) Innovative Mobile and Internet Services in
Ubiquitous Computing. IMIS 2017. Advances in Intelligent Systems and
Computing, vol 612
https://doi.org/10.1007/978-3-319-61542-4_20
A design for a common-sense
knowledge-enhanced decision-support
system: Integration of high-frequency
market data and real-time news
Kun Chen, Jian Yin, Sulin Pang
Expert Systems (June 2017) doi: 10.1111/exsy.12209
Compared with previous work, our model is the
first to incorporate broad common-sense
knowledge into a decision support system, thereby
improving the news analysis process through the
application of a graphic random-walk framework.
Prototype and experiments based on Hong Kong
stock market data have demonstrated that
common-sense knowledge is an important factor
in building financial decision models that
incorporate news information.
Dynamics of financial markets and
transaction costs: A graph-based study
FelipeLillo, RodrigoValdés
Research in International Business and Finance
Volume 38, September 2016, Pages 455-465
Using financialization as a conceptual framework
to understand the current trading patterns of
financial markets, this work employs a market
graph model for studying the stock indexes of
geographically separated financial markets. By
using an edge creation condition based on a
transaction cost threshold, the resulting market
graph features a strong connectivity, some traces
of a power law in the degree distribution and an
intensive presence of cliques.
Ponzi scheme diffusion in complex
networks
Anding Zhu, Peihua Fu, Qinghe Zhang, ZhenyueChen
Physica A: Statistical Mechanics and its Applications
Volume 479, 1 August 2017, Pages 128-136
https://doi.org/10.1016/j.physa.2017.03.015

“Intelligent knowledge graphs” with “actionable insights”
Model-Driven Analytics: Connecting Data,
Domain Knowledge, and Learning
Thomas Hartmann, Assaad Moawad, Francois Fouquet, Gregory Nain,
Jacques Klein, Yves Le Traon, Jean-Marc Jezequel
(Submitted on 5 Apr 2017)
Gaining profound insights from collected data of today's application domains like IoT, cyber-physical
systems, health care, or the financial sector is business-critical and can create the next multi-billion
dollar market. However, analyzing these data and turning it into valuable insights is a huge challenge.
This is often not alone due to the large volume of data but due to an incredibly high domain complexity,
which makes it necessary to combine various extrapolation and prediction methods to understand the
collected data. Model-driven analytics is a refinement process of raw data driven by a model reflecting
deep domain understanding, connecting data, domain knowledge, and learning.

Graph theory example Applications beyond typical networks
Construction (BIM): “Graph theory based
representation of building information models (BIM)
for access control applications”
Automation in Construction, Volume 68, August 2016, Pages 44-51
https://doi.org/10.1016/j.autcon.2016.04.001
IFC 4 model
IFC-SPF
format.
Medical Imaging (OCT): “Improving Segmentation
of 3D Retina Layers Based on Graph Theory
Approach for Low Quality OCT Images”
Metrology and Measurement System Volume 23, Issue 2 (Jun 2016)
https://doi.org/10.1515/mms-2016-0016
Dijkstra shortest path algorithm
Risk Assessment: “A New Risk
Assessment Framework Using Graph
Theory for Complex ICT Systems”
MIST '16 Proceedings of the 8th ACM CCS1
https://doi.org/10.1145/2995959.2995969
Biodiversity management: “Multiscale
connectivity and graph theory highlight
critical areas for conservation under
climate change”
Ecological Applications (8 June 2016)
http://doi.org/10.1890/15-0925
Brain Imaging: ““Small World” architecture in
brain connectivity and hippocampal volume in
Alzheimer’s disease: a study via graph theory
from EEG data”
Brain Imaging and Behavior April 2017, Volume 11, Issue 2, pp
473–485 doi: 10.1007/s11682-016-9528-3
Small World trends in the two groups of subjects
Medical Imaging (OCT):
“Reconstruction of 3D surface maps from anterior
segment optical coherence tomography images using
graph theory and genetic algorithms”
Biomedical Signal Processing and Control
Volume 25, March 2016, Pages 91-98
https://doi.org/10.1016/j.bspc.2015.11.004
Cybersecurity: “Big Data Behavioral Analytics
Meet Graph Theory: On Effective Botnet
Takedowns”
IEEE Network ( Volume: 31, Issue: 1, January/February 2017 )
https://doi.org/10.1109/MNET.2016.1500116NM

Graph Signal Processing and quantitative graph theory
Defferrard et al. (2016): “The emerging field of Graph Signal Processing (GSP) aims at bridging the gap
between signal processing and spectral graph theory [Shuman et al. 2013], a blend between graph theory and
harmonic analysis. A goal is to generalize fundamental analysis operations for signals from regular grids to
irregular structures embodied by graphs. We refer the reader to Belkin and Niyogi 2008 for an introduction of
the field.”
Matthias Dehmer, Frank Emmert-Streib, Yongtang Shi
https://doi.org/10.1016/j.ins.2017.08.009
The main goal of quantitative graph theory is
the structural quantification of information
contained in complex networks by employing
a measurement approach based on numerical
invariants and comparisons. Furthermore, the
methods as well as the networks do not need to be
deterministic but can be statistic.
Shuman et al. 2013:Perraudin and Vandergheynst 2016:
”the proposed Wiener regularization framework offers a
compelling way to solve traditional problems such as denoising,
regression or semi-supervised learning”
Experiments on the temperature of Molene. Top: A
realization of the stochastic graph signal (first measure).
Bottom center: the temperature of the Island of Brehat.
Bottom right: Recovery errors (inpainting error) for different
noise levels

Graph Fourier Transform GFT
The use of Graph Fourier Transform in image
processing: A new solution to classical problems
Verdoja Francesco. PhD Thesis 2017
https://doi.org/10.1109/ICASSP.2017.7952886
On the Graph Fourier Transform for
Directed Graphs
Stefania Sardellitti ; Sergio Barbarossa ; Paolo Di Lorenzo
IEEE Journal of Selected Topics in Signal Processing ( Volume: 11, Issue: 6, Sept. 2017 )
https://doi.org/10.1109/JSTSP.2017.2726979
The analysis of signals defined over a graph is relevant in many applications, such as social and
economic networks, big data or biological networks, and so on. A key tool for analyzing these
signals is the so called Graph Fourier Transform (GFT). Alternative definitions of GFT have been
suggested in the literature, based on the eigen-decomposition of either the graph Laplacian or
adjacency matrix. In this paper, we address the general case of directed graphs and we propose an
alternative approach that builds the graph Fourier basis as the set of orthonormal vectors that
minimize a continuous extension of the graph cut size, known as the Lovasz extension.
Graph-based approaches have recently seen a spike of interest in the
image processing and computer vision communities, and many
classical problems are finding new solutions thanks to these
techniques. The Graph Fourier Transform (GFT), the equivalent of the
Fourier transform for graph signals, is used in many domains to
analyze and process data modeled by a graph.
In this thesis we present some classical image processing problems
that can be solved through the use of GFT. We’ll focus our attention
on two main research area: the first is image compression, where
the use of the GFT is finding its way in recent literature; we’ll propose
two novel ways to deal with the problem of graph weight
encoding. We’ll also propose approaches to reduce overhead costs
of shape-adaptive compression methods.
The second research field is image anomaly detection, GFT has
never been proposed to this date to solve this class of problems;
we’ll discuss here a novel technique and we’ll test its application on
hyperspectral and medical (PET tumor scan) images

Graph signal Processing #1
Adaptive Least Mean Squares Estimation of Graph
Signals
Paolo Di Lorenzo ; Sergio Barbarossa ; Paolo Banelli ; Stefania Sardellitti
IEEE Transactions on Signal and Information Processing over Networks
( Volume: 2, Issue: 4, Dec. 2016 )
https://doi.org/10.1109/TSIPN.2016.2613687
Distributed Adaptive Learning of Graph Signals
Paolo Di Lorenzo ; Sergio Barbarossa ; Paolo Banelli ; Stefania Sardellitti
IEEE Transactions on Signal Processing ( Volume: 65, Issue: 16, Aug.15, 15 2017 )
https://doi.org/10.1109/TSP.2017.2708035
The aim of this paper is to propose a least mean squares (LMS) strategy for adaptive estimation of
signals defined over graphs. Assuming the graph signal to be band-limited, over a known bandwidth, the
method enables reconstruction, with guaranteed performance in terms of mean-square error, and tracking
from a limited number of observations over a subset of vertices.
Furthermore, to cope with the case where the bandwidth is not known beforehand, we propose a method
that performs a sparse online estimation of the signal support in the (graph) frequency domain, which
enables online adaptation of the graph sampling strategy. Finally, we apply the proposed method to build
the power spatial density cartography of a given operational region in a cognitive network environment.
“We apply the proposed distributed framework to power density cartography in cognitive
radio (CR) networks. We consider a 5G scenario, where a dense deployment of radio access
points (RAPs) is envisioned to provide a service environment characterized by very low latency
and high rate access. Each RAP collects data related to the transmissions of primary users
(PUs) at its geographical position, and communicates with other RAPs with the aim of
implementing advanced cooperative sensing techniques”
“This paper represents the first work that merges the well established field
of adaptation and learning over networks, and the emerging topic of
signal processing over graphs. Several interesting problems are still open,
e.g., distributed reconstruction in the presence of directed and/or
switching graph topologies, online identification of the graph signal support
from streaming data, distributed inference of the (possibly unknown)
graph signal topology, adaptation of the sampling strategy to time-varying
scenarios, optimization of the sampling probabilities, just to name a few.
We plan to investigate on these exciting problems in our future works”

Graph signal Processing #2
Kernel Regression for Signals over Graphs
Arun Venkitaraman, Saikat Chatterjee, Peter Händel
Uncertainty Principles and Sparse
Eigenvectors of Graphs
Arun Venkitaraman, Saikat Chatterjee, Peter Händel
IEEE Transactions on Signal Processing ( Volume: 65, Issue: 20, Oct.15, 15 2017 )
https://doi.org/10.1109/TSP.2017.2731299
We propose kernel regression for signals over graphs. The optimal regression coefficients are
learnt using a constraint that the target vector is a smooth signal over an underlying graph.
The constraint is imposed using a graph- Laplacian based regularization. We discuss how
the proposed kernel regression exhibits a smoothing effect, simultaneously achieving noise-
reduction and graph-smoothness. We further extend the kernel regression to simultaneously
learn the underlying graph and the regression coefficients.
Our hypothesis was that incorporating the graph smoothness constraint would help
kernel regression to perform better, particularly when we lack sufficient and reliable
training data. Our experiments illustrate that this is indeed the case in practice.
Through experiments we also conclude that graph signals carry sufficient
information about the underlying graph structure which may be extracted in the
regression setting even with moderately small number of samples in comparison with
the graph dimension. Thus, our approach helps both predict and infer the underlying
topology of the network or graph.
When the graph has
repeated eigenvalues we
explained that s graph
Fourier Basis (GFB) is not
unique, and the derived
lower bound can have
different values depending
on the selected GFB. We
provided a constructive
method to find a GFB that
yields the smallest
uncertainty bound. In order
to find the signals that
achieve the derived lower
bound we considered
sparse eigenvectors of the
graph. We showed that the
graph Laplacian has a 2-
sparse eigenvector if and
only if there exists a pair of
nodes with the same
neighbors. When this
happens, the uncertainty
bound is very low and the 2-
sparse eigenvectors
achieve this bound. We
presented examples of
both classical and real-
world graphs with 2-sparse
eigenvectors. We also
discussed that, in some
examples, the
neighborhood structure
has a meaningful
interpretation.

Graph signal Processing #3 Time-varying graphs
Kernel-Based Reconstruction of Space-Time
Functions on Dynamic Graphs
Daniel Romero ; Vassilis N. Ioannidis ; Georgios B. Giannakis
Filtering Random Graph Processes Over Random
Time-Varying Graphs
Kai Qiu ; Xianghui Mao ; Xinyue Shen ; Xiaohan Wang ; Tiejian Li ; Yuantao Gu
DSLR distributed least squares reconstruction LMS least mean-squares
KKF kernel Kalman filter ECoG electrocorticography
NMSE cumulative normalized mean-square error
This paper investigated kernel-based
reconstruction of space-time
functions on graphs. The adopted
approach relied on the construction of
an extended graph, which regards the
time dimension just as a spatial
dimension. Several kernel designs were
introduced together with a batch and
an online function estimators. The latter
is a kernel Kalman filter developed from
a purely deterministic standpoint
without any need to adopt any state-
space model. Future research will deal
with multi-kernel and distributed
versions of the proposed algorithms.
Schemes tailored for time-evolving functions on
graphs include [Bach and Jordan 2004] and [
Mei and Moura 2016], which predict the
function values at time t given observations up
to time t − 1. However, these schemes assume
that the function of interest adheres to a
specific vector autoregression and all vertices
are observed at previous time instances.
Moreover, [Bach and Jordan 2004] requires
Gaussianity along with an ad hoc form of
stationarity.
However, many real-world graph signals are time-varying, and they evolve smoothly, so instead of the signals themselves
being bandlimited or smooth on graph, it is more reasonable that their temporal differences are smooth on graph. In
this paper, a new batch reconstruction method of time-varying graph signals is proposed by exploiting the smoothness
of the temporal difference signals, and the uniqueness of the solution to the corresponding optimization problem is
theoretically analyzed. Furthermore, driven by practical applications faced with real-time requirements, huge size of
data, lack of computing center, or communication difficulties between two non-neighboring vertices, an online
distributed method is proposed by applying local properties of the temporal difference operator and the graph
Laplacian matrix.
In the future, we
will further study
the applications
of smoothness of
temporal
difference signals,
and may combine
it with other
properties of
signals, such as
low rank.
Besides, it is also
interesting to
consider the
situation where
both the signal
and the graph
are time-
varying.

Graph signal Processing #4 Time-varying graphs
Signal Processing on Graphs: Causal Modeling of
Unstructured Data
Jonathan Mei, José M. F. Moura
(Submitted on 28 Feb 2015 (v1), last revised 8 Feb 2017 (this version, v6))
Learning directed Graph Shifts from High-
Dimensional Time Series
Lukas Nagel(June 2017)
Master Thesis, Institute of Telecommunications (TU Wien)
https://pdfs.semanticscholar.org/8822/526b7b2862f6374f5f950c89a14a7a931820.pdf
Many applications collect a large number of time series, for example, the financial data of companies quoted in a stock
exchange, the health care data of all patients that visit the emergency room of a hospital, or the temperature sequences
continuously measured by weather stations across the US. These data are often referred to as unstructured.
A first task in its analytics is to derive a low dimensional representation, a graph or discrete manifold, that describes well
the interrelations among the time series and their intrarelations across time. This paper presents a computationally
tractable algorithm for estimating this graph that structures the data. The resulting graph is directed and weighted,
possibly capturing causal relations, not just reciprocal correlations as in many existing approaches in the literature. A
convergence analysis is carried out. The algorithm is demonstrated on random graph datasets and real network time
series datasets, and its performance is compared to that of related methods. The adjacency matrices estimated with the
new method are close to the true graph in the simulated data and consistent with prior physical knowledge in the real
dataset tested. Frequency ordering depending on the position of the
eigenvalues λ in C. Both graphics are from
Sandryhaila and Moura 2014.
Causal graph signal process. Visualization of the
information spreading through graph shifts for P3(A, c)
We want to apply the causal graph process estimation algorithm to
stock prices and especially point out some additional points of
failure we spotted.
In the shift matrix shown in Figure 4.9a, we observe that the stocks
number 2, 16 and 24 have many incoming connections. It appears
unlikely that this is due to some economic relations and points
towards a numerical problem.
As we were interested in
potential interpretations
of the shift recovered
from the stock data, we
chose to visualize the
largest possible
directions of the shift
shown in Figure 4.11 as a
graph in Figure 4.12. The
only observation we
could draw from the
graph is that there are
multiple bank stocks,
which affect multiple
other stocks. Otherwise,
the connected
companies show no
common ownership
structure nor even
similar or related
products.
The stocks example with no clear expectation did not lead to promising results. Despite this, we described with scaling and averaging two processing steps
that could be applied before starting the estimation algorithm. It is unclear if further tuning were needed or the domain of daily stock data cannot
reasonably be modeled with causal graph processes, and we, therefore, leave this question open for future research.

Graph Wavelet transform vs. GFT #1
Compression of dynamic 3D point clouds using
subdivisional meshes and graph wavelet transforms
Aamir Anis ; Philip A. Chou ; Antonio Ortega
University of Southern California, Los Angeles, CA; † Microsoft Research, Redmond, WA
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE
The subdivisional structure also allows us to
obtain a sequence of bipartite graphs that
facilitate the use of GraphBior [
Narang et al. (2012)] to compute the wavelet
transform coefficients of the geometry and
color attributes.
Compact Support Biorthogonal Wavelet
Filterbanks for Arbitrary Undirected Graphs
Sunil K. Narang, Antonio Ortega
(Submitted on 30 Oct 2012 (v1), last revised 19 Nov 2012 (this version, v2))
In this paper, we provide a framework for compression of 3D point cloud sequences. Our approach involves
representing sets of frames by a consistently-evolving high-resolution subdivisional triangular mesh. This
representation helps us facilitate efficient implementations of motion estimation and graph wavelet transforms.
The subdivisional structure plays a crucial role in designing a simple hierarchical method for efficiently
estimating these meshes, and the application of Biorthogonal Graph Wavelet Filterbanks for compression.
Preliminary experimental results show promising performances of both the estimation and the compression
steps, and we believe this work shall open new avenues of research in this emerging field.
Comparison of graph wavelet
designs in terms of key properties:
zero highpass response for constant
graph-signal (DC), critical sampling
(CS), perfect reconstruction (PR),
compact support (Comp),
orthogonal expansion (OE), requires
graph simplification (GS).
In this paper we have presented
novel graph-wavelet filterbanks that
provide a critically sampled
representation with compactly
supported basis functions.
The filterbanks come in two flavors:
a) nonzeroDC filterbanks, and b)
zeroDC filterbanks. The former
filterbanks are designed as
polynomials of the normalized graph
Laplacian matrix, and the latter
filterbanks are extensions of the
former to provide a zero response
by the highpass operators.
Preliminary results showed that the
filterbanks are useful not only for
arbitrary graph but also to the
standard regular signal processing
domains. Extensions of this work will
focus on the application of these
filters to different scenarios,
including, for example, social
network analysis, sensor networks
etc.

Graph Wavelet transform vs. GFT #2
Bipartite Approximation for Graph
Wavelet Signal Decomposition
Jin Zeng ; Gene Cheung ; Antonio Ortega
IEEE Transactions on Signal Processing ( Volume: 65, Issue: 20, Oct.15, 15 2017 )
https://doi.org/10.1109/TSP.2017.2733489
Splines and Wavelets on Circulant Graphs
Madeleine S. Kotzagiannidis, Pier Luigi Dragotti
(Submitted on 15 Mar 2016)
(a) Two-channel
wavelet filterbank
on bipartite graph;
(b) Kernels of H0
, H1
in graphBior
Narang et al. (2012) with
filter length of 19.
Unlike previous works, our design of the two metrics relates directly to energy
compaction for bipartite subgraph decomposition. Comparison with the state-
of-the-art schemes validates our proposed metrics for energy compaction and
illustrates the efficiency of our approach. We are currently working on different
applications of graphBior with our bipartite approximation, e.g., graph-signal
denoising, which will benefit from the energy compaction in the wavelet domain.
In this paper, we have introduced novel
families of wavelets and associated
filterbanks on circulant graphs with
vanishing moment properties, which
reveal (e-)spline-like functions on
graphs, and promote sparse multiscale
representations.
Moreover, we have discussed
generalizations to arbitrary graphs in the
form of a multidimensional wavelet
analysis scheme based on graph
product decomposition, facilitating a
sparsity-promoting generalization with
the advantage of lower-dimensional
processing. In our future work, we wish
to further explore the sets of graph
signals which can be annihilated with
existing and/or evolved graph wavelets
as well as refine its extensions and
relevance for arbitrary graphs.

Graphlet induced subgraphs of a large network
Estimation of Graphlet Statistics
Ryan A. Rossi, Rong Zhou, and Nesreen K. Ahmed
(Submitted on 6 Jan 2017 (v1), last revised 28 Feb 2017 (this version, v2))

Graph Computing Accelerations
Parallel Local Algorithms for Core, Truss,
and Nucleus Decompositions
Ahmet Erdem Sariyuce, C. Seshadhri, Ali Pinar
Sandia National Laboratories, University of California
Finding the dense regions of a graph and relations among them is a fundamental task in network
analysis. Nucleus decomposition is a principled framework of algorithms that generalizes the k-
core and k-truss decompositions. It can leverage the higher-order structures to locate the dense
subgraphs with hierarchical relations. … We present a framework of local algorithms to obtain the
exact and approximate nucleus decompositions. Our algorithms are pleasingly parallel and can
provide approximations to explore time and quality trade-offs. Our shared-memory implementation
verifies the efficiency, scalability, and effectiveness of our algorithms on real-world networks. In
particular, using 24 threads, we obtain up to 4.04x and 7.98x speedups for k-truss and (3, 4)
nucleus decompositions.

P-Laplacian on graphs
p-Laplacian Regularized Sparse Coding for Human
Activity Recognition
Weifeng Liu ; Zheng-Jun Zha ; Yanjiang Wang ; Ke Lu ; Dacheng Tao
IEEE Transactions on Industrial Electronics ( Volume: 63, Issue: 8, Aug. 2016 )
https://doi.org/10.1109/TIE.2016.2552147
On the game p-Laplacian on weighted
graphs with applications in image
processing and data clustering
A. ELMOATAZ, X. DESQUESNES and M. TOUTAIN
(3 July 2017) European Journal of Applied Mathematics
https://doi.org/10.1017/S0956792517000122
In this paper, we have introduced a new class of normalized p-Laplacian
operators as a discrete adaptation of the game-theoretic p-Laplacian on
weighted graphs. This class is based on new partial difference operator
which interpolate between normalized 2- Laplacian, 1-Laplacian and ∞-
Laplacian on graphs. This operator is also connected to non-local average
operators such as non-local mean, non-local median and non-local midrange.
It generalizes the normalized p-Laplacian on graphs for 1 ≤ p ≤ . We have∞
shown the connections with local and non-local PDEs of p-Laplacian types
and stochastic game Tug-of-War with noise (Peres et al. 2008). We have
proved existence and uniqueness of the Dirichlet problem involving operators
of this new class. Finally, we have illustrated the interest and behaviour of such
operators in some inverse problems in image processing and machine
learning.
The framework of human activity recognition. Firstly, we extract the representative features of human activity including SIFT, STIP and
MFCC. Then we concatenate the histograms formed by bags of each feature. Thirdly, we learn the sparse codes of each sample and
the corresponding dictionary simultaneously by p-Laplacian regularized sparse coding algorithm. Finally, we input the learned sparse
codes into classifiers i.e. support vector machines to conduct human activity recognition.
As a sparse representation, the proposed p-
Laplacian regularized sparse coding algorithm
can also be employed for modern industry using
data-based techniques [Jung et al. 2015;
Shen et al. 2015] and other computer vision
applications such as video summary and
visual tracking [Bai and Li 2014; Yu et al. 2016].
In the future, we will apply the proposed p-
Laplacian regularized sparse coding for more
practical implementations. We will also study
the extensions to the multiview learning and
deep architecture construction for more
attractive performance.
Sparse coding has achieved promising performance in classification. The
most prominent Laplacian regularized sparse coding employs Laplacian
regularization to preserve the manifold structure; however, Laplacian
regularization suffers from poor generalization. To tackle this problem,
we present a p-Laplacian regularized sparse coding algorithm by
introducing the nonlinear generalization of standard graph Laplacian to
exploit the local geometry. Compared to the conventional graph Laplacian,
the p-Laplacian has tighter isoperimetric inequality and the p-
Laplacian regularized sparse coding can achieve superior theoretical
evidence.

“Applied Laplacian” Mesh processing #1A
Spectral Mesh Processing
H. Zhang, O. Van Kaick, R. Dyer
Computer Graphics Forum 9 April 2010
http://dx.doi.org/10.1111/j.1467-8659.2010.01655.x

Graph Framework for Manifold-valued Data image processing
Nonlocal Inpainting of Manifold-valued Data on Finite
Weighted Graphs
Ronny Bergmann, Daniel Tenbrinck
(Submitted on 21 Apr 2017 (v1), last revised 12 Jul 2017 (this version, v2))
open source code: http://www.mathematik.uni-kl.de/imagepro/members/bergmann/mvirt/
A Graph Framework for Manifold-valued Data
Ronny Bergmann, Daniel Tenbrinck
(Submitted on 17 Feb 2017)
Recently, there has been a strong
ambition to translate models and
algorithms from traditional image
processing to non-Euclidean domains,
e.g., to manifold-valued data. While the
task of denoising has been extensively
studied in the last years, there was rarely
an attempt to perform image inpainting
on manifold-valued data. In this paper we
present a nonlocal inpainting method for
manifold-valued data given on a finite
weighted graph.
First numerical examples using a nonlocal graph
construction with patch-based similarity
measures demonstrate the capabilities and
performance of the inpainting algorithm applied
to manifold-valued images.
Despite an analytic investigation of the
convergence of the presented scheme, future
work includes further development of
numerical algorithms, as well as properties of
the -Laplacian for manifold-valued vertex∞
functions on graphs
Illustration of the basic
definitions and concepts on a
Riemannian manifold M.
In the following we present several examples illustrating the
large variety of problems that can be tackled using the
proposed manifold-valued graph framework. Furthermore,
we compare our framework for the special case of nonlocal
denoising of phase-valued data to a state-of-the-art method.
Finally, we demonstrate a real-world application from
denoising surface normals in digital elevation maps from
LiDAR data. Subsequently, we model manifold-data
measured on samples of an explicitly given surface and in
particular illustrate denoising of diffusion tensors measured
on a sphere. Finally, we investigate denoising of real DT-MRI
data from medical applications both on a regular pixel grid as
well as on an implicitly given surface. All algorithm were
implemented in Mathworks Matlab by extending the open
source software package
Manifold-valued Image Restoration Toolbox (MVIRT) .
Reconstruction results of measured surface normals in digital
elevation maps (DEM) generated by light detection and ranging
(LiDAR) measurements of earth’s surface topology.
Reconstruction results of manifold-valued data given on the
implicit surface of the open Camino brain data set.

segmentation of graphs #1
Convex variational methods for multiclass data
segmentation on graphs
Egil Bae, Ekaterina Merkurjev
(Submitted on 4 May 2016 (v1), last revised 16 Feb 2017 (this version, v4))
https://arxiv.org/abs/1605.01443 | https://doi.org/10.1007/s10851-017-0713-9
Theoretical Analysis of Active Contours on
Graphs
Christos Sakaridis, Kimon Drakopoulos, Petros Maragos
(Submitted on 24 Oct 2016)
Detection of triangle on a random geometric graph. Edges are
omitted for illustration purposes. (a) Original triangle on graph (b)–
(f) Instances of active contour evolution at intervals of 60
iterations, with vertices in the contour’s interior shown in red and
the rest in blue (g) Final detection result after 300 iterations, using
green for true positives, blue for true negatives, red for false
positives and black for false negatives.
Experiments on 3D point clouds acquired by a LiDAR in outdoor scenes demonstrate that the
scenes can accurately be segmented into object classes such as vegetation, the ground plane
and regular structures. The experiments also demonstrate fast and highly accurate convergence
of the algorithms, and show that the approximation difference between the convex and original
problems vanishes or becomes extremely low in practice.
In the future, it would be interesting to investigate region
homogeneity terms for general unsupervised classification
problems. In addition to avoiding the problem of trivial global
minimizers, the region terms may improve the accuracy compared
to models based primarily on boundary terms. Region
homogeneity may for instance be defined in terms of the
eigendecomposition of the covariance matrix or graph Laplacian.

segmentation of graphs #2:
Scalable Motif-aware Graph Clustering
CE Tsourakakis, J Pachocki, Michael Mitzenmacher Harvard University, Cambridge, MA, USA
WWW '17 Proceedings of the 26th International Conference on World Wide Web
Pages 1451-1460
https://doi.org/10.1145/3038912.3052653
Coarsening Massive Influence Networks
for Scalable Diffusion Analysis
Naoto Ohsaka, Tomohiro Sonobe, Sumio Fujita, Ken-ichi Kawarabayashi
SIGMOD '17 Proceedings of the 2017 ACM International Conference on
Management of Data Pages 635-650
https://doi.org/10.1145/3035918.3064045
“superpixelization”/clustering
to speed-up computations
Higher-order organization of complex networks
Austin R. Benson, David F. Gleich, Jure Leskovec (Submitted on 26 Dec 2016)
https://arxiv.org/abs/1612.08447 pre-print to Science→
https://doi.org/10.1126/science.aad9029
Theoretical results in the
supplementary materials
also explain why classes of
hypergraph partitioning
methods are more general
than previously assumed and
how motif-based clustering
provides a rigorous
framework for the special
case of partitioning directed
graphs. Finally, the higher-
order network clustering
framework is generally
applicable to a wide range of
network types, including
directed, undirected,
weighted, and signed
networks.

Graph Summarization #1A
Graph Summarization: A Survey
Yike Liu, Abhilash Dighe, Tara Safavi, Danai Koutra
(Submitted on 14 Dec 2016 (v1), last revised 12 Apr 2017 (this version, v2))
The abundance of generated data and its velocity call for data summarization, one of the main
data mining tasks. … This survey focuses on summarizing interconnected data, otherwise known
as graphs or networks. … . In general, graph summarization or coarsening or aggregation
approaches seek to find a short representation of the input graph, often in the form of a summary
or sparsified graph, which reveals patterns in the original data and preserves specific structural or
other properties, depending on the application domain.

Graph Summarization #1B
Table I: Qualitative comparison of static graph summarization techniques. The first six columns describe the type of the input graph (e.g. with weighted/directed edges, and one/multiple types of node entities), followed by
three algorithm-specific properties (i.e., user parameters, algorithmic compexity—linear on the number of edges or higher—, and type of output). The last column gives the final purpose of each approach. Notation: (1) ∗
indicates that the algorithm can be extended to handle the corresponding type of input, but the authors do not provide details in the paper, for complexity indicates sub-linear; (2) + means that at least one parameter can be∗
set by the user, but it is not required (i.e., the algorithm provides a default value). - Liu et al. (2017)

Point cloud resampling via graphs
Fast Resampling of 3D Point Clouds via Graphs
Siheng Chen ; Dong Tian ; Chen Feng ; Anthony Vetro ; Jelena Kovačević
Proposed resampling strategy
enhances contours of a point
cloud. Plots (a) and (b)
resamples 2% points from a 3D
point cloud of a building
containing 381, 903 points. Plot
(b) is more visual-friendly than
Plot (a). Note that the proposed
resampling strategy is able to to
enhance any information
depending on users’
preferences.

2D Image Processing with graphs
Directional graph weight prediction for
image compression
Francesco Verdoja ; Marco Grangetto
The experimental results showed that the
proposed technique is able to improve the
compression efficiency; as an example we
reported a Bjøntegaard Delta (BD) rate
reduction of about 30% over JPEG.
Future works will investigate the integration of
the proposed method in more advanced image
and video coding tools comprising adaptive
block sizes and richer set of intra prediction
modes.
Luminance coding in graph-based
representation of multiview images
Thomas Maugey ; Yung-Hsuan Chao ; Akshay Gadde ; Antonio Ortega ; Pascal Frossard
Image Processing (ICIP), 2014 IEEE
https://doi.org/10.1109/ICIP.2014.7025025
(a) Wavelet decomposition on graphs in GraphBior, where shape {circle, triangle, square, and
cross} denote coefficients in LL, LH, HL, HH subbands. (b) Parent-children relationship: P node in
LH band of level l + 1 has five children from two views in level l marked with blue. (c)The procedure
of finding the children node in level for the parent node in level l + 1 (be

Background on
GRAPH Deep learning
Beyond the short
introduction from
the review above

Graph structure known or not?
GRAPH KNOWN
”Graph well defined, when the temperature
measurement positions are known, and
temperature measurement uncertainty is small”
- Perraudin and Vandergheynst 2016
GRAPH “Semi-KNOWN”
”In a way the structure is known as we can
quantify graph signal as number of citations with
some journal impact factor weighing, but does
this really represent the impact of an article?
Scientists are known to game the system and
just responding to the metrics[
*]
. Are they
alternative ways to improve the graph to
represent better the impact of an article and the
GRAPH NOT KNOWN
“Point cloud measured with a terrestrial laser
scanner is unordered point cloud given on non-
grid x,y,z coordinates. It is not trivial to define
how the points are connected to each other”
Bibliometric network analysis by Nees Jan van Eck
[
*]
See e.g.
Clauset, Aaron, Daniel B. Larremore, and Roberta Sinatra. "Data-driven
predictions in the science of science." Science 355.6324 (2017): 477-480. DOI:
10.1126/science.aal4217
Sinatra, Roberta, et al. "Quantifying the evolution of individual scientific
impact." Science 354.6312 (2016): aaf5239. DOI: 10.1126/science.aaf5239
Furlanello, Cesare, et al. "Towards a scientific blockchain framework for
reproducible data analysis." arXiv preprint arXiv: 1707.06552 (2017).
the R-factor, with R standing for reputation, reproducibility, responsibility, and
robustness, http://verumanalytics.io/
Overview of the segmentation method: (a) the initial LiDAR point cloud, (b)
height raster image, (c) patches formed with adjacent cells of the same
value, (d) hierarchized patches, (e) weighted graph, (f) graph partition, (g)
partition result on the raster, (h) segmented point cloud.
- Strimbu and Strimbu (2015)
Graphics and Media
Lab (GML) is a part of
Department of
Computational
Mathematics and
Cybernetics of M.V.
Lomonosov Moscow
State University.
http://graphics.cs.msu.
ru/en/node/922
http://slideplayer.com/slide/8146222/

Convolutions for graphs #1
Deep Convolutional Networks on
Graph-Structured Data
Mikael Henaff, Joan Bruna, Yann LeCun
https://github.com/mdeff/cnn_graph
However, as our results demonstrate, their extension poses significant
challenges:
• Although the learning complexity requires O(1) parameters per feature map,
the evaluation, both forward and backward, requires a multiplication by the
Graph Fourier Transform, which costs O(N2
) operations. This is a major
difference with respect to traditional ConvNets, which require only O(N).
Fourier implementations of Convnets bring the complexity to O(N log N)
thanks again to the specific symmetries of the grid. An open question is
whether one can find approximate eigenbasis of general Graph Laplacians
using Givens’ decompositions similar to those of the FFT.
Our experiments show that when the input graph structure is not known a
priori, graph estimation is the statistical bottleneck of the model,
requiring O(N2) for general graphs and O(MN) for M-dimensional graphs.
Supervised graph estimation performs significantly better than unsupervised
graph estimation based on low-order moments. Furthermore, we have
verified that the architecture is quite sensitive to graph estimation
errors. In the supervised setting, this step can be viewed in terms of a
Bootstrapping mechanism, where an initially unconstrained network is self-
adjusted to become more localized and with weightsharing.
• Finally, the statistical assumptions of stationarity and compositionality
are not always verified. In those situations, the constraints imposed by the
model risk to reduce its capacity for no reason. One possibility for addressing
this issue is to insert Fully connected layers between the input and the
spectral layers, such that data can be transformed into the appropriate
statistical model. Another strategy, that is left for future work, is to relax the
notion of weight sharing by introducing instead a commutation error ∥Wi
L −
LWi
∥ with the graph Laplacian, which puts a soft penalty on transformations
that do not commute with the Laplacian, instead of imposing exact
commutation as is the case in the spectral net.
We explore for two areas of application for which it has not been
possible to apply convolutional networks before: text categorization
and bioinformatics. Our results show that our method is capable of
matching or outperforming large, fully-connected networks trained
with dropout using fewer parameters.
Our main contributions can be summarized as follows:
● We extend the ideas from Bruna et al. (2013) to large-scale
classification problems, specifically Imagenet Object
Recognition, text categorization and bioinformatics.
● We consider the most general setting where no prior information
on the graph structure is available, and propose unsupervised
and new supervised graph estimation strategies in combination
with the supervised graph convolutions.

Learning Convolutional Neural Networks
for Graphs
Mathias Niepert, Mohamed Ahmed, Konstantin Kutzkov ;
Proceedings of The 33rd International Conference on Machine
Learning, PMLR 48:2014-2023, 2016.
http://proceedings.mlr.press/v48/niepert16.html
A CNN with a receptive field of size 3x3. The field is moved over an image from
left to right and top to bottom using a particular stride (here: 1) and zero-
padding (here: none) (a). The values read by the receptive fields are
transformed into a linear layer and fed to a convolutional architecture (b). The
node sequence for which the receptive fields are created and the shapes of
the receptive fields are fully determined by the hyper-parameters.
An illustration of the proposed architecture. A node sequence is selected
from a graph via a graph labeling procedure. For some nodes in the sequence,
a local neighborhood graph is assembled and normalized. The normalized
neighborhoods are used as receptive fields and combined with existing CNN
components.
The normalization is performed for each of the graphs induced on the neighborhood of a root node v (the red node; node colors indicate distance to
the root node). A graph labeling is used to rank the nodes and to create the normalized receptive fields, one of size k (here: k = 9) for node attributes
and one of size k × k for edge attributes. Normalization also includes cropping of excess nodes and padding with dummy nodes. Each vertex (edge)
attribute corresponds to an input channel with the respective receptive field.
Visualization of RBM features learned with 1-dimensional WL normalized receptive fields of size 9 for a torus (periodic lattice, top left), a preferential
attachment graph (Barabási & Albert 1999, bottom left), a co-purchasing network of political books (top right), and a random graph (bottom right).
Instances of these graphs with about 100 nodes are depicted on the left. A visual representation of the feature’s weights (the darker a pixel, the stronger
the corresponding weight) and 3 graphs sampled from the RBMs by setting all but the hidden node corresponding to the feature to zero. Yellow nodes
have position 1 in the adjacency matrices
“Directions for future work include the use of alternative neural network architectures such
as recurrent neural networks (RNNs); combining different receptive field sizes; pretraining
with e restricted Boltzman machines (RBMs) and autoencoders; and statistical relational
models based on the ideas of the approach.”

Geometric deep learning on graphs
and manifolds using mixture model
CNNs
Federico Monti, Davide Boscaini, Jonathan Masci,
Emanuele Rodolà, Jan Svoboda, Michael M. Bronstein
Submitted on 25 Nov 2016 (v1), last revised 6 Dec 2016
(this version, v3))
Left: intrinsic local polar coordinates ,ρ θ on manifold around a point marked in white. Right: patch operator weighting functions wi
( , )ρ θ used in different
generalizations of convolution on the manifold (hand-crafted in GCNN and ACNN and learned in MoNet). All kernels are L -normalized; red curves∞
represent the 0.5 level set.
Representation of images as graphs. Left: regular grid
(the graph is fixed for all images). Right: graph of superpixel
adjacency (different for each image). Vertices are shown as
red circles, edges as red lines.
Learning configuration used for Cora
and PubMed experiments.
. Predictions obtained
applying MoNet over the
Cora dataset. Marker fill
color represents the
predicted class; marker
outline color represents
the groundtruth class.
In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs
and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN
methods previously proposed in the literature can be considered as particular instances of our framework. We test the
proposed method on standard tasks from the realms of image-, graph- and 3D shape analysis and show that it consistently
outperforms previous approaches.

Convolutional Neural Networks on Graphs
with Fast Localized Spectral Filtering
Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst
Advances in Neural Information Processing Systems 29 (NIPS 2016)
https://github.com/mdeff/cnn_graph
https://youtu.be/cIA_m7vwOVQ
Architecture of a CNN on graphs and the four ingredients of a (graph)
convolutional layer.
It is however known that graph clustering is NP-hard [Bui and Jones, 1992] and
that approximations must be used. While there exist many clustering
techniques, e.g. the popular spectral clustering [von Luxburg, 2007], we are
most interested in multilevel clustering algorithms where each level produces a
coarser graph which corresponds to the data domain seen at a different
resolution.
Future works will investigate two directions.
On one hand, we will enhance the proposed framework with newly developed tools in
GSP. On the other hand, we will explore applications of this generic model to important
fields where the data naturally lies on graphs, which may then incorporate external
information about the structure of the data rather than artificially created graphs which
quality may vary as seen in the experiments.
Another natural and future approach, pioneered in [Henaff et al. 2015], would be to
alternate the learning of the CNN parameters and the graph.

Top: Schematic
illustration of a
standard CNN where
patches of w×h
pixels are convolved
with D×E filters to
map the D
dimensional input
features to E
dimensional output
features.
Middle: same, but
representing the
CNN parameters as
a set of M = w×h
weight matrices,
each of size D×E.
Each weight matrix is
associated with a
single relative
position in the input
patch.
Bottom: our graph
convolutional
network, where each
relative position in
the input patch is
associated in a soft
manner to each of
the M weight
matrices using the
function q(xi
, xj
).

CayleyNets: Graph Convolutional Neural
Networks with Complex Rational Spectral Filters
Ron Levie, Federico Monti, Xavier Bresson, Michael M. Bronstein
(Submitted on 22 May 2017)
The core ingredient of our model is a new class of parametric rational complex functions
(Cayley polynomials) allowing to efficiently compute localized regular filters on graphs that
specialize on frequency bands of interest. Our model scales linearly with the size of the input
data for sparsely-connected graphs, can handle different constructions of Laplacian
operators, and typically requires less parameters than previous models
Filters (spatial domain, top and spectral domain, bottom) learned by
CayleyNet (left) and ChebNet (center, right) on the MNIST dataset.
Cayley filters are able to realize larger supports for the same order r.
Eigenvalues of the unnormalized Laplacian h u∆ of the 15-communities
graph mapped on the complex unit half-circle by means of Cayley
transform with spectral zoom values (left-to-right) h = 0.1, 1, and 10. The first
15 frequencies carrying most of the information about the communities are
marked in red. Larger values of h zoom (right) on the low frequency band

Graph Convolutional Matrix Completion
Rianne van den Berg, Thomas N. Kipf, Max Welling
Left: Rating matrix M with entries that correspond to user-item interactions (ratings between 1-
5) or missing observations (0). Right: User-item interaction graph with bipartite structure. Edges
correspond to interaction events, numbers on edges denote the rating a user has given to a
particular item. The matrix completion task (i.e. predictions for unobserved interactions) can
be cast as a link prediction problem and modeled using an end-to-end trainable graph auto-
encoder.
Schematic of a forward-pass through the MC-GC model, which is comprised of a graph convolutional encoder [U, V ] =
f(X, M1, . . . , MR) that passes and transforms messages from user to item nodes, and vice versa, followed by a bilinear
decoder model that predicts entries of the (reconstructed) rating matrix M = g(U, V), based on pairs of user and item
embeddings.
“Our model can be seen as a first step towards
modeling recommender systems where the
interaction data is integrated into other structured
modalities, such as a social network or a
knowledge graph.
As a next step, it would be interesting to investigate
how the differentiable message passing scheme of
our encoder model can be extended to such
structured data environments. We expect that
further approximations, e.g. subsampling of local
graph neighborhoods, will be necessary in order to
keep requirements in terms of computation and
memory in a feasible range.”

Graph Based Convolutional Neural Network
Michael Edwards, Xianghua Xie
(Submitted on 28 Sep 2016)
Graph based Convolutional Neural Network components. The GCNN is designed from an architecture of
graph convolution and pooling operator layers. Convolution layers generate O output feature maps
dependent on the selected O for that layer. Graph pooling layers will coarsen the current graph and graph
signal based on the selected vertex reduction method.
Two levels of graph pooling operation on regular and irregular grid with MNIST signal. From left: Regular grid, AMG level 1, AMG
level 2, Irregular grid, AMG level 1, AMG level 2.
Feature maps formed by a feed-forward pass of the regular domain. From left: Original image, Convolution round 1, Pooling round
1, Convolution round 2, Pooling round 2
Feature maps formed by a feed-forward pass of the irregular domain. From left: Original image, Convolution round 1, Pooling
round 1, Convolution round 2, Pooling round 2.
This study proposes a novel method of performing deep convolutional learning on the
irregular graph by coupling standard graph signal processing techniques and
backpropagation based neural network design.
Convolutions are performed in the spectral domain of the graph Laplacian and allow for the learning of
spatially localized features whilst handling the nontrivial irregular kernel design. Results are provided on
both a regular and irregular domain classification problem and show the ability to learn localized feature
maps across multiple layers of a network. A graph pooling method is provided that agglomerates
vertices in the spatial domain to reduce complexity and generalize the features learnt. GPU performance
of the algorithm improves upon training and testing speed, however further optimization is needed.
Although the results on the regular grid are outperformed by standard CNN architecture this is
understandable due to the direct use of a local kernel in the spatial domain.
The major contribution over standard CNNs is the ability to function on the irregular graph is not to be
underestimated. Graph based CNN requires costly forward and inverse graph Fourier transforms, and
this requires some work to enhance usability in the community. Ongoing study into graph construction
and reduction techniques is required to encourage uptake by a wider range of problem domains.

Generalizing CNNs for data structured on
locations irregularly spaced out
Jean-Charles Vialatte, Vincent Gripon, Grégoire Mercier
(Submitted on 3 Jun 2016 (v1), last revised 4 Jul 2017 (this version, v3))
In this paper, we have defined a generalized convolution
operator. This operator makes possible to transport the
CNN paradigm to irregular domains. It retains the
proprieties of a regular convolutional operator. Namely, it is
linear, supported locally and uses the same kernel of
weights for each local operation. The generalized
convolution operator can then naturally be used instead of
convolutional layers in a deep learning framework.
Typically, the created model is well suited for input data
that has an underlying graph structure.
The definition of this operator is flexible enough for it
allows to adapt its weight-allocation map to any input
domain, so that depending on the case, the distribution of
the kernel weight can be done in a way that is natural for
this domain. However, in some cases, there is no natural
way but multiple acceptable methods to define the weight
allocation. In further works, we plan to study these
methods. We also plan to apply the generalized operator
on unsupervised learning tasks.

Robust Spatial Filtering with Graph
Convolutional Neural Networks
Felipe Petroski Such, Shagan Sah, Miguel Dominguez, Suhas Pillai,
Chao Zhang, Andrew Michael, Nathan Cahill, Raymond Ptucha
(Submitted on 2 Mar 2017 (v1), last revised 14 Jul 2017 (this version, v3))
https://github.com/fps7806/Graph-CNN
Two types of graph datasets. Left: Homogeneous
datasets. All samples in a homogeneous graph data
have identical graph structure, but different vertex
values or “signals”. Right: Heterogeneous graph
samples. Heterogeneous graph samples can vary in
number of vertices, structure of edge connections,
and in the vertex values.
General vertex-edge domain Graph-CNN architecture. Convolution and pooling layers are cascaded into a deep network. FC are fully-
connected layers for graph classification. V is vertex set and A is adjacency matrix that define a graph.
Graph convolution and pooling setting. The convolution operation
obtains a filtered representation of the graph after a multi-hop vertex
filter. Likewise, a compact representation of the graph after a pooling
layer

A Generalization of Convolutional Neural
Networks to Graph-Structured Data
Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
https://github.com/hechtlinger/graph_cnn
Visualization of the graph convolution size 5. For a given node, the convolution
is applied on the node and its 4 closest neighbors selected by the random
walk. As the right figure demonstrates, the random walk can expand further
into the graph to higher degree neighbors. The convolution weights are
shared according to the neighbors’ closeness to the nodes and applied
globally on all nodes.
Visualization of a row of Q(k)
on the graph generated over the 2-D grid at a node near the center, when connecting each node to
its 8 adjacent neighbors. For k = 1, most of the weight is on the node, with smaller weights on the first order neighbors. This
corresponds to a standard 3 × 3 convolution. As k increases the number of active neighbors also increases, providing greater
weight to neighbors farther away, while still keeping the local information.
We propose a generalization of convolutional neural networks from grid-structured data to
graph-structured data, a problem that is being actively researched by our community. Our novel
contribution is a convolution over a graph that can handle different graph structures as its
input. The proposed convolution contains many sought-after attributes; it has a natural and
intuitive interpretation, it can be transferred within different domains of knowledge, it is
computationally efficient and it is effective.
Furthermore, the convolution can be applied on standard regression or classification problems
by learning the graph structure in the data, using the correlation matrix or other methods.
Compared to a fully connected layer, the suggested convolution has significantly fewer parameters
while providing stable convergence and comparable performance. Our experimental results on the
Merck Molecular Activity data set and MNIST data demonstrate the potential of this approach.
Convolutional Neural Networks have already revolutionized the fields of computer vision, speech
recognition and language processing. We think an important step forward is to extend it to other
problems which have an inherent graph structure.

Autoencoders for graphs
Variational Graph Auto-Encoders
Thomas N. Kipf, Max Welling
(Submitted on 21 Nov 2016)
https://github.com/tkipf/gae
→ http://tkipf.github.io/graph-convolutional-networks/
Latent space of unsupervised VGAE model
trained on Cora citation network dataset.
Grey lines denote citation links. Colors
denote document class (not provided during
training).
Future work will investigate
better-suited prior
distributions (instead of
Gaussian here), more flexible
generative models and the
application of a stochastic
gradient descent algorithm for
improved scalability.
Modeling Relational Data with Graph Convolutional Networks
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling
(Submitted on 17 Mar 2017 (v1), last revised 6 Jun 2017 (this version, v3))
In this work, we introduce relational GCNs (R-GCNs). R-GCNs are specifically designed to deal with highly multi-relational data,
characteristic of realistic knowledge bases. Our entity classification model, similarly to Kipf and Welling [see left], uses softmax
classifiers at each node in the graph. The classifiers take node representations supplied by an R-GCN and predict the labels. The
model, including R-GCN parameters, is learned by optimizing the cross-entropy loss. Our link prediction model can be regarded as
an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities, and (2) a decoder: a
tensor factorization model exploiting these representations to predict labeled edges. Though in principle the decoder can rely on any
type of factorization (or generally any scoring function), we use one of the simplest and most effective factorization methods: DistMult [
Yang et al. 2014].
(a) R-GCN per-layer update for a single graph node (in light red). Activations from neighboring nodes (dark blue) are
gathered and then transformed for each relation type individually (for both in- and outgoing edges). The resulting
representation is accumulated in a (normalized) sum and passed through an activation function (such as the ReLU). This
per-node update can be computed in parallel with shared parameters across the whole graph. (b) Depiction of an R-GCN
model for entity classification with a per-node loss function. (c) Link prediction model with an R-GCN encoder
(interspersed with fully-connected/dense layers) and a DistMult decoder that takes pairs of hidden node representations
and produces a score for every (potential) edge in the graph. The loss is evaluated per edge.

Representation Learning For graphs #1
Inductive Representation Learning on Large Graphs
William L. Hamilton, Rex Ying, Jure Leskovec (Submitted on 7 Jun 2017)
http://snap.stanford.edu/graphsage/
We propose a general framework, called GraphSAGE (SAmple and
aggreGatE), for inductive node embedding. Unlike embedding approaches that
are based on matrix factorization, we leverage node features (e.g., text
attributes, node profile information, node degrees) in order to learn an
embedding function that generalizes to unseen nodes. By incorporating node
features in the learning algorithm, we simultaneously learn the topological
structure of each node’s neighborhood as well as the distribution of node
features in the neighborhood. While we focus on feature-rich graphs (e.g.,
citation data with text attributes, biological data with functional/molecular
markers), our approach can also make use of structural features that are
present in all graphs (e.g., node degrees). Thus, our algorithm can also be
applied to graphs without node features (i.e. point clouds with only the xyz-
coordinates without RGB texture, normals, etc.)
Low-dimensional vector embeddings of nodes in large graphs have proved
extremely useful as feature inputs for a wide variety of prediction and graph analysis
tasks. The basic idea behind node embedding approaches is to use dimensionality
reduction techniques to distill the high-dimensional information about a node’s
neighborhood into a dense vector embedding. These node embeddings can then be
fed to downstream machine learning systems and aid in tasks such as node
classification, clustering, and link prediction (e.g. LINE, see below).
However, previous works have focused on embedding nodes from a single fixed graph,
and many real-world applications require embeddings to be quickly generated for
unseen nodes, or entirely new (sub)graphs. This inductive capability is essential for
high-throughput, production machine learning systems, which operate on evolving
graphs and constantly encounter unseen nodes (e.g., posts on Reddit, users and videos
on Youtube). An inductive approach to generating node embeddings also facilitates
generalization across graphs with the same form of features: for example, one could
train an embedding generator on protein-protein interaction graphs derived from a
model organism, and then easily produce node embeddings for data collected on new
organisms using the trained model.
LINE: Large-scale Information Network Embedding
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
https://github.com/tangjianpku/LINE

Representation Learning For graphs #2
Skip-graph: Learning graph embeddings with an
encoder-decoder model
John Boaz Lee, Xiangnan Kong
04 Nov 2016 (modified: 11 Jan 2017) ICLR 2017 conference submission
https://openreview.net/forum?id=BkSqjHqxg&noteId=BkSqjHqxg
We introduced an unsupervised method, based on the encoder-decoder model, for
generating feature representations for graph-structured data. The model was
evaluated on the binary classification task on several real-world datasets. The
method outperformed several state-of-the-art algorithms on the tested datasets.
There are several interesting directions for future work. For instance, we can try
training multiple encoders on random walks generated using very different
neighborhood selection strategies. This may allow the different encoders to capture
different properties in the graphs. We would also like to test the approach using
different neural network architectures. Finally, it would be interesting to test the
method on other types of heterogeneous information networks.

Semi-supervised Learning For graphs
Inductive Representation Learning on Large Graphs
Thang D. Bui, Sujith Ravi, Vivek Ramavajjala
University of Cambridge, United Kingdom; Google Research, Mountain View, CA, USA
We have revisited graph-augmentation training of neural networks and proposed
Neural Graph Machines as a general framework for doing so. Its label propagation (for
semi-supervised CNNs see e.g. Tarvainen and Valpola 2017) objective function
encourages the neural networks to make accurate node-level predictions, as in vanilla
neural network training, as well as constrains the networks to learn similar hidden
representations for nodes connected by an edge in the graph. Importantly, the
objective can be trained by stochastic gradient descent and scaled to large graphs
We validated the efficacy of the graph-augmented objective on various tasks including
bloggers’ interest, text category and semantic intent classification problems, using a
wide range of neural network architectures (FFNNs, CNNs and LSTM RNNs). The
experimental results demonstrated that graph-augmented training almost always
helps to find better neural networks that outperforms other techniques in
predictive performance or even much smaller networks that are faster and easier to
train. Additionally, the node-level input features can be combined with graph features
as inputs to the neural networks. We showed that a neural network that simply takes
the adjacency matrix of a graph and produces node labels, can perform better
than a recently proposed two-stage approach using sophisticated graph embeddings
and a linear classifier. Our framework also excels when the neural network is small,
or when there is limited supervision available.
While our objective can be applied to multiple graphs which come from different
domains, we have not fully explored this aspect and leave this as future work. We
expect the domain-specific networks can interact with the graphs to determine the
importance of each domain/graph source in prediction. We also did not explore using
graph regularisation for different hidden layers of the neural networks; we expect
this is key for the multi-graph transfer setting (Yosinski et al., 2014). Another possible
future extension is to use our objective on directed graphs, that is to control the
direction of influence between nodes during training.

Recurrent Networks for graphs #1
Geometric Matrix Completion with Recurrent
Multi-Graph Neural Networks
Federico Monti, Michael M. Bronstein, Xavier Bresson
Main contribution. In this work, we treat matrix completion problem as deep learning on graph-structured
data. We introduce a novel neural network architecture that is able to extract local stationary patterns
from the high-dimensional spaces of users and items, and use these meaningful representations to infer the
non-linear temporal diffusion mechanism of ratings. The spatial patterns are extracted by a new CNN
architecture designed to work on multiple graphs. The temporal dynamics of the rating diffusion is produced
by a Long-Short Term Memory (LSTM) recurrent neural network (RNN). To our knowledge, our work is the
first application of graph-based deep learning to matrix completion problem.
Recurrent GCNN (RGCNN) architecture
using the full matrix completion model and
operating simultaneously on the rows and
columns of the matrix X. The output of the
Multi-Graph CNN (MGCNN) module is a q-
dimensional feature vector for each element
of the input matrix. The number of
parameters to learn is O(1) and the learning
complexity is O(mn).
Separable Recurrent GCNN (sRGCNN) architecture using the factorized
matrix completion model and operating separately on the rows and
columns of the factors W, H>. The output of the GCNN module is a q-
dimensional feature vector for each input row/column, respectively. The
number of parameters to learn is O(1) and the learning complexity is O(m
+ n).
Evolution of the matrix X(t) with our architecture using full matrix completion model RGCNN (top) and factorized matrix completion model
sRGCNN (bottom). Numbers indicate the RMS error.
Absolute value of the first 8 spectral filters learnt by our bidimensional
convolution. On the left the first filter with the reference axes
associated to the row and column graph eigenvalues.

Recurrent Networks for graphs #2
Learning From Graph Neighborhoods Using
LSTMs
Rakshit Agrawal, Luca de Alfaro, Vassilis Polychronopoulos
(Submitted on 21 Nov 2016)
https://sites.google.com/view/ml-on-structures
→ https://github.com/ML-on-structures/blockchain-lstm
→ → Bitcoin blockchain data used in paper
“The approach is based on a multi-level architecture built from Long Short-Term
Memory neural nets (LSTMs); the LSTMs learn how to summarize the
neighborhood from data. We demonstrate the effectiveness of the proposed
technique on a synthetic example and on real-world data related to
crowdsourced grading, Bitcoin transactions, and Wikipedia edit reversions.”
The blockchain is the public immutable distributed ledger where Bitcoin transactions are recorded [20]. In Bitcoin, coins
are held by addresses, which are hash values; these address identifiers are used by their owners to anonymously hold
bitcoins, with ownership provable with public key cryptography. A Bitcoin transaction involves a set of source addresses,
and a set of destination addresses: all coins in the source addresses are gathered, and they are then sent in various
amounts to the destination addresses.
Mining data on the blockchain is challenging [Meiklejohn et al. 2013] due to the anonymity of addresses. We use data
from the blockchain to predict whether an address will spend the funds that were deposited to it.
We obtain a dataset of addresses by using a slice of the blockchain. In particular, we consider all the addresses where
deposits happened in a short range of 101 blocks, from 200,000 to 200,100 (included) . They contain 15,709 unique
addresses where deposits took place. Looking at the state of the blockchain after 50,000 blocks (which corresponds to
roughly one year later as each block is mined on average every 10 minutes), 3,717 of those addresses still had funds sitting:
we call these “hoarding addresses”. The goal is to predict which addresses are hoarding addresses, and which spent
the funds. We randomly split the 15,709 addresses into a training set of 10,000 and a validation set of 5,709 addresses.
We built a graph with addresses as nodes, and transactions as edges. Each edge was labeled with features of the
transaction: its time, amount of funds transmitted, number of recipients, and so forth, for a total of 9 features. We
compared two different algorithms:
● Baseline: an informative guess; it guesses a label with a probability equal to its percentage in the training set.
● MLSL of depths 1, 2, 3. The outputs and memory sizes of the learners for the reported results are K2 = K3 = 3.
Increasing these to 5 maintained virtually the same performance while increasing training time. Using only 1 output
and memory cell was not providing any advances in performance.
Quantitative Analysis of the Full Bitcoin Transaction Graph
Dorit Ron, Adi Shamir Financial Cryptography 2012
http://doi.org/10.1007/978-3-642-39884-1_2

Time-series analysis with graphs #1
Spectral Algorithms for Temporal Graph Cuts
Arlei Silva, Ambuj Singh, Ananthram Swami
(Submitted on 15 Feb 2017)
We propose novel formulations and algorithms for
computing temporal cuts using spectral graph theory,
multiplex graphs, divide-and-conquer and low-rank
matrix approximation. Furthermore, we extend our
formulation to dynamic graph signals, where cuts
also capture node values, as graph wavelets.
Experiments show that our solutions are accurate and
scalable, enabling the discovery of dynamic
communities and the analysis of dynamic graph
processes.
This work opens several lines for future investigation:
(i) temporal cuts, as a general framework for solving
problems involving dynamic data, can be applied in
many scenarios, we are particularly interested to see
how our method performs in computer vision tasks;
(ii) Perturbation Theory can provide deeper
theoretical insights into the properties of temporal
cuts [Sole-Ribalta et al. 2013; Taylor et al. 2015]
; finally,
(iii) we want to study Cheeger inequalities [Chung 1996]
for temporal cuts, as means to better understand the
performance of our algorithms.
Temporal graph cut for a primary school network. The cut, represented as node colors, reflects the
network dynamics, capturing major changes in the children’s interactions.

Active learning on Graphs
Active Learning for Graph Embedding
Hongyun Cai, Vincent W. Zheng, Kevin Chen-Chuan Chang
(Submitted on 15 May 2017)
https://github.com/vwz/AGE
In this paper, we proposed a novel active learning
framework for graph embedding named Active
Graph Embedding (AGE). Unlike the traditional
active learning algorithms, AGE processes the
data with structural information and learnt
representations (node embeddings), and it is
carefully designed to address the challenges
brought by these two characteristics.
First, to exploit the graphical information, a
graphical centrality based measurement is
considered in addition to the popular information
entropy based and information density based
query criteria.
Second, the active learning and graph
embedding process are jointly run together
by posing the label query at the end of every
epoch of the graph embedding training process.
Moreover, the time-sensitive weights are put on
the three active learning query criteria which
focus on the graphical centrality at the beginning
and shift the focus to the other two embedding
based criteria as the training process progresses
(i.e., more accurate embeddings are learnt).

Transfer learning on Graphs
Intrinsic Geometric Information Transfer
Learning on Multiple Graph-Structured
Datasets
Jaekoo Lee, Hyunjae Kim, Jongsun Lee, Sungroh Yoon
(Submitted on 15 Nov 2016 (v1), last revised 5 Dec 2016 (this version, v2))
Conventional CNN works on a regular grid domain (top); proposed
transfer learning framework for CNN, which can transfer intrinsic
geometric information obtained from a source graph domain to a
target graph domain (bottom).
Overview of the proposed method.
Conclusion We have proposed a new transfer learning framework for deep learning on graph-structured data. Our approach can transfer the
intrinsic geometric information learned from the graph representation of the source domain to the target domain. We observed that the
knowledge transfer between tasks domains is most effective when the source and target domains possess high similarity in their graph
representations. We anticipate that adoption of our methodology will help extend the territory of deep learning to data in non-grid structure as
well as to cases with limited quantity and quality of data. To prove this, we are planning to apply our approach to diverse datasets in different
domains.

Transfer learning on Graphs #2
Deep Feature Learning for Graphs
Ryan A. Rossi, Rong Zhou, Nesreen K. Ahmed
This paper presents a general graph representation learning framework called DeepGL for learning
deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by
deriving a set of base features (e.g., graphlet features) and automatically learns a multi-layered
hierarchical graph representation where each successive layer leverages the output from the
previous layer to learn features of a higher-order. Contrary to previous work, DeepGL learns relational
functions (each representing a feature) that generalize across-networks and therefore useful for graph-
based transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns
interpretable features, and is space-efficient (by learning sparse feature vectors).
Thus, features learned by DeepGL are interpretable
and naturally generalize for across-network transfer
learning tasks as they can be derived on any arbitrary
graph. The framework is flexible with many
interchangeable components, expressive, interpretable,
parallel, and is both space- and time-efficient for large
graphs with runtime that is linear in the number of edges.
DeepGL has all the following desired properties:
● Effective for attributed graphs and across-network transfer learning tasks
● Space-efficient requiring up to 6× less memory
● Fast with up to 182× speedup in runtime
● Accurate with a mean improvement of 20% or more on many applications
● Parallel with strong scaling results.

Learning Graphs learning the graph itself #1
Learning Graph While Training: An Evolving
Graph Convolutional Neural Network
Ruoyu Li, Junzhou Huang
(Submitted on 10 Aug 2017)
“In this paper, we propose a more general and flexible graph convolution network
(EGCN) fed by batch of arbitrarily shaped data together with their evolving graph
Laplacians trained in supervised fashion. Extensive experiments have been
conducted to demonstrate the superior performance in terms of both the acceleration
of parameter fitting and the significantly improved prediction accuracy on multiple
graph-structured datasets.”
In this paper, we explore our approach primarily on
chemical molecular datasets, although the network
can be straightforwardly trained on other graph-
structured data, such as point cloud, social networks
and so on. Our contributions can be summarized as
follows:
● A novel spectral graph convolution layer boosted by
Laplacian learning (SGC-LL) has been proposed to
dynamically update the residual graph Laplacians via metric
learning for deep graph learning.
● Re-parametrization on feature domain has been
introduced in K-hop spectral graph convolution to
enable our proposed deep graph learning and to
grant graph CNNs the similar capability of feature
extraction on graph data as that in the classical CNNs
on grid data.
● An evolving graph convolution network (EGCN) has
been designed to be fed by a batch of arbitrarily
shaped graph-structured data. The network is able to
construct and learn for each data sample the graph
structure that best serves the prediction part of
network. Extensive experimental results indicate the
benefits from the evolving graph structure of data.

Graph structure as the “signal” for prediction
DeepGraph: Graph Structure Predicts
Network Growth
Cheng Li, Xiaoxiao Guo, Qiaozhu Mei
(Submitted on 20 Oct 2016)
“Extensive experiments on five large collections of real-world networks demonstrate that the
proposed prediction model significantly improves the effectiveness of existing methods,
including linear or nonlinear regressors that use hand-crafted features, graph kernels, and
competing deep learning methods.”
Graph descriptor vs. adjacency matrix.
We have described the process in
converting an adjacency matrix into our
graph descriptor, which is then passed
through a deep neural network for further
feature extraction. All computation in this
process is to obtain a more effective low-
level representation of the topological
structure information than the original
adjacency matrix.
First, isometric graphs could be
represented by many different adjacency
matrices, while our graph descriptor would
provide a unique representation for those
isometric graphs. The unique
representation simplifies the neural
network structures for network growth
prediction.
Second, our graph descriptor provides
similar representations for graphs with
similar structures. The similarity of graphs
is less preserved in adjacency matrix
representation. Such information loss
could cause great burden for deep neural
networks in growth prediction tasks.
Third, our graph descriptor is a universal
graph structure representation which
does not depend on vertex ordering or the
number of vertexes, while the adjacency
matrix is not.
The motivation in adopting Heat Kernel Signature (HKS) is its
theoretical proven properties in representing graphs: HKS is an intrinsic
and informative representation for graphs [31]. Intrinsicness means that
isomorphic graphs map to the same HKS representation, and
informativeness means if two graphs have the same HKS representation,
then they must be isomorphic graphs.
A meaningful future direction is to
integrate network structure with other
types of information, such as the content
of information cascades in the network. A
joint representation of multi-modal
information may maximize the
performance of particular prediction
tasks.

Geometric Deep Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Geometric Deep Learning

Semelhante a Geometric Deep Learning (20)

Mais de PetteriTeikariPhD

Mais de PetteriTeikariPhD (20)

Último

Último (20)

Geometric Deep Learning