StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

StructMatrix: large-scale visualization of graphs by
means of structure detection and dense matrices
Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr
University of Sao Paulo
In collaboration with Carnegie Mellon University
(Prof. Christos Faloutsos, and PhD Danai Koutra)
Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335)
In: The Fifth IEEE ICDM Workshop on Data Mining in Networks,
Atlantic City, NJ, USA - November, 2015
http://www.icmc.usp.br/pessoas/junio
Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20

Introduction
Motivation
Big Data!!!
A lot of information, much of it in the form of relationships;
Large-scale graphs: graphs generated by applications in which users
or entities are distributed along large geographical areas - even the
entire planet;
Social networks, recommendation networks, road nets, e-commerce,
computer networks, client-product logs, and many others.
Data analysis is the diﬀerential for industrial competition.
General Electric & Accenture.

Introduction
Problem
Such graphs are too big:
node-link visualization cannot handle even thousand-vertices graphs;
adjacency matrices are limited by the number of pixels of the screen;
in any case, the cardinality of the nodes prevents rationalization;
non-visual analytical techniques might produce way too many
patterns preventing human cognition.
Still, we want to characterize the structure of graphs for:
understanding the overall structure, and not only the
distribution-based analyses;
spotting outliers and trends that are not dominant;
requesting details on demand concerning subregions of the graph
topology.

Introduction
Problem
Layouts node-link and adjacency matrix
Node-link Adjacency matrix
Scalability:
Hundred nodes Thousand nodes

Introduction
Methodology overview
Assumptions:
graphs are made of recurrent simple structures (cliques, bi-partite
cores, stars, and chains);
such structures are more meaningful than sole nodes;
even at lower resolutions, the graph main properties are maintained in
a visualization.
Hypothesis: we reach more scalable and meaningful graph visualizations
with:
graph summarization by detecting recurrent structures of the graph;
dense adjacency matrices.

Methodology
Proposed method: StructMatrix
Our method has two parts:
1 An algorithm to detect substructures;
2 A dense adjacency matrix of the structures that were detected.

Methodology
1.Structure detection

Methodology
1.Structure detection
We designed a graph partitioning algorithm based on the fact that
real-world graphs obey to power-law distributions;
In such graphs: few nodes with very high degree and the majority of
nodes with low degree;
Kang and Faloutsos [1] demonstrated that the ordered removal of the
higher degree nodes leads to the removal of hubs from the giant CC,
creating satellite (much smaller) connected components;
This ordered removal lends to a structural scanning of the graph.

Methodology
1.Structure detection–Structure vocabulary
StructMatrix Vocabulary ψ

Methodology
1.Structure detection–Algorithm
1 If the queue has connected components, StructMatrix gets the ﬁrst
element for processing.

Methodology
2 StructMatrix selects the vertices with higher degree (up to 1% of the
vertices) and removes their edges.

Methodology
2 We get a set of smaller connected subcomponents.

Methodology
3 We classify the subcomponents according to the vocabulary.

Methodology
1.Structure detection–Structure classiﬁcation
α = n2
4 β = n(n−1)
2 = 0.2

Methodology
4 We store the classiﬁed subcomponents; the ones that were not
identiﬁed go to the queue waiting for a new round of shattering.

Methodology
5 We proceed to the next element in the queue.

Methodology
1.Structure detection–Structure detection results
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -

Methodology
1.Structure detection–Runtime
We compare to algorithm VoG (Koutra et al.[2]): better performance, and
bigger vocabulary.

Methodology
2.Visualization–Projection
After structure detection, we build an adjacency matrix
structure-to-structure whose edges’ weights indicate the number of
edges between the nodes of each structure;
Although smaller than the original matrix, for million-scale graphs,
the struct matrix is still too large to ﬁt in the screen;
For this reason we create a dense matrix according to a straight
proportion (x, y) → (ρx , ρy ) for:
ρx = (Resx − 1) x−xmin
xmax −xmin
+ 1
2
ρy = (Rexy − 1) y−ymin
ymax −ymin
+ 1
2
(1)
where (x, y) are points of the original matrix and Resx , Resy are the
target resolutions; the more resolution, the more details are presented
– these parameters allow for interactive grasping of details.

Methodology
2.Visualization–Projection

Methodology
2.Visualization–Layout
We organize the matrix according to structure type, and to number of
edges – size of structures (number of nodes) is given by color.

Experiments
Experiments–Real datasets
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -

Experiments
Experiments–Real datasets–WWW-barabasi
WWW-barabasi: webpages and links between them.
Stars (st and fs) refer to webpages with many out links.
Most of the webpages have less than one thousand connections;
however, some present unusual thousand connections.

Experiments
Experiments–Real datasets–Road nets
Pennsylvania California Texas
The three road graphs have a similar structure – all U.S. roads;
There is a hierarchical connectivity: bigger to smaller cities;
Surprising grid-like (due to symmetry) structure: intersections refer to
hub cities, and lines refer to inter-city paths.

Experiments
Experiments–Real datasets–Road nets
Comparison: Structure-to-structure vs Node-to-node.
California (structure-to-structure) California (node-to-node)
Main diﬀerences:
1 The partitioning according to structures;
2 The ordering by number of edges to other structures;
3 There is a hierarchical connectivity: bigger to smaller cities;
4 Surprising grid-like structure: intersections refer to hub cities, and
lines refer to inter-city paths.

Experiments
Experiments–Real datasets–DBLP
Overall FC-FC zoom
DBLP is mainly characterized by false stars – possibly because
advisors have students, and students connect one to each other;
By zooming FC-FC, one can see outliers, for instance k3 = “The
Biomolecular Interaction Network Database and related tools 2005
update” 75 authors.

Conclusions
Contributions
Visualization technique: we introduce a processing and visualization
methodology that puts together algorithmic techniques and design in
order to reach large-scale visualizations;
Analytical scalability: our technique extends the most scalable
technique found in the literature; plus, it is engineered to plot millions
of edges in a matter of seconds;
Practical analysis: we show that large-scale graphs have well-deﬁned
behaviors concerning the distribution of structures, their size, and
how they are related one to each other; ﬁnally, using a standard
laptop, our techniques allowed us to experiment in real, large-scale
graphs coming from domains of high impact, i.e., WWW, Wikipedia,
Roadnet, and DBLP.

References
U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubs
and spokes for graph compression and mining,” in ICDM, 2011, pp.
300–309.
D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog:
Summarizing and understanding large graphs,” in SDM, 2014.

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (14)

Semelhante a StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Semelhante a StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices (20)

Mais de Universidade de São Paulo

Mais de Universidade de São Paulo (11)

Último

Último (20)

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices