SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
StructMatrix: large-scale visualization of graphs by
means of structure detection and dense matrices
Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr
University of Sao Paulo
In collaboration with Carnegie Mellon University
(Prof. Christos Faloutsos, and PhD Danai Koutra)
Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335)
In: The Fifth IEEE ICDM Workshop on Data Mining in Networks,
Atlantic City, NJ, USA - November, 2015
http://www.icmc.usp.br/pessoas/junio
Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20
Introduction
Motivation
Big Data!!!
A lot of information, much of it in the form of relationships;
Large-scale graphs: graphs generated by applications in which users
or entities are distributed along large geographical areas - even the
entire planet;
Social networks, recommendation networks, road nets, e-commerce,
computer networks, client-product logs, and many others.
Data analysis is the differential for industrial competition.
General Electric & Accenture.
Jose F Rodrigues-Jr (University of Sao Paulo) 2 / 20
Introduction
Problem
Such graphs are too big:
node-link visualization cannot handle even thousand-vertices graphs;
adjacency matrices are limited by the number of pixels of the screen;
in any case, the cardinality of the nodes prevents rationalization;
non-visual analytical techniques might produce way too many
patterns preventing human cognition.
Still, we want to characterize the structure of graphs for:
understanding the overall structure, and not only the
distribution-based analyses;
spotting outliers and trends that are not dominant;
requesting details on demand concerning subregions of the graph
topology.
Jose F Rodrigues-Jr (University of Sao Paulo) 3 / 20
Introduction
Problem
Layouts node-link and adjacency matrix
Node-link Adjacency matrix
Scalability:
Hundred nodes Thousand nodes
Jose F Rodrigues-Jr (University of Sao Paulo) 4 / 20
Introduction
Methodology overview
Assumptions:
graphs are made of recurrent simple structures (cliques, bi-partite
cores, stars, and chains);
such structures are more meaningful than sole nodes;
even at lower resolutions, the graph main properties are maintained in
a visualization.
Hypothesis: we reach more scalable and meaningful graph visualizations
with:
graph summarization by detecting recurrent structures of the graph;
dense adjacency matrices.
Jose F Rodrigues-Jr (University of Sao Paulo) 5 / 20
Methodology
Proposed method: StructMatrix
Our method has two parts:
1 An algorithm to detect substructures;
2 A dense adjacency matrix of the structures that were detected.
Jose F Rodrigues-Jr (University of Sao Paulo) 6 / 20
Methodology
1.Structure detection
Jose F Rodrigues-Jr (University of Sao Paulo) 7 / 20
Methodology
1.Structure detection
We designed a graph partitioning algorithm based on the fact that
real-world graphs obey to power-law distributions;
In such graphs: few nodes with very high degree and the majority of
nodes with low degree;
Kang and Faloutsos [1] demonstrated that the ordered removal of the
higher degree nodes leads to the removal of hubs from the giant CC,
creating satellite (much smaller) connected components;
This ordered removal lends to a structural scanning of the graph.
Jose F Rodrigues-Jr (University of Sao Paulo) 8 / 20
Methodology
1.Structure detection–Structure vocabulary
StructMatrix Vocabulary ψ
Jose F Rodrigues-Jr (University of Sao Paulo) 9 / 20
Methodology
1.Structure detection–Algorithm
1 If the queue has connected components, StructMatrix gets the first
element for processing.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
2 StructMatrix selects the vertices with higher degree (up to 1% of the
vertices) and removes their edges.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
2 We get a set of smaller connected subcomponents.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
3 We classify the subcomponents according to the vocabulary.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Structure classification
α = n2
4 β = n(n−1)
2 = 0.2
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
4 We store the classified subcomponents; the ones that were not
identified go to the queue waiting for a new round of shattering.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
5 We proceed to the next element in the queue.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Structure detection results
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Runtime
We compare to algorithm VoG (Koutra et al.[2]): better performance, and
bigger vocabulary.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
2.Visualization–Projection
After structure detection, we build an adjacency matrix
structure-to-structure whose edges’ weights indicate the number of
edges between the nodes of each structure;
Although smaller than the original matrix, for million-scale graphs,
the struct matrix is still too large to fit in the screen;
For this reason we create a dense matrix according to a straight
proportion (x, y) → (ρx , ρy ) for:
ρx = (Resx − 1) x−xmin
xmax −xmin
+ 1
2
ρy = (Rexy − 1) y−ymin
ymax −ymin
+ 1
2
(1)
where (x, y) are points of the original matrix and Resx , Resy are the
target resolutions; the more resolution, the more details are presented
– these parameters allow for interactive grasping of details.
Jose F Rodrigues-Jr (University of Sao Paulo) 11 / 20
Methodology
2.Visualization–Projection
Jose F Rodrigues-Jr (University of Sao Paulo) 12 / 20
Methodology
2.Visualization–Layout
We organize the matrix according to structure type, and to number of
edges – size of structures (number of nodes) is given by color.
Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
Methodology
2.Visualization–Layout
We organize the matrix according to structure type, and to number of
edges – size of structures (number of nodes) is given by color.
Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
Experiments
Experiments–Real datasets
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -
Jose F Rodrigues-Jr (University of Sao Paulo) 14 / 20
Experiments
Experiments–Real datasets–WWW-barabasi
WWW-barabasi: webpages and links between them.
Stars (st and fs) refer to webpages with many out links.
Most of the webpages have less than one thousand connections;
however, some present unusual thousand connections.
Jose F Rodrigues-Jr (University of Sao Paulo) 15 / 20
Experiments
Experiments–Real datasets–Road nets
Pennsylvania California Texas
The three road graphs have a similar structure – all U.S. roads;
There is a hierarchical connectivity: bigger to smaller cities;
Surprising grid-like (due to symmetry) structure: intersections refer to
hub cities, and lines refer to inter-city paths.
Jose F Rodrigues-Jr (University of Sao Paulo) 16 / 20
Experiments
Experiments–Real datasets–Road nets
Comparison: Structure-to-structure vs Node-to-node.
California (structure-to-structure) California (node-to-node)
Main differences:
1 The partitioning according to structures;
2 The ordering by number of edges to other structures;
3 There is a hierarchical connectivity: bigger to smaller cities;
4 Surprising grid-like structure: intersections refer to hub cities, and
lines refer to inter-city paths.
Jose F Rodrigues-Jr (University of Sao Paulo) 17 / 20
Experiments
Experiments–Real datasets–DBLP
Overall FC-FC zoom
DBLP is mainly characterized by false stars – possibly because
advisors have students, and students connect one to each other;
By zooming FC-FC, one can see outliers, for instance k3 = “The
Biomolecular Interaction Network Database and related tools 2005
update” 75 authors.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
Conclusions
Contributions
Visualization technique: we introduce a processing and visualization
methodology that puts together algorithmic techniques and design in
order to reach large-scale visualizations;
Analytical scalability: our technique extends the most scalable
technique found in the literature; plus, it is engineered to plot millions
of edges in a matter of seconds;
Practical analysis: we show that large-scale graphs have well-defined
behaviors concerning the distribution of structures, their size, and
how they are related one to each other; finally, using a standard
laptop, our techniques allowed us to experiment in real, large-scale
graphs coming from domains of high impact, i.e., WWW, Wikipedia,
Roadnet, and DBLP.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
References
U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubs
and spokes for graph compression and mining,” in ICDM, 2011, pp.
300–309.
D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog:
Summarizing and understanding large graphs,” in SDM, 2014.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20

Mais conteúdo relacionado

Destaque

6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
Universidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
Universidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
Universidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Universidade de São Paulo
 

Destaque (14)

Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 

Semelhante a StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
butest
 
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
ISA Interchange
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 

Semelhante a StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices (20)

Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier FactorsTraffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
 
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
 
A framework for outlier detection in
A framework for outlier detection inA framework for outlier detection in
A framework for outlier detection in
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streams
 
Sunbelt 2013 Presentation
Sunbelt 2013 PresentationSunbelt 2013 Presentation
Sunbelt 2013 Presentation
 
An improved particle filter tracking
An improved particle filter trackingAn improved particle filter tracking
An improved particle filter tracking
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
 
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
 

Mais de Universidade de São Paulo

Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
Universidade de São Paulo
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Universidade de São Paulo
 

Mais de Universidade de São Paulo (11)

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
 
Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
 
Java generics-basics
Java generics-basicsJava generics-basics
Java generics-basics
 
Java collections-basic
Java collections-basicJava collections-basic
Java collections-basic
 
Java network-sockets-etc
Java network-sockets-etcJava network-sockets-etc
Java network-sockets-etc
 
Java streams
Java streamsJava streams
Java streams
 
Infovis tutorial
Infovis tutorialInfovis tutorial
Infovis tutorial
 
Java platform
Java platformJava platform
Java platform
 

Último

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Último (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

  • 1. StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr University of Sao Paulo In collaboration with Carnegie Mellon University (Prof. Christos Faloutsos, and PhD Danai Koutra) Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335) In: The Fifth IEEE ICDM Workshop on Data Mining in Networks, Atlantic City, NJ, USA - November, 2015 http://www.icmc.usp.br/pessoas/junio Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20
  • 2. Introduction Motivation Big Data!!! A lot of information, much of it in the form of relationships; Large-scale graphs: graphs generated by applications in which users or entities are distributed along large geographical areas - even the entire planet; Social networks, recommendation networks, road nets, e-commerce, computer networks, client-product logs, and many others. Data analysis is the differential for industrial competition. General Electric & Accenture. Jose F Rodrigues-Jr (University of Sao Paulo) 2 / 20
  • 3. Introduction Problem Such graphs are too big: node-link visualization cannot handle even thousand-vertices graphs; adjacency matrices are limited by the number of pixels of the screen; in any case, the cardinality of the nodes prevents rationalization; non-visual analytical techniques might produce way too many patterns preventing human cognition. Still, we want to characterize the structure of graphs for: understanding the overall structure, and not only the distribution-based analyses; spotting outliers and trends that are not dominant; requesting details on demand concerning subregions of the graph topology. Jose F Rodrigues-Jr (University of Sao Paulo) 3 / 20
  • 4. Introduction Problem Layouts node-link and adjacency matrix Node-link Adjacency matrix Scalability: Hundred nodes Thousand nodes Jose F Rodrigues-Jr (University of Sao Paulo) 4 / 20
  • 5. Introduction Methodology overview Assumptions: graphs are made of recurrent simple structures (cliques, bi-partite cores, stars, and chains); such structures are more meaningful than sole nodes; even at lower resolutions, the graph main properties are maintained in a visualization. Hypothesis: we reach more scalable and meaningful graph visualizations with: graph summarization by detecting recurrent structures of the graph; dense adjacency matrices. Jose F Rodrigues-Jr (University of Sao Paulo) 5 / 20
  • 6. Methodology Proposed method: StructMatrix Our method has two parts: 1 An algorithm to detect substructures; 2 A dense adjacency matrix of the structures that were detected. Jose F Rodrigues-Jr (University of Sao Paulo) 6 / 20
  • 7. Methodology 1.Structure detection Jose F Rodrigues-Jr (University of Sao Paulo) 7 / 20
  • 8. Methodology 1.Structure detection We designed a graph partitioning algorithm based on the fact that real-world graphs obey to power-law distributions; In such graphs: few nodes with very high degree and the majority of nodes with low degree; Kang and Faloutsos [1] demonstrated that the ordered removal of the higher degree nodes leads to the removal of hubs from the giant CC, creating satellite (much smaller) connected components; This ordered removal lends to a structural scanning of the graph. Jose F Rodrigues-Jr (University of Sao Paulo) 8 / 20
  • 9. Methodology 1.Structure detection–Structure vocabulary StructMatrix Vocabulary ψ Jose F Rodrigues-Jr (University of Sao Paulo) 9 / 20
  • 10. Methodology 1.Structure detection–Algorithm 1 If the queue has connected components, StructMatrix gets the first element for processing. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 11. Methodology 1.Structure detection–Algorithm 2 StructMatrix selects the vertices with higher degree (up to 1% of the vertices) and removes their edges. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 12. Methodology 1.Structure detection–Algorithm 2 We get a set of smaller connected subcomponents. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 13. Methodology 1.Structure detection–Algorithm 3 We classify the subcomponents according to the vocabulary. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 14. Methodology 1.Structure detection–Structure classification α = n2 4 β = n(n−1) 2 = 0.2 Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 15. Methodology 1.Structure detection–Algorithm 4 We store the classified subcomponents; the ones that were not identified go to the queue waiting for a new round of shattering. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 16. Methodology 1.Structure detection–Algorithm 5 We proceed to the next element in the queue. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 17. Methodology 1.Structure detection–Structure detection results Graph # Structures fs st ch nc fc nb fb DBLP 160.885 76% 5% 2% 2% 15% <1% - WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2% cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1% Wikipedia-vote 1.706 65% 33% 2% - - <1% - Epinions 8774 52% 31% 14% <1% <1% 2% <1% Roadnet PA 51.175 23% 45% 27% - - 5% - Roadnet CA 88.993 27% 39% 29% - - 4% - Roadnet TX 62.614 25% 43% 28% - - 4% - Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 18. Methodology 1.Structure detection–Runtime We compare to algorithm VoG (Koutra et al.[2]): better performance, and bigger vocabulary. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 19. Methodology 2.Visualization–Projection After structure detection, we build an adjacency matrix structure-to-structure whose edges’ weights indicate the number of edges between the nodes of each structure; Although smaller than the original matrix, for million-scale graphs, the struct matrix is still too large to fit in the screen; For this reason we create a dense matrix according to a straight proportion (x, y) → (ρx , ρy ) for: ρx = (Resx − 1) x−xmin xmax −xmin + 1 2 ρy = (Rexy − 1) y−ymin ymax −ymin + 1 2 (1) where (x, y) are points of the original matrix and Resx , Resy are the target resolutions; the more resolution, the more details are presented – these parameters allow for interactive grasping of details. Jose F Rodrigues-Jr (University of Sao Paulo) 11 / 20
  • 21. Methodology 2.Visualization–Layout We organize the matrix according to structure type, and to number of edges – size of structures (number of nodes) is given by color. Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
  • 22. Methodology 2.Visualization–Layout We organize the matrix according to structure type, and to number of edges – size of structures (number of nodes) is given by color. Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
  • 23. Experiments Experiments–Real datasets Graph # Structures fs st ch nc fc nb fb DBLP 160.885 76% 5% 2% 2% 15% <1% - WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2% cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1% Wikipedia-vote 1.706 65% 33% 2% - - <1% - Epinions 8774 52% 31% 14% <1% <1% 2% <1% Roadnet PA 51.175 23% 45% 27% - - 5% - Roadnet CA 88.993 27% 39% 29% - - 4% - Roadnet TX 62.614 25% 43% 28% - - 4% - Jose F Rodrigues-Jr (University of Sao Paulo) 14 / 20
  • 24. Experiments Experiments–Real datasets–WWW-barabasi WWW-barabasi: webpages and links between them. Stars (st and fs) refer to webpages with many out links. Most of the webpages have less than one thousand connections; however, some present unusual thousand connections. Jose F Rodrigues-Jr (University of Sao Paulo) 15 / 20
  • 25. Experiments Experiments–Real datasets–Road nets Pennsylvania California Texas The three road graphs have a similar structure – all U.S. roads; There is a hierarchical connectivity: bigger to smaller cities; Surprising grid-like (due to symmetry) structure: intersections refer to hub cities, and lines refer to inter-city paths. Jose F Rodrigues-Jr (University of Sao Paulo) 16 / 20
  • 26. Experiments Experiments–Real datasets–Road nets Comparison: Structure-to-structure vs Node-to-node. California (structure-to-structure) California (node-to-node) Main differences: 1 The partitioning according to structures; 2 The ordering by number of edges to other structures; 3 There is a hierarchical connectivity: bigger to smaller cities; 4 Surprising grid-like structure: intersections refer to hub cities, and lines refer to inter-city paths. Jose F Rodrigues-Jr (University of Sao Paulo) 17 / 20
  • 27. Experiments Experiments–Real datasets–DBLP Overall FC-FC zoom DBLP is mainly characterized by false stars – possibly because advisors have students, and students connect one to each other; By zooming FC-FC, one can see outliers, for instance k3 = “The Biomolecular Interaction Network Database and related tools 2005 update” 75 authors. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
  • 28. Conclusions Contributions Visualization technique: we introduce a processing and visualization methodology that puts together algorithmic techniques and design in order to reach large-scale visualizations; Analytical scalability: our technique extends the most scalable technique found in the literature; plus, it is engineered to plot millions of edges in a matter of seconds; Practical analysis: we show that large-scale graphs have well-defined behaviors concerning the distribution of structures, their size, and how they are related one to each other; finally, using a standard laptop, our techniques allowed us to experiment in real, large-scale graphs coming from domains of high impact, i.e., WWW, Wikipedia, Roadnet, and DBLP. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
  • 29. References U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubs and spokes for graph compression and mining,” in ICDM, 2011, pp. 300–309. D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog: Summarizing and understanding large graphs,” in SDM, 2014. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20