SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Tweet Along @dgleich


                                                                                                                              Spectra
                                                                                                                                   of
                                                                                                                                Large
                                                                                                                             Networks

                                                                                                     David F. Gleich
                                                                                        Sandia National Laboratories

                                                                                                        iCME la/opt seminar
                                                                                                          24 February 2011

Thanks to Ali Pinar, Jaideep Ray, Tammy                     Supported by Sandia’s John von Neumann postdoctoral fellowship
Kolda, C. Seshadhri, Rich Lehoucq, and                                and the DOE Office of Science’s ASCR Graphs project.
Jure Leskovec for helpful discussions.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
                       Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
2/24/2011                                                  Stanford ICME seminar                                                           1/36
Tweet Along @dgleich



There’s information inside the spectra




      Words in dictionary definitions                      Internet router network
      111,000 vertices, 2.7M edges                        192k vetices, 1.2M edges
                                         Also noted in a study of smaller graphs by Banerjee and Jost (2009)
2/24/2011                         Stanford ICME seminar                                                2/36
Tweet Along @dgleich



Overview
Graphs and their matrices

Data for our experiments

Computing spectra for large networks

Issues with computing spectra

Many examples of graph spectra

Conclusion Future work
              Images taken from Stanford, flickr, and Purdue, respectively
2/24/2011                            Stanford ICME seminar                                  3/36
Tweet Along @dgleich




              GRAPHS            As well as things we
            AND THEIR           already know about
                                graph spectra.
             MATRICES




2/24/2011        Stanford ICME seminar                                  4
Tweet Along @dgleich



Matrices from graphs
Adjacency matrix                                Not covered
                                                Signless Laplacian matrix

       if                                       Incidence matrix
                                                 (It is incidentally discussed)

                                                Seidel matrix
Laplacian matrix
                                                Random walk matrix
                                                 
 
                                                Modularity matrix
                                                 
Normalized Laplacian matrix                      
 


                                                                     Everything is undirected.
2/24/2011               Stanford ICME seminar                                            5/36
Tweet Along @dgleich



Why are we interested in the spectra?
Modeling

Properties
  Moments of the adjacency

Anomalies

Regularities

Network Comparison
  Fay et al. 2010 – Weighted Spectral Density
  The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes
2/24/2011                                         Stanford ICME seminar                                               6/36
Tweet Along @dgleich



Spectral bounds from Gerschgorin

 

 

 
    (from a slightly different approach)




2/24/2011                   Stanford ICME seminar                  7/36
Tweet Along @dgleich



Semi-circle law
Wigner’s semi-circle law
 Eigenvalues of a random symmetric matrix where
 each entry is independent have a semi-circle density




Erdős–Rényi graphs obey a special case
                                                  Image from Mathworld
2/24/2011               Stanford ICME seminar                     8/36
Tweet Along @dgleich



Erdős–Rényi Semi-circles
The eigenvalues of the adjacency matrix for
  n=1000, averaged over 10 trials
Semi-circle with outlier if average degree is large enough.




                   Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)
2/24/2011                       Stanford ICME seminar                                               9/36
Tweet Along @dgleich



Previous results
Farkas et al. : Significant deviation from the
  semi-circle law for the adjacency matrix

Mihail and Papadimitriou : Leading eigenvalues
  of the adjacency matrix obey a power-law
  based on the degree-sequence

Chung et al. : Normalized Laplacian still obeys
  a semi-circle law

Banerjee and Jost : Study of types of patterns
  that emerge in evolving graph models –
  explain many features of the spectra
2/24/2011                  Stanford ICME seminar                 10/36
Tweet Along @dgleich



In comparison …
We use “exact” computation of spectra,
 instead of approximation.

We study “all” of the standard matrices
 over a range of large networks.

Our “large” is bigger.

We look at a few different random graph
 models.



2/24/2011                Stanford ICME seminar                 11/36
Tweet Along @dgleich




            DATA




2/24/2011   Stanford ICME seminar                   12
Tweet Along @dgleich



Data sources
 SNAP               Various                       100s-100,000s
 SNAP-p2p           Gnutella Network              5-60k, ~30 inst.
 SNAP-as-733        Autonomous Sys.               ~5,000, 733 inst.
 SNAP-caida         Router networks               ~20,000, ~125 inst.
 Pajek              Various                       100s-100,000s
 Models             Copying Model                 1k-100k 9 inst. 324 gs
                    Pref. Attach                  1k-100k 9 inst. 164 gs
                    Forest Fire                   1k-100k 9 inst. 324 gs
 Mine               Various                       2k-500k
 Newman             Various
 Arenas             Various
 Porter             Facebook                      100 schools, 5k-60k
 IsoRank, Natalie   Protein-Protein               <10k , 4 graphs
                                                     Thanks to all who make data available
2/24/2011                 Stanford ICME seminar                                      13/36
Tweet Along @dgleich



Big graphs
Arxiv             86376               1035126      Co-authorship
Dblp              9356                356290       Co-authorship
Dictionary(*)     111982              2750576      Word definitions
Internet(*)       124651              414428       Routers
Itdk0304          190914              1215220      Routers
p2p-Gnutella(*)   62561               295756       Peer-to-peer
Patents(*)        230686              1109898      Citations
Roads             126146              323900       Roads
Wordnet(*)        75606               240036       Word relationship
Web-NotreDame     325729              2994268      Web




2/24/2011                  Stanford ICME seminar                   14/36
Tweet Along @dgleich



Models
Preferential Attachment
Start graph with a k-node clique. Add a new node and
  connect to k random nodes, chosen proportional to degree.

Copying model
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Copy edges of parent and
  make an error with probability  

Forest Fire
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Do a random “bfs’/”forest fire”
  and link to all nodes “burned”
2/24/2011                Stanford ICME seminar                 15/36
Tweet Along @dgleich




                COMPUTING
                 SPECTRA OF
            LARGE NETWORKS




2/24/2011              Stanford ICME seminar                   16
Tweet Along @dgleich



Matlab!
Always a great starting point.
  My desktop has 24GB of RAM (less than $2500 now!)

24GB/8 bytes (per double) = 3 billion numbers
  ~ 50,000-by-50,000 matrix

Possibilities
  D = eig(A) – needs twice the memory for A,D
  [V,D] = eig(A) – needs three times the memory for A,D,V

These limit us to ~38000 and ~31000 respectively.


2/24/2011               Stanford ICME seminar                 17/36
Tweet Along @dgleich



ScaLAPACK
LAPACK with distributed memory dense matrices
Scalapack uses a 2d block-cyclic dense matrix distribution




2/24/2011                Stanford ICME seminar                 18/36
Tweet Along @dgleich



Eigenvalues with ScaLAPACK
Mostly the same approach as in LAPACK

1. Reduce to tridiagonal form
   (most time consuming part)
2. Distribute tridiagonals to
    all processors
3. Each processor finds
   all eigenvalues
4. Each processor computes a
   subset of eigenvectors

Using the MRRR algorithm steps 3 and 4 are more intricate
and faster.            MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.
2/24/2011                                Stanford ICME seminar                                 19/36
Tweet Along @dgleich



Alternatives
Use ARPACK to get extrema

Use ARPACK to get interior around                via the folded spectrum
 




                                                         Farkas et al. used this approach.
2/24/2011                Stanford ICME seminar                                      20/36
Tweet Along @dgleich




            ISSUES WITH
            COMPUTING
                SPECTRA




2/24/2011          Stanford ICME seminar                   21
Tweet Along @dgleich



Bugs - Matlab


eig(A)
Returns incorrect eigenvectors




                                                 Seems to be the result of a bug in Intel’s MKL library.
2/24/2011                Stanford ICME seminar                                                    22/36
Tweet Along @dgleich



Bug – ScaLAPACK default

sudo apt-get install scalapack-openmpi
Allocate 36000x36000 local matrix
Run on 4 processors

Code crashes




2/24/2011               Stanford ICME seminar                 23/36
Tweet Along @dgleich



Bug - LAPACK

Scalapack MRRR
Compare standard lapack/blas to atlas performance
Result: correct output from atlas
Result: incorrect output from lapack
Hypothesis: lapack contains a known bug that’s apparently in
  the default ubuntu lapack




2/24/2011                Stanford ICME seminar                 24/36
Tweet Along @dgleich



Moral


Always test your software.
Extensively.




2/24/2011                Stanford ICME seminar                 25/36
Tweet Along @dgleich



(Super)-Computers
Redsky
  2x Intel Nehalem 2.9 GHz/8 core
  12 GB/node
  22TB memory total
  used up to 500 nodes/
      6 TB memory

Hopper (I)
  2x AMD quad-core
      2.4 GHz/8 cores
  16 GB/node

Cielo (testbed) 20 nodes
2/24/2011                  Stanford ICME seminar                 26/36
Tweet Along @dgleich



Adding MPI tasks vs. using threads
Most math libraries have threaded versions
   (Intel MKL, AMD ACML)
Is it better to use threads or MPI tasks?

It depends.



                                              Threads          Ranks             Time
Threads Ranks    Time-T   Time-E
                                              1                64                1412.5
1           36   1271.4   339.0
                                              4                16                1881.4
4           16   1058.1   456.6
                                              16               4                 Omitted.

                                  Normalized Laplacian for 36k-by-36k co-author graph of CondMat
2/24/2011                 Stanford ICME seminar                                            27/36
Tweet Along @dgleich



    Strong Parallel Scaling
                                             Time  
Time in hours




                                             Good strong
                                               scaling up to
                                               325,000 vertices

                                             Estimated time for
                                               500,000 nodes
                                               9 hours with
                                               925 nodes
                                               (7400 procs)
                  
    2/24/2011        Stanford ICME seminar                          28/36
Tweet Along @dgleich




            EXAMPLES




2/24/2011       Stanford ICME seminar                   29
Tweet Along @dgleich



A $40,000 matrix computation




2/24/2011     Stanford ICME seminar                 30/36
Tweet Along @dgleich



Spikes?




2/24/2011   Stanford ICME seminar                 31/36
Tweet Along @dgleich



Nullspaces of the adjacency matrix
 

So unit eigenvalues of the normalized Laplacian are null-
  vectors of the adjacency matrix.




2/24/2011                Stanford ICME seminar                 32/36
Tweet Along @dgleich



Stanford’s Facebook Network




                                       Data from Mason Porter
2/24/2011     Stanford ICME seminar                     33/36
Tweet Along @dgleich



Movies of spectra…
As these models evolve, what do the spectra look like?




2/24/2011                Stanford ICME seminar                 34/36
Tweet Along @dgleich




2/24/2011   Stanford ICME seminar                 35/36
Code will be available eventually. Image from good financial cents.
2/24/2011                            Stanford ICME seminar                        36 of <Total>

Mais conteúdo relacionado

Semelhante a Spectra of Large Network

Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling
Leila Jalali
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
Videoguy
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
Alexandru Iosup
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
Hajime Sasaki
 

Semelhante a Spectra of Large Network (20)

The Spectre of the Spectra
The Spectre of the SpectraThe Spectre of the Spectra
The Spectre of the Spectra
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
A Dynamic Topic Model of Learning Analytics Research
A Dynamic Topic Model of Learning Analytics ResearchA Dynamic Topic Model of Learning Analytics Research
A Dynamic Topic Model of Learning Analytics Research
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
Coupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation EconomyCoupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation Economy
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
 
Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling Middleware Solutions for Simulation & Modeling
Middleware Solutions for Simulation & Modeling
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
 
A Pocket Dictionary of Tomorrow’s Electronics_Franz_IPC-TLP2021.pdf
A Pocket Dictionary of Tomorrow’s Electronics_Franz_IPC-TLP2021.pdfA Pocket Dictionary of Tomorrow’s Electronics_Franz_IPC-TLP2021.pdf
A Pocket Dictionary of Tomorrow’s Electronics_Franz_IPC-TLP2021.pdf
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
169 s170
169 s170169 s170
169 s170
 

Mais de David Gleich

Mais de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Spectra of Large Network

  • 1. Tweet Along @dgleich Spectra of Large Networks David F. Gleich Sandia National Laboratories iCME la/opt seminar 24 February 2011 Thanks to Ali Pinar, Jaideep Ray, Tammy Supported by Sandia’s John von Neumann postdoctoral fellowship Kolda, C. Seshadhri, Rich Lehoucq, and and the DOE Office of Science’s ASCR Graphs project. Jure Leskovec for helpful discussions. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 2/24/2011 Stanford ICME seminar 1/36
  • 2. Tweet Along @dgleich There’s information inside the spectra Words in dictionary definitions Internet router network 111,000 vertices, 2.7M edges 192k vetices, 1.2M edges Also noted in a study of smaller graphs by Banerjee and Jost (2009) 2/24/2011 Stanford ICME seminar 2/36
  • 3. Tweet Along @dgleich Overview Graphs and their matrices Data for our experiments Computing spectra for large networks Issues with computing spectra Many examples of graph spectra Conclusion Future work Images taken from Stanford, flickr, and Purdue, respectively 2/24/2011 Stanford ICME seminar 3/36
  • 4. Tweet Along @dgleich GRAPHS As well as things we AND THEIR already know about graph spectra. MATRICES 2/24/2011 Stanford ICME seminar 4
  • 5. Tweet Along @dgleich Matrices from graphs Adjacency matrix Not covered   Signless Laplacian matrix   if   Incidence matrix (It is incidentally discussed) Seidel matrix Laplacian matrix   Random walk matrix     Modularity matrix   Normalized Laplacian matrix     Everything is undirected. 2/24/2011 Stanford ICME seminar 5/36
  • 6. Tweet Along @dgleich Why are we interested in the spectra? Modeling Properties Moments of the adjacency Anomalies Regularities Network Comparison Fay et al. 2010 – Weighted Spectral Density The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes 2/24/2011 Stanford ICME seminar 6/36
  • 7. Tweet Along @dgleich Spectral bounds from Gerschgorin       (from a slightly different approach) 2/24/2011 Stanford ICME seminar 7/36
  • 8. Tweet Along @dgleich Semi-circle law Wigner’s semi-circle law Eigenvalues of a random symmetric matrix where each entry is independent have a semi-circle density Erdős–Rényi graphs obey a special case Image from Mathworld 2/24/2011 Stanford ICME seminar 8/36
  • 9. Tweet Along @dgleich Erdős–Rényi Semi-circles The eigenvalues of the adjacency matrix for n=1000, averaged over 10 trials Semi-circle with outlier if average degree is large enough. Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14) 2/24/2011 Stanford ICME seminar 9/36
  • 10. Tweet Along @dgleich Previous results Farkas et al. : Significant deviation from the semi-circle law for the adjacency matrix Mihail and Papadimitriou : Leading eigenvalues of the adjacency matrix obey a power-law based on the degree-sequence Chung et al. : Normalized Laplacian still obeys a semi-circle law Banerjee and Jost : Study of types of patterns that emerge in evolving graph models – explain many features of the spectra 2/24/2011 Stanford ICME seminar 10/36
  • 11. Tweet Along @dgleich In comparison … We use “exact” computation of spectra, instead of approximation. We study “all” of the standard matrices over a range of large networks. Our “large” is bigger. We look at a few different random graph models. 2/24/2011 Stanford ICME seminar 11/36
  • 12. Tweet Along @dgleich DATA 2/24/2011 Stanford ICME seminar 12
  • 13. Tweet Along @dgleich Data sources SNAP Various 100s-100,000s SNAP-p2p Gnutella Network 5-60k, ~30 inst. SNAP-as-733 Autonomous Sys. ~5,000, 733 inst. SNAP-caida Router networks ~20,000, ~125 inst. Pajek Various 100s-100,000s Models Copying Model 1k-100k 9 inst. 324 gs Pref. Attach 1k-100k 9 inst. 164 gs Forest Fire 1k-100k 9 inst. 324 gs Mine Various 2k-500k Newman Various Arenas Various Porter Facebook 100 schools, 5k-60k IsoRank, Natalie Protein-Protein <10k , 4 graphs Thanks to all who make data available 2/24/2011 Stanford ICME seminar 13/36
  • 14. Tweet Along @dgleich Big graphs Arxiv 86376 1035126 Co-authorship Dblp 9356 356290 Co-authorship Dictionary(*) 111982 2750576 Word definitions Internet(*) 124651 414428 Routers Itdk0304 190914 1215220 Routers p2p-Gnutella(*) 62561 295756 Peer-to-peer Patents(*) 230686 1109898 Citations Roads 126146 323900 Roads Wordnet(*) 75606 240036 Word relationship Web-NotreDame 325729 2994268 Web 2/24/2011 Stanford ICME seminar 14/36
  • 15. Tweet Along @dgleich Models Preferential Attachment Start graph with a k-node clique. Add a new node and connect to k random nodes, chosen proportional to degree. Copying model Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Copy edges of parent and make an error with probability   Forest Fire Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned” 2/24/2011 Stanford ICME seminar 15/36
  • 16. Tweet Along @dgleich COMPUTING SPECTRA OF LARGE NETWORKS 2/24/2011 Stanford ICME seminar 16
  • 17. Tweet Along @dgleich Matlab! Always a great starting point. My desktop has 24GB of RAM (less than $2500 now!) 24GB/8 bytes (per double) = 3 billion numbers ~ 50,000-by-50,000 matrix Possibilities D = eig(A) – needs twice the memory for A,D [V,D] = eig(A) – needs three times the memory for A,D,V These limit us to ~38000 and ~31000 respectively. 2/24/2011 Stanford ICME seminar 17/36
  • 18. Tweet Along @dgleich ScaLAPACK LAPACK with distributed memory dense matrices Scalapack uses a 2d block-cyclic dense matrix distribution 2/24/2011 Stanford ICME seminar 18/36
  • 19. Tweet Along @dgleich Eigenvalues with ScaLAPACK Mostly the same approach as in LAPACK 1. Reduce to tridiagonal form (most time consuming part) 2. Distribute tridiagonals to all processors 3. Each processor finds all eigenvalues 4. Each processor computes a subset of eigenvectors Using the MRRR algorithm steps 3 and 4 are more intricate and faster. MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel. 2/24/2011 Stanford ICME seminar 19/36
  • 20. Tweet Along @dgleich Alternatives Use ARPACK to get extrema Use ARPACK to get interior around   via the folded spectrum   Farkas et al. used this approach. 2/24/2011 Stanford ICME seminar 20/36
  • 21. Tweet Along @dgleich ISSUES WITH COMPUTING SPECTRA 2/24/2011 Stanford ICME seminar 21
  • 22. Tweet Along @dgleich Bugs - Matlab eig(A) Returns incorrect eigenvectors Seems to be the result of a bug in Intel’s MKL library. 2/24/2011 Stanford ICME seminar 22/36
  • 23. Tweet Along @dgleich Bug – ScaLAPACK default sudo apt-get install scalapack-openmpi Allocate 36000x36000 local matrix Run on 4 processors Code crashes 2/24/2011 Stanford ICME seminar 23/36
  • 24. Tweet Along @dgleich Bug - LAPACK Scalapack MRRR Compare standard lapack/blas to atlas performance Result: correct output from atlas Result: incorrect output from lapack Hypothesis: lapack contains a known bug that’s apparently in the default ubuntu lapack 2/24/2011 Stanford ICME seminar 24/36
  • 25. Tweet Along @dgleich Moral Always test your software. Extensively. 2/24/2011 Stanford ICME seminar 25/36
  • 26. Tweet Along @dgleich (Super)-Computers Redsky 2x Intel Nehalem 2.9 GHz/8 core 12 GB/node 22TB memory total used up to 500 nodes/ 6 TB memory Hopper (I) 2x AMD quad-core 2.4 GHz/8 cores 16 GB/node Cielo (testbed) 20 nodes 2/24/2011 Stanford ICME seminar 26/36
  • 27. Tweet Along @dgleich Adding MPI tasks vs. using threads Most math libraries have threaded versions (Intel MKL, AMD ACML) Is it better to use threads or MPI tasks? It depends. Threads Ranks Time Threads Ranks Time-T Time-E 1 64 1412.5 1 36 1271.4 339.0 4 16 1881.4 4 16 1058.1 456.6 16 4 Omitted. Normalized Laplacian for 36k-by-36k co-author graph of CondMat 2/24/2011 Stanford ICME seminar 27/36
  • 28. Tweet Along @dgleich Strong Parallel Scaling Time   Time in hours Good strong scaling up to 325,000 vertices Estimated time for 500,000 nodes 9 hours with 925 nodes (7400 procs)   2/24/2011 Stanford ICME seminar 28/36
  • 29. Tweet Along @dgleich EXAMPLES 2/24/2011 Stanford ICME seminar 29
  • 30. Tweet Along @dgleich A $40,000 matrix computation 2/24/2011 Stanford ICME seminar 30/36
  • 31. Tweet Along @dgleich Spikes? 2/24/2011 Stanford ICME seminar 31/36
  • 32. Tweet Along @dgleich Nullspaces of the adjacency matrix   So unit eigenvalues of the normalized Laplacian are null- vectors of the adjacency matrix. 2/24/2011 Stanford ICME seminar 32/36
  • 33. Tweet Along @dgleich Stanford’s Facebook Network Data from Mason Porter 2/24/2011 Stanford ICME seminar 33/36
  • 34. Tweet Along @dgleich Movies of spectra… As these models evolve, what do the spectra look like? 2/24/2011 Stanford ICME seminar 34/36
  • 35. Tweet Along @dgleich 2/24/2011 Stanford ICME seminar 35/36
  • 36. Code will be available eventually. Image from good financial cents. 2/24/2011 Stanford ICME seminar 36 of <Total>