2. Resource
• Unit: LINCOLN LABORATORY JOURNAL
• Date: VOLUME 20, NUMBER 1, 2013
• Download:
https://www.ll.mit.edu/publications/journal/p
df/vol20_no1/20_1_5_Campbell.pdf
3. Authors
• William M. Campbell
– Match learning, mathematics in CU
• Charlie K. Dagli
– Computer engineering in UIUC
• Clifford J. Weinstein
– Electrical engineering in MIT
4. Agenda
• Introduction
– A consequence of changing economic
– Graph Construction
• Community Detection
– Modularity optimization
– Infomap
– Spectral Clustering
• Summary
5. We don’t cover
• Community Dynamics
– Latent semantic indexing(LSI)
– SVD
– Tensor
• Time-Profile-specific sub-network
– Classification(C4.5, PETS)
– Relational probability tree(RPT)
6. A consequence of
changing economic
• Big Data from variety and bunch of data
source of large-scale, real-world
sociographic data.
• Constructing social network, analyzing
the structure and dynamics of a
community and developing inferences
from social network
7. Graph Construction
• Challenge: too many factors
– Ambiguity of human language
– Multiple aliases for the same user
– Incompatible representations of information
– Ambiguity of relationship between
individuals
8. Data Source and
Information Extraction
• Newswire and sensor
– Smartphone and proximity devices provide
dynamic interactions
• Communications and social media
– Followers in Twitter, people who are
related by current news topics. Even FB…
– Email related to Enron’s bankruptcy
9. Introduction -
Information Extraction from Text
• Named-entity
recognition (NER)
extracts people, places
and orgs.
• Use links based upon
the co-occurrence of
entities in a documents
https://web.cs.umass.edu/publication/docs/2012/UM-CS-2012-015.pdf
11. Community Detection
• High connectivity within a group and low
connectivity across groups
• Modularity
optimization(Clauset/Newman/Moore)
• Infomap
• Spectral clustering
12. DataSet
• Name: ISVG (Institute for the Study of
Violent Groups)
• Cover: terrorist and criminal activity
from open-source docs, including news
articles, court doc, police rpt.
• More than 100,000 incidents
• More than 1,500 hand-annotated types
• Nearly 30,000 individuals and 3,000
groups
13. Modularity optimization
(Clauset/Newman/Moore)
• Missing link prediction (e.g.
recommend friendship,
Folding@home)
• Using similarity probability
to associate nodes
• Strong similarity has
tendency to be linked?
(Men vs Men in sex-
network)
14. Modularity optimization
(Clauset/Newman/Moore)
• Even better performance in Terrorist
association and Grassland species network
• Both dataset has explicit level orgs.
• It is considered in large-scale network for big
calculating
15. • Convert a graph to Markov model by
random walk
• Using entropy to determine clusters
Infomap
16. Spectral Clustering (SC)
• Convergence in global optimal value with
arbitrary shape
• Few clusters, non-flat geometry
• Laplacian matrix for engenvalue/vector
17. Eigenvalue & Eigenvectors
• After image transformation, red line
keeps the same direction but yellow line
change to opposite direction
• Red line (eigenvalue = 1)
• Yellow line (eigenvalue = -1)
• They are orthogonal
• Av = λv (λ=Eigenvalue, v=Eigenvector)
Stanford University research project (Folding@home)
矩陣乘以一個不為零的向量,相當於將此向量做一些平移、旋轉、伸展、推移之後的結果
Divide and Conquer: a big problem divide to some small problem.
Solve the small problems and then combine them. Such like map reduce, quick sort, merge sort