Selected highlights of Coursera Social Networking course, taught by Prof. Lada Adamic of the Univ. of Michigan. Presented at the annual Annual RTP Analytics Unconference, May 4, 2013
Preso on social network analysis for rtp analytics unconference
1. Consolidated Behaviors and Attitudes1
Analyzing Networks
An Overview, and Discussion of Network
Analysis (NA) and Social Network Analysis (SNA)
Prepared for 2013 AnalyticsCamp:
An Annual Unconference , Held in the
Research Triangle Park, NC Area, on May 4, 2013
By Bruce Conner
Consolidated Behaviors and Attitudes
2. Consolidated Behaviors and Attitudes2
Full Disclosure
• I just finished the Social Networking course,
on Coursera, taught by Lada Adamic, Assoc.
Prof. of Information at the Univ. of Michigan
– All of the content of this deck is derived
from that course (not original)
– For purposes of this unconference, I will
not be further citing or footnoting this
content
3. Consolidated Behaviors and Attitudes3
My Interest in Social Networking Analysis (SNA)
• Interest in marketing analytics and quantitative market research
– Rise of social media and social marketing
– Big data and marketing analytics
– The strengths and weaknesses of behavioral data (Web, mobile, CRM,
transactional, scanner, telemetry, etc.) in marketing applications
• A long-term interest in clustering and segmentation as tools of
identifying and targeting of products, services, and messages: can
social relationships and social communities enhance this?
• Marketing issues such as:
– The role of opinion leaders in influencing brand preferences and purchases of
goods and services
– Diffusion of products, services, innovations, brands, preferences, etc.
– Formation of preferences for products/services/brands
– Targeted marketing to communities and individuals in those communities
4. Consolidated Behaviors and Attitudes4
Agenda
• Brief introduction to the applications and issues that
Social Networking Analysis (SNA) – and, more broadly
Network Analysis (NA) -- try to deal with
• Brief overview of some methods, approaches, and
statistics involved
• Possible Discussion Topics:
– Who is currently using SNA (or NA) -- and what are your
applications?
– How (else) might SNA (or NA) be used in your work?
– Specifically, how might SNA (or NA) be used in marketing,
product development, or other business applications (or
other applications
– Other topics/questions/thoughts?
6. Consolidated Behaviors and Attitudes6
Quick Overview of Applications of SNA:
Anti-Terrorism and National Security
7. Consolidated Behaviors and Attitudes7
A Quick Overview of
Applications of SNA (2)
• Anti-terrorism
• Criminal justice
– Conspiracy (e.g., Enron)
– Insider trading
– Fraud
8. Consolidated Behaviors and Attitudes8
A Quick Overview of
Applications of SNA (3)
• Anti-terrorism
• Criminal justice
• Social media
9. Consolidated Behaviors and Attitudes9
A Quick Overview of
Applications of SNA (4)
• Anti-terrorism
• Criminal justice
• Social media
• Gaming
–Game (Social) Experience
–Recruitment/virality/engagement/
retention/conversion
10. Consolidated Behaviors and Attitudes10
And Some More
Applications of SNA (5)
• Organizational analysis/
communities of practice
• Marketing based on affiliation
with “communities”
• Inputs to clustering/
segmentation/ profiling
• Biological networks
(health care, genomics,
etc.)
• Predictive analytics (e.g.,
predicting improvements
in recipes based on
ingredient networks)
• Sociology/Economics/
Political Science/etc.
• Computer networks
12. Consolidated Behaviors and Attitudes12
Kinds Of Questions that
SNA/NA Address
• How do networks form and grow?
– Compare real-world networks (e.g., the Internet,
Facebook, biological networks) with various
theoretical models
• Do the theoretical models help explain the behavior and growth
dynamics of the real network?
• Example: Randomly-formed network vs. “preferential
attachment”
13. Consolidated Behaviors and Attitudes13
Kinds Of Questions that
SNA/NA Address (2)
• How does network structure (topology) affect the
way that information disseminates -- or that
infections spread???
14. Consolidated Behaviors and Attitudes14
Kinds Of Questions that
SNA/NA Address (3)
• Based on the number, strength, directionality, and/or
characteristics/attributes of “links,” … and
characteristics of individuals/nodes …
… how do we identify (and characterize)
communities???
16. Consolidated Behaviors and Attitudes16
What are networks?
• Networks are sets of nodes connected by edges.
“Network” ≡ “Graph”
points lines
vertices edges, arcs math
nodes links computer science
sites bonds physics
actors ties, relations sociology
node
edge
17. Consolidated Behaviors and Attitudes17
Network elements: edges
• Directed (also called arcs, links)
– A -> B
• A likes B, A gave a gift to B, A is B’s child
• Undirected
– A <-> B or A – B
• A and B like each other
• A and B are siblings
• A and B are co-authors
18. Consolidated Behaviors and Attitudes18
Directed networks
Ada
Cora
Louise
Jean
Helen
Martha
Alice
Robin
Marion
Maxine
Lena
Hazel Hilda
Frances
Eva
RuthEdna
Adele
Jane
Anna
Mary
Betty
Ella
Ellen
Laura
Irene
• Girls’ school dormitory dining-table partners, 1st and 2nd choices (Moreno,
The sociometry reader, 1960)
21. Consolidated Behaviors and Attitudes21
2 Ways that NA is Different From
Conventional (Frequentist) Statistics
• Non-independence of “edge rows”:
– Example: if I am “linked” to two individuals, it often increases the
probability that they are linked to each other
– Implication: one cannot necessarily use statistical tests based on statistical
independence, normal distribution, etc., to understand statistical
significance
• Exploration of real-world “graphs” by comparing them to various
hypothetical (strawman) models
– A Monte Carlo approach:
• Generate large numbers of graphs based on hypothetical models
• Compare the various characteristic of real world graph to the
distribution of same characteristics of the multiple hypothetical
graphs to test the null hypothesis that the real graph is
significantly different than the hypothetical graphs
23. Consolidated Behaviors and Attitudes23
Erdös-Renyi Random Graph:
Simplest Network Model
• Assumptions
– Nodes connect at random
– Network is undirected
• Key parameters
– Number of nodes N
– Either “p” or “M”
• p = probability that any two nodes share an edge
• M = total number of edges in the graph
25. Consolidated Behaviors and Attitudes25
Preferential Attachment Networks
• Preferential attachment of growing
networks:
– New nodes prefer to attach to well-
connected nodes over less-well connected
nodes
• Process also known as
– Cumulative advantage
– Rich-get-richer
– Matthew effect
28. Consolidated Behaviors and Attitudes28
Node Statistics
• Node network properties
– From immediate connections
• indegree
how many directed edges (arcs) are incident on a node
• outdegree
how many directed edges (arcs) originate at a node
• degree (in or out)
number of edges incident on a node
– From the entire graph
• Centrality (betweenness, closeness)
outdegree=2
indegree=3
degree=5
29. Consolidated Behaviors and Attitudes29
Giant Component
• if the largest component encompasses a significant fraction of the graph, it is
called the giant component
30. Consolidated Behaviors and Attitudes30
average degree
sizeofgiantcomponent “Percolation Threshold”
av deg = 0.99 av deg = 1.18 av deg = 3.96
Percolation threshold: how many edges need
to be added before the giant component
appears?
As the average degree increases to z = 1, a
giant component suddenly appears
31. Consolidated Behaviors and Attitudes31
Shortest Path – And
Average Shortest Path
• How many hops between two nodes?
• On average, how many hops between each
pair of nodes
35. Consolidated Behaviors and Attitudes35
Example of Eigenvector Centrality (a
Recursive Measure) in Directed Networks
• PageRank brings order to the Web:
– it's not just the pages that point to you, but how many
pages point to those pages, etc.
– more difficult to artificially inflate centrality with a
recursive definition
36. Consolidated Behaviors and Attitudes36
Degree Distributions: An Example –
With a Log-Log Distribution
• Sexual
networks:
great variation
in contact
numbers
39. Consolidated Behaviors and Attitudes39
Ties and Geography
“The geographic movement of the [message] from Nebraska to
Massachusetts is striking. There is a progressive closing in on the target
area as each new person is added to the chain”
S.Milgram ‘The small world problem’, Psychology TodayM 1967
NE
MA
40. Consolidated Behaviors and Attitudes40
Kleinberg’s geographical small world model
nodes are placed on a lattice and connect to nearest neighbors
additional links placed with:
p(link between u and v) = (distance(u,v))-r
If you set r = 2, you get optimum ability to get
between nodes with minimal jumps!!!!!
42. Consolidated Behaviors and Attitudes42
Why Care About Communities?
• Opinion formation and uniformity
If each node adopts the opinion of the majority
of its neighbors, it is possible to have different
opinions in different cohesive subgroups
44. Consolidated Behaviors and Attitudes44
Community Finding
• Social and other networks have a natural community structure
• We want to discover this structure rather than impose a certain
size of community or fix the number of communities
• Without “looking”, can we discover community structure in an
automated way?
45. Consolidated Behaviors and Attitudes45
Hierarchical clustering
• Process:
– after calculating the “distances”for all pairs of vertices
– start with all n vertices disconnected
– add edges between pairs one by one in order of
decreasing weight
– result: nested components, where one can take a
‘slice’ at any level of the tree
47. Consolidated Behaviors and Attitudes47
Betweenness Clustering
• Successively removing edges of highest betweenness (the bridges, or local
bridges) breaks up the network into separate components
48. Consolidated Behaviors and Attitudes48
Modularity
• Algorithm
– Start with all vertices as isolates
– Follow a greedy strategy:
• successively join clusters with the greatest increase DQ in modularity
• stop when the maximum possible DQ <= 0 from joining any two
– Successfully used to find community structure in a graph
with > 400,000 nodes with > 2 million edges
• Amazon’s people who bought this also bought that…
– Alternatives to achieving optimum DQ:
• simulated annealing rather than greedy search