The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Jaideep
1. Social Analytics
Mining Behaviors of a Connected
World
PAKDD School
April 11, 2013
Sydney, Ausytralia
Jaideep Srivastava
University of Minnesota
srivasta@cs.umn.edu
4/9/2013 University of Minnesota 1
2. Course Outline
• Module 1
• Introduction to Social Analytics – applying data mining to social computing
systems; examples of a number of social computing systems, e.g.
FaceBook, MMO games, etc.
• Module 2
• Computational online trust
• Identifying key influencers
• Information flow in networks
• Module 3
• Analysis of clandestine networks
• Katana - game analytics engine
4/9/2013 University of Minnesota 2
4. Social Network Analysis
O a iz tio a
rg n a n l Sc l
o ia
A th p lo y
n ro o g T e ry
ho P y h lo y
sco g
C g itiv
on e
P rc p n S c -C g itiv
e e tio o io o n e K o le g
nw de
N tw rk
e o s N tw rk
e o s
Ra
e lity Sc l
o ia K o le g
nw de
N tw rk
e o s N tw rk
e o s
A q a ta c
c u in n e K o le g
nw de
E id m lo y
p e io g (lin s
k) (c n n
o te t)
S c lo y
o io g
Social science networks have widespread application in various
fields
Most of the analyses techniques have come from Sociology,
Statistics and Mathematics
See (Wasserman and Faust, 1994) for a comprehensive introduction
to social network analysis
12/02/06 IEEE ICDM 2006 4
5. What have been it‟s key scientific successes?
In classical social sciences numerous results
„Six degree of separation‟ [Milgram]
Popularized by the „Kevin Bacon game‟
„The strength of weak ties‟ [Granovetter]
„Online networks as social networks‟ [Wellman, Krackhardt]
„Dunbar Number‟
Various types of centrality measures
Etc.
In the Web era
„The Bow-Tie model of the Web‟ [Raghavan]
„Preferential attachment model‟ [Barabasi] (Yes and No)
„Powerlaw of degree distribution‟ [Lots of people] (NO!)
Etc.
6. Application successes
Numerous in social sciences
Google – PageRank
LinkedIn – expanding your Cognitive Social Network
making you aware that „you‟re more connected and closer than you
think you are‟
Expertise discovery in organizations
Knowledge experts, „authorities‟
Well-connected individuals, „hubs‟
Rapid-response teams in emergency management
Information flow in organizations
Twitter – real time information dissemination
Etc.
8. Player Behavior & Revenue Model
Blizzard (subscription) Zynga (free2play)
World of Warcraft Farmville, Fishville, Mafia
12 million subscribers Wars, etc.
Revenue model 180 million players
$15/month Revenue model
Approx $3billion annual Virtual goods
revenue $700 million in 2010
4 hours a day, 7 days a 0.5 hrs a day, 7 days a week
week!
Hard core gamers Everyone
Less socially acceptable More socially acceptable
Like Cocaine Like Caffeine
9. Implications of this „addiction‟
3 billion hours a week are being spent playing online
games
Jane McGonigal in “Reality is Broken”
Labor economics
What is the impact of so much labor being removed from the pool
[Castranova]
Entertainment economics
If MMO players can get 100 hrs/month of entertainment by spending
$25 or so, what will happen to other entertainment industries?
Psychological/Sociological
Is it an addiction – the prevailing view (Chinese government‟s „detox
centers‟ for kids)
Are they fulfilling a deeper need that real world is not (McGonigal)
Societal
A trend far too important to not be taken seriously!
11. Levis‟ – Example of Social Retail
Levis‟ leverages its brand to ensure customers provide their social
network
Levis‟ can leverage predictive social analytics technology to understand
the value of the customer‟s social network
11
12. Opportunity, Innovation, Impact
Companies do not understand the social graph of their customers
It‟s not just about how they relate to their customers, but also about
how customers relate to each other
vs.
Understanding these relationships unlocks immense value
Innovation: Understanding the social network of customers
Key influencers, relationship strength, …
Impact: Deriving actionable insights from this understanding
Customer acquisition, retention, customer care, …
Social recommendation, influence-based marketing, identifying trend-setters, …
Ninja Metrics confidential information. Copyright 2012 12
13. Unlocking true value by product, category, or store
0021
-$128.61
-$293.79
-$79.63
13
14. True Value of each customer
True value = individual value + social value
Who really matters, and to what degree
Some empirical facts
31% activity due to socialization
23% more individual + 8% more social activity
The individual‟s their social and their true
lifetime value influence total
14
15. Impact of New Instrumentation on Science
1950s
Invention of the electron microscope fundamentally changed
chemistry from „playing with colored liquids in a lab‟ to „truly
understanding what‟s going on‟
1970s
Invention of gene sequencing fundamentally changed biology from
a qualitative field to a quantitative field
1980s
Deployment of the Hubble (and other) Space telescopes has had
fundamental impact on astronomy and astrophysics
2000s
Massive adoption is fundamentally changing social science
research
Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
(VWs) are acting as „macroscopes of human behavior‟
16. The Virtual World
Observatory (VWO) Project
• Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers
• Noshir Contractor, Northwestern: Networks
• M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups
• Jaideep Srivastava, Minnesota: Computer Science
• Dmitri Williams, USC: Social Psychology
• Collaborators
• Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan
(Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci,
Michigan), …
• Data and technology partners
• Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft
(Chevalier‟s Romance), others …
• Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
17. Overall Goals of the VWO Project
Basic Science
• behavior, socialization, …
• novel, scalable algorithms
• NSF
Sticky Social
Media tons of Analytics Government applications
data • Social science
• Games • team dynamics (Army)
models • skills acq, leadership (Army)
• Virtual Worlds • New algorithms • social influence & adolescent
• Other social • Ultra-scalability health (CDC)
• etc.
apps
Business applications
• customer churn
• game design
• anti-social behavior
• etc.
21. Who is playing?
It is not just a
bunch of kids
Average age is
31.16 (US
population median
is 35)
More players in
their 30s than in
their 20s.
22. How much do they play?
Mean is 25.86
hours/week
Compares to
US mean of
31.5 for TV
(Hu et al,
2001)
• From prior experimental work, MMO play eats into entertainment TV
and going out, not news
• So much for kids being the ones with the free time.
23. Gender Differences
More men players (78/22%)
Men played to compete , and women played to socialize
Men play more other games, but it was the women who were more
satisfied EQ2 players
Women: 29.32 hours/week
Men: 25.03 hours/week
Likelihood of quitting: “no plans to quit”:
women 48.66%, men 35.08%
Self reported play times
Women: 26.03 (3 hours less than actual)
Men: 24.10 (1 hour less than actual)
Boys and girls are socialized early on, and thus have clear role
expectations for their behaviors and identities (Gender Role Theory
in action!!)
25. Inferring RW gender from VW data
Goal
What virtual world behaviors and characteristics predict
real world gender?
Data:
Survey Data n=7119
Survey Character Store
EQ2 Character Store
Variables
Avatar Characteristics:
Gender, Race, Class, Experience, Guild Rank, Alignment,
Archetypes
Game play Behaviors:
Total Deaths & Quests, PvP Kills & Deaths, Achievement Points,
Number of Characters, Time played, Communication patterns
25
26. Gender prediction results
Close to 95% prediction accuracy
Decision trees work rather well
Character Gender, Race and Class are significant
predictors to real life gender.
Gender swapping is rarer, but systematically different by
real gender
Players tend to choose:
character gender based on their real life gender
character races that are gendered: women play
elves/men play barbarians
classes that are gendered: women play priests/men
play fighters
26
27. Gender swapping behavior
Game Character Real Gender
Gender
Male Female Total
Male 4065 82.6% 98 8.2% 4163 68.0%
Female 855 17.4% 1104 91.8% 1959 32.0%
Total 4920 100.0% 1202 100.0% 6122 100.0%
Observation
• Far more males gender swap than females
• Why?
• Men are more creative?
• Women have less identity confusion?
• Women get their „fill of gender swapping in real life‟
27
28. Economics: A test of RW VW
mapping
Do players behave in virtual worlds as we expect
them to in the actual world?
Economics is an obvious dimension to test
In the real world, perfect aggregate data are hard to
get
29. GDP and Price Level
GDP and price levels are robust but comparatively unstable
GDP and Prices on Antonia Bayle
5,000,000 160
4,500,000 140
4,000,000
120
Prices (January = 100)
3,500,000
100
GDP (Gold)
3,000,000
2,500,000 80
2,000,000 60
1,500,000
40
1,000,000
20
500,000
0 0
January February March April May
Nominal GDP Price Level (January = 100)
30. Money Supply and Price
Change in Money Supply and Population on Antonia Bayle
The instability is 2500 4000
explicable through 3000
Change in Money (000 Gold)
2000
the Quantity Theory 2000
Change in Accounts
1000
of Money 1500
0
-1000
a rapid influx of 1000
-2000
money . . . 500 -3000
-4000
0 -5000
February March April May
Change in Money Supply (000 Gold) Change in Active Accounts
. . . dramatically Price Level on Antonia Bayle
boosted prices 50
40
Percent Change in Price Level
30
More evidence that 20
this behaves like a 10
real economy 0
-10 February March April May June
-20
Price Level
31. Networks in Virtual Worlds
SONIC
Advancing the Science of
Networks in Communities
32. Why do we create and sustain
networks?
Theories of self-interest Theories of contagion
Theories of social and Theories of balance
resource exchange Theories of homophily
Theories of mutual interest Theories of proximity
and collective action Theories of co-evolution
Sources:
Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
about organizational networks: An analytic framework and empirical example. Academy of
Management Review.
Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford
University Press.
SONIC
Advancing the Science of
Networks in Communities
33. “Structural signatures” of Social Theories
A A
B
B F
+ F
+ -
C
- E
C
E
D
D
Self interest Exchange Balance
A
A
B F
B F
- +
+ C E
C D
E
Novice
D Expert
Collective Action Homophily Contagion
SONIC
Advancing the Science of
Networks in Communities
34. Black: male
Red: female
Partnership Instant messaging
SONIC
Trade Mail
Advancing the Science of
Networks in Communities
35. 4 5 2
0 0 0
(1)
0 0 0
(2)
0 1 0
1 3 0
1 2 0
0 1 0
(n)
0 0 0
Social Networks (1) (2) (n)
as network structure frequency Cluster Structure
vectors in a bag-of-words model Vectors using
Text clustering
methods
Social theory + IR Cluster means provide Attribute values for
based network modes of network each cluster can be
analysis structure configurations used to discover
Making up all the trends between network
social networks structures and attributes
37. Results
Clusters 1 and 4 are similar
Groups kill fewer monsters
Group members in cluster 4 do not communicate much
Group members in cluster 1 generally limit their communication to just
one other person in the group
Most people belong to these two clusters
Consistent with previous research - users in virtual environments are
less likely to interact with strangers
[N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring
the social dynamics of massively multiplayer online games, Proceedings
CHI06, ACM Press, New York, 407-416.]
Cluster 5 groups have many 1-edge and 2-out stars
Most of the communication is one way possibly indicating presence of
central people
Maximum number of monsters killed out of all clusters
Performance of the groups is very good
Minimal communication
It is possible that cluster 5 consists of groups more focused on playing
and performing well in the game and less on socializing
38. Some Open Questions
How similar/dissimilar are online social networks from real-
world social networks?
Is online socialization
Only a sustainability activity of real-world networks?
Causing new social networks to be formed?
Is fundamentally a different type of networking activity?
A fundamental tenet of socialization has been
“geography/proximity drives socialization”
How is this being impacted by online socialization?
40. Description and motivation
The goal of this work is to improve skill assessment approaches
used in multiplayer games, especially team-based games
Xbox Live, PSN, Steam, and Battle.net have between 149-167
million total users, collectively (and that number is growing)
Many of the most popular games are team-based: Halo, Call of Duty,
TF2, CS, etc
Why?
Better skill assessment = fairer games = less player attrition
More accurate rankings of players/teams
Most previous work has focused on individuals, not teams
It‟s a hard problem
Online setting: Updates to a player‟s skill distribution must be done
after each game is played
Applicability: Any generalized assessment technique cannot include
game-specific data
Our datasets are from the games of professional Halo 3 players
40
41. What is Halo?
Halo is one of the most well-known “first-person shooter” video
game series in the world
Online/LAN-based multiplayer is its most significant component
650,000 to 850,000 unique users per day at its peak
41
42. Major League Gaming
Our work focuses on
professional Halo 3 players
Online scrimmage data, as
well as complete tournament
data, is readily available from
bungie.net and mlgpro.com
Players are highly-skilled
individually – allows us to
better focus on group-level
What is Major League Gaming (MLG)? performance characteristics
• A professional league for competitive Players change teams
gaming, including Halo 3 from ‟08-‟10
• Best players in the world compete in this regularly, helps isolate
league impact of particular players
• Last tournament watched by over on overall team performance
1,000,000 people online
Unexplored boundary case
• Our datasets are comprised of games
played by these players for skill assessment
42
43. Skill assessment
An old problem (with different applications in different
contexts)
Paired comparison estimation
Foundational work by Thurstone (1927) and Bradley-Terry (1952)
Elo (1959), popularized in chess ranking (FIDE, USCF)
Glicko (1993) – player-level ratings volatility incorporated (σ2),
addition of rating periods
TrueSkill (2006) - factor-graph based approach used in
Microsoft‟s Xbox Live gaming service
Used to match players/teams up with each other online
Our work focuses on Elo, Glicko, and TrueSkill
43
44. Elo (1959)
Arpad Elo, Dept of Physics, Marquette University
Was a master chess player in USCF
Proposed the Elo Rating System
Replaced earlier systems of competitive rewards (i.e.,
tournaments) in USCF/FIDE
Simple to implement: assumes player skill is normally-
distributed with constant variance β2
Still widely-used today
44
45. Glicko (1993)
Mark E. Glickman, Department of Health Policy
and Management , Boston University
Proposed the Glicko Rating System
Addresses rating reliability issue through the use of
rating periods and player-specific variances
Elo is a special case of Glicko
Iterative approach for approximating the
marginal posterior distribution of a player‟s skill
conditional on other players‟ priors
More computationally tractable for large datasets
45
46. TrueSkill (2006)
Ralf Herbrich and Thore
Graepel at Microsoft
Research, Cambridge, UK
Used for automated
ranking/matchmaking on Xbox
Live
Uses factor graphs to model
multi-player, multi-team
environments
Converges quickly (~5 games
for a player)
Large-scale deployment
30 million Xbox Live members
150+ games use TrueSkill for
ranking & matchmaking
46
47. Issues with current approaches
1 2 3 4
+
1234
Basic idea
Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)
Sum across all team members
Not intuitive: Team chemistry is a well-known concept in team-based
competition [Martens 1987, Yukelson 1997] and it is not captured in
any of these models
Can think of it as the overall dynamics of a team resulting from
leadership, confidence, relationships, and mutual trust
Independence assumption not realistic in teams, especially at
high levels of play
47
48. More information is available, however…
1 2 3 4
12 13 14 23 24 34
123 124 134 234
1234
Observation: We have more information than just the history of
individual players – we also know the histories of groups of players
Player 1‟s history ∩ player 2‟s history history of {1, 2}
Existing approaches only make use of top row
Idea: Estimate the skills of subgroups of players on a team, combine
in some way, and use to produce better estimate of a team‟s skill
48
49. Reframing the assessment
problem
k=1 1 2 3 4
k=2 12 13 14 23 24 34
k=3 123 124 134 234
k=4 1234
Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)
Player-level skill representation generalized subgroup skill
representation
For each game and for each k <= size of the team (K), treat each
group as you would an individual player and update skill accordingly
Hashing of skill variable matrix according to unordered subgroup
membership
Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting
Each rating says something about the skill of that particular subgroup
…but how do we aggregate these ratings to estimate the skill of a
team?
49
50. TeamSkill
Four different aggregation approaches:
TeamSkill-K
TeamSkill-AllK
TeamSkill-AllK-EV
TeamSkill-AllK-LS
50
51. Aggregation issues
black = history available red = no history available
time: t=1 t = 100 t = 200
After 1 game New player – 5 - who‟s New player – 6 – who‟s
never played with 1, 2, or 3 played with 3 or 5 (not both)
Real world case: assume that players can leave/join the 4-player
team and look at the timeline
The question at each point in time, t: how best to combine the
available group ratings to produce a team rating?
The problem: the feature space is expanding and contracting over
time.
51
52. Data set overview
Collected over the course of
2009
7,590 games (2,076 from
tournaments and 5,514 from
Xbox Live scrimmages)
448 players on 140 different
professional and semi-
professional teams
Games took place during
January 2008 through January
2010
Websites pertaining to this data
http://stats.halofit.org - Player/team
statistics
http://halofit.org – Datasets and related
information
“Friend or foe” social network for all tournaments in 2008 and
2009
52
53. Evaluation
The TeamSkill approaches were evaluated by predicting the outcomes
of games occurring prior to 10 MLG tournaments and comparing their
accuracy to unaltered versions (k = 1) of their base learner rating
systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible
choices of k for teams of 4, 1 ≤ k ≤ 4, were used)
For each tournament, we evaluated each rating approach using:
3 types of training data sets - games consisting only of previous tournament data,
games from online scrimmages only, and games of both types.
3 periods of game history - all data except for the data between the test tournament
and the one preceding it (“long”), all data between the test tournament and the one
preceding it (“recent”), and all data before the test tournament (“complete”).
We will only show “complete” as results for “long” and “recent” mirrored those for “complete”
2 types of games - the full dataset and those games considered “close” (i.e., prior
probability of one team winning close to 50%).
53
54. Results – all games
Prediction accuracy for both tournament and scrimmage/custom games using complete history
Prediction accuracy for tournament games using complete history
Prediction accuracy for scrimmage/custom games using complete history
54
55. Results – “close” games
Prediction accuracy for both tournament and scrimmage/custom games using complete history
Prediction accuracy for tournament games using complete history
Prediction accuracy for scrimmage/custom games using complete history
55
56. Conclusions
Modified several existing assessment approaches to
operate on generalized player subgroup entities (instead of
just individual players)
Introduced four aggregation methods for combining player
subgroup information to produce final forecast
Shown evidence that close games are decided on the
basis of team chemistry, consistent with sports psychology
research
56
59. Time Equals Stickiness
6.00
5.00
6
Ratio of Quitters to Stayers
4.00
3.00
2.00
1.00
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
Character Level
There are less quitters as the levels go up, and focus
should be on the first 20 levels.
60. Solo vs. Social Players
6.00
5.00
Ratio of Quitters to Stayers
4.00
3.00 Solo
Social
2.00
1.00
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
Character Level
Isolated players are 3.5x more likely to quit (B = 1.26, p<.001).
Focus design on facilitating social interaction.
61. Problem Statement & Approach
Objective: At time t, for player p, given a window w, compute
the probability, P(p,t,w), of p churning within the interval
(t,t+w), and the confidence in this probability.
Approach: Estimate P(p,t,w) and its confidence using
Data about p‟s activity and socialization behavior
Statistics and machine learning (linear, non-linear, ensemble, etc.)
Use of socio-psychological theories of player motivation
Novel synthesis of data driven (DD) and theory driven (TD) approaches
61
65. Model Evaluation (Confusion Matrix)
Model performance for different features (10
fold cross-validation)
Features Tree size Precision (%) Recall (%) F-measure
(nodes) (%)
Pure DD 485 69.3 84 76
TD: 59 67.1 76.2 71.3
Achievement+Social
TD: Achievement 37 67.2 73.2 70.1
TD: Socialization 7 64.3 55.4 59.5
Observations
Decision tree learning works quite well
DD model very accurate (76%); difficult to interpret (485
nodes)
TD model interpretable (7 – 59 nodes), but less accurate
Achievement-orientation more important than socialization
65
66. Model Evaluation (Lift Chart)
Observations
TD model does better in
predicting top quintile of
churners
DD model performs better in the
40%-70% range
66
67. Ensemble Approach – Model
Evaluation
Ensemble approach
A „committee of classifiers‟ votes on the result
Heterogeneity helps
No. of Precision Recall F-measure
clusters
5 66.23 91.94 76.99
10 66.49 91.08 76.87
15 67.58 89.80 77.12
20 67.6 90.13 77.25
Improvements over single model
Best recall value shows 7.94% improvement , i.e. model
can identify larger proportion of potential churners)
Best F-measure shows 1.25% improvement
67
68. Conclusions
Comparison of theory-driven and data-driven model in
terms of prediction accuracy and model interpretability
Achievement-orientation is more important than
socialization-orientation in identifying potential churners
Ensemble model can identify larger proportion of
potential churners as compared to single global model
68
69. Course Outline
• Module 1
• Introduction to Social Analytics – applying data mining to social computing
systems; examples of a number of social computing systems, e.g.
FaceBook, MMO games, etc.
• Module 2
• Computational online trust
• Identifying key influencers
• Module 3
• Analysis of clandestine networks
• Katana - game analytics engine
4/9/2013 University of Minnesota 69
71. “If you want to go fast walk alone, if you
want to go far then walk with a group.”
- Proverb from Ghana
72. Virtual Worlds & Massive Online
Games
Massively Multiplayer Online Role
Playing Games
(MMORPGs/MMOs)
Simulated Environments like
SecondLife
Millions of people can interact with
one another is shared virtual
environment
People can engage in a large number
of activities with one another and with
the environment
Many of the observed behaviors have
offline analogs
72
72
73. Big Picture Questions
How is trust expressed differently in different social
contexts?
Cooperative (PvE), Adversarial (PvP), …
How is trust expressed in different types of social
networks?
Housing, Mentoring, Trade, Group, …
What are the characteristics of trust and related networks
in MMOs?
Similarities and differences with social networks in other domains
e.g., citation networks, co-authorship networks
What role can features derived from the trust network play
in prediction tasks e.g., link prediction (formation,
breakage, change), trust propensity, success prediction
73
74. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a Multi-Level Network Phenomenon
Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based
network phenomenon Trust related concepts in trust related prediction task social interactions
75. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a Multi-Level Network Phenomenon
Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based
network phenomenon Trust related concepts in trust related prediction task social interactions
Social Characteristics of Trust
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Most types of homophily do not
Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
76. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a Multi-Level Network Phenomenon
Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based
network phenomenon Trust related concepts in trust related prediction task social interactions
Social Characteristics of Trust
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Most types of homophily do not
Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Trust and Prediction
Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital)
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Effects of social environments on trust Effect of network structure on trust
77. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a Multi-Level Network Phenomenon
Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based
network phenomenon Trust related concepts in trust related prediction task social interactions
Social Characteristics of Trust
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Most types of homophily do not
Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Trust and Prediction
Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital)
Predict Formation, Change, Breakage of Trust with in the same
network and across Social networks. Predict trust propensity.
Effects of social environments on trust Effect of network structure on trust
78. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
79. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
80. Housing-Trust in EQ2
• Access permissions to in-game house
as trust relationships
• None: Cannot enter house.
• Visitor: Can enter the house and can
interact with objects in the house.
• Friend: Visitor + move items
• Trustee: Friend + remove items
• Houses can contain also items which
allow sales to other characters without
exchanging on the market
80
81. Homophily and Trust
Homophily: Birds of a feather flock together
There is no one form of homophily and homophily in
general is described in multiple ways: Status vs. Value
Homophily
Each of these homophiles are in turn defined in multiple
ways themselves
Previous literature instantiates homophily in MMOs in
terms of player characteristics and behavior in the game
81
82. Network Models, Homophily and
Trust Networks in MMOs
RQ1: Does homophily in MMOs operate in ways similar to
homophily in the offline world?
RQ2: How do we map characteristics that define
homophily in the offline world to online settings?
82
83. Mapping Homophily in MMOs
In general, studies of homophily in MMOs assume only one type of homophily and
generalize based on that type
Even in the offline world homophily is of different types
Hence the necessity of Mapping Homophily which we address here
Mapping and Proteus Effect
83
84. Trust and Homophily in MMOs
Homophily Type Hypothesis Observation
H Gender Homophily Players trust other players who ?
1 are of the same gender
H Age Homophily Players trust other players who ?
2 are of the same age cohorts
H Class Homophily Players trust other players who ?
3 are of the same class
H Race Homophily Players trust other players who ?
4 are of the same race
H Guild Homophily Players trust other players who ?
5 belong to the same guild
H Level Homophily Players trust other players who ?
6 are at a similar level
H Challenge Players trust other players who ?
7 Homophily like similar types of challenges
84
85. Key Observations
H1: Players trust other players who are of the same gender?
In general players trust other players who are of
the same gender
H2: Players trust other players who are of similar age?
The stronger the type of trust the lesser is the age
difference between the people specifying trust
H3: Players trust other players who are of the same class?
Class does not seem to effect the choice of trusting
others
85
86. Key Observations
H4: Players trust other players who are of the same race?
Race does not seem to effect the choice of trusting
others
H5: Players trust other players who are of the same guild?
In general, the stronger the type of trust, the greater
is the percentage of the people who trust people in
their own guilds
H6: Players trust other players more who are level at a similar rate?
Leveling at the same rate does not seem to greatly
effect trust amongst players
86
87. Key Observations
H7: Players trust other players who are of the same level?
• Level difference seems to have some effect on trust
• For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level
players are more likely to trust players who are at a higher level
Summary:
• Homophily is observed for a subset of types in MMOs as compared to what it is
observed for in the offline world
• The types of homophily which are not observed in MMOs are the ones which are
greatly effected by game mechanics
87
88. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
89. Social Networks: General
Observations
There is an extensive literature on characteristics of social
networks (Leskovec PAKDD 2005, Leskovec ICDM 2005,
McGlohon ICDM 2008, McGlohon KDD 2008)
The network exhibits monotonically
shrinking diameter over time (Leskovec PAKDD 2005,
Leskovec ICDM 2005, McGlohon ICDM 2008)
At a certain point in time called the Gelling Point many
smaller connected connect together and become part of
the largest connected component (Leskovec PAKDD 2005,
Leskovec ICDM 2005, McGlohon ICDM 2008)
The largest connected component (LCC) comprises of
the majority of the nodes in the network (>= 80%)
(McGlohon ICDM 2008, McGlohon KDD 2008)
89
90. Social Networks: General
Observations
The size of the second and the third largest connected
components remain constant (more or less) even though
the identity of these components change over time
(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
ICDM 2008, McGlohon KDD 2008)
Network isolates are few in number (<5%)
(Leskovec PAKDD 2005, Leskovec ICDM 2005)
The number of connected components decreases over
time
(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
ICDM 2008)
Relatively fast growth of LCC close to the gelling point
(Leskovec PAKDD 2005, Leskovec ICDM 2005)
90
91. Trust Networks in MMOs
Data from 4 servers is available. Results from one server
(Player vs. Environment, „guk‟) are shown
The network consists of 15,237 nodes, 30,686 edges
and 1,476 connected components
Dataset spans from January 2006 to August 2006
Average node degree of 4.03. The size of the three
largest connected components are as follows: 9039, 51
and 49. The largest connected component accounts for
59% of all the nodes in the network
The Trust Network on „guk‟ on August 31, 2006
91
92. Key Observations
Observation 1: Preferential Attachment: The rich get
richer but not too rich
Explanation: Social bandwidth is limited, Dunbar
Number
Observation 2: The growth of the LCC is retarded after
the gelling point
Explanation: The trust network has a relatively low
growth rate as compared to the other networks
Observation 3: Non-monotonic change in the diameter
of the largest connected component
Explanation: Players have different levels of activity at
various points in time and can also “drop out” of the
network if they churn from the game
92
93. Key Observations
Observation 4: A large number of isolate components are
observed (> 1000)
Explanation: People join in groups and spend all the time
playing with one another instead of interacting with people
from the outside
Observation 5: The number of isolate components
increases monotonically over time
Explanation: (Same as observation 4)
Observation 6: Nodes in the non-LCC constitute a
significant portion of the network (41%, 8 months after
gelling point)
Explanation: (Same as observation 4)
93
94. Generative Models of Trust Networks
Time bound Preferential Attachment: The rich get rich
but not so much after a certain point in time. Edge
formation is bound by time
Presence of Auxiliary Components: Isolate components
are added to the network at an almost constant rate over
time
94
95. Generative Models of Trust Networks
(ii)
Non-Monotonic Decrease in the Diameter:
Nodes become inert after a certain point in time. Sample
the lifetime of nodes from a normal distribution
Homophily in Edge formation: Probability of edge
formation dependent upon node degree as well as
agreement (similarity) in node characteristics
95
96. Results
Diameter: Non-Monotonically Changing Diameter
% LCC as being relatively small
96
97. Results
Number of Connected Components
Network growth and the gelling point
97
98. Conclusion: Trust Networks
Trust Networks in MMOs exhibit many properties which are
not exhibited by other social networks in most other
domains
Proposed a model based on observations and domain
knowledge
Models of social networks should incorporate the
peculiarities which are observed in MMOs in general
Generalization? Similar observations have been made for
mentoring networks but not for PvP, Trade and Chat
Networks
98
99. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
100. Trust Prediction Family of
Problems
Trust Prediction: Given a trust network G predict which
nodes are going to trust one another in the future
Prediction Across Networks: Given a set of actors who
participate in multiple types of interactions using features from
one network predict the existence of links in the other network
100
101. Trust Prediction Family of
Problems
Machine Learning/Classification approach to the problem
of trust prediction
Introduced a set of new problems (inter-network link
prediction, trust propensity prediction)
Proposed an algorithm for link prediction which used
domain knowledge from social science theories
New Contributions:
Does the social context (adversarial vs. cooperative) effect
prediction results?
If so then what is the implication for generalization?
101
102. Prediction Task
Trust Prediction as a classification problem
60,000 examples for each prediction task
10 Fold Cross-validation
Data from Guk (Cooperative) and Nagafen (Adversarial)
Servers
Six Standard Classifiers for Comparison: J48, JRip,
AdaBoost, Bayes Network, Naive Bayes and k-nearest
neighbor
Positive Example: Negative Example:
Training Period Test Period Training Period Test Period
102
103. Prediction: Group Network
In general good prediction results are obtained for the
combat network (in either direction), except one case
The grouping network is a union of all grouping
instances and thus it is extremely dense (1,796,438
edges, 31,900 nodes)
103
104. Prediction: Group Network
Grouping is not a good predictor of mentoring but the vice
versa is correct
Mentoring is (more often than not is accompanied by
grouping) but grouping can be in a variety of contexts e.g.,
raids, quests, dungeon instances, …
104
105. Prediction: Combat Network
In general good prediction results are obtained for the combat
network (in either direction)
In general if players have played against another person then
they have friends in other networks who have done the same
or something similar
105
106. Comparison across Social
Environments
T: Trust; M: Mentoring; B: Trade
In general the results are similar for both the
environments with some exceptions
Mentoring-Mentoring: In the adversarial environment a
more cliqueish behavior is observed i.e., friends of
friends are likely to mentor one another in the future
which is not the case for the cooperative environment
106
107. Comparison across Social
Environments
Mentoring-Related Tasks: In general mentoring is
much more prevalent in the adversarial environment as
compared to the cooperative environment (~3 times) and
is much more intense. Overlap between mentoring and
trade is thus more likely
107
108. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
109. Trust Amongst Clandestine Actors
“A plague upon‟t when thieves
cannot be true one to another!”
– Sir Falstaff, Henry IV, Part 1, II.ii
Do gold farmers
trust each other?
109
110. Hypergraphs to Represent Tripartite Graphs
• Accounts can have several characters
• Houses can be accessed by several characters
• Projecting to one- or two-model data obscures crucial
information about embededdness and paths
• Figure 2a: Can ca31 access the same house as ca11?
• Figure 2b: Are characters all owned by same account?
110
111. Hypergraphs: Key Concepts
• Hyperedge: An edge between three or
more nodes in a graph. We use three
types of nodes: Character, account and
house
• Node Degree: The number of
hyperedges which are connected to a
node
NDh1 = 3
• Edge Degree: The number of
hyperedges that an edge participates in
EDa1-h1 = 2
111
112. Network Characteristics
• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique
• Players who are connected to a large number of houses are highly active
players otherwise as well
112
113. Characteristics of Hypergraph Projection Networks
• Account Projection: Majority of the gold farmer nodes are isolates
(79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47)
• Character Projection: Majority of the gold farmer nodes are isolates
(84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23)
• House Projection: 521 gold farmer houses. Most are isolates (not
shown) but others are part of complex structures. Densely connected
network with gold farmers (7.56) and affiliates (84.02)
The Housing Projection Network
113
114. Key Observations
• Gold farmers grant trust ties less frequently than either
affiliates or general players
• Gold farmers grant and receive fewer housing permissions
(1.82) than their affiliates (4.03) or general player population
(2.73)
Total degree In degree Out degree
<n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
114
115. Key Observations
• No honor among thieves
• Gold farmers also have very low tendency to grant other
gold farmers permission (0.29)
• Affiliates also unlikely to trust other affiliates (0.70)
Total degree In degree Out degree
<n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
115
116. Key Observations
• Affiliates are brokers:
• Farmers trust affiliates more (1.82) than other farmers (0.29)
• Affiliates trust farmers more (1.28) than other affiliates (0.70)
• Non-affiliates have a greater tendency to grant permissions to
affiliates (7.77) than in general (2.73)
Total degree In degree Out degree
<n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
116
117. Frequent Itemset Mining for Frequent Hyper-subgraphs
Support of a Hyper-subgraph: Given a sub-hypergraph of size k,
subP is the pattern of interest containing the label P, shP is a pattern
of the same size as subP and contains the label P, the support is
defined as follows:
Support of pattern also containing a gold farmer (red) = 5/8
117
118. Frequent Itemset Mining for Frequent Hyper-subgraphs
Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
size k, subP is the pattern of interest containing the label P, subG is
a pattern which is structurally equivalent but which does not contain
the label P, the confidence is defined as follows:
Confidence of pattern and containing a gold farmer = 5/7
118
119. Frequent Patterns of GFs
• Very low (s ≤ 0.1) support and confidence for almost all
(except 8) frequent patterns with gold farmers
• Remaining 8 patterns can be used for discrimination
between gold farmers and non-gold farmers in a subset of
the instances
• Gold farmers & affiliates are more connected: A 3rd of more
complex patterns are associated with affiliates (15/44)
119
120. Application: Gold Farmer Detection (Behavioral)
Classification using in-game behavioral and demographic features
Each model corresponds to a grouping of different features
120
121. Application: Gold Farmer Detection (Label
Propagation)
• Problem: Not all people who are labeled as normal players are such. Some
of them are gold farmers but have not been identified as such
• Analogue: If someone is socializing almost exclusively with criminals then
he may be a criminal
Classifier Metric Initial Dataset Label Prop Change In Performance
Bayes Net Precision 0.17 0.189 0.019
Recall 0.834 0.819 -0.015
F-Score 0.282 0.307 0.025
J48 Precision 0.494 0.62 0.126
Recall 0.189 0.337 0.148
F-Score 0.273 0.437 0.164
J Rip Precision 0.495 0.537 0.042
Recall 0.462 0.462 0
F-Score 0.478 0.497 0.019
KNN Precision 0.436 0.46 0.024
Recall 0.396 0.428 0.032
F-Score 0.415 0.443 0.028
Logistic Regression Precision 0.455 0.534 0.079
Recall 0.189 0.271 0.082
F-Score 0.267 0.36 0.093
Naïve Bayes Precision 0.146 0.142 -0.004
Recall 0.538 0.502 -0.036
F-Score 0.23 0.221 -0.009
Adaboost w/ DT Precision 0.405 0.471 0.066
Recall 0.105 0.08 -0.025
F-Score 0.167 0.137 -0.03
121
122. Gold Farmer Detection
Model 1 (Player Attribute Based Features): These features are
based on the attributes of the player‟s character in the game e.g.,
character race, character gender, distribution of gaming activities etc.
Model 2 (Item Based Features): These are the features which are
derived from items bought and sold from the consignment network.
These features are based on the frequency of the frequent items sold
or bought by gold farmers.
Model 3 (Player Attribute & Item Based Features): All the attributes
from the previous two models.
Model 4 (Item Network Based Features): Features which are derived
from the item network in a manner analogous to Model 2.
Model 5 (Player Attribute & Item-Network Based Features): A
combination of features from Model 1 and Model 4.
Model 6 (Item Network & Item-Network Based Features): A
combination of features from Model 2 and Model 4.
Model 7 (Player Attribute, Item & Item-Network Based Features):
Union of all the features described above.
122
123. Application: Structural Signatures Approach Applied to Trade Networks
Not sufficient data is present to use the structural signature approach to catch
gold farmers
Alternative: Application of the same approach in other networks: Trade Network
A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO
123
124. Conclusion: Gold Farmer Behavior Analysis and Prediction
Representation Related Issues
addressed with respect to trust in
MMOs
Application of frequent pattern mining to
discover distinct trust patterns associated
with gold farmers
No honor between thieves:
Gold farmers tend not to trust other
gold farmers
124
125. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a Multi-Level Network Phenomenon
Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based
network phenomenon Trust related concepts in trust related prediction task social interactions
Social Characteristics of Trust
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Most types of homophily do not
Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Trust and Prediction
Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital)
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Effects of social environments on trust Effect of network structure on trust
126. Computational Trust in Multiplayer Online Games
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Computational Social Science
Semantics Structure
Trust and Homophily Characteristics of Trust Networks
Most types of homophily do not Trust based and other social networks in MMOs
carry over to the MMO domain exhibit anomalous network characteristics
Predictive Analysis
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Detection of Clandestine Actors
No Honor Amongst Thieves
Gold Farmer Detection
127. Conclusion
Availability of data allows one to analyze phenomenon
where it was not possible to do so in the past (Trust in
MMOs in the current case)
Explored a series of big-picture questions pertaining to
trust in MMOs
Similar affordances and contexts (with respect to the offline
world) lead to similar outcomes in the online world
Explored and expanded the scope of trust related
prediction tasks
Analysis is required on more datasets for generalizability
127
128. Acknowledgement
DMR Lab
Professor Jaideep Srivastava and all DMR lab members especially
Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak,
Kyong Jin Shim and Nisheeth Srivastava
Virtual Worlds Observatory
Professor Noshir Contractor (Northwestern), Professor Scott Poole
(UIUC), Professor Dmitri Williams (USC)
External Collaborators at Northwestern, UIUC, USC, U Toronto
especially Brian Keegan
Funding Agencies:
NSF, AFRL, NSCTA, IARPA
128
129. Publication Summary
Publication Type Total Thesis Related Comment
Book 1 1 Book on Clandestine
Behaviors and Networks
(Springer 2012)
Conference 14 5
Papers
Book Chapters 1 -
Journal Papers 2 -
Workshop Papers 10 3 2 Best Paper Awards
Short/Poster 8 1
Papers
Technical Reports 4 1
Tutorials 4 1
27 Co-authors
Patents 2 1
129
133. Motivation
Original
Retweet
Global retweets of Tweets coming from Japan for one
hour after the earthquake
133
Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
134. Optimal Target People Selection
Goal: Find the optimal targets to maximize influence
spread in network
Why is it important?
Public Polls: Sentiment spread
Sales and Marketing: Word-of-mouth spread
Public Health: Disease spread
Influence: A force that attempts to change the opinion or
behavior of the individual
Influence is causal
My friend buys a product -> I buy a product
134
135. Current Approaches (1/2)
Assume a certain model for propagation of influence [1]
Independent Cascade Model
Each node is independently influences the neighbor
using a biased coin toss
Linear Threshold Model
Each node picks a random threshold
Each node infects the neighbor by a specified
amount
All of them need to know the
propagation probability 135
136. Current Approaches (1/2)
Use a two step approach
Step 1: Choose influence Model
Step 2: Find the subset of top-k nodes that maximizes
the influence
Bad News: Step 2 is NP-Hard [2]
Popular method for step 2: Greedy Heuristics
Greedy heuristics gives (1-1/e) approximation
Address scalability issues in the second step [3][4][5][6]
CELF [3]: Influence maximization function follows law
of diminishing returns (sub-modular)
136
137. Limitations of current approaches
Estimation/Learning of propagation
probabilities
Data-driven and model free approach
Propagation models are not content
unaware
Content-aware influencer mining
No causality effect in influence model
Add causal effect in influence
propagation
137
138. Proposed Approach
Communication
Logs
Frequent Network
Sequence Discovery of
Mining Influencers
Ordered list of
influencers
Network
Structure
138
139. Sequence Cascade Mining
Example
Temporal order
Append neighbors from network i=2
Check downward closure and
support count
Peter, Charles, Kim, Vicky, Nancy
Network structure
Let Support = 2
139
140. Sequence Cascade Mining
Example
Append neighbors from network i=3
Check downward closure and
support count
Peter, Charles, Kim, Vicky, Nancy
Network structure
Let Support = 2
140
141. Sequence Cascade Mining
Example
Append neighbors from network i=4
Break loop; No more work to do…
Peter, Charles, Kim, Vicky, Nancy
Network structure
Return Frequent Sequences
Let Support = 2
141
142. Sequence Cascade Mining
L1 Frequent Sequence Mining
Candidate Generation for k+1
By appending neighbors
Downward closure pruning
Support Count and reorganize
142
143. Influence Function
Node pattern Influence Set (Q(i,p))
=
Influence Set of a Node
Influence Set of a Node Set
It can be shown that influence function I(V,s) is
monotone and sub-modular
143
144. Network Discovery of Influencers
Greedy heuristic with (1-1/e) approximation
guarantee 144
145. Network Discovery of Influencers
Frequent Sequences
Example
i=1
Choose the node with max out-degree
Find top-3 (Randomly break tie between P
influencers and C)
P C chosen
Current Set of Influenced People
C N
C K P
K V
145
146. Network Discovery of Influencers
Frequent Sequences
Example
i=2
Choose the node with max out-degree
P P chosen
Current Set of Influenced People
C N
C K P V N
K V
146
147. Network Discovery of Influencers
Frequent Sequences
Example
i=3
Choose the node with max out-degree
(Randomly break tie between K, V,
N)
P K chosen
Current Set of Influenced People
C N
C K P V N
K V
147
148. Context-Aware Influencers
Context Specific
influencers
Bag words Frequent Network
representing Sequence Discovery of
the context Mining Influencers
148
149. Initial Results (1/2)
DBLP Dataset USPTO Dataset
Significant performance gain over established
baseline PrefixSpan 149