SlideShare a Scribd company logo
1 of 304
Download to read offline
Social Analytics
  Mining Behaviors of a Connected
                            World
                   PAKDD School
                             April 11, 2013
                         Sydney, Ausytralia


                                Jaideep Srivastava
                            University of Minnesota
                             srivasta@cs.umn.edu

4/9/2013         University of Minnesota              1
Course Outline
• Module 1
   •        Introduction to Social Analytics – applying data mining to social computing
            systems; examples of a number of social computing systems, e.g.
            FaceBook, MMO games, etc.
• Module 2
   •        Computational online trust
   •        Identifying key influencers
   •        Information flow in networks
• Module 3
   •        Analysis of clandestine networks
   •        Katana - game analytics engine




 4/9/2013                                  University of Minnesota                        2
Part I: Background
Social Network Analysis
                              O a iz tio a
                               rg n a n l                      Sc l
                                                                o ia
               A th p lo y
                n ro o g         T e ry
                                  ho                         P y h lo y
                                                              sco g


                                                 C g itiv
                                                  on e
                P rc p n S c -C g itiv
                 e e tio  o io o n e             K o le g
                                                  nw de
                           N tw rk
                             e o s               N tw rk
                                                  e o s


                  Ra
                   e lity       Sc l
                                 o ia            K o le g
                                                  nw de
                               N tw rk
                                e o s            N tw rk
                                                  e o s

                             A q a ta c
                              c u in n e         K o le g
                                                  nw de
              E id m lo y
               p e io g         (lin s
                                    k)           (c n n
                                                   o te t)

                                          S c lo y
                                           o io g


     Social science networks have widespread application in various
      fields
     Most of the analyses techniques have come from Sociology,
      Statistics and Mathematics
     See (Wasserman and Faust, 1994) for a comprehensive introduction
      to social network analysis
12/02/06                            IEEE ICDM 2006                        4
What have been it‟s key scientific successes?
   In classical social sciences numerous results
       „Six degree of separation‟ [Milgram]
         Popularized by the „Kevin Bacon game‟

       „The strength of weak ties‟ [Granovetter]
       „Online networks as social networks‟ [Wellman, Krackhardt]
       „Dunbar Number‟
       Various types of centrality measures
       Etc.
   In the Web era
       „The Bow-Tie model of the Web‟ [Raghavan]
       „Preferential attachment model‟ [Barabasi] (Yes and No)
       „Powerlaw of degree distribution‟ [Lots of people] (NO!)
       Etc.
Application successes
   Numerous in social sciences
   Google – PageRank
   LinkedIn – expanding your Cognitive Social Network
       making you aware that „you‟re more connected and closer than you
        think you are‟
   Expertise discovery in organizations
       Knowledge experts, „authorities‟
       Well-connected individuals, „hubs‟
   Rapid-response teams in emergency management
   Information flow in organizations
   Twitter – real time information dissemination
   Etc.
Online (Multiplayer) Games

     High




  How
  social




     Low
            Low                   High
                  How enagaging
Player Behavior & Revenue Model
   Blizzard (subscription)              Zynga (free2play)
       World of Warcraft                    Farmville, Fishville, Mafia
       12 million subscribers                Wars, etc.
       Revenue model                        180 million players
           $15/month                        Revenue model
           Approx $3billion annual               Virtual goods
            revenue                               $700 million in 2010
       4 hours a day, 7 days a              0.5 hrs a day, 7 days a week
        week!

        Hard core gamers                          Everyone

        Less socially acceptable                  More socially acceptable

        Like Cocaine                              Like Caffeine
Implications of this „addiction‟
   3 billion hours a week are being spent playing online
    games
       Jane McGonigal in “Reality is Broken”
   Labor economics
       What is the impact of so much labor being removed from the pool
        [Castranova]
   Entertainment economics
       If MMO players can get 100 hrs/month of entertainment by spending
        $25 or so, what will happen to other entertainment industries?
   Psychological/Sociological
       Is it an addiction – the prevailing view (Chinese government‟s „detox
        centers‟ for kids)
       Are they fulfilling a deeper need that real world is not (McGonigal)
   Societal
       A trend far too important to not be taken seriously!
Business Example
Levis‟ – Example of Social Retail




   Levis‟ leverages its brand to ensure customers provide their social
    network
   Levis‟ can leverage predictive social analytics technology to understand
    the value of the customer‟s social network

                                                                         11
Opportunity, Innovation, Impact
     Companies do not understand the social graph of their customers
     It‟s not just about how they relate to their customers, but also about
      how customers relate to each other



                                                vs.




     Understanding these relationships unlocks immense value
         Innovation: Understanding the social network of customers
             Key influencers, relationship strength, …
         Impact: Deriving actionable insights from this understanding
             Customer acquisition, retention, customer care, …
             Social recommendation, influence-based marketing, identifying trend-setters, …

                                    Ninja Metrics confidential information. Copyright 2012     12
Unlocking true value by product, category, or store

                                  0021




                -$128.61




                     -$293.79




                                                  -$79.63




                                                            13
True Value of each customer
     True value = individual value + social value
     Who really matters, and to what degree
     Some empirical facts
         31% activity due to socialization
         23% more individual + 8% more social activity

      The individual‟s         their social               and their true
      lifetime value           influence                  total




                                                                           14
Impact of New Instrumentation on Science
   1950s
       Invention of the electron microscope fundamentally changed
        chemistry from „playing with colored liquids in a lab‟ to „truly
        understanding what‟s going on‟
   1970s
       Invention of gene sequencing fundamentally changed biology from
        a qualitative field to a quantitative field
   1980s
       Deployment of the Hubble (and other) Space telescopes has had
        fundamental impact on astronomy and astrophysics
   2000s
       Massive adoption is fundamentally changing social science
        research
       Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
        (VWs) are acting as „macroscopes of human behavior‟
The Virtual World
                          Observatory (VWO) Project
•   Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers
     • Noshir Contractor, Northwestern: Networks
     • M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups
     • Jaideep Srivastava, Minnesota: Computer Science
     • Dmitri Williams, USC: Social Psychology
•   Collaborators
     • Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan
        (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci,
        Michigan), …
•   Data and technology partners
     • Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft
        (Chevalier‟s Romance), others …
     • Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
Overall Goals of the VWO Project

                                                   Basic Science
                                                   • behavior, socialization, …
                                                   • novel, scalable algorithms
                                                   • NSF
Sticky Social
Media              tons of   Analytics             Government applications
                   data      • Social science
• Games                                            • team dynamics (Army)
                             models                • skills acq, leadership (Army)
• Virtual Worlds             • New algorithms      • social influence & adolescent
• Other social               • Ultra-scalability   health (CDC)
                                                   • etc.
apps
                                                    Business applications
                                                    • customer churn
                                                    • game design
                                                    • anti-social behavior
                                                    • etc.
Collaborators, Sponsors, Partners
   Team



   Financial Sponsors



   Data Partners



   Technology Partners
Part II – Impact on Science
Findings from a Player Survey
Who is playing?

   It is not just a
    bunch of kids
   Average age is
    31.16 (US
    population median
    is 35)
   More players in
    their 30s than in
    their 20s.
How much do they play?

   Mean is 25.86
    hours/week
   Compares to
    US mean of
    31.5 for TV
    (Hu et al,
    2001)




    • From prior experimental work, MMO play eats into entertainment TV
      and going out, not news
    • So much for kids being the ones with the free time.
Gender Differences
   More men players (78/22%)
   Men played to compete , and women played to socialize
   Men play more other games, but it was the women who were more
    satisfied EQ2 players
      Women: 29.32 hours/week

      Men: 25.03 hours/week

      Likelihood of quitting: “no plans to quit”:
        women 48.66%, men 35.08%
   Self reported play times
      Women: 26.03 (3 hours less than actual)
      Men: 24.10 (1 hour less than actual)
      Boys and girls are socialized early on, and thus have clear role
        expectations for their behaviors and identities (Gender Role Theory
        in action!!)
Playing with a partner
Inferring RW gender from VW data
Goal
    What virtual world behaviors and characteristics predict
    real world gender?
Data:
 Survey Data n=7119
 Survey Character Store
 EQ2 Character Store
Variables
 Avatar Characteristics:
       Gender, Race, Class, Experience, Guild Rank, Alignment,
        Archetypes
   Game play Behaviors:
       Total Deaths & Quests, PvP Kills & Deaths, Achievement Points,
        Number of Characters, Time played, Communication patterns


                                                                         25
Gender prediction results
   Close to 95% prediction accuracy
       Decision trees work rather well
   Character Gender, Race and Class are significant
    predictors to real life gender.
   Gender swapping is rarer, but systematically different by
    real gender
   Players tend to choose:
     character gender based on their real life gender
     character races that are gendered: women play
      elves/men play barbarians
     classes that are gendered: women play priests/men
      play fighters



                                                            26
Gender swapping behavior
Game Character                                 Real Gender
Gender
                            Male                 Female                   Total

Male                    4065     82.6%           98      8.2%           4163      68.0%

Female                   855     17.4%        1104      91.8%           1959      32.0%

Total                   4920 100.0%           1202 100.0%               6122   100.0%


    Observation
    • Far more males gender swap than females
    • Why?
         •   Men are more creative?
         •   Women have less identity confusion?
         •   Women get their „fill of gender swapping in real life‟ 
                                                                                    27
Economics: A test of RW  VW
mapping

    Do players behave in virtual worlds as we expect
     them to in the actual world?
    Economics is an obvious dimension to test
    In the real world, perfect aggregate data are hard to
     get
GDP and Price Level
   GDP and price levels are robust but comparatively unstable

                                          GDP and Prices on Antonia Bayle

                    5,000,000                                                                  160
                    4,500,000                                                                  140
                    4,000,000
                                                                                               120




                                                                                                     Prices (January = 100)
                    3,500,000
                                                                                               100
       GDP (Gold)




                    3,000,000
                    2,500,000                                                                  80
                    2,000,000                                                                  60
                    1,500,000
                                                                                               40
                    1,000,000
                                                                                               20
                     500,000
                           0                                                                   0
                                January       February    March         April            May

                                            Nominal GDP    Price Level (January = 100)
Money Supply and Price
                                                                                         Change in Money Supply and Population on Antonia Bayle
   The instability is                                                2500                                                                                       4000

    explicable through                                                                                                                                           3000




                              Change in Money (000 Gold)
                                                                      2000
    the Quantity Theory                                                                                                                                          2000




                                                                                                                                                                         Change in Accounts
                                                                                                                                                                 1000

    of Money                                                          1500
                                                                                                                                                                 0

                                                                                                                                                                 -1000

        a rapid influx of                                            1000
                                                                                                                                                                 -2000

         money . . .                                                            500                                                                              -3000

                                                                                                                                                                 -4000

                                                                                 0                                                                               -5000
                                                                                      February            March                  April              May

                                                                                         Change in Money Supply (000 Gold)         Change in Active Accounts




        . . . dramatically                                                                            Price Level on Antonia Bayle

         boosted prices                                                          50
                                                                                 40
                                                Percent Change in Price Level




                                                                                 30
   More evidence that                                                           20
    this behaves like a                                                          10

    real economy                                                                  0
                                                                                -10   February         March             April               May               June

                                                                                -20
                                                                                                                         Price Level
Networks in Virtual Worlds




                                      SONIC

                             Advancing the Science of
                             Networks in Communities
Why do we create and sustain
 networks?
          Theories of self-interest                  Theories of contagion
          Theories of social and                     Theories of balance
           resource exchange                          Theories of homophily
          Theories of mutual interest                Theories of proximity
           and collective action                      Theories of co-evolution

Sources:
Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
  about organizational networks: An analytic framework and empirical example. Academy of
  Management Review.
Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford
  University Press.




                                                                                                    SONIC

                                                                                           Advancing the Science of
                                                                                           Networks in Communities
“Structural signatures” of Social Theories
                                         A                             A




                                                     B
                                 B                                                        F

                                             +   F


                                                               +               -
                                 C
                                     -           E
                                                     C
                                                                                      E




                                                                       D
                                         D




     Self interest                   Exchange                   Balance

                                                                           A
            A


                                                         B                                        F
 B                           F
                                                                   -           +
                +                                        C                                    E


 C                                                                         D
                         E

                                                             Novice
            D                                                Expert
     Collective Action               Homophily                Contagion

                                                                                                      SONIC

                                                                                   Advancing the Science of
                                                                                   Networks in Communities
Black: male
                       Red: female




Partnership   Instant messaging




                                   SONIC
   Trade           Mail
                          Advancing the Science of
                          Networks in Communities
4      5              2
                                          0      0              0
                   (1)
                                          0      0              0
       (2)
                                          0      1              0
                                          1      3              0
                                          1      2              0
                                          0      1              0
                  (n)
                                          0      0              0

        Social Networks                  (1)    (2)             (n)
as network structure frequency                                Cluster Structure
vectors in a bag-of-words model                                 Vectors using
                                                               Text clustering
                                                                  methods
Social theory + IR          Cluster means provide        Attribute values for
based network                 modes of network           each cluster can be
analysis                   structure configurations       used to discover
                               Making up all the      trends between network
                                social networks       structures and attributes
Results – normalized network structure
vector means for all clusters
Results
   Clusters 1 and 4 are similar
         Groups kill fewer monsters
         Group members in cluster 4 do not communicate much
         Group members in cluster 1 generally limit their communication to just
          one other person in the group
         Most people belong to these two clusters
         Consistent with previous research - users in virtual environments are
          less likely to interact with strangers
         [N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring
          the social dynamics of massively multiplayer online games, Proceedings
          CHI06, ACM Press, New York, 407-416.]
   Cluster 5 groups have many 1-edge and 2-out stars
        Most of the communication is one way possibly indicating presence of
         central people
        Maximum number of monsters killed out of all clusters
        Performance of the groups is very good
        Minimal communication
        It is possible that cluster 5 consists of groups more focused on playing
         and performing well in the game and less on socializing
Some Open Questions
   How similar/dissimilar are online social networks from real-
    world social networks?
   Is online socialization
       Only a sustainability activity of real-world networks?
       Causing new social networks to be formed?
       Is fundamentally a different type of networking activity?
   A fundamental tenet of socialization has been
    “geography/proximity drives socialization”
       How is this being impacted by online socialization?
TeamSkill: Modeling Team
Chemistry in Online Multi-
            Player Games
Description and motivation
   The goal of this work is to improve skill assessment approaches
    used in multiplayer games, especially team-based games
   Xbox Live, PSN, Steam, and Battle.net have between 149-167
    million total users, collectively (and that number is growing)
       Many of the most popular games are team-based: Halo, Call of Duty,
        TF2, CS, etc
   Why?
       Better skill assessment = fairer games = less player attrition
       More accurate rankings of players/teams
       Most previous work has focused on individuals, not teams
   It‟s a hard problem
       Online setting: Updates to a player‟s skill distribution must be done
        after each game is played
       Applicability: Any generalized assessment technique cannot include
        game-specific data
   Our datasets are from the games of professional Halo 3 players



                                         40
What is Halo?




    Halo is one of the most well-known “first-person shooter” video
     game series in the world
    Online/LAN-based multiplayer is its most significant component
        650,000 to 850,000 unique users per day at its peak
                                                                       41
Major League Gaming
                                                  Our work focuses on
                                                   professional Halo 3 players
                                                    Online scrimmage data, as
                                                      well as complete tournament
                                                      data, is readily available from
                                                      bungie.net and mlgpro.com
                                                    Players are highly-skilled
                                                      individually – allows us to
                                                      better focus on group-level
 What is Major League Gaming (MLG)?                   performance characteristics
 • A professional league for competitive            Players change teams
   gaming, including Halo 3 from ‟08-‟10
 • Best players in the world compete in this          regularly, helps isolate
   league                                             impact of particular players
 • Last tournament watched by over                    on overall team performance
   1,000,000 people online
                                                    Unexplored boundary case
 • Our datasets are comprised of games
   played by these players                            for skill assessment

                                                                                   42
Skill assessment
    An old problem (with different applications in different
     contexts)
    Paired comparison estimation
        Foundational work by Thurstone (1927) and Bradley-Terry (1952)
        Elo (1959), popularized in chess ranking (FIDE, USCF)
        Glicko (1993) – player-level ratings volatility incorporated (σ2),
         addition of rating periods
    TrueSkill (2006) - factor-graph based approach used in
     Microsoft‟s Xbox Live gaming service
        Used to match players/teams up with each other online
    Our work focuses on Elo, Glicko, and TrueSkill




                                       43
Elo (1959)
     Arpad Elo, Dept of Physics, Marquette University
         Was a master chess player in USCF
     Proposed the Elo Rating System
         Replaced earlier systems of competitive rewards (i.e.,
          tournaments) in USCF/FIDE
         Simple to implement: assumes player skill is normally-
          distributed with constant variance β2
         Still widely-used today




                                  44
Glicko (1993)
    Mark E. Glickman, Department of Health Policy
     and Management , Boston University
    Proposed the Glicko Rating System
        Addresses rating reliability issue through the use of
         rating periods and player-specific variances
        Elo is a special case of Glicko
    Iterative approach for approximating the
     marginal posterior distribution of a player‟s skill
     conditional on other players‟ priors
        More computationally tractable for large datasets



                                  45
TrueSkill (2006)
    Ralf Herbrich and Thore
     Graepel at Microsoft
     Research, Cambridge, UK
    Used for automated
     ranking/matchmaking on Xbox
     Live
    Uses factor graphs to model
     multi-player, multi-team
     environments
    Converges quickly (~5 games
     for a player)
    Large-scale deployment
        30 million Xbox Live members
        150+ games use TrueSkill for
         ranking & matchmaking




                                        46
Issues with current approaches
                         1      2         3    4
                                    +
                                  1234

     Basic idea
       Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)
       Sum across all team members
     Not intuitive: Team chemistry is a well-known concept in team-based
      competition [Martens 1987, Yukelson 1997] and it is not captured in
      any of these models
       Can think of it as the overall dynamics of a team resulting from
         leadership, confidence, relationships, and mutual trust
       Independence assumption not realistic in teams, especially at
         high levels of play


                                     47
More information is available, however…
                                    1    2     3     4

                         12     13       14    23    24        34

                              123       124    134       234

                                          1234

     Observation: We have more information than just the history of
      individual players – we also know the histories of groups of players
         Player 1‟s history ∩ player 2‟s history  history of {1, 2}
         Existing approaches only make use of top row
     Idea: Estimate the skills of subgroups of players on a team, combine
      in some way, and use to produce better estimate of a team‟s skill




                                              48
Reframing the assessment
problem
                     k=1            1    2     3     4

                     k=2 12      13      14    23    24        34

                     k=3      123       124    134       234

                     k=4                  1234

    Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)
      Player-level skill representation  generalized subgroup skill
         representation
          For each game and for each k <= size of the team (K), treat each
            group as you would an individual player and update skill accordingly
      Hashing of skill variable matrix according to unordered subgroup
         membership
    Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting
      Each rating says something about the skill of that particular subgroup
      …but how do we aggregate these ratings to estimate the skill of a
         team?
                                          49
TeamSkill
    Four different aggregation approaches:
        TeamSkill-K
        TeamSkill-AllK
        TeamSkill-AllK-EV
        TeamSkill-AllK-LS




                             50
Aggregation issues

                     black = history available       red = no history available




 time:       t=1                              t = 100                             t = 200
          After 1 game                 New player – 5 - who‟s            New player – 6 – who‟s
                                     never played with 1, 2, or 3      played with 3 or 5 (not both)
        Real world case: assume that players can leave/join the 4-player
         team and look at the timeline
        The question at each point in time, t: how best to combine the
         available group ratings to produce a team rating?
        The problem: the feature space is expanding and contracting over
         time.
                                                                                                 51
Data set overview
    Collected over the course of
     2009
    7,590 games (2,076 from
     tournaments and 5,514 from
     Xbox Live scrimmages)
    448 players on 140 different
     professional and semi-
     professional teams
    Games took place during
     January 2008 through January
     2010
    Websites pertaining to this data
        http://stats.halofit.org - Player/team
         statistics
        http://halofit.org – Datasets and related
         information
                                                     “Friend or foe” social network for all tournaments in 2008 and
                                                     2009




                                                                                                              52
Evaluation
    The TeamSkill approaches were evaluated by predicting the outcomes
     of games occurring prior to 10 MLG tournaments and comparing their
     accuracy to unaltered versions (k = 1) of their base learner rating
     systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible
     choices of k for teams of 4, 1 ≤ k ≤ 4, were used)
    For each tournament, we evaluated each rating approach using:
        3 types of training data sets - games consisting only of previous tournament data,
         games from online scrimmages only, and games of both types.
        3 periods of game history - all data except for the data between the test tournament
         and the one preceding it (“long”), all data between the test tournament and the one
         preceding it (“recent”), and all data before the test tournament (“complete”).
            We will only show “complete” as results for “long” and “recent” mirrored those for “complete”
        2 types of games - the full dataset and those games considered “close” (i.e., prior
         probability of one team winning close to 50%).




                                                     53
Results – all games



       Prediction accuracy for both tournament and scrimmage/custom games using complete history




       Prediction accuracy for tournament games using complete history




       Prediction accuracy for scrimmage/custom games using complete history
                                                          54
Results – “close” games



       Prediction accuracy for both tournament and scrimmage/custom games using complete history




       Prediction accuracy for tournament games using complete history




       Prediction accuracy for scrimmage/custom games using complete history
                                                          55
Conclusions
    Modified several existing assessment approaches to
     operate on generalized player subgroup entities (instead of
     just individual players)
    Introduced four aggregation methods for combining player
     subgroup information to produce final forecast
    Shown evidence that close games are decided on the
     basis of team chemistry, consistent with sports psychology
     research




                                56
Part III: Impact on Business
Player Churn Prediction
Time Equals Stickiness

                               6.00



                               5.00
                                                      6
Ratio of Quitters to Stayers




                               4.00



                               3.00



                               2.00



                               1.00



                               0.00
                                      1   3   5   7   9   11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
                                                                                       Character Level


                                     There are less quitters as the levels go up, and focus
                                      should be on the first 20 levels.
Solo vs. Social Players

                               6.00



                               5.00
Ratio of Quitters to Stayers




                               4.00



                               3.00                                                                                                         Solo
                                                                                                                                            Social

                               2.00



                               1.00



                               0.00
                                      1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
                                                                               Character Level

                              Isolated players are 3.5x more likely to quit (B = 1.26, p<.001).
                               Focus design on facilitating social interaction.
Problem Statement & Approach
   Objective: At time t, for player p, given a window w, compute
    the probability, P(p,t,w), of p churning within the interval
    (t,t+w), and the confidence in this probability.
   Approach: Estimate P(p,t,w) and its confidence using
        Data about p‟s activity and socialization behavior
        Statistics and machine learning (linear, non-linear, ensemble, etc.)
        Use of socio-psychological theories of player motivation
        Novel synthesis of data driven (DD) and theory driven (TD) approaches




    61
Player Motivation Theories



                             Richard Bartle, 1996




                              Nick Yee, 2005




62
Mapping MMO Behaviors to Theory

    Example MMORPG
        Persistent virtual world
        Players have avatars
        Quest-driven
        Participation in groups,
           guilds
    Dataset (game logs)
        Time period: FEB-JUN, 2006
        No. of accounts: ~16000
        Churner definition: 2 months of inactivity




63
Synthesis of TD and DD approach




64
Model Evaluation (Confusion Matrix)

     Model performance for different features (10
      fold cross-validation)
                 Features      Tree size   Precision (%)   Recall (%)   F-measure
                                (nodes)                                    (%)
          Pure DD                485           69.3           84           76
          TD:                     59           67.1          76.2         71.3
          Achievement+Social
          TD: Achievement         37           67.2          73.2         70.1
          TD: Socialization       7            64.3          55.4         59.5

     Observations
           Decision tree learning works quite well
           DD model very accurate (76%); difficult to interpret (485
            nodes)
           TD model interpretable (7 – 59 nodes), but less accurate
           Achievement-orientation more important than socialization
 65
Model Evaluation (Lift Chart)




    Observations
        TD model does better in
         predicting top quintile of
         churners
        DD model performs better in the
         40%-70% range

66
Ensemble Approach – Model
Evaluation
    Ensemble approach
        A „committee of classifiers‟ votes on the result
        Heterogeneity helps
              No. of       Precision       Recall      F-measure
             clusters
                5            66.23         91.94            76.99
                10           66.49         91.08            76.87
                15           67.58         89.80            77.12
                20           67.6          90.13            77.25


    Improvements over single model
        Best recall value shows 7.94% improvement , i.e. model
         can identify larger proportion of potential churners)
        Best F-measure shows 1.25% improvement

67
Conclusions

    Comparison of theory-driven and data-driven model in
     terms of prediction accuracy and model interpretability
    Achievement-orientation is more important than
     socialization-orientation in identifying potential churners
    Ensemble model can identify larger proportion of
     potential churners as compared to single global model




68
Course Outline

• Module 1
   •        Introduction to Social Analytics – applying data mining to social computing
            systems; examples of a number of social computing systems, e.g.
            FaceBook, MMO games, etc.
• Module 2
   •        Computational online trust
   •        Identifying key influencers
• Module 3
   •        Analysis of clandestine networks
   •        Katana - game analytics engine




 4/9/2013                                 University of Minnesota                     69
Computational Trust in
   Multiplayer Online
               Games
“If you want to go fast walk alone, if you
want to go far then walk with a group.”
                           - Proverb from Ghana
Virtual Worlds & Massive Online
Games

   Massively Multiplayer Online Role
    Playing Games
    (MMORPGs/MMOs)
   Simulated Environments like
    SecondLife
   Millions of people can interact with
    one another is shared virtual
    environment
   People can engage in a large number
    of activities with one another and with
    the environment
   Many of the observed behaviors have
    offline analogs
                                              72
                                              72
Big Picture Questions
   How is trust expressed differently in different social
    contexts?
       Cooperative (PvE), Adversarial (PvP), …
   How is trust expressed in different types of social
    networks?
       Housing, Mentoring, Trade, Group, …
   What are the characteristics of trust and related networks
    in MMOs?
       Similarities and differences with social networks in other domains
        e.g., citation networks, co-authorship networks
   What role can features derived from the trust network play
    in prediction tasks e.g., link prediction (formation,
    breakage, change), trust propensity, success prediction

                                                                             73
Computational Trust in Multiplayer Online Games
                                            Department of Computer Science, University of Minnesota
                                                        Muhammad Aurangzeb Ahmad
                                                 Trust as a Multi-Level Network Phenomenon




Trust as a multi-modal multi-level Network formulations of Traditional   Incorporating social science theories   Generative Network Models for Trust based
      network phenomenon                Trust related concepts              in trust related prediction task                 social interactions
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional    Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts               in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior        Trust and Mentoring       Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain        No Honor Amongst Thieves
   differences in network signatures

                                                                                                              Trust based and other social networks in MMOs
                                                                                                                 exhibit anomalous network characteristics
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across Social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Housing-Trust in EQ2

 • Access permissions to in-game house
   as trust relationships
    • None: Cannot enter house.
    • Visitor: Can enter the house and can
      interact with objects in the house.
    • Friend: Visitor + move items
    • Trustee: Friend + remove items


 • Houses can contain also items which
   allow sales to other characters without
   exchanging on the market




                                             80
Homophily and Trust
   Homophily: Birds of a feather flock together
   There is no one form of homophily and homophily in
    general is described in multiple ways: Status vs. Value
    Homophily
   Each of these homophiles are in turn defined in multiple
    ways themselves
   Previous literature instantiates homophily in MMOs in
    terms of player characteristics and behavior in the game




                                                          81
Network Models, Homophily and
Trust Networks in MMOs


   RQ1: Does homophily in MMOs operate in ways similar to
    homophily in the offline world?
   RQ2: How do we map characteristics that define
    homophily in the offline world to online settings?




                                                        82
Mapping Homophily in MMOs




   In general, studies of homophily in MMOs assume only one type of homophily and
    generalize based on that type
   Even in the offline world homophily is of different types
   Hence the necessity of Mapping Homophily which we address here
   Mapping and Proteus Effect
                                                                                     83
Trust and Homophily in MMOs

     Homophily Type              Hypothesis               Observation
H   Gender Homophily   Players trust other players who           ?
1                      are of the same gender
H   Age Homophily      Players trust other players who           ?
2                      are of the same age cohorts
H   Class Homophily    Players trust other players who           ?
3                      are of the same class
H   Race Homophily     Players trust other players who           ?
4                      are of the same race
H   Guild Homophily    Players trust other players who           ?
5                      belong to the same guild
H   Level Homophily    Players trust other players who           ?
6                      are at a similar level
H   Challenge          Players trust other players who           ?
7   Homophily          like similar types of challenges

                                                                        84
Key Observations

H1: Players trust other players who are of the same gender?


                           In general players trust other players who are of
                           the same gender


H2: Players trust other players who are of similar age?

                           The stronger the type of trust the lesser is the age
                           difference between the people specifying trust



H3: Players trust other players who are of the same class?

                           Class does not seem to effect the choice of trusting
                           others

                                                                                  85
Key Observations

H4: Players trust other players who are of the same race?


                           Race does not seem to effect the choice of trusting
                           others


H5: Players trust other players who are of the same guild?

                           In general, the stronger the type of trust, the greater
                           is the percentage of the people who trust people in
                           their own guilds


H6: Players trust other players more who are level at a similar rate?

                           Leveling at the same rate does not seem to greatly
                           effect trust amongst players

                                                                                     86
Key Observations

H7: Players trust other players who are of the same level?




 • Level difference seems to have some effect on trust
 • For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level
    players are more likely to trust players who are at a higher level


 Summary:
 • Homophily is observed for a subset of types in MMOs as compared to what it is
   observed for in the offline world
 • The types of homophily which are not observed in MMOs are the ones which are
   greatly effected by game mechanics

                                                                                 87
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Social Networks: General
Observations
   There is an extensive literature on characteristics of social
    networks (Leskovec PAKDD 2005, Leskovec ICDM 2005,
    McGlohon ICDM 2008, McGlohon KDD 2008)
   The network exhibits monotonically
    shrinking diameter over time (Leskovec PAKDD 2005,
    Leskovec ICDM 2005, McGlohon ICDM 2008)
   At a certain point in time called the Gelling Point many
    smaller connected connect together and become part of
    the largest connected component (Leskovec PAKDD 2005,
    Leskovec ICDM 2005, McGlohon ICDM 2008)
   The largest connected component (LCC) comprises of
    the majority of the nodes in the network (>= 80%)
    (McGlohon ICDM 2008, McGlohon KDD 2008)

                                                                89
Social Networks: General
Observations
   The size of the second and the third largest connected
    components remain constant (more or less) even though
    the identity of these components change over time
    (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
    ICDM 2008, McGlohon KDD 2008)
   Network isolates are few in number (<5%)
    (Leskovec PAKDD 2005, Leskovec ICDM 2005)
   The number of connected components decreases over
    time
    (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
    ICDM 2008)
   Relatively fast growth of LCC close to the gelling point
    (Leskovec PAKDD 2005, Leskovec ICDM 2005)

                                                          90
Trust Networks in MMOs

   Data from 4 servers is available. Results from one server
    (Player vs. Environment, „guk‟) are shown
   The network consists of 15,237 nodes, 30,686 edges
    and 1,476 connected components
   Dataset spans from January 2006 to August 2006
   Average node degree of 4.03. The size of the three
    largest connected components are as follows: 9039, 51
    and 49. The largest connected component accounts for
    59% of all the nodes in the network


                                      The Trust Network on „guk‟ on August 31, 2006




                                                                               91
Key Observations

   Observation 1: Preferential Attachment: The rich get
    richer but not too rich
    Explanation: Social bandwidth is limited, Dunbar
    Number
   Observation 2: The growth of the LCC is retarded after
    the gelling point
    Explanation: The trust network has a relatively low
    growth rate as compared to the other networks
   Observation 3: Non-monotonic change in the diameter
    of the largest connected component
    Explanation: Players have different levels of activity at
    various points in time and can also “drop out” of the
    network if they churn from the game

                                                                92
Key Observations
   Observation 4: A large number of isolate components are
    observed (> 1000)
    Explanation: People join in groups and spend all the time
    playing with one another instead of interacting with people
    from the outside
   Observation 5: The number of isolate components
    increases monotonically over time
    Explanation: (Same as observation 4)
   Observation 6: Nodes in the non-LCC constitute a
    significant portion of the network (41%, 8 months after
    gelling point)
    Explanation: (Same as observation 4)


                                                             93
Generative Models of Trust Networks
   Time bound Preferential Attachment: The rich get rich
    but not so much after a certain point in time. Edge
    formation is bound by time
   Presence of Auxiliary Components: Isolate components
    are added to the network at an almost constant rate over
    time




                                                          94
Generative Models of Trust Networks
(ii)

   Non-Monotonic Decrease in the Diameter:
    Nodes become inert after a certain point in time. Sample
    the lifetime of nodes from a normal distribution
   Homophily in Edge formation: Probability of edge
    formation dependent upon node degree as well as
    agreement (similarity) in node characteristics




                                                               95
Results

     Diameter: Non-Monotonically Changing Diameter




              % LCC as being relatively small




                                                     96
Results

          Number of Connected Components




          Network growth and the gelling point




                                                 97
Conclusion: Trust Networks
   Trust Networks in MMOs exhibit many properties which are
    not exhibited by other social networks in most other
    domains
   Proposed a model based on observations and domain
    knowledge
   Models of social networks should incorporate the
    peculiarities which are observed in MMOs in general
   Generalization? Similar observations have been made for
    mentoring networks but not for PvP, Trade and Chat
    Networks




                                                          98
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                        Algorithmic
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Trust Prediction Family of
Problems
Trust Prediction: Given a trust network G predict which
  nodes are going to trust one another in the future




Prediction Across Networks: Given a set of actors who
participate in multiple types of interactions using features from
one network predict the existence of links in the other network




                                                                    100
Trust Prediction Family of
Problems
   Machine Learning/Classification approach to the problem
    of trust prediction
   Introduced a set of new problems (inter-network link
    prediction, trust propensity prediction)
   Proposed an algorithm for link prediction which used
    domain knowledge from social science theories
   New Contributions:
       Does the social context (adversarial vs. cooperative) effect
        prediction results?
       If so then what is the implication for generalization?




                                                                       101
Prediction Task

   Trust Prediction as a classification problem
   60,000 examples for each prediction task
   10 Fold Cross-validation
   Data from Guk (Cooperative) and Nagafen (Adversarial)
    Servers
   Six Standard Classifiers for Comparison: J48, JRip,
    AdaBoost, Bayes Network, Naive Bayes and k-nearest
    neighbor
               Positive Example:             Negative Example:




        Training Period    Test Period   Training Period   Test Period




                                                                         102
Prediction: Group Network




   In general good prediction results are obtained for the
    combat network (in either direction), except one case
   The grouping network is a union of all grouping
    instances and thus it is extremely dense (1,796,438
    edges, 31,900 nodes)
                                                              103
Prediction: Group Network




   Grouping is not a good predictor of mentoring but the vice
    versa is correct
   Mentoring is (more often than not is accompanied by
    grouping) but grouping can be in a variety of contexts e.g.,
    raids, quests, dungeon instances, …
                                                                   104
Prediction: Combat Network




   In general good prediction results are obtained for the combat
    network (in either direction)
   In general if players have played against another person then
    they have friends in other networks who have done the same
    or something similar
                                                                 105
Comparison across Social
Environments




   T: Trust; M: Mentoring; B: Trade
   In general the results are similar for both the
    environments with some exceptions
   Mentoring-Mentoring: In the adversarial environment a
    more cliqueish behavior is observed i.e., friends of
    friends are likely to mentor one another in the future
    which is not the case for the cooperative environment

                                                         106
Comparison across Social
Environments




   Mentoring-Related Tasks: In general mentoring is
    much more prevalent in the adversarial environment as
    compared to the cooperative environment (~3 times) and
    is much more intense. Overlap between mentoring and
    trade is thus more likely




                                                        107
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics



                                                                                                             Algorithmic
                                                                                                       Trust Prediction Family of Problems




                                                                                        Predict Formation, Change, Breakage of Trust with in the same
                                                                                         network and across social networks. Predict trust propensity.


                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Trust Amongst Clandestine Actors

“A plague upon‟t when thieves
cannot be true one to another!”
– Sir Falstaff, Henry IV, Part 1, II.ii




                         Do gold farmers
                         trust each other?
                                             109
Hypergraphs to Represent Tripartite Graphs




   • Accounts can have several characters
   • Houses can be accessed by several characters
   • Projecting to one- or two-model data obscures crucial
     information about embededdness and paths
       • Figure 2a: Can ca31 access the same house as ca11?
       • Figure 2b: Are characters all owned by same account?
                                                                110
Hypergraphs: Key Concepts

                      •     Hyperedge: An edge between three or
                            more nodes in a graph. We use three
                            types of nodes: Character, account and
                            house
                      •     Node Degree: The number of
                            hyperedges which are connected to a
                            node
                                NDh1 = 3
                      •     Edge Degree: The number of
                            hyperedges that an edge participates in
                                EDa1-h1 = 2




                                                                      111
Network Characteristics




• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique
• Players who are connected to a large number of houses are highly active
  players otherwise as well
                                                                         112
Characteristics of Hypergraph Projection Networks
 • Account Projection: Majority of the gold farmer nodes are isolates
   (79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47)
 • Character Projection: Majority of the gold farmer nodes are isolates
   (84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23)
 • House Projection: 521 gold farmer houses. Most are isolates (not
   shown) but others are part of complex structures. Densely connected
   network with gold farmers (7.56) and affiliates (84.02)




                         The Housing Projection Network
                                                                     113
Key Observations

          • Gold farmers grant trust ties less frequently than either
            affiliates or general players
          • Gold farmers grant and receive fewer housing permissions
            (1.82) than their affiliates (4.03) or general player population
            (2.73)




                        Total degree                     In degree                     Out degree

                 <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers        1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates     4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates   2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                    114
Key Observations

          • No honor among thieves
                 • Gold farmers also have very low tendency to grant other
                   gold farmers permission (0.29)
                 • Affiliates also unlikely to trust other affiliates (0.70)




                          Total degree                     In degree                     Out degree

                   <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers          1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates       4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates     2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                      115
Key Observations

      • Affiliates are brokers:
               • Farmers trust affiliates more (1.82) than other farmers (0.29)
               • Affiliates trust farmers more (1.28) than other affiliates (0.70)
               • Non-affiliates have a greater tendency to grant permissions to
                 affiliates (7.77) than in general (2.73)




                           Total degree                     In degree                     Out degree

                    <n>      < nGF >      < nAff >   <n>     < nGF >    < nAff >   <n>     < nGF >     < nAff >

  Farmers           1.82       0.29        1.82      0.89     0.29       0.89      1.07      0.29       1.07

  Affiliates        4.03       1.28        0.70      1.55     0.75       0.70      2.88      0.63       0.70

Non-Affiliates      2.73        -          7.77      1.57       -        5.98      1.56       -         2.34


                                                                                                       116
Frequent Itemset Mining for Frequent Hyper-subgraphs


    Support of a Hyper-subgraph: Given a sub-hypergraph of size k,
    subP is the pattern of interest containing the label P, shP is a pattern
    of the same size as subP and contains the label P, the support is
    defined as follows:




    Support of pattern         also containing a gold farmer (red) = 5/8



                                                                         117
Frequent Itemset Mining for Frequent Hyper-subgraphs


    Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
    size k, subP is the pattern of interest containing the label P, subG is
    a pattern which is structurally equivalent but which does not contain
    the label P, the confidence is defined as follows:




    Confidence of pattern        and containing a gold farmer = 5/7



                                                                        118
Frequent Patterns of GFs
• Very low (s ≤ 0.1) support and confidence for almost all
  (except 8) frequent patterns with gold farmers
• Remaining 8 patterns can be used for discrimination
  between gold farmers and non-gold farmers in a subset of
  the instances
• Gold farmers & affiliates are more connected: A 3rd of more
  complex patterns are associated with affiliates (15/44)




                                                             119
Application: Gold Farmer Detection (Behavioral)
  Classification using in-game behavioral and demographic features
  Each model corresponds to a grouping of different features




                                                                     120
Application: Gold Farmer Detection (Label
                   Propagation)
•   Problem: Not all people who are labeled as normal players are such. Some
    of them are gold farmers but have not been identified as such
•   Analogue: If someone is socializing almost exclusively with criminals then
    he may be a criminal
                                  Classifier       Metric Initial Dataset   Label Prop   Change In Performance
                                  Bayes Net       Precision     0.17          0.189              0.019
                                                    Recall     0.834          0.819              -0.015
                                                   F-Score     0.282          0.307              0.025
                                       J48        Precision    0.494           0.62              0.126
                                                    Recall     0.189          0.337              0.148
                                                   F-Score     0.273          0.437              0.164
                                      J Rip       Precision    0.495          0.537              0.042
                                                    Recall     0.462          0.462                 0
                                                   F-Score     0.478          0.497              0.019
                                      KNN         Precision    0.436           0.46              0.024
                                                    Recall     0.396          0.428              0.032
                                                   F-Score     0.415          0.443              0.028
                              Logistic Regression Precision    0.455          0.534              0.079
                                                    Recall     0.189          0.271              0.082
                                                   F-Score     0.267           0.36              0.093
                                 Naïve Bayes      Precision    0.146          0.142              -0.004
                                                    Recall     0.538          0.502              -0.036
                                                   F-Score      0.23          0.221              -0.009
                               Adaboost w/ DT Precision        0.405          0.471              0.066
                                                    Recall     0.105           0.08              -0.025
                                                   F-Score     0.167          0.137               -0.03



                                                                                                        121
Gold Farmer Detection
   Model 1 (Player Attribute Based Features): These features are
    based on the attributes of the player‟s character in the game e.g.,
    character race, character gender, distribution of gaming activities etc.
   Model 2 (Item Based Features): These are the features which are
    derived from items bought and sold from the consignment network.
    These features are based on the frequency of the frequent items sold
    or bought by gold farmers.
   Model 3 (Player Attribute & Item Based Features): All the attributes
    from the previous two models.
   Model 4 (Item Network Based Features): Features which are derived
    from the item network in a manner analogous to Model 2.
   Model 5 (Player Attribute & Item-Network Based Features): A
    combination of features from Model 1 and Model 4.
   Model 6 (Item Network & Item-Network Based Features): A
    combination of features from Model 2 and Model 4.
   Model 7 (Player Attribute, Item & Item-Network Based Features):
    Union of all the features described above.

                                                                         122
Application: Structural Signatures Approach Applied to Trade Networks

Not sufficient data is present to use the structural signature approach to catch
gold farmers
Alternative: Application of the same approach in other networks: Trade Network




A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO



                                                                               123
Conclusion: Gold Farmer Behavior Analysis and Prediction


          Representation Related Issues
        addressed with respect to trust in
                                    MMOs
            Application of frequent pattern mining to
            discover distinct trust patterns associated
            with gold farmers

         No honor between thieves:
   Gold farmers tend not to trust other
                         gold farmers




                                                           124
Computational Trust in Multiplayer Online Games
                                              Department of Computer Science, University of Minnesota
                                                          Muhammad Aurangzeb Ahmad
                                                   Trust as a Multi-Level Network Phenomenon




  Trust as a multi-modal multi-level Network formulations of Traditional     Incorporating social science theories   Generative Network Models for Trust based
        network phenomenon                Trust related concepts                in trust related prediction task                 social interactions

                                                               Social Characteristics of Trust




Effect of Social Environments on Trust      Trust and Homophily          Trust and Clandestine Behavior         Trust and Mentoring        Trust and Trade




                                        Most types of homophily do not
Different Social environments result in carry over to the MMO domain         No Honor Amongst Thieves
   differences in network signatures

                                                                                                               Trust based and other social networks in MMOs
                                                                                                                  exhibit anomalous network characteristics


                                                                  Trust and Prediction
                 Trust Prediction Family of Problems                               Item Recommendation                      Success Prediction (Social Capital)




 Predict Formation, Change, Breakage of Trust with in the same
 network and across social networks. Predict trust propensity.
                                                                           Effects of social environments on trust           Effect of network structure on trust
Computational Trust in Multiplayer Online Games
                                          Department of Computer Science, University of Minnesota
                                                      Muhammad Aurangzeb Ahmad


           Computational Social Science
        Semantics                                        Structure




     Trust and Homophily                     Characteristics of Trust Networks
Most types of homophily do not          Trust based and other social networks in MMOs
carry over to the MMO domain               exhibit anomalous network characteristics

                                                                                                Predictive Analysis
                                                                                                  Trust Prediction Family of Problems




                                                                                   Predict Formation, Change, Breakage of Trust with in the same
                                                                                    network and across social networks. Predict trust propensity.



                           Applications
                          Detection of Clandestine Actors




                                        No Honor Amongst Thieves
                Gold Farmer Detection
Conclusion
   Availability of data allows one to analyze phenomenon
    where it was not possible to do so in the past (Trust in
    MMOs in the current case)
   Explored a series of big-picture questions pertaining to
    trust in MMOs
   Similar affordances and contexts (with respect to the offline
    world) lead to similar outcomes in the online world
   Explored and expanded the scope of trust related
    prediction tasks
   Analysis is required on more datasets for generalizability




                                                              127
Acknowledgement
   DMR Lab
       Professor Jaideep Srivastava and all DMR lab members especially
        Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak,
        Kyong Jin Shim and Nisheeth Srivastava
   Virtual Worlds Observatory
       Professor Noshir Contractor (Northwestern), Professor Scott Poole
        (UIUC), Professor Dmitri Williams (USC)
       External Collaborators at Northwestern, UIUC, USC, U Toronto
        especially Brian Keegan
   Funding Agencies:
       NSF, AFRL, NSCTA, IARPA




                                                                       128
Publication Summary

Publication Type    Total   Thesis Related          Comment
Book                 1            1          Book on Clandestine
                                             Behaviors and Networks
                                             (Springer 2012)

Conference           14           5
Papers
Book Chapters        1            -
Journal Papers       2            -
Workshop Papers      10           3          2 Best Paper Awards
Short/Poster         8            1
Papers
Technical Reports    4            1
Tutorials            4            1
27 Co-authors
Patents              2            1
                                                                      129
A Computational Model
                    for
      Social Influence
Objective: Model social
influence in a dynamic network
setting as a spread of cascades
to identify key influential nodes



                                    131
Outline

   Team Overview
   Optimal Target Selection
   Current approaches
   Proposed Approach
   Results
   Update summary
   Q&A




                               132
Motivation




                                                               Original
                                                               Retweet


    Global retweets of Tweets coming from Japan for one
                 hour after the earthquake
                                                                 133
  Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
Optimal Target People Selection
   Goal: Find the optimal targets to maximize influence
    spread in network
   Why is it important?
       Public Polls: Sentiment spread
       Sales and Marketing: Word-of-mouth spread
       Public Health: Disease spread
   Influence: A force that attempts to change the opinion or
    behavior of the individual
   Influence is causal
       My friend buys a product -> I buy a product



                                                           134
Current Approaches (1/2)

   Assume a certain model for propagation of influence [1]
       Independent Cascade Model
           Each node is independently influences the neighbor
            using a biased coin toss
       Linear Threshold Model
           Each node picks a random threshold
           Each node infects the neighbor by a specified
            amount



             All of them need to know the
                propagation probability                       135
Current Approaches (1/2)

   Use a two step approach
     Step 1: Choose influence Model

     Step 2: Find the subset of top-k nodes that maximizes
      the influence
   Bad News: Step 2 is NP-Hard [2]
   Popular method for step 2: Greedy Heuristics
     Greedy heuristics gives (1-1/e) approximation

   Address scalability issues in the second step [3][4][5][6]
     CELF [3]: Influence maximization function follows law
      of diminishing returns (sub-modular)

                                                             136
Limitations of current approaches


   Estimation/Learning of propagation
    probabilities
       Data-driven and model free approach
   Propagation models are not content
    unaware
       Content-aware influencer mining
   No causality effect in influence model
       Add causal effect in influence
        propagation
                                              137
Proposed Approach


Communication
   Logs


                Frequent     Network
                Sequence   Discovery of
                 Mining    Influencers




                                          Ordered list of
                                           influencers

  Network
  Structure
                                                  138
Sequence Cascade Mining

                                              Example


                 Temporal order
                                    Append neighbors from network i=2



                                    Check downward closure and
                                    support count
Peter, Charles, Kim, Vicky, Nancy
       Network structure


   Let Support = 2
                                                                        139
Sequence Cascade Mining

                                              Example



                                    Append neighbors from network i=3



                                    Check downward closure and
                                    support count
Peter, Charles, Kim, Vicky, Nancy
       Network structure


   Let Support = 2
                                                                        140
Sequence Cascade Mining

                                              Example



                                    Append neighbors from network i=4


                                    Break loop; No more work to do…

Peter, Charles, Kim, Vicky, Nancy
       Network structure
                                      Return Frequent Sequences

   Let Support = 2
                                                                        141
Sequence Cascade Mining



                   L1 Frequent Sequence Mining




                   Candidate Generation for k+1
                   By appending neighbors

                   Downward closure pruning

                   Support Count and reorganize




                                              142
Influence Function
   Node pattern Influence Set (Q(i,p))
                    =
   Influence Set of a Node

   Influence Set of a Node Set




    It can be shown that influence function I(V,s) is
              monotone and sub-modular
                                                        143
Network Discovery of Influencers




  Greedy heuristic with (1-1/e) approximation
                  guarantee                     144
Network Discovery of Influencers

Frequent Sequences

                                       Example
                                                            i=1
                              Choose the node with max out-degree
                Find top-3    (Randomly break tie between P
                influencers   and C)
            P                   C chosen
                              Current Set of Influenced People
   C                 N
                                   C     K     P

        K       V
                                                              145
Network Discovery of Influencers

Frequent Sequences

                                   Example
                                                       i=2
                         Choose the node with max out-degree



            P              P   chosen

                         Current Set of Influenced People
   C                 N
                               C    K     P     V     N

        K       V
                                                          146
Network Discovery of Influencers

Frequent Sequences

                                   Example
                                                       i=3
                         Choose the node with max out-degree
                         (Randomly break tie between K, V,
                         N)
            P              K    chosen

                         Current Set of Influenced People
   C                 N
                               C    K     P     V     N

        K       V
                                                          147
Context-Aware Influencers


                                         Context Specific
                                           influencers



 Bag words     Frequent     Network
representing   Sequence   Discovery of
 the context    Mining    Influencers




                                                 148
Initial Results (1/2)




       DBLP Dataset             USPTO Dataset


Significant performance gain over established
             baseline PrefixSpan                149
Initial Results (2/2)



DBLP
Dataset




USPTO
Dataset


                        150
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep
Jaideep

More Related Content

Similar to Jaideep

Loss Of Innocence Essay
Loss Of Innocence EssayLoss Of Innocence Essay
Loss Of Innocence EssayMichelle Sykes
 
Iab Microsoft Search Social Media Report 6343
Iab Microsoft Search Social Media Report 6343Iab Microsoft Search Social Media Report 6343
Iab Microsoft Search Social Media Report 6343SHIJU VARGHESE
 
CMCs 1st ProDoc School seminar
CMCs 1st ProDoc School seminarCMCs 1st ProDoc School seminar
CMCs 1st ProDoc School seminarSara Vannini
 
Data socialscienceprogramme
Data socialscienceprogrammeData socialscienceprogramme
Data socialscienceprogrammedan mcquillan
 
Social Media for Social Good
Social Media for Social GoodSocial Media for Social Good
Social Media for Social GoodThe New School
 
Learning In a Participatory Culture: Debunking the Myth of Digital Natives
Learning In a Participatory Culture:  Debunking the Myth of Digital NativesLearning In a Participatory Culture:  Debunking the Myth of Digital Natives
Learning In a Participatory Culture: Debunking the Myth of Digital NativesAnna van Someren
 
Searching Twitter: Separating the Tweet from the Chaff
Searching Twitter: Separating the Tweet from the ChaffSearching Twitter: Separating the Tweet from the Chaff
Searching Twitter: Separating the Tweet from the ChaffASOS.com
 
Networks, swarms and policy: what collective intelligence means for policy ma...
Networks, swarms and policy: what collective intelligence means for policy ma...Networks, swarms and policy: what collective intelligence means for policy ma...
Networks, swarms and policy: what collective intelligence means for policy ma...Alberto Cottica
 
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...Universidad Nebrija
 
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...Kamil Kopecky
 
Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.The New School
 
Stepping into the Internet_MISQ SI on VWs
Stepping into the Internet_MISQ SI on VWsStepping into the Internet_MISQ SI on VWs
Stepping into the Internet_MISQ SI on VWsRobin Teigland
 
Digital Citizenship Summit 2014
Digital Citizenship Summit 2014Digital Citizenship Summit 2014
Digital Citizenship Summit 2014Elizabeth Calhoon
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Wael Elrifai
 
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...Premsankar Chakkingal
 
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Marc Smith
 
Content marketing in the arts
Content marketing in the artsContent marketing in the arts
Content marketing in the artsPatrick Towell
 

Similar to Jaideep (20)

Loss Of Innocence Essay
Loss Of Innocence EssayLoss Of Innocence Essay
Loss Of Innocence Essay
 
Iab Microsoft Search Social Media Report 6343
Iab Microsoft Search Social Media Report 6343Iab Microsoft Search Social Media Report 6343
Iab Microsoft Search Social Media Report 6343
 
CMCs 1st ProDoc School seminar
CMCs 1st ProDoc School seminarCMCs 1st ProDoc School seminar
CMCs 1st ProDoc School seminar
 
Data socialscienceprogramme
Data socialscienceprogrammeData socialscienceprogramme
Data socialscienceprogramme
 
Social Media for Social Good
Social Media for Social GoodSocial Media for Social Good
Social Media for Social Good
 
Learning In a Participatory Culture: Debunking the Myth of Digital Natives
Learning In a Participatory Culture:  Debunking the Myth of Digital NativesLearning In a Participatory Culture:  Debunking the Myth of Digital Natives
Learning In a Participatory Culture: Debunking the Myth of Digital Natives
 
Searching Twitter: Separating the Tweet from the Chaff
Searching Twitter: Separating the Tweet from the ChaffSearching Twitter: Separating the Tweet from the Chaff
Searching Twitter: Separating the Tweet from the Chaff
 
Networks, swarms and policy: what collective intelligence means for policy ma...
Networks, swarms and policy: what collective intelligence means for policy ma...Networks, swarms and policy: what collective intelligence means for policy ma...
Networks, swarms and policy: what collective intelligence means for policy ma...
 
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...
Media Literacy and Digital Skills for Enhancing Critical Thinking in Networke...
 
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...
The Minecraft Phenomenon in the Czech Environment (Research Report) ENGLISH V...
 
Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.Wisdom of the Crowd vs. Collective Intelligence.
Wisdom of the Crowd vs. Collective Intelligence.
 
Stepping into the Internet_MISQ SI on VWs
Stepping into the Internet_MISQ SI on VWsStepping into the Internet_MISQ SI on VWs
Stepping into the Internet_MISQ SI on VWs
 
Digital Citizenship Summit 2014
Digital Citizenship Summit 2014Digital Citizenship Summit 2014
Digital Citizenship Summit 2014
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...Social network analysis (SNA) - Big data and social data - Telecommunications...
Social network analysis (SNA) - Big data and social data - Telecommunications...
 
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
 
ase-social-informatics (6)
ase-social-informatics (6)ase-social-informatics (6)
ase-social-informatics (6)
 
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
 
Content marketing in the arts
Content marketing in the artsContent marketing in the arts
Content marketing in the arts
 
Maria Bäcke Defense
Maria Bäcke DefenseMaria Bäcke Defense
Maria Bäcke Defense
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Jaideep

  • 1. Social Analytics Mining Behaviors of a Connected World PAKDD School April 11, 2013 Sydney, Ausytralia Jaideep Srivastava University of Minnesota srivasta@cs.umn.edu 4/9/2013 University of Minnesota 1
  • 2. Course Outline • Module 1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc. • Module 2 • Computational online trust • Identifying key influencers • Information flow in networks • Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 2
  • 4. Social Network Analysis O a iz tio a rg n a n l Sc l o ia A th p lo y n ro o g T e ry ho P y h lo y sco g C g itiv on e P rc p n S c -C g itiv e e tio o io o n e K o le g nw de N tw rk e o s N tw rk e o s Ra e lity Sc l o ia K o le g nw de N tw rk e o s N tw rk e o s A q a ta c c u in n e K o le g nw de E id m lo y p e io g (lin s k) (c n n o te t) S c lo y o io g  Social science networks have widespread application in various fields  Most of the analyses techniques have come from Sociology, Statistics and Mathematics  See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis 12/02/06 IEEE ICDM 2006 4
  • 5. What have been it‟s key scientific successes?  In classical social sciences numerous results  „Six degree of separation‟ [Milgram]  Popularized by the „Kevin Bacon game‟  „The strength of weak ties‟ [Granovetter]  „Online networks as social networks‟ [Wellman, Krackhardt]  „Dunbar Number‟  Various types of centrality measures  Etc.  In the Web era  „The Bow-Tie model of the Web‟ [Raghavan]  „Preferential attachment model‟ [Barabasi] (Yes and No)  „Powerlaw of degree distribution‟ [Lots of people] (NO!)  Etc.
  • 6. Application successes  Numerous in social sciences  Google – PageRank  LinkedIn – expanding your Cognitive Social Network  making you aware that „you‟re more connected and closer than you think you are‟  Expertise discovery in organizations  Knowledge experts, „authorities‟  Well-connected individuals, „hubs‟  Rapid-response teams in emergency management  Information flow in organizations  Twitter – real time information dissemination  Etc.
  • 7. Online (Multiplayer) Games High How social Low Low High How enagaging
  • 8. Player Behavior & Revenue Model  Blizzard (subscription)  Zynga (free2play)  World of Warcraft  Farmville, Fishville, Mafia  12 million subscribers Wars, etc.  Revenue model  180 million players  $15/month  Revenue model  Approx $3billion annual  Virtual goods revenue  $700 million in 2010  4 hours a day, 7 days a  0.5 hrs a day, 7 days a week week! Hard core gamers Everyone Less socially acceptable More socially acceptable Like Cocaine Like Caffeine
  • 9. Implications of this „addiction‟  3 billion hours a week are being spent playing online games  Jane McGonigal in “Reality is Broken”  Labor economics  What is the impact of so much labor being removed from the pool [Castranova]  Entertainment economics  If MMO players can get 100 hrs/month of entertainment by spending $25 or so, what will happen to other entertainment industries?  Psychological/Sociological  Is it an addiction – the prevailing view (Chinese government‟s „detox centers‟ for kids)  Are they fulfilling a deeper need that real world is not (McGonigal)  Societal  A trend far too important to not be taken seriously!
  • 11. Levis‟ – Example of Social Retail  Levis‟ leverages its brand to ensure customers provide their social network  Levis‟ can leverage predictive social analytics technology to understand the value of the customer‟s social network 11
  • 12. Opportunity, Innovation, Impact  Companies do not understand the social graph of their customers  It‟s not just about how they relate to their customers, but also about how customers relate to each other vs.  Understanding these relationships unlocks immense value  Innovation: Understanding the social network of customers  Key influencers, relationship strength, …  Impact: Deriving actionable insights from this understanding  Customer acquisition, retention, customer care, …  Social recommendation, influence-based marketing, identifying trend-setters, … Ninja Metrics confidential information. Copyright 2012 12
  • 13. Unlocking true value by product, category, or store 0021 -$128.61 -$293.79 -$79.63 13
  • 14. True Value of each customer  True value = individual value + social value  Who really matters, and to what degree  Some empirical facts  31% activity due to socialization  23% more individual + 8% more social activity The individual‟s their social and their true lifetime value influence total 14
  • 15. Impact of New Instrumentation on Science  1950s  Invention of the electron microscope fundamentally changed chemistry from „playing with colored liquids in a lab‟ to „truly understanding what‟s going on‟  1970s  Invention of gene sequencing fundamentally changed biology from a qualitative field to a quantitative field  1980s  Deployment of the Hubble (and other) Space telescopes has had fundamental impact on astronomy and astrophysics  2000s  Massive adoption is fundamentally changing social science research  Massively Multiplayer Online Games (MMOGs) and Virtual Worlds (VWs) are acting as „macroscopes of human behavior‟
  • 16. The Virtual World Observatory (VWO) Project • Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers • Noshir Contractor, Northwestern: Networks • M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups • Jaideep Srivastava, Minnesota: Computer Science • Dmitri Williams, USC: Social Psychology • Collaborators • Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci, Michigan), … • Data and technology partners • Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft (Chevalier‟s Romance), others … • Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
  • 17. Overall Goals of the VWO Project Basic Science • behavior, socialization, … • novel, scalable algorithms • NSF Sticky Social Media tons of Analytics Government applications data • Social science • Games • team dynamics (Army) models • skills acq, leadership (Army) • Virtual Worlds • New algorithms • social influence & adolescent • Other social • Ultra-scalability health (CDC) • etc. apps Business applications • customer churn • game design • anti-social behavior • etc.
  • 18. Collaborators, Sponsors, Partners  Team  Financial Sponsors  Data Partners  Technology Partners
  • 19. Part II – Impact on Science
  • 20. Findings from a Player Survey
  • 21. Who is playing?  It is not just a bunch of kids  Average age is 31.16 (US population median is 35)  More players in their 30s than in their 20s.
  • 22. How much do they play?  Mean is 25.86 hours/week  Compares to US mean of 31.5 for TV (Hu et al, 2001) • From prior experimental work, MMO play eats into entertainment TV and going out, not news • So much for kids being the ones with the free time.
  • 23. Gender Differences  More men players (78/22%)  Men played to compete , and women played to socialize  Men play more other games, but it was the women who were more satisfied EQ2 players  Women: 29.32 hours/week  Men: 25.03 hours/week  Likelihood of quitting: “no plans to quit”: women 48.66%, men 35.08%  Self reported play times  Women: 26.03 (3 hours less than actual)  Men: 24.10 (1 hour less than actual)  Boys and girls are socialized early on, and thus have clear role expectations for their behaviors and identities (Gender Role Theory in action!!)
  • 24. Playing with a partner
  • 25. Inferring RW gender from VW data Goal What virtual world behaviors and characteristics predict real world gender? Data:  Survey Data n=7119  Survey Character Store  EQ2 Character Store Variables  Avatar Characteristics:  Gender, Race, Class, Experience, Guild Rank, Alignment, Archetypes  Game play Behaviors:  Total Deaths & Quests, PvP Kills & Deaths, Achievement Points, Number of Characters, Time played, Communication patterns 25
  • 26. Gender prediction results  Close to 95% prediction accuracy  Decision trees work rather well  Character Gender, Race and Class are significant predictors to real life gender.  Gender swapping is rarer, but systematically different by real gender  Players tend to choose:  character gender based on their real life gender  character races that are gendered: women play elves/men play barbarians  classes that are gendered: women play priests/men play fighters 26
  • 27. Gender swapping behavior Game Character Real Gender Gender Male Female Total Male 4065 82.6% 98 8.2% 4163 68.0% Female 855 17.4% 1104 91.8% 1959 32.0% Total 4920 100.0% 1202 100.0% 6122 100.0% Observation • Far more males gender swap than females • Why? • Men are more creative? • Women have less identity confusion? • Women get their „fill of gender swapping in real life‟  27
  • 28. Economics: A test of RW  VW mapping  Do players behave in virtual worlds as we expect them to in the actual world?  Economics is an obvious dimension to test  In the real world, perfect aggregate data are hard to get
  • 29. GDP and Price Level  GDP and price levels are robust but comparatively unstable GDP and Prices on Antonia Bayle 5,000,000 160 4,500,000 140 4,000,000 120 Prices (January = 100) 3,500,000 100 GDP (Gold) 3,000,000 2,500,000 80 2,000,000 60 1,500,000 40 1,000,000 20 500,000 0 0 January February March April May Nominal GDP Price Level (January = 100)
  • 30. Money Supply and Price Change in Money Supply and Population on Antonia Bayle  The instability is 2500 4000 explicable through 3000 Change in Money (000 Gold) 2000 the Quantity Theory 2000 Change in Accounts 1000 of Money 1500 0 -1000  a rapid influx of 1000 -2000 money . . . 500 -3000 -4000 0 -5000 February March April May Change in Money Supply (000 Gold) Change in Active Accounts  . . . dramatically Price Level on Antonia Bayle boosted prices 50 40 Percent Change in Price Level 30  More evidence that 20 this behaves like a 10 real economy 0 -10 February March April May June -20 Price Level
  • 31. Networks in Virtual Worlds SONIC Advancing the Science of Networks in Communities
  • 32. Why do we create and sustain networks?  Theories of self-interest  Theories of contagion  Theories of social and  Theories of balance resource exchange  Theories of homophily  Theories of mutual interest  Theories of proximity and collective action  Theories of co-evolution Sources: Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review. Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press. SONIC Advancing the Science of Networks in Communities
  • 33. “Structural signatures” of Social Theories A A B B F + F + - C - E C E D D Self interest Exchange Balance A A B F B F - + + C E C D E Novice D Expert Collective Action Homophily Contagion SONIC Advancing the Science of Networks in Communities
  • 34. Black: male Red: female Partnership Instant messaging SONIC Trade Mail Advancing the Science of Networks in Communities
  • 35. 4 5 2 0 0 0 (1) 0 0 0 (2) 0 1 0 1 3 0 1 2 0 0 1 0 (n) 0 0 0 Social Networks (1) (2) (n) as network structure frequency Cluster Structure vectors in a bag-of-words model Vectors using Text clustering methods Social theory + IR Cluster means provide Attribute values for based network modes of network each cluster can be analysis structure configurations used to discover Making up all the trends between network social networks structures and attributes
  • 36. Results – normalized network structure vector means for all clusters
  • 37. Results  Clusters 1 and 4 are similar  Groups kill fewer monsters  Group members in cluster 4 do not communicate much  Group members in cluster 1 generally limit their communication to just one other person in the group  Most people belong to these two clusters  Consistent with previous research - users in virtual environments are less likely to interact with strangers [N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring the social dynamics of massively multiplayer online games, Proceedings CHI06, ACM Press, New York, 407-416.]  Cluster 5 groups have many 1-edge and 2-out stars  Most of the communication is one way possibly indicating presence of central people  Maximum number of monsters killed out of all clusters  Performance of the groups is very good  Minimal communication  It is possible that cluster 5 consists of groups more focused on playing and performing well in the game and less on socializing
  • 38. Some Open Questions  How similar/dissimilar are online social networks from real- world social networks?  Is online socialization  Only a sustainability activity of real-world networks?  Causing new social networks to be formed?  Is fundamentally a different type of networking activity?  A fundamental tenet of socialization has been “geography/proximity drives socialization”  How is this being impacted by online socialization?
  • 39. TeamSkill: Modeling Team Chemistry in Online Multi- Player Games
  • 40. Description and motivation  The goal of this work is to improve skill assessment approaches used in multiplayer games, especially team-based games  Xbox Live, PSN, Steam, and Battle.net have between 149-167 million total users, collectively (and that number is growing)  Many of the most popular games are team-based: Halo, Call of Duty, TF2, CS, etc  Why?  Better skill assessment = fairer games = less player attrition  More accurate rankings of players/teams  Most previous work has focused on individuals, not teams  It‟s a hard problem  Online setting: Updates to a player‟s skill distribution must be done after each game is played  Applicability: Any generalized assessment technique cannot include game-specific data  Our datasets are from the games of professional Halo 3 players 40
  • 41. What is Halo?  Halo is one of the most well-known “first-person shooter” video game series in the world  Online/LAN-based multiplayer is its most significant component  650,000 to 850,000 unique users per day at its peak 41
  • 42. Major League Gaming  Our work focuses on professional Halo 3 players  Online scrimmage data, as well as complete tournament data, is readily available from bungie.net and mlgpro.com  Players are highly-skilled individually – allows us to better focus on group-level What is Major League Gaming (MLG)? performance characteristics • A professional league for competitive  Players change teams gaming, including Halo 3 from ‟08-‟10 • Best players in the world compete in this regularly, helps isolate league impact of particular players • Last tournament watched by over on overall team performance 1,000,000 people online  Unexplored boundary case • Our datasets are comprised of games played by these players for skill assessment 42
  • 43. Skill assessment  An old problem (with different applications in different contexts)  Paired comparison estimation  Foundational work by Thurstone (1927) and Bradley-Terry (1952)  Elo (1959), popularized in chess ranking (FIDE, USCF)  Glicko (1993) – player-level ratings volatility incorporated (σ2), addition of rating periods  TrueSkill (2006) - factor-graph based approach used in Microsoft‟s Xbox Live gaming service  Used to match players/teams up with each other online  Our work focuses on Elo, Glicko, and TrueSkill 43
  • 44. Elo (1959)  Arpad Elo, Dept of Physics, Marquette University  Was a master chess player in USCF  Proposed the Elo Rating System  Replaced earlier systems of competitive rewards (i.e., tournaments) in USCF/FIDE  Simple to implement: assumes player skill is normally- distributed with constant variance β2  Still widely-used today 44
  • 45. Glicko (1993)  Mark E. Glickman, Department of Health Policy and Management , Boston University  Proposed the Glicko Rating System  Addresses rating reliability issue through the use of rating periods and player-specific variances  Elo is a special case of Glicko  Iterative approach for approximating the marginal posterior distribution of a player‟s skill conditional on other players‟ priors  More computationally tractable for large datasets 45
  • 46. TrueSkill (2006)  Ralf Herbrich and Thore Graepel at Microsoft Research, Cambridge, UK  Used for automated ranking/matchmaking on Xbox Live  Uses factor graphs to model multi-player, multi-team environments  Converges quickly (~5 games for a player)  Large-scale deployment  30 million Xbox Live members  150+ games use TrueSkill for ranking & matchmaking 46
  • 47. Issues with current approaches 1 2 3 4 + 1234  Basic idea  Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)  Sum across all team members  Not intuitive: Team chemistry is a well-known concept in team-based competition [Martens 1987, Yukelson 1997] and it is not captured in any of these models  Can think of it as the overall dynamics of a team resulting from leadership, confidence, relationships, and mutual trust  Independence assumption not realistic in teams, especially at high levels of play 47
  • 48. More information is available, however… 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234  Observation: We have more information than just the history of individual players – we also know the histories of groups of players  Player 1‟s history ∩ player 2‟s history  history of {1, 2}  Existing approaches only make use of top row  Idea: Estimate the skills of subgroups of players on a team, combine in some way, and use to produce better estimate of a team‟s skill 48
  • 49. Reframing the assessment problem k=1 1 2 3 4 k=2 12 13 14 23 24 34 k=3 123 124 134 234 k=4 1234  Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)  Player-level skill representation  generalized subgroup skill representation  For each game and for each k <= size of the team (K), treat each group as you would an individual player and update skill accordingly  Hashing of skill variable matrix according to unordered subgroup membership  Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting  Each rating says something about the skill of that particular subgroup  …but how do we aggregate these ratings to estimate the skill of a team? 49
  • 50. TeamSkill  Four different aggregation approaches:  TeamSkill-K  TeamSkill-AllK  TeamSkill-AllK-EV  TeamSkill-AllK-LS 50
  • 51. Aggregation issues black = history available red = no history available time: t=1 t = 100 t = 200 After 1 game New player – 5 - who‟s New player – 6 – who‟s never played with 1, 2, or 3 played with 3 or 5 (not both)  Real world case: assume that players can leave/join the 4-player team and look at the timeline  The question at each point in time, t: how best to combine the available group ratings to produce a team rating?  The problem: the feature space is expanding and contracting over time. 51
  • 52. Data set overview  Collected over the course of 2009  7,590 games (2,076 from tournaments and 5,514 from Xbox Live scrimmages)  448 players on 140 different professional and semi- professional teams  Games took place during January 2008 through January 2010  Websites pertaining to this data  http://stats.halofit.org - Player/team statistics  http://halofit.org – Datasets and related information “Friend or foe” social network for all tournaments in 2008 and 2009 52
  • 53. Evaluation  The TeamSkill approaches were evaluated by predicting the outcomes of games occurring prior to 10 MLG tournaments and comparing their accuracy to unaltered versions (k = 1) of their base learner rating systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible choices of k for teams of 4, 1 ≤ k ≤ 4, were used)  For each tournament, we evaluated each rating approach using:  3 types of training data sets - games consisting only of previous tournament data, games from online scrimmages only, and games of both types.  3 periods of game history - all data except for the data between the test tournament and the one preceding it (“long”), all data between the test tournament and the one preceding it (“recent”), and all data before the test tournament (“complete”).  We will only show “complete” as results for “long” and “recent” mirrored those for “complete”  2 types of games - the full dataset and those games considered “close” (i.e., prior probability of one team winning close to 50%). 53
  • 54. Results – all games Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 54
  • 55. Results – “close” games Prediction accuracy for both tournament and scrimmage/custom games using complete history Prediction accuracy for tournament games using complete history Prediction accuracy for scrimmage/custom games using complete history 55
  • 56. Conclusions  Modified several existing assessment approaches to operate on generalized player subgroup entities (instead of just individual players)  Introduced four aggregation methods for combining player subgroup information to produce final forecast  Shown evidence that close games are decided on the basis of team chemistry, consistent with sports psychology research 56
  • 57. Part III: Impact on Business
  • 59. Time Equals Stickiness 6.00 5.00 6 Ratio of Quitters to Stayers 4.00 3.00 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level  There are less quitters as the levels go up, and focus should be on the first 20 levels.
  • 60. Solo vs. Social Players 6.00 5.00 Ratio of Quitters to Stayers 4.00 3.00 Solo Social 2.00 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level  Isolated players are 3.5x more likely to quit (B = 1.26, p<.001). Focus design on facilitating social interaction.
  • 61. Problem Statement & Approach  Objective: At time t, for player p, given a window w, compute the probability, P(p,t,w), of p churning within the interval (t,t+w), and the confidence in this probability.  Approach: Estimate P(p,t,w) and its confidence using  Data about p‟s activity and socialization behavior  Statistics and machine learning (linear, non-linear, ensemble, etc.)  Use of socio-psychological theories of player motivation  Novel synthesis of data driven (DD) and theory driven (TD) approaches 61
  • 62. Player Motivation Theories Richard Bartle, 1996 Nick Yee, 2005 62
  • 63. Mapping MMO Behaviors to Theory  Example MMORPG  Persistent virtual world  Players have avatars  Quest-driven  Participation in groups, guilds  Dataset (game logs)  Time period: FEB-JUN, 2006  No. of accounts: ~16000  Churner definition: 2 months of inactivity 63
  • 64. Synthesis of TD and DD approach 64
  • 65. Model Evaluation (Confusion Matrix)  Model performance for different features (10 fold cross-validation) Features Tree size Precision (%) Recall (%) F-measure (nodes) (%) Pure DD 485 69.3 84 76 TD: 59 67.1 76.2 71.3 Achievement+Social TD: Achievement 37 67.2 73.2 70.1 TD: Socialization 7 64.3 55.4 59.5  Observations  Decision tree learning works quite well  DD model very accurate (76%); difficult to interpret (485 nodes)  TD model interpretable (7 – 59 nodes), but less accurate  Achievement-orientation more important than socialization 65
  • 66. Model Evaluation (Lift Chart)  Observations  TD model does better in predicting top quintile of churners  DD model performs better in the 40%-70% range 66
  • 67. Ensemble Approach – Model Evaluation  Ensemble approach  A „committee of classifiers‟ votes on the result  Heterogeneity helps No. of Precision Recall F-measure clusters 5 66.23 91.94 76.99 10 66.49 91.08 76.87 15 67.58 89.80 77.12 20 67.6 90.13 77.25  Improvements over single model  Best recall value shows 7.94% improvement , i.e. model can identify larger proportion of potential churners)  Best F-measure shows 1.25% improvement 67
  • 68. Conclusions  Comparison of theory-driven and data-driven model in terms of prediction accuracy and model interpretability  Achievement-orientation is more important than socialization-orientation in identifying potential churners  Ensemble model can identify larger proportion of potential churners as compared to single global model 68
  • 69. Course Outline • Module 1 • Introduction to Social Analytics – applying data mining to social computing systems; examples of a number of social computing systems, e.g. FaceBook, MMO games, etc. • Module 2 • Computational online trust • Identifying key influencers • Module 3 • Analysis of clandestine networks • Katana - game analytics engine 4/9/2013 University of Minnesota 69
  • 70. Computational Trust in Multiplayer Online Games
  • 71. “If you want to go fast walk alone, if you want to go far then walk with a group.” - Proverb from Ghana
  • 72. Virtual Worlds & Massive Online Games  Massively Multiplayer Online Role Playing Games (MMORPGs/MMOs)  Simulated Environments like SecondLife  Millions of people can interact with one another is shared virtual environment  People can engage in a large number of activities with one another and with the environment  Many of the observed behaviors have offline analogs 72 72
  • 73. Big Picture Questions  How is trust expressed differently in different social contexts?  Cooperative (PvE), Adversarial (PvP), …  How is trust expressed in different types of social networks?  Housing, Mentoring, Trade, Group, …  What are the characteristics of trust and related networks in MMOs?  Similarities and differences with social networks in other domains e.g., citation networks, co-authorship networks  What role can features derived from the trust network play in prediction tasks e.g., link prediction (formation, breakage, change), trust propensity, success prediction 73
  • 74. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions
  • 75. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics
  • 76. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 77. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across Social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 78. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 79. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 80. Housing-Trust in EQ2 • Access permissions to in-game house as trust relationships • None: Cannot enter house. • Visitor: Can enter the house and can interact with objects in the house. • Friend: Visitor + move items • Trustee: Friend + remove items • Houses can contain also items which allow sales to other characters without exchanging on the market 80
  • 81. Homophily and Trust  Homophily: Birds of a feather flock together  There is no one form of homophily and homophily in general is described in multiple ways: Status vs. Value Homophily  Each of these homophiles are in turn defined in multiple ways themselves  Previous literature instantiates homophily in MMOs in terms of player characteristics and behavior in the game 81
  • 82. Network Models, Homophily and Trust Networks in MMOs  RQ1: Does homophily in MMOs operate in ways similar to homophily in the offline world?  RQ2: How do we map characteristics that define homophily in the offline world to online settings? 82
  • 83. Mapping Homophily in MMOs  In general, studies of homophily in MMOs assume only one type of homophily and generalize based on that type  Even in the offline world homophily is of different types  Hence the necessity of Mapping Homophily which we address here  Mapping and Proteus Effect 83
  • 84. Trust and Homophily in MMOs Homophily Type Hypothesis Observation H Gender Homophily Players trust other players who ? 1 are of the same gender H Age Homophily Players trust other players who ? 2 are of the same age cohorts H Class Homophily Players trust other players who ? 3 are of the same class H Race Homophily Players trust other players who ? 4 are of the same race H Guild Homophily Players trust other players who ? 5 belong to the same guild H Level Homophily Players trust other players who ? 6 are at a similar level H Challenge Players trust other players who ? 7 Homophily like similar types of challenges 84
  • 85. Key Observations H1: Players trust other players who are of the same gender? In general players trust other players who are of the same gender H2: Players trust other players who are of similar age? The stronger the type of trust the lesser is the age difference between the people specifying trust H3: Players trust other players who are of the same class? Class does not seem to effect the choice of trusting others 85
  • 86. Key Observations H4: Players trust other players who are of the same race? Race does not seem to effect the choice of trusting others H5: Players trust other players who are of the same guild? In general, the stronger the type of trust, the greater is the percentage of the people who trust people in their own guilds H6: Players trust other players more who are level at a similar rate? Leveling at the same rate does not seem to greatly effect trust amongst players 86
  • 87. Key Observations H7: Players trust other players who are of the same level? • Level difference seems to have some effect on trust • For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level players are more likely to trust players who are at a higher level Summary: • Homophily is observed for a subset of types in MMOs as compared to what it is observed for in the offline world • The types of homophily which are not observed in MMOs are the ones which are greatly effected by game mechanics 87
  • 88. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 89. Social Networks: General Observations  There is an extensive literature on characteristics of social networks (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008)  The network exhibits monotonically shrinking diameter over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  At a certain point in time called the Gelling Point many smaller connected connect together and become part of the largest connected component (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  The largest connected component (LCC) comprises of the majority of the nodes in the network (>= 80%) (McGlohon ICDM 2008, McGlohon KDD 2008) 89
  • 90. Social Networks: General Observations  The size of the second and the third largest connected components remain constant (more or less) even though the identity of these components change over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008, McGlohon KDD 2008)  Network isolates are few in number (<5%) (Leskovec PAKDD 2005, Leskovec ICDM 2005)  The number of connected components decreases over time (Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon ICDM 2008)  Relatively fast growth of LCC close to the gelling point (Leskovec PAKDD 2005, Leskovec ICDM 2005) 90
  • 91. Trust Networks in MMOs  Data from 4 servers is available. Results from one server (Player vs. Environment, „guk‟) are shown  The network consists of 15,237 nodes, 30,686 edges and 1,476 connected components  Dataset spans from January 2006 to August 2006  Average node degree of 4.03. The size of the three largest connected components are as follows: 9039, 51 and 49. The largest connected component accounts for 59% of all the nodes in the network The Trust Network on „guk‟ on August 31, 2006 91
  • 92. Key Observations  Observation 1: Preferential Attachment: The rich get richer but not too rich Explanation: Social bandwidth is limited, Dunbar Number  Observation 2: The growth of the LCC is retarded after the gelling point Explanation: The trust network has a relatively low growth rate as compared to the other networks  Observation 3: Non-monotonic change in the diameter of the largest connected component Explanation: Players have different levels of activity at various points in time and can also “drop out” of the network if they churn from the game 92
  • 93. Key Observations  Observation 4: A large number of isolate components are observed (> 1000) Explanation: People join in groups and spend all the time playing with one another instead of interacting with people from the outside  Observation 5: The number of isolate components increases monotonically over time Explanation: (Same as observation 4)  Observation 6: Nodes in the non-LCC constitute a significant portion of the network (41%, 8 months after gelling point) Explanation: (Same as observation 4) 93
  • 94. Generative Models of Trust Networks  Time bound Preferential Attachment: The rich get rich but not so much after a certain point in time. Edge formation is bound by time  Presence of Auxiliary Components: Isolate components are added to the network at an almost constant rate over time 94
  • 95. Generative Models of Trust Networks (ii)  Non-Monotonic Decrease in the Diameter: Nodes become inert after a certain point in time. Sample the lifetime of nodes from a normal distribution  Homophily in Edge formation: Probability of edge formation dependent upon node degree as well as agreement (similarity) in node characteristics 95
  • 96. Results Diameter: Non-Monotonically Changing Diameter % LCC as being relatively small 96
  • 97. Results Number of Connected Components Network growth and the gelling point 97
  • 98. Conclusion: Trust Networks  Trust Networks in MMOs exhibit many properties which are not exhibited by other social networks in most other domains  Proposed a model based on observations and domain knowledge  Models of social networks should incorporate the peculiarities which are observed in MMOs in general  Generalization? Similar observations have been made for mentoring networks but not for PvP, Trade and Chat Networks 98
  • 99. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 100. Trust Prediction Family of Problems Trust Prediction: Given a trust network G predict which nodes are going to trust one another in the future Prediction Across Networks: Given a set of actors who participate in multiple types of interactions using features from one network predict the existence of links in the other network 100
  • 101. Trust Prediction Family of Problems  Machine Learning/Classification approach to the problem of trust prediction  Introduced a set of new problems (inter-network link prediction, trust propensity prediction)  Proposed an algorithm for link prediction which used domain knowledge from social science theories  New Contributions:  Does the social context (adversarial vs. cooperative) effect prediction results?  If so then what is the implication for generalization? 101
  • 102. Prediction Task  Trust Prediction as a classification problem  60,000 examples for each prediction task  10 Fold Cross-validation  Data from Guk (Cooperative) and Nagafen (Adversarial) Servers  Six Standard Classifiers for Comparison: J48, JRip, AdaBoost, Bayes Network, Naive Bayes and k-nearest neighbor Positive Example: Negative Example: Training Period Test Period Training Period Test Period 102
  • 103. Prediction: Group Network  In general good prediction results are obtained for the combat network (in either direction), except one case  The grouping network is a union of all grouping instances and thus it is extremely dense (1,796,438 edges, 31,900 nodes) 103
  • 104. Prediction: Group Network  Grouping is not a good predictor of mentoring but the vice versa is correct  Mentoring is (more often than not is accompanied by grouping) but grouping can be in a variety of contexts e.g., raids, quests, dungeon instances, … 104
  • 105. Prediction: Combat Network  In general good prediction results are obtained for the combat network (in either direction)  In general if players have played against another person then they have friends in other networks who have done the same or something similar 105
  • 106. Comparison across Social Environments  T: Trust; M: Mentoring; B: Trade  In general the results are similar for both the environments with some exceptions  Mentoring-Mentoring: In the adversarial environment a more cliqueish behavior is observed i.e., friends of friends are likely to mentor one another in the future which is not the case for the cooperative environment 106
  • 107. Comparison across Social Environments  Mentoring-Related Tasks: In general mentoring is much more prevalent in the adversarial environment as compared to the cooperative environment (~3 times) and is much more intense. Overlap between mentoring and trade is thus more likely 107
  • 108. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Algorithmic Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 109. Trust Amongst Clandestine Actors “A plague upon‟t when thieves cannot be true one to another!” – Sir Falstaff, Henry IV, Part 1, II.ii Do gold farmers trust each other? 109
  • 110. Hypergraphs to Represent Tripartite Graphs • Accounts can have several characters • Houses can be accessed by several characters • Projecting to one- or two-model data obscures crucial information about embededdness and paths • Figure 2a: Can ca31 access the same house as ca11? • Figure 2b: Are characters all owned by same account? 110
  • 111. Hypergraphs: Key Concepts • Hyperedge: An edge between three or more nodes in a graph. We use three types of nodes: Character, account and house • Node Degree: The number of hyperedges which are connected to a node NDh1 = 3 • Edge Degree: The number of hyperedges that an edge participates in EDa1-h1 = 2 111
  • 112. Network Characteristics • Long tail distributions are observed for the various degree distributions • The mapping from character-house to an account is always unique • Players who are connected to a large number of houses are highly active players otherwise as well 112
  • 113. Characteristics of Hypergraph Projection Networks • Account Projection: Majority of the gold farmer nodes are isolates (79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47) • Character Projection: Majority of the gold farmer nodes are isolates (84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23) • House Projection: 521 gold farmer houses. Most are isolates (not shown) but others are part of complex structures. Densely connected network with gold farmers (7.56) and affiliates (84.02) The Housing Projection Network 113
  • 114. Key Observations • Gold farmers grant trust ties less frequently than either affiliates or general players • Gold farmers grant and receive fewer housing permissions (1.82) than their affiliates (4.03) or general player population (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 114
  • 115. Key Observations • No honor among thieves • Gold farmers also have very low tendency to grant other gold farmers permission (0.29) • Affiliates also unlikely to trust other affiliates (0.70) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 115
  • 116. Key Observations • Affiliates are brokers: • Farmers trust affiliates more (1.82) than other farmers (0.29) • Affiliates trust farmers more (1.28) than other affiliates (0.70) • Non-affiliates have a greater tendency to grant permissions to affiliates (7.77) than in general (2.73) Total degree In degree Out degree <n> < nGF > < nAff > <n> < nGF > < nAff > <n> < nGF > < nAff > Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07 Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70 Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34 116
  • 117. Frequent Itemset Mining for Frequent Hyper-subgraphs Support of a Hyper-subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, shP is a pattern of the same size as subP and contains the label P, the support is defined as follows: Support of pattern also containing a gold farmer (red) = 5/8 117
  • 118. Frequent Itemset Mining for Frequent Hyper-subgraphs Confidence of a Hyper-Subgraph: Given a sub-hypergraph of size k, subP is the pattern of interest containing the label P, subG is a pattern which is structurally equivalent but which does not contain the label P, the confidence is defined as follows: Confidence of pattern and containing a gold farmer = 5/7 118
  • 119. Frequent Patterns of GFs • Very low (s ≤ 0.1) support and confidence for almost all (except 8) frequent patterns with gold farmers • Remaining 8 patterns can be used for discrimination between gold farmers and non-gold farmers in a subset of the instances • Gold farmers & affiliates are more connected: A 3rd of more complex patterns are associated with affiliates (15/44) 119
  • 120. Application: Gold Farmer Detection (Behavioral) Classification using in-game behavioral and demographic features Each model corresponds to a grouping of different features 120
  • 121. Application: Gold Farmer Detection (Label Propagation) • Problem: Not all people who are labeled as normal players are such. Some of them are gold farmers but have not been identified as such • Analogue: If someone is socializing almost exclusively with criminals then he may be a criminal Classifier Metric Initial Dataset Label Prop Change In Performance Bayes Net Precision 0.17 0.189 0.019 Recall 0.834 0.819 -0.015 F-Score 0.282 0.307 0.025 J48 Precision 0.494 0.62 0.126 Recall 0.189 0.337 0.148 F-Score 0.273 0.437 0.164 J Rip Precision 0.495 0.537 0.042 Recall 0.462 0.462 0 F-Score 0.478 0.497 0.019 KNN Precision 0.436 0.46 0.024 Recall 0.396 0.428 0.032 F-Score 0.415 0.443 0.028 Logistic Regression Precision 0.455 0.534 0.079 Recall 0.189 0.271 0.082 F-Score 0.267 0.36 0.093 Naïve Bayes Precision 0.146 0.142 -0.004 Recall 0.538 0.502 -0.036 F-Score 0.23 0.221 -0.009 Adaboost w/ DT Precision 0.405 0.471 0.066 Recall 0.105 0.08 -0.025 F-Score 0.167 0.137 -0.03 121
  • 122. Gold Farmer Detection  Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc.  Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers.  Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.  Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2.  Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4.  Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4.  Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above. 122
  • 123. Application: Structural Signatures Approach Applied to Trade Networks Not sufficient data is present to use the structural signature approach to catch gold farmers Alternative: Application of the same approach in other networks: Trade Network A set of standard machine learning models are used: Naive Bayes, Bayes Net, Logistic Regression, KNN, J48, JRip, AdaBoost and SMO 123
  • 124. Conclusion: Gold Farmer Behavior Analysis and Prediction Representation Related Issues addressed with respect to trust in MMOs Application of frequent pattern mining to discover distinct trust patterns associated with gold farmers No honor between thieves: Gold farmers tend not to trust other gold farmers 124
  • 125. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Trust as a Multi-Level Network Phenomenon Trust as a multi-modal multi-level Network formulations of Traditional Incorporating social science theories Generative Network Models for Trust based network phenomenon Trust related concepts in trust related prediction task social interactions Social Characteristics of Trust Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade Most types of homophily do not Different Social environments result in carry over to the MMO domain No Honor Amongst Thieves differences in network signatures Trust based and other social networks in MMOs exhibit anomalous network characteristics Trust and Prediction Trust Prediction Family of Problems Item Recommendation Success Prediction (Social Capital) Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Effects of social environments on trust Effect of network structure on trust
  • 126. Computational Trust in Multiplayer Online Games Department of Computer Science, University of Minnesota Muhammad Aurangzeb Ahmad Computational Social Science Semantics Structure Trust and Homophily Characteristics of Trust Networks Most types of homophily do not Trust based and other social networks in MMOs carry over to the MMO domain exhibit anomalous network characteristics Predictive Analysis Trust Prediction Family of Problems Predict Formation, Change, Breakage of Trust with in the same network and across social networks. Predict trust propensity. Applications Detection of Clandestine Actors No Honor Amongst Thieves Gold Farmer Detection
  • 127. Conclusion  Availability of data allows one to analyze phenomenon where it was not possible to do so in the past (Trust in MMOs in the current case)  Explored a series of big-picture questions pertaining to trust in MMOs  Similar affordances and contexts (with respect to the offline world) lead to similar outcomes in the online world  Explored and expanded the scope of trust related prediction tasks  Analysis is required on more datasets for generalizability 127
  • 128. Acknowledgement  DMR Lab  Professor Jaideep Srivastava and all DMR lab members especially Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak, Kyong Jin Shim and Nisheeth Srivastava  Virtual Worlds Observatory  Professor Noshir Contractor (Northwestern), Professor Scott Poole (UIUC), Professor Dmitri Williams (USC)  External Collaborators at Northwestern, UIUC, USC, U Toronto especially Brian Keegan  Funding Agencies:  NSF, AFRL, NSCTA, IARPA 128
  • 129. Publication Summary Publication Type Total Thesis Related Comment Book 1 1 Book on Clandestine Behaviors and Networks (Springer 2012) Conference 14 5 Papers Book Chapters 1 - Journal Papers 2 - Workshop Papers 10 3 2 Best Paper Awards Short/Poster 8 1 Papers Technical Reports 4 1 Tutorials 4 1 27 Co-authors Patents 2 1 129
  • 130. A Computational Model for Social Influence
  • 131. Objective: Model social influence in a dynamic network setting as a spread of cascades to identify key influential nodes 131
  • 132. Outline  Team Overview  Optimal Target Selection  Current approaches  Proposed Approach  Results  Update summary  Q&A 132
  • 133. Motivation Original Retweet Global retweets of Tweets coming from Japan for one hour after the earthquake 133 Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
  • 134. Optimal Target People Selection  Goal: Find the optimal targets to maximize influence spread in network  Why is it important?  Public Polls: Sentiment spread  Sales and Marketing: Word-of-mouth spread  Public Health: Disease spread  Influence: A force that attempts to change the opinion or behavior of the individual  Influence is causal  My friend buys a product -> I buy a product 134
  • 135. Current Approaches (1/2)  Assume a certain model for propagation of influence [1]  Independent Cascade Model  Each node is independently influences the neighbor using a biased coin toss  Linear Threshold Model  Each node picks a random threshold  Each node infects the neighbor by a specified amount All of them need to know the propagation probability 135
  • 136. Current Approaches (1/2)  Use a two step approach  Step 1: Choose influence Model  Step 2: Find the subset of top-k nodes that maximizes the influence  Bad News: Step 2 is NP-Hard [2]  Popular method for step 2: Greedy Heuristics  Greedy heuristics gives (1-1/e) approximation  Address scalability issues in the second step [3][4][5][6]  CELF [3]: Influence maximization function follows law of diminishing returns (sub-modular) 136
  • 137. Limitations of current approaches  Estimation/Learning of propagation probabilities  Data-driven and model free approach  Propagation models are not content unaware  Content-aware influencer mining  No causality effect in influence model  Add causal effect in influence propagation 137
  • 138. Proposed Approach Communication Logs Frequent Network Sequence Discovery of Mining Influencers Ordered list of influencers Network Structure 138
  • 139. Sequence Cascade Mining Example Temporal order Append neighbors from network i=2 Check downward closure and support count Peter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 139
  • 140. Sequence Cascade Mining Example Append neighbors from network i=3 Check downward closure and support count Peter, Charles, Kim, Vicky, Nancy Network structure Let Support = 2 140
  • 141. Sequence Cascade Mining Example Append neighbors from network i=4 Break loop; No more work to do… Peter, Charles, Kim, Vicky, Nancy Network structure Return Frequent Sequences Let Support = 2 141
  • 142. Sequence Cascade Mining L1 Frequent Sequence Mining Candidate Generation for k+1 By appending neighbors Downward closure pruning Support Count and reorganize 142
  • 143. Influence Function  Node pattern Influence Set (Q(i,p)) =  Influence Set of a Node  Influence Set of a Node Set It can be shown that influence function I(V,s) is monotone and sub-modular 143
  • 144. Network Discovery of Influencers Greedy heuristic with (1-1/e) approximation guarantee 144
  • 145. Network Discovery of Influencers Frequent Sequences Example i=1 Choose the node with max out-degree Find top-3 (Randomly break tie between P influencers and C) P C chosen Current Set of Influenced People C N C K P K V 145
  • 146. Network Discovery of Influencers Frequent Sequences Example i=2 Choose the node with max out-degree P P chosen Current Set of Influenced People C N C K P V N K V 146
  • 147. Network Discovery of Influencers Frequent Sequences Example i=3 Choose the node with max out-degree (Randomly break tie between K, V, N) P K chosen Current Set of Influenced People C N C K P V N K V 147
  • 148. Context-Aware Influencers Context Specific influencers Bag words Frequent Network representing Sequence Discovery of the context Mining Influencers 148
  • 149. Initial Results (1/2) DBLP Dataset USPTO Dataset Significant performance gain over established baseline PrefixSpan 149