Towards Maximising Cross-Community Information Diffusion

Digital Enterprise Research Institute www.deri.ie

Towards Cross-Community
Information Diffusion
Maximisation
Václav Belák, Samantha Lam, Conor Hayes

© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Enabling Networked Knowledge

Motivation

•  Information cascades of high interest in marketing, CRM, etc.
•  A common approach is to maximise information diffusion by
targeting influential actors
•  In the context of many online communities (e.g. discussion
fora) the information is shared to the community as a whole
and not to individual actors

common case – targeting individuals cross-community case – targeting communities


Objectives

•  Our main hypothesis is that it is possible to efficiently
spread a message over the information flow network by
targeting highly influential communities

•  The main problem is then formulated as a prediction of
the set of communities to target such that the message is
spread over the network as much as possible
•  Spread over the actors, i.e. user activation fraction
•  Spread over the communities, i.e. community
activation fraction


Methods: Definition of Impact

•  We propose (Belák et al., ‘12) to take two factors into account:
1.  degree of community membership of the users
2.  centrality of the users within each community

•  Impact of community A on community B defined as an average centrality of
actors from A within B, weighted by their membership in A


Methods: Targeting
Communities

•  Level of dispersion (heterogeneity) of total impact of community i can be
measured as an entropy of an i-th row/column of the impact matrix

•  We propose to target communities by means of the product of the total
impact of community i and its entropy: impact focus (IF)

•  We simulated the diffusion by extending Independent Cascade (ICM) and
Linear Threshold (LTM) Models (Kempe et al., ‘03)
1.  Take q target communities and sample s users from each of them
2.  Run the original models from the union of sampled users
•  Information diffusion network derived from the reply-to network:
replies to
i rji j

information
i j
ﬂow wij


Evaluation Strategy

•  IF compared with random targeting (R), and group in-degree (GI)
(Everett & Borgatti, ’99)

•  The main aim was to investigate robustness of our framework with
respect to:
•  Character of the system
•  Diffusion models
•  User and Community Activation Fractions

•  Procedural outline
1.  Target q communities using one of the heuristics evaluated on
the data from time-slice t
2.  Run the diffusion model on the network from time-slice t+1
3.  Compute an average user and community spreads over all
pairs (t, t+1)


Evaluation Data-Sets

•  51 weeks of data of the largest Irish
discussion board system
•  Segmented using 1 week sliding window
•  1 week window represents approx. 84% of
cross-fora posting activity
•  540 communities, 5.3k users/snapshot (avg)

•  5 years of data from the technical support fora of SAP
•  Used only for the diffusion experiments
•  Segmented using 2 months sliding window
•  2 months represent approx. 50% of cross-fora posting
activity
•  33 communities, 2k users/snapshot (avg)


User Act. Fraction

One targeted community
q=1, Boards−LTM q=1, SAP−LTM
0.8

0.30
0.7

0.25
0.6
mean user activation fraction (u)

mean user activation fraction (u)

0.20
0.5

0.15
0.4

0.10
0.3

0.05
0.2

IF IF
GI GI
0.00
0.1

R R

5 10 15 20 5 10 15 20

user sample size (s) user sample size (s)


Community Act. Fr.

One targeted community

0.5
0.8
0.7

0.4
mean community activation fraction (c)

0.6

0.3
0.5
0.4

0.2
0.3

0.1
0.2

IF IF
GI GI
0.1

0.0

R R

5 10 15 20 5 10 15 20



Community Act. Fr.

Five targeted communities

0.5
0.8
0.7

0.4

0.6

0.3
0.5
0.4

0.2
0.3

0.1
0.2

IF IF
GI GI
0.1

0.0

R R

5 10 15 20 5 10 15 20



Results Highlights

•  Diffusion process became saturated at approximately 80% of users
or communities in Boards, and 30% in SAP
•  More efficient to target few communities

•  Impact Focus outperformed the other two strategies with respect to
both user and community activation fractions, namely for small
number of targeted communities (i.e. [1, 2]) and
seed users (i.e. [1, 20])
•  Diminishing returns

•  For high number of targeted communities and seed users, random
strategy outperformed the other two with respect to community
activation fractions in SAP data-set
•  SAP network fragmented into many small components, which
made it hard to reach peripheral communities


Conclusion

•  The evaluation demonstrated that the framework
•  is able to identify highly influential communities
•  can predict which communities to target s.t. the
message spreads efficiently over both individual users
and communities

•  We aim to extend it with content analysis
•  E.g. What are the most influential communities with
respect to a particular topic?

•  We will also investigate empirically-observed topic
cascades and modify our models accordingly if needed


Questions?

References

•  Belák V., Lam S., Hayes C. Cross-Community Influence in Discussion
Fora. ICWSM. AAAI, 2012.
•  M. Everett and S. Borgatti. The centrality of groups and classes. J. of
Mathematical Sociology, 23(3):181–201, 1999.
•  D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of
influence through a social network. SIGKDD. ACM, 2003.


Towards Maximising Cross-Community Information Diffusion

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Towards Maximising Cross-Community Information Diffusion

Similar to Towards Maximising Cross-Community Information Diffusion (20)

More from Václav Belák

More from Václav Belák (6)

Recently uploaded

Recently uploaded (20)

Towards Maximising Cross-Community Information Diffusion