Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.
3. Problem Statement
To
iden$fy
and
characterize
user
communi$es
on
TwiRer
during
crisis
events,
using
semi-‐automated
techniques.
3
4. Data: Three 2011 Crisis Events
Event:
Hurricane
Irene
Tweets
collected:
90,237
Users:
55,718
Event:
Earthquake
in
VA
Tweets
collected:
277,604
Users:
219,621
Event:
England
Riots
Tweets
collected:
1,165,628
Users:
546,966
4
5. Our Contributions
Defined
a
novel
similarity
metric
to
compute
similari$es
between
users
based
on
their
links,
content
posted
and
meta
data.
Applied
spectral
clustering
to
obtain
communi$es
of
users
formed
during
three
different
crisis
events.
Showed
most
central
people
(by
degree
centrality)
represent
what
users
in
the
cluster
are
talking
about
with
81%
accuracy
on
average.
5
6. Methodology
STEP
1:
Computer
User*User
Similarity
Matrix
STEP
2:
Use
spectral
clustering
to
obtain
communi$es
of
users
STEP
3:
Use
degree
centrality
to
compute
top
users
in
each
cluster
6
7. Methodology
user*user
similarity
matrix,
U[i,j]
-
Content
Similarity,
C[i,j]
Computed
based
on
common
words,
hashtags
and
URLs
in
tweets
- Link
Similarity,
L[i,j]
Based
on
users
retweet,
men$on
or
reply
to
each
other
tweets
or
a
common
third
person’s
tweet
- Meta-‐data
Similarity,
M[i,j]
Similarity
between
two
users
based
on
their
loca$on
(country,
state
or
city)
U[i,j]
=
C[i,j]
+
L[i,j]
+
M[i,j],
for
each
user
i
and
j
7
8. Top Users Identification
Use
degree
centrality
- Social
Network
Analysis
metric
- Number
of
links
incident
upon
a
node
(i.e.,
the
number
of
$es
that
a
node
has)
In
each
cluster
obtained
- We
compute
the
top
30
users
using
degree
centrality
8
9. Evaluation Metrics
Modularity
- How
good
is
the
division
of
the
network
in
communi$es
Mean
average
precision
- With
how
much
precision
can
we
say
that
the
influencers
in
the
cluster
represent
the
en$re
cluster
9
10. Results: Modularity score
The
modularity
score
greater
than
0.5
for
all
events
- Hurricane
Irene
=
0.67
- England
Riots
=
0.51
- Virginia
Earthquake
=
0.53
We
conclude
that
the
division
of
graph
into
communi$es
is
of
good
quality
10
11. Results: MAP Performance
• Top
users
based
on
degree
centrality
represent
the
opinions
and
topic
of
the
en$re
cluster
during
crisis
events.
• In
case,
of
a
random
TwiRer
sample,
the
above
is
not
true.
11
13. Results: Characterization
C2:Username
C2:Description
C3:Username
C3:Description
top_tw_science
none
YaekoAvilez3837
none
top_trend_world
none
MarxInbody5668
none
Neednewsnow
none
KerrieRamagano3
none
iBreakings
none
AureliaHickman1
none
metofficestorm
Official Met Office page for latest information on tropical storms,
hurricanes, typhoons and cyclones
LenitaBross5906
none
RES911CU
RES911CU Bringing you the latest news/weather from around the world &
country using over 1000 news sources
BeckieSteakley4
none
qianafgtt
none
KirbyPietig6221
none
Georgeannla47
none
GailSterner5774
none
moneyworlds
none
LeroyRasheed
none
LinJosh2011
none
ClorindaKan1926
none
top_trend_uk
none
StaceeRosebure2
none
ShareWith911
Emergency Information Sharing Network connecting 911,First Responders
and Citizens before, during and after emergencies of all shapes and sizes.
VaniaAsmus
none
Hurricane
Irene
• Community
2:
formed
of
official
news
media,
emergency
and
alert
profiles
(ShareWith911,
metofficestorm,
RES911CU,
Neednewsnow)
• Community
3:
formed
with
common
or
general
users
of
TwiRer
(KerrieRamagano3,
YaekoAvilez3837,
AureliaHickman1)
13
14. Conclusions
The
results
obtained
with
the
case
study
of
three
events
show
the
poten$al
of
applying
spectral
clustering
and
centrality
measures,
to
iden$fy
community
of
users
and
the
prominent
top
users
in
each
community,
during
crisis
events.
14
15. Future Work
Construct
weighted
similarity
metrics
to
compute
similarity
between
users
Include
behavioral
features
of
users
in
similarity
metric
construc$on
Conduct
a
more
generic
experiment
with
more
crisis
events
15
16. References
Gupta
et
al.
Credibility
ranking
of
tweets
during
high
impact
events.
In
Proceedings
of
the
1st
Workshop
on
Privacy
and
Security
in
Online
Social
Media,
PSOSM
’12.
Java
et
al.
Detec$ng
Communi$es
via
Simultaneous
Clustering
of
Graphs
and
Folksonomies.
In
Proceedings
of
the
Tenth
Workshop
on
Web
Mining
and
Web
Usage
Analysis
(WebKDD).
ACM,
August
2008.
Kwak
et
al.
What
is
twiRer,
a
social
network
or
a
news
media?
In
Proceedings
of
the
19th
interna$onal
conference
on
World
wide
web,
WWW
’10.
ACM.
Longueville
et
al.
“omg,
from
here,
i
can
see
the
flames!”:
a
use
case
of
mining
loca$on
based
social
networks
to
acquire
spa$o-‐temporal
data
on
forest
fires,
LBSN,
2009.
Mendoza
et
al.
TwiRer
under
crisis:
can
we
trust
what
we
rt?
In
Proceedings
of
the
First
Workshop
on
Social
Media
Analy$cs,
SOMA
’10.
Newman
et
al.
Analysis
of
weighted
networks.
Phys
Rev
E
Stat
Nonlin
Sor
MaRer
Phys,
70(5
Pt
2),
Nov.
2004.
Shi
et
al.
Normalized
cuts
and
image
segmenta$on.
In
Proceedings
of
the
1997
Conference
on
Computer
Vision
and
PaRern
Recogni$on
(CVPR
’97).
Vieweg
et
al.
Microblogging
during
two
natural
hazards
events:
what
twiRer
may
contribute
to
situa$onal
awareness.
In
CHI
’10.
Wang
et
al.
Detec$ng
community
kernels
in
large
social
networks.
In
Proceedings
of
Interna$onal
Conference
on
Data
Mining,
ICDM
’11.
Welch
et
al.
Topical
seman$cs
of
twiRer
links.
In
Proceedings
of
the
fourth
interna$onal
conference
on
Web
search
and
data
mining,
WSDM
’11.
16