3. Motivation, Goals
Mumbai Terror Attack 2008
Citizen sensor observations (flickr, twitter,
blogs..)
No matter where you looked, tapping into a
cultural perception was impossible
We wanted to know what people in India
were saying vs. those in Pakistan or the
U.S.A
4. Spatio-Temporal-Thematic Slices of
Real-time Data
Around NEWS-WORTHY EVENTS
Using space and time as cues for extracting
social perceptions (behind signals)
Summarizing hundreds and thousands of
real-time observations
12. Find resources related to
Find resources related to
social perceptions
social perceptions
Browsing Real-time Data in Context
News and
News and
Wikipedia articles
Wikipedia articles
toto put extracted
put extracted
SOYLENT GREEN and the HEALTH CARE REFORM descriptors in
descriptors in
context
context
News and
Wikipedia articles
to put extracted
descriptors in
context
✓Exploit spatio, temporal semantics for thematic aggregation
Exploit spatio, temporal semantics for thematic aggregation
17. Topical Tweets
Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
2: Google insights to expand hashtag list
18. Topical Tweets
Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
2: Google insights to expand hashtag list
19. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
20. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
21. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
Check for topic drifts
22. Topical Tweets
3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
1500 tweets per query
4. Obtain other Hashtags in crawled tweets
Check for topic drifts
5. Repeat from Step 3 and babysit!
23. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
ata Collection, analysis metadata of tweets
and visualizing in
ly Relevant Data
ning citizen observations from Twitte
24. Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
profile
Location: Dayton, OH (Google geocoder service, GeoDB)
Location: “best place in the world” (fail!)
25. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
metadata of tweets
ta Collection, analysis and visualizing in
Step3: Spatio-temporal
clusters
y Relevant Data
26. Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
that generated this data!
Long-running, world-wide events (Iran Election Protest)
clusters by country and week?
Short, world-wide events (Olympics)
clusters by country and day?
Long-running, evolving, local events (Health Care
Reform Debate)
clusters by state and day?
Tunable parameters
27. Tweets in a Spatio-Temporal Cluster
Spatio-temporal bias dictate granularity of
processing tweets
Mumbai Terror Attack
Cluster1: Tweets from India, 08/1/08
Cluster2: Tweets from Pakistan, 08/1/08
Cluster n: Tweets from USA, 08/13/08
28. Architecture
Step1 : Gathering event-
relevant tweets
Step2: Spatial, Temporal
metadata of tweets
Step3: Spatio-temporal
ta Collection, analysis andclusters
visualizing in
Step4: Thematic Descriptors
in spatio-temporal cluster
y Relevant Data
30. n-gram descriptors
“President Obama in trying to regain control of the
health-care debate will likely shift his pitch in September”
1-grams: President, Obama, in, trying, to, regain, ...
2-grams: “President Obama”, “Obama in”, “in
trying”, “trying to”...
3-grams: “President Obama in”, “Obama in trying”;
“in trying to”...
32. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
33. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
Spatial Importance (local vs. global popularity)
34. Thematic Descriptors
“President” “President Obama” “President Obama in”
A descriptor is an n-gram weighted by:
Thematic Importance
redundancy: statistically discriminatory in nature
variability: contextually important
Spatial Importance (local vs. global popularity)
Temporal Importance (always popular vs. currently
trending)
35. Thematic Importance of an n-gram
“President” “President Obama” “President Obama in”
Exploiting Redundancy
tfidf of n-gram (Lucene Index)
amplify by fraction of nouns in the n-gram
(Stanford Natural Language Parser)
amplify by fraction of non-stop words (‘going to
try’)
36. Thematic Importance of an n-gram
Exploiting Variability
Big three/Big 3; Ford, GM, Chrysler, General
Motors..
Contextually relevant words boost statistical
importance #)$
*&'+,-('$
Focus word (fw) : “big three” #(1('2-$
)/%/',$
!"#$%&'(($
Associated words (awi) : ./'0$
co-occurring in spatio-temporal set of tweets
37. Thematic Importance of an n-gram
#)$
*&'+,-('$
focus word (fw): Big Three
#(1('2-$ !"#$%&'(($
)/%/',$
associated word (awi): Ford
./'0$
Thematic importance of focus word:
tfidf of fw tfidf of awi
association strength of fw and awi
38. focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated words
of the focus word only with the strongly word co-occu
nguage [9]. Borrowing fromassociations is in thisword co-occure
to measure strength of past success to use area, we mea
rengthlanguage [9]. Borrowingwordpast success in this area, words a
between the focus from and the associated we meas
Contextual Relevance
strength between the focus word and the associated words as
he notion of point-wise mutual information in terms of co-o
the notion of point-wise mutual information in terms of co-oc
We measure assocstr scores as aas a function ofthe point-wisem
We measure assocstr scores function of the point-wise
etweenbetween the word Strengthcontextandawi .i . This is done
the focus focus word and the context of awi This is done
Association and the of fw of aw
ssociation strengths are determined in in the contexts thatthe d
association strengths are determined the contexts that the
Let us depends on contexts Cawi ={caw1 ,caw ..} where caw
et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk
contexts for awi aw as
strong descriptors collocate with awawiassoc str(f w,aw) )isis
rong descriptors that that collocate with . . assoc (f w,awi c
i str i
Contexts of associated P (pmi(f w,caw ))
word awi : ‘Ford’
assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi
i )=
k
k ))
|Cawi |
!"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw
|Cawi |
where the point-wise mutual information between f w and ca
here the i)*'+$is calculated as:
aw ),point-wise mutual information between f w and c
Pointwise Mutual Information
wi ), is calculated big
chrysler, GM, as: 3 p(f w,caw )
k p(cawk |f w)
pmi(f w,cawk )=log p(f w)p(caw )
=log p(cawk )
k
focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f)
where p(f w)= pmi(f k |f w)=
n(f w)
;p(caw
p(f w,cawk
n(cawk ,f w)
w)
; n(f w) =log frequency
p(caw
N n(f w) k k
39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai attack even
ocus word and all associations in Cf w . The thematic weights of
long with Temporal Importance of a1 to compu
their strengths are plugged into Eqn
Descriptor
hematic score ngrami (th), of the n-gram descriptor.
B. Temporal Importance of an event descriptor: While th
re good indicators of what will always dominate
Certain descriptors is important in a spatio-tempora
escriptors tend to dominate discussions. In order to allow
discussions
ossibly interesting descriptors to surface, we discount the th
“Terrorism” in Mumbai Terror Attack Tweets
escriptor depending on how popular it has been in the recent p
iscount score for a n-gram, a Care reform debatedepending on
“Healthcare” in Health tuneable factor
vent, is calculated over a period of time as:
Allow recent (possibly interesting) ones to
surface ngram (te)=temporal ∗
PD ngrami (th)d
i bias d=1 d
0-1 bias: less to more importance
here ngrami (th)d is the enhanced thematic score
to recent n-grams of the descri
40. ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be relevant f
ons. For this reason, we also apply a temporalbias weight ranging fr
weight closer to 1 Importance of while a weight closer to 0
Spatial activity.
gives more importance, a Descriptor
portance to past
ial Importance of an event descriptor: We also discount the im
a descriptor based on its occurence in other spatio-temporal sets
is that Local descriptors are more interesting compared ar
descriptors that occur all over the world on a given day
sting compared to those that occur only in the spatio-temporal set
to global ones
We define the spatial discount score for an n-gram as a fraction of sp
Spatial discount
artitions (e.g. countries) that had activity surrounding this descri
k
ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias )
fraction of spatio-temporal closer to 0 = global
clusters n-gram occurred in importance
41. of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
rent spatio-temporal sets. For example, when processi
STT Score of an n-gram
Mumbai attack setting the spatialbias to 1 eliminate
ial signals. While processing tweets from the US, on
obal bias given that the event did not originate the
are setSpatio-temporal-thematic score of aof observations
before we begin the processing descriptor
he spatial thematic score - spatio-temporal discountsfrom
= and temporal effects are discounted
final spatio-temporal-thematic (STT) weight of the n
wi =ngrami (th)−ngrami (te)−ngrami (sp)
illustrates the effect of our enhanced STT weights
ptors pertaining to the Mumbai terror attack event,