Finding bursty topics from microblogs

FINDING BURSTY TOPICS
FROM MICROBLOGS
Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim
Living Analytics Research Centre
School of Information Systems
Singapore Management University

Abstract



1.

2.

To ﬁnd topics that have bursty patterns on
microblogs
two observations:
posts published around the same time are
more likely to have the same topic
posts published by the same user are more
likely to have the same topic

Introduction


Retrospective bursty event detection ：
Bursty detection: state machine
 Topic discovery: LDA



1.

2.

Two assumptions:
If a post is about a global event, it is likely to
follow a global topic distribution that is timedependent.
If a post is about a personal topic, it is likely
to follow a personal topic distribution that is
more or less stable overtime.

Method


Preliminaries
d i

, u i , t i , w i,j
 a bursty topic b as a word distribution coupled with
a bursty interval, denoted as ( ϕb,tbs ,tbe )




Our task: to ﬁnd meaningful bursty topics from
the input text stream.
Our method: a topic discovery step and a burst
detection step.

Our Topic Model

1.

2.
3.

4.

Assume:
C (latent) topics in the text stream, where
each topic c has a word distribution ϕc.
A background word distribution ϕB
A single post is most likely to be about a
single topic.
A global topic distribution θt for each time
point t .





Our focus is to ﬁnd popular global events, we
need to separate out these “personal” posts.
A time-independent topic distribution ηu for
each user to capture her long term topical
interests.

Burst Detection


Assume:
A

series of counts( mc1 , mc2 ,..., mcT)
representing the intensity of the topic at different
time points.
 These counts are generated by two Poisson
distributions corresponding to a bursty state and a
normal state.

Burst Detection

σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics.
Finally, a burst is marked by a consecutive
subsequence of bursty states.

Experiments


Data Set
 sampled

2892 users from this dataset and
extracted their tweets between September 1 and
November 30, 2011(91 days in total).
 the ﬁnal dataset with 3,967,927 tweets and
24,280,638 tokens.



Ground Truth Generation
 top-30

bursty topics from each model
 two human judges to judge their quality by
assigning a score of either 0 or 1


Evaluation
set the number of topics C to 80, α to 50/C
and β to 0.01. Each model was run for 500
iterations of Gibbs sampling.

 We

Sample Results and
Discussions

two case studies to demonstrate
the effectiveness of our model


Effectiveness of Temporal Models: Both
TimeLDA and TimeUserLDA tend to group posts
published on the same day into the same topic.

two case studies to demonstrate
the effectiveness of our model




Effectiveness of User Models: it is important to
ﬁlter out users’ “personal” posts in order to ﬁnd
meaningful global events.

Conclusions




A new topic model that considers both the
temporal information of microblog posts and
users’ personal interests.
A Poisson-based state machine to identify
bursty periods from the topics discovered by
our model.

TM-LDA: EFﬁCIENT
ONLINE MODELING OF
THE LATENT TOPIC
TRANSITIONS IN SOCIAL
MEDIA

ABSTRACT




TM-LDA learns the transition parameters
among topics by minimizing the prediction
error on topic distribution in subsequent
postings.
We develop an eﬃcient updating algorithm to
adjust transition parameters, as new
documents stream in.


1.

2.

3.

Challenges：
to model and analyze latent topics in social
textual data;
to adaptively update the models as the
massive social content streams in;
to facilitate temporal-aware applications of
social media

contribution






First, we propose a novel temporally-aware
topic language model, TM-LDA, which
captures the latent topic transitions in
temporally-sequenced documents.
Second, we design an eﬃcient algorithm to
update TM-LDA which enables it to be
performed on large scale data.
Finally, we evaluate TM-LDA against the static
topic modeling method(LDA)

METHODOLOGY


TM-LDA Algorithm
 if

we deﬁne the space of topic distribution as X =
{ x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be
considered as a function f : X → X .
 the prediction error

 TM-LDA is

modeled as a non-linear mapping:



Error Function of TM-LDA：



Iterative Minimization of the Error Function



Direct Minimization of the Error Function

TM-LDA for Twitter Stream


let A = D (1 ;m ) and B = D (2 ;m +1)

UPDATING TRANSITION
PARAMETERS


Updating Transition Parameters with
Sherman-Morrison-Woodbury Formula



Updating Transition Parameters with QRfactorization
 Suppose

the QR-factorization of matrix A is A =
QR , where Q′Q = I and R is an upper triangular
matrix. RT=Q’B

EXPERIMENTS


Dataset



Using Perplexity as Evaluation Metric

Predicting Future Tweets

TM-LDA ﬁrst trains LDA on 7-day historical tweets and
compute the transition parameter matrix accordingly. Then
for each new tweet generated on the 8th day, it predicts
the topic distribution of the following tweet.







Estimated Topic Distributions ofFuture" Tweets : the
topic distribution of the tweet b.
LDA Topic Distributions of Future" Tweets :
the inferred topic distribution of the tweet b .
LDA Topic Distributions ofPrevious" Tweets :
the inferred topic distribution of the tweet a .

Efﬁciency of Updating Transition
Parameters

Properties of Transition
Parameters




T is a square matrix where the size of T is
determined by the number of topics trained in
LDA.
The row sum of T is always 1, which means
that the overall weights emitted from atopic
is 1.

APPLYING TM-LDA FORTREND
ANALYSIS AND SENSEMAKING

Changing Topic Transitions over
Time

Various Topic Transition Patterns by
Cities

CONCLUSIONS




a novel temporally-aware language model,
TM-LDA, for eﬃciently modeling streams of
social text such as a Twitter stream for an
author
an eﬃcient model updating algorithm for TMLDA

Finding bursty topics from microblogs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (14)

Semelhante a Finding bursty topics from microblogs

Semelhante a Finding bursty topics from microblogs (20)

Mais de moresmile

Mais de moresmile (8)

Último

Último (20)

Finding bursty topics from microblogs