SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
FINDING BURSTY TOPICS
FROM MICROBLOGS
Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim
Living Analytics Research Centre
School of Information Systems
Singapore Management University
Abstract



1.

2.

To find topics that have bursty patterns on
microblogs
two observations:
posts published around the same time are
more likely to have the same topic
posts published by the same user are more
likely to have the same topic
Introduction


Retrospective bursty event detection :
Bursty detection: state machine
 Topic discovery: LDA



1.

2.

Two assumptions:
If a post is about a global event, it is likely to
follow a global topic distribution that is timedependent.
If a post is about a personal topic, it is likely
to follow a personal topic distribution that is
more or less stable overtime.
Method


Preliminaries
d i

, u i , t i , w i,j
 a bursty topic b as a word distribution coupled with
a bursty interval, denoted as ( ϕb,tbs ,tbe )




Our task: to find meaningful bursty topics from
the input text stream.
Our method: a topic discovery step and a burst
detection step.
Our Topic Model

1.

2.
3.

4.

Assume:
C (latent) topics in the text stream, where
each topic c has a word distribution ϕc.
A background word distribution ϕB
A single post is most likely to be about a
single topic.
A global topic distribution θt for each time
point t .




Our focus is to find popular global events, we
need to separate out these “personal” posts.
A time-independent topic distribution ηu for
each user to capture her long term topical
interests.
Learning

Learning
Burst Detection


Assume:
A

series of counts( mc1 , mc2 ,..., mcT)
representing the intensity of the topic at different
time points.
 These counts are generated by two Poisson
distributions corresponding to a bursty state and a
normal state.
Burst Detection

σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics.
Finally, a burst is marked by a consecutive
subsequence of bursty states.
Experiments


Data Set
 sampled

2892 users from this dataset and
extracted their tweets between September 1 and
November 30, 2011(91 days in total).
 the final dataset with 3,967,927 tweets and
24,280,638 tokens.


Ground Truth Generation
 top-30

bursty topics from each model
 two human judges to judge their quality by
assigning a score of either 0 or 1


Evaluation
set the number of topics C to 80, α to 50/C
and β to 0.01. Each model was run for 500
iterations of Gibbs sampling.

 We
Sample Results and
Discussions
Sample Results and
Discussions
two case studies to demonstrate
the effectiveness of our model


Effectiveness of Temporal Models: Both
TimeLDA and TimeUserLDA tend to group posts
published on the same day into the same topic.
two case studies to demonstrate
the effectiveness of our model




Effectiveness of User Models: it is important to
filter out users’ “personal” posts in order to find
meaningful global events.
Conclusions




A new topic model that considers both the
temporal information of microblog posts and
users’ personal interests.
A Poisson-based state machine to identify
bursty periods from the topics discovered by
our model.
TM-LDA: EFfiCIENT
ONLINE MODELING OF
THE LATENT TOPIC
TRANSITIONS IN SOCIAL
MEDIA
ABSTRACT




TM-LDA learns the transition parameters
among topics by minimizing the prediction
error on topic distribution in subsequent
postings.
We develop an efficient updating algorithm to
adjust transition parameters, as new
documents stream in.

1.

2.

3.

Challenges:
to model and analyze latent topics in social
textual data;
to adaptively update the models as the
massive social content streams in;
to facilitate temporal-aware applications of
social media
contribution






First, we propose a novel temporally-aware
topic language model, TM-LDA, which
captures the latent topic transitions in
temporally-sequenced documents.
Second, we design an efficient algorithm to
update TM-LDA which enables it to be
performed on large scale data.
Finally, we evaluate TM-LDA against the static
topic modeling method(LDA)
METHODOLOGY


TM-LDA Algorithm
 if

we define the space of topic distribution as X =
{ x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be
considered as a function f : X → X .
 the prediction error

 TM-LDA is

modeled as a non-linear mapping:


Error Function of TM-LDA:


Iterative Minimization of the Error Function


Direct Minimization of the Error Function
TM-LDA for Twitter Stream
TM-LDA for Twitter Stream


let A = D (1 ;m ) and B = D (2 ;m +1)
UPDATING TRANSITION
PARAMETERS


Updating Transition Parameters with
Sherman-Morrison-Woodbury Formula


Updating Transition Parameters with QRfactorization
 Suppose

the QR-factorization of matrix A is A =
QR , where Q′Q = I and R is an upper triangular
matrix. RT=Q’B
EXPERIMENTS


Dataset



Using Perplexity as Evaluation Metric
Predicting Future Tweets

TM-LDA first trains LDA on 7-day historical tweets and
compute the transition parameter matrix accordingly. Then
for each new tweet generated on the 8th day, it predicts
the topic distribution of the following tweet.






Estimated Topic Distributions ofFuture" Tweets : the
topic distribution of the tweet b.
LDA Topic Distributions of Future" Tweets :
the inferred topic distribution of the tweet b .
LDA Topic Distributions ofPrevious" Tweets :
the inferred topic distribution of the tweet a .
Efficiency of Updating Transition
Parameters
Properties of Transition
Parameters




T is a square matrix where the size of T is
determined by the number of topics trained in
LDA.
The row sum of T is always 1, which means
that the overall weights emitted from atopic
is 1.
APPLYING TM-LDA FORTREND
ANALYSIS AND SENSEMAKING
Changing Topic Transitions over
Time
Various Topic Transition Patterns by
Cities
CONCLUSIONS




a novel temporally-aware language model,
TM-LDA, for efficiently modeling streams of
social text such as a Twitter stream for an
author
an efficient model updating algorithm for TMLDA

Mais conteúdo relacionado

Mais procurados

Practical Knowledge Representation
Practical Knowledge RepresentationPractical Knowledge Representation
Practical Knowledge Representationbutest
 
Marketing analysis
Marketing analysisMarketing analysis
Marketing analysisGaurav Dubey
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
S1240068 presentation
S1240068 presentationS1240068 presentation
S1240068 presentationKuriharaYuta1
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingTakuma Wakamori
 
Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...Eswar Publications
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
Elgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingElgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingijwmn
 
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer SystemsA Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systemsijp2p
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
 

Mais procurados (14)

Practical Knowledge Representation
Practical Knowledge RepresentationPractical Knowledge Representation
Practical Knowledge Representation
 
Marketing analysis
Marketing analysisMarketing analysis
Marketing analysis
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
S1240068 presentation
S1240068 presentationS1240068 presentation
S1240068 presentation
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 
7 ijcse-01229
7 ijcse-012297 ijcse-01229
7 ijcse-01229
 
Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...
 
K means
K meansK means
K means
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Clique
Clique Clique
Clique
 
Elgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingElgamal signature for content distribution with network coding
Elgamal signature for content distribution with network coding
 
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer SystemsA Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
A Cooperative Peer Clustering Scheme for Unstructured Peer-to-Peer Systems
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodes
 
Av33274282
Av33274282Av33274282
Av33274282
 

Semelhante a Finding bursty topics from microblogs

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonPeadar Coyle
 
ICDE2013勉強会 Session 19: Social Media II
ICDE2013勉強会 Session 19: Social Media IIICDE2013勉強会 Session 19: Social Media II
ICDE2013勉強会 Session 19: Social Media IIMitsuo Yamamoto
 
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...Eugene Nho
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptxssuser1fb3df
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetHoang Nguyen Phong
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningThi K. Tran-Nguyen, PhD
 

Semelhante a Finding bursty topics from microblogs (20)

Tweet Cloud
Tweet CloudTweet Cloud
Tweet Cloud
 
paper_148.pptx
paper_148.pptxpaper_148.pptx
paper_148.pptx
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in Python
 
AI Final report 1.pdf
AI Final report 1.pdfAI Final report 1.pdf
AI Final report 1.pdf
 
ICDE2013勉強会 Session 19: Social Media II
ICDE2013勉強会 Session 19: Social Media IIICDE2013勉強会 Session 19: Social Media II
ICDE2013勉強会 Session 19: Social Media II
 
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksDetection of Embryonic Research Topics by Analysing Semantic Topic Networks
Detection of Embryonic Research Topics by Analysing Semantic Topic Networks
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 

Mais de moresmile

When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enoughmoresmile
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twittermoresmile
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networksmoresmile
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switchmoresmile
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogsmoresmile
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthmoresmile
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweetsmoresmile
 

Mais de moresmile (8)

When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enough
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networks
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogs
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Finding bursty topics from microblogs

  • 1. FINDING BURSTY TOPICS FROM MICROBLOGS Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim Living Analytics Research Centre School of Information Systems Singapore Management University
  • 2. Abstract   1. 2. To find topics that have bursty patterns on microblogs two observations: posts published around the same time are more likely to have the same topic posts published by the same user are more likely to have the same topic
  • 3. Introduction  Retrospective bursty event detection : Bursty detection: state machine  Topic discovery: LDA   1. 2. Two assumptions: If a post is about a global event, it is likely to follow a global topic distribution that is timedependent. If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable overtime.
  • 4. Method  Preliminaries d i , u i , t i , w i,j  a bursty topic b as a word distribution coupled with a bursty interval, denoted as ( ϕb,tbs ,tbe )   Our task: to find meaningful bursty topics from the input text stream. Our method: a topic discovery step and a burst detection step.
  • 5. Our Topic Model  1. 2. 3. 4. Assume: C (latent) topics in the text stream, where each topic c has a word distribution ϕc. A background word distribution ϕB A single post is most likely to be about a single topic. A global topic distribution θt for each time point t .
  • 6.   Our focus is to find popular global events, we need to separate out these “personal” posts. A time-independent topic distribution ηu for each user to capture her long term topical interests.
  • 7.
  • 8.
  • 11. Burst Detection  Assume: A series of counts( mc1 , mc2 ,..., mcT) representing the intensity of the topic at different time points.  These counts are generated by two Poisson distributions corresponding to a bursty state and a normal state.
  • 12. Burst Detection σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics. Finally, a burst is marked by a consecutive subsequence of bursty states.
  • 13. Experiments  Data Set  sampled 2892 users from this dataset and extracted their tweets between September 1 and November 30, 2011(91 days in total).  the final dataset with 3,967,927 tweets and 24,280,638 tokens.
  • 14.  Ground Truth Generation  top-30 bursty topics from each model  two human judges to judge their quality by assigning a score of either 0 or 1  Evaluation set the number of topics C to 80, α to 50/C and β to 0.01. Each model was run for 500 iterations of Gibbs sampling.  We
  • 15.
  • 18. two case studies to demonstrate the effectiveness of our model  Effectiveness of Temporal Models: Both TimeLDA and TimeUserLDA tend to group posts published on the same day into the same topic.
  • 19. two case studies to demonstrate the effectiveness of our model   Effectiveness of User Models: it is important to filter out users’ “personal” posts in order to find meaningful global events.
  • 20. Conclusions   A new topic model that considers both the temporal information of microblog posts and users’ personal interests. A Poisson-based state machine to identify bursty periods from the topics discovered by our model.
  • 21. TM-LDA: EFfiCIENT ONLINE MODELING OF THE LATENT TOPIC TRANSITIONS IN SOCIAL MEDIA
  • 22. ABSTRACT   TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings. We develop an efficient updating algorithm to adjust transition parameters, as new documents stream in.
  • 23.  1. 2. 3. Challenges: to model and analyze latent topics in social textual data; to adaptively update the models as the massive social content streams in; to facilitate temporal-aware applications of social media
  • 24. contribution    First, we propose a novel temporally-aware topic language model, TM-LDA, which captures the latent topic transitions in temporally-sequenced documents. Second, we design an efficient algorithm to update TM-LDA which enables it to be performed on large scale data. Finally, we evaluate TM-LDA against the static topic modeling method(LDA)
  • 25. METHODOLOGY  TM-LDA Algorithm  if we define the space of topic distribution as X = { x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be considered as a function f : X → X .  the prediction error  TM-LDA is modeled as a non-linear mapping:
  • 27.  Iterative Minimization of the Error Function
  • 28.  Direct Minimization of the Error Function
  • 29.
  • 31. TM-LDA for Twitter Stream  let A = D (1 ;m ) and B = D (2 ;m +1)
  • 32. UPDATING TRANSITION PARAMETERS  Updating Transition Parameters with Sherman-Morrison-Woodbury Formula
  • 33.  Updating Transition Parameters with QRfactorization  Suppose the QR-factorization of matrix A is A = QR , where Q′Q = I and R is an upper triangular matrix. RT=Q’B
  • 35. Predicting Future Tweets TM-LDA first trains LDA on 7-day historical tweets and compute the transition parameter matrix accordingly. Then for each new tweet generated on the 8th day, it predicts the topic distribution of the following tweet.
  • 36.    Estimated Topic Distributions ofFuture" Tweets : the topic distribution of the tweet b. LDA Topic Distributions of Future" Tweets : the inferred topic distribution of the tweet b . LDA Topic Distributions ofPrevious" Tweets : the inferred topic distribution of the tweet a .
  • 37. Efficiency of Updating Transition Parameters
  • 38. Properties of Transition Parameters   T is a square matrix where the size of T is determined by the number of topics trained in LDA. The row sum of T is always 1, which means that the overall weights emitted from atopic is 1.
  • 40.
  • 42. Various Topic Transition Patterns by Cities
  • 43. CONCLUSIONS   a novel temporally-aware language model, TM-LDA, for efficiently modeling streams of social text such as a Twitter stream for an author an efficient model updating algorithm for TMLDA