Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Twitter, Twinder, Twitcident: Filtering
and Search on Social Web Streams

Data Bridges Workshop, Inria, Paris, April 12th 2012

Fabian Abel, Claudia Hauff, Geert-Jan Houben,
Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands

Delft
University of
Technology

200,000,000
number of tweets published per day

Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2

Pukkelpop 2011

People tweet about everything,
everywhere :-)

3

200,000,000
Pukkelpop 2011
became a tragedy

Filtering

81,000 tweets in four hours

Search &
Browsing
4

Challenges
1. (Automatic) Filtering: Given a topic (e.g. expressed via
some keywords), how can one automatically identify
those tweets that are relevant to the topic?

2. Search & Browsing: How can one improve search and
browsing capabilities so that users can explore
information in the streams of tweets (that are relevant for
a topic)?
Twinder
Filtering
Search & filtering
Browsing and search
framework
Twitter streams
topic information need

Search &
Filtering
Browsing

Twitter streams

1. Filtering of Twitter streams


Filtering onTwitter
Query:
www2012

Typical approach:
Keyword-based
matching

Are there further features that can be used as
indicators for estimating the relevance of a tweet
for a topic?


Syntactical feature: hashtags
Is a tweet more relevant ifitcontains a #hashtag?

Hypothesis: tweets that contain hashtags are more likely
to be relevant than tweets that do not contain hashtags.

＃Hashtag


Syntactical feature: URLs
Is a tweetthatcontains a URL more relevant?

Hypothesis: tweets that contain a URL are more likely to
be relevant than tweets that do not contain a URL.

URL


Syntactical feature: “mentions”
Is a tweetthatmentions@somebodymore relevant?

Hypothesis: tweets that are formulated as a reply to another
tweet are less likely to be relevant than other tweets.

Reply

@mention


Syntacticalfeature: length
Does the length of a tweetinfluenceitsrelevancefor a topic?

54 characters (9 words)

vs.
140 characters (20 words)

Hypothesis: the longer a tweet, the more likely it is to be
relevant and interesting.


Overview of features
Topic-sensitive and topic-insensitive features

Topic sensitive Topic insensitive
Keyword-based
Syntactical features
relevance

What about the semantics?


Semanticfeatures: number of entities
Findsemantics in a tweettoestimate the relevance

dbp:Tim_Berners-Lee dbp:World_Wide_Web

dbp:WWW_Conference dbp:France

dbp:Lyon

Hypothesis: the more entities a tweet mentions, the more
likely it is to be relevant and interesting.


Semanticfeatures: diversity
The types of entitiesthat are featuredby a tweet matter

Place Place Place Place
Person Thing

vs. I plan to visit Paris, Bordeaux, Grenoble, Nice,
Marseille and Lyon.

Event Place
Place Place Place

Hypothesis: the higher the diversity of entities that are
mentioned in a tweet, the more likely it is to be relevant.


Semanticfeatures: sentiment
Opinionsexpressed in tweets are interesting

Looking forward to the WWW
conference :-) Yes! vs. I plan to visit Paris, Bordeaux,
vs.
Grenoble, Nice, Marseille and Lyon.
Why are the big players not releasing
query logs to the WWW community? :-(
#fail

:-) neutral :-(
Hypothesis: the likelihood of a tweet’s relevance is
influenced by its sentiment polarity.


Semanticrelatedness
Exploitsemantics to relate query withtweets

dbp:International_World_Wide_Web_Conference

dbp:Tim_Berners-Lee


Overview of features
Bynow, we have 4 types of features.

Keyword-based Syntactical
Semantic-based Semantics
Context? Context?

What kind of contextual features
might be helpful?

Contextual feature: authority of the publisher
Itmatterswhopublished a tweet

Hypothesis: the higher the number of tweets that have
been published by the creator of a tweet, the more likely
it is that the tweet is relevant.


Contextual feature: time w.r.t. query
When was a tweetpublished?
Hypothesis: the lower the temporal distance between the
query time and the creation time of a tweet, the more likely
is the tweet relevant to the topic.

Tweet query

March 31 April 16


Summary of Features

Semantic-based Semantics
Context-based Context


Results
Achievedfor the TREC MicroblogChallenge

Features Precision Recall F-measure
keyword relevance 0.3040 0.2924 0.2981
without semantics
semantic relevance 0.3363
0.3053 0.4828
0.2931 0.3965
0.2991
all features 0.3674 0.4736 0.4138

Overall, we can achieve the precision and
recall of over 35% and 45% respectively by
applying all the features.

Challenge the future 21

Importance of Features
Topic-sensitive Topic-insensitive
2 2
1 1

0 0
Keyword-based relevance hasHashtag hasURL isReply length
-1 -1

2 2

1
Semantic-based 1
Semantics
0 0
Relevance Relatedness #entities diversity sentiment
-1 -1

Semantic relatedness, URLs, !isReply, diversity and
2
Context-based Context
2

sentiment are good indicators for estimating the
1 1

0 0
relevance of a tweet.
-1
Temporal context
Keyword-based relevance
-1
Social context
Keyword-based relevance

Challenge the future 22

Search &
Filtering
Browsing

Twitter streams

2. Search & Browsing in Twitter Streams


Idea: Faceted Search

Expand Query: Current Query:
Suggestions: Eindhoven Music
+ Guilty Simpson
+ Area51 Results:
1. Yskiddd: Next saturday
Locations more... @thatsimpsonguy aka Guilty Simpson
will be performing at Area51 in my
Events more... homeytown Eindhoven. #realliveshit
#iwillspinrecords2
Music Artists: 2. Usee123: Cool #EV3door7980 !!!
+ Guilty Simpson http://bit.ly/igyyRhL
+ Bryan Adams
+ Elton John 3. sanmiquelmusic: This Saturday I'm
joining @KrusadersMusic to Intents
+ Golden Earring
more...

Adaptive Faceted Search
user

Adaptive Faceted Search
How to adapt the
How to represent facet-value pair
the content of a ranking to the
User and Context Modeling
tweet? current demands of
 facet extraction the user?
 query suggestions
Semantic Enrichment

Twitter posts

Facet Extraction and Semantic Enrichment
powered by

Julian Assange

@bob: Julian Assange got Tweet-based
arrested enrichment

Julian Assange

Julian Assange
Julian Assange arrested Link-based
London Julian Assange, the founder of
Julian Assange enrichment
WikiLeaks, is under arrest in
WikiLeaks
London…
London
WikiLeaks

Faceted-search vs. hashtag-based
(keyword) search
Faceted search based on
semantic enrichment of
tweets outperforms
hashtgag-based search
significantly.


Impact of link-based enrichment
Personalized strategy
outperforms baseline
significantly

Link-based enrichment
improves quality for both
strategies


Twitcident application

Search &
Filtering
Browsing

Twitter streams

Twitcident: Applying filter & search functionality
for distilling information from Twitter during
incidents (e.g. fires, extreme weather situations)

200,000,000
Pukkelpop 2011
became a tragedy

Filtering

81,000 tweets in four hours

Search &
Browsing
30

Search &
Browsing

Automatic
Filtering

Twitcident Pipeline

Faceted Search

Filtered Twitter stream
32

Real-time visualizations
33

Could we see it coming?

Popular artist
made a joke Impact
about the weather storm

Term usage 25 minutes before the incident

1. heavy weather, hail balls, lightning, pitch black…
2. drama, panic, hell, serious, extreme…

“ ”
34

Spotting eye witnesses
35

Real-time information from eyewitness

36

Summary
Automatic Filtering of Tweets: [#MSM@WWW ’12]
• Topic-sensitive and topic-insensitive features
• Semantic features (semantic relatedness, diversity, sentiment
are beneficial)
Search and browsing: [ISWC ’11]
• Faceted Search
• Personalization & contextualization helps
Application: [Hypertext ‘12, Demo@WWW’12]
• Twitcident: fulfilling information needs during incidents
Future works:
• Weak signal detection based on tweets
• Duplicate detection

Thank you!

@fabianabel
http://wis.ewi.tudelft.nl/


Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (13)

Semelhante a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Semelhante a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams (20)

Último

Último (20)

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Notas do Editor