TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
1. Twitter, Twinder, Twitcident: Filtering
and Search on Social Web Streams
Data Bridges Workshop, Inria, Paris, April 12th 2012
Fabian Abel, Claudia Hauff, Geert-Jan Houben,
Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands
Delft
University of
Technology
2. 200,000,000
number of tweets published per day
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
3. Pukkelpop 2011
People tweet about everything,
everywhere :-)
3
5. Challenges
1. (Automatic) Filtering: Given a topic (e.g. expressed via
some keywords), how can one automatically identify
those tweets that are relevant to the topic?
2. Search & Browsing: How can one improve search and
browsing capabilities so that users can explore
information in the streams of tweets (that are relevant for
a topic)?
Twinder
Filtering
Search & filtering
Browsing and search
framework
Twitter streams
topic information need
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
6. Search &
Filtering
Browsing
Twitter streams
topic information need
1. Filtering of Twitter streams
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
7. Filtering onTwitter
Query:
www2012
Typical approach:
Keyword-based
matching
Are there further features that can be used as
indicators for estimating the relevance of a tweet
for a topic?
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
8. Syntactical feature: hashtags
Is a tweet more relevant ifitcontains a #hashtag?
Hypothesis: tweets that contain hashtags are more likely
to be relevant than tweets that do not contain hashtags.
#Hashtag
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
9. Syntactical feature: URLs
Is a tweetthatcontains a URL more relevant?
Hypothesis: tweets that contain a URL are more likely to
be relevant than tweets that do not contain a URL.
URL
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
10. Syntactical feature: “mentions”
Is a tweetthatmentions@somebodymore relevant?
Hypothesis: tweets that are formulated as a reply to another
tweet are less likely to be relevant than other tweets.
Reply
@mention
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
11. Syntacticalfeature: length
Does the length of a tweetinfluenceitsrelevancefor a topic?
54 characters (9 words)
vs.
140 characters (20 words)
Hypothesis: the longer a tweet, the more likely it is to be
relevant and interesting.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
12. Overview of features
Topic-sensitive and topic-insensitive features
Topic sensitive Topic insensitive
Keyword-based
Syntactical features
relevance
What about the semantics?
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
13. Semanticfeatures: number of entities
Findsemantics in a tweettoestimate the relevance
dbp:Tim_Berners-Lee dbp:World_Wide_Web
dbp:WWW_Conference dbp:France
dbp:Lyon
Hypothesis: the more entities a tweet mentions, the more
likely it is to be relevant and interesting.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
14. Semanticfeatures: diversity
The types of entitiesthat are featuredby a tweet matter
Place Place Place Place
Person Thing
vs. I plan to visit Paris, Bordeaux, Grenoble, Nice,
Marseille and Lyon.
Event Place
Place Place Place
Hypothesis: the higher the diversity of entities that are
mentioned in a tweet, the more likely it is to be relevant.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
15. Semanticfeatures: sentiment
Opinionsexpressed in tweets are interesting
Looking forward to the WWW
conference :-) Yes! vs. I plan to visit Paris, Bordeaux,
vs.
Grenoble, Nice, Marseille and Lyon.
Why are the big players not releasing
query logs to the WWW community? :-(
#fail
:-) neutral :-(
Hypothesis: the likelihood of a tweet’s relevance is
influenced by its sentiment polarity.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
16. Semanticrelatedness
Exploitsemantics to relate query withtweets
dbp:International_World_Wide_Web_Conference
dbp:Tim_Berners-Lee
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
17. Overview of features
Bynow, we have 4 types of features.
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Context? Context?
What kind of contextual features
might be helpful?
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
18. Contextual feature: authority of the publisher
Itmatterswhopublished a tweet
Hypothesis: the higher the number of tweets that have
been published by the creator of a tweet, the more likely
it is that the tweet is relevant.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
19. Contextual feature: time w.r.t. query
When was a tweetpublished?
Hypothesis: the lower the temporal distance between the
query time and the creation time of a tweet, the more likely
is the tweet relevant to the topic.
Tweet query
March 31 April 16
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
20. Summary of Features
Topic sensitive Topic insensitive
Keyword-based Syntactical
Semantic-based Semantics
Context-based Context
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
21. Results
Achievedfor the TREC MicroblogChallenge
Features Precision Recall F-measure
keyword relevance 0.3040 0.2924 0.2981
without semantics
semantic relevance 0.3363
0.3053 0.4828
0.2931 0.3965
0.2991
all features 0.3674 0.4736 0.4138
Overall, we can achieve the precision and
recall of over 35% and 45% respectively by
applying all the features.
Challenge the future 21
22. Importance of Features
Topic-sensitive Topic-insensitive
2 2
Keyword-based Syntactical
1 1
0 0
Keyword-based relevance hasHashtag hasURL isReply length
-1 -1
2 2
1
Semantic-based 1
Semantics
0 0
Relevance Relatedness #entities diversity sentiment
-1 -1
Semantic relatedness, URLs, !isReply, diversity and
2
Context-based Context
2
sentiment are good indicators for estimating the
1 1
0 0
relevance of a tweet.
-1
Temporal context
Keyword-based relevance
-1
Social context
Keyword-based relevance
Challenge the future 22
23. Search &
Filtering
Browsing
Twitter streams
topic information need
2. Search & Browsing in Twitter Streams
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
24. Idea: Faceted Search
Expand Query: Current Query:
Suggestions: Eindhoven Music
+ Guilty Simpson
+ Area51 Results:
1. Yskiddd: Next saturday
Locations more... @thatsimpsonguy aka Guilty Simpson
will be performing at Area51 in my
Events more... homeytown Eindhoven. #realliveshit
#iwillspinrecords2
Music Artists: 2. Usee123: Cool #EV3door7980 !!!
+ Guilty Simpson http://bit.ly/igyyRhL
+ Bryan Adams
+ Elton John 3. sanmiquelmusic: This Saturday I'm
joining @KrusadersMusic to Intents
+ Golden Earring
more...
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
25. Adaptive Faceted Search
user
Adaptive Faceted Search
How to adapt the
How to represent facet-value pair
the content of a ranking to the
User and Context Modeling
tweet? current demands of
facet extraction the user?
query suggestions
Semantic Enrichment
Twitter posts
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
26. Facet Extraction and Semantic Enrichment
powered by
Julian Assange
@bob: Julian Assange got Tweet-based
arrested enrichment
Julian Assange
Julian Assange
Julian Assange arrested Link-based
London Julian Assange, the founder of
Julian Assange enrichment
WikiLeaks, is under arrest in
WikiLeaks
London…
London
WikiLeaks
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
27. Faceted-search vs. hashtag-based
(keyword) search
Faceted search based on
semantic enrichment of
tweets outperforms
hashtgag-based search
significantly.
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
28. Impact of link-based enrichment
Personalized strategy
outperforms baseline
significantly
Link-based enrichment
improves quality for both
strategies
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
29. Twitcident application
Search &
Filtering
Browsing
Twitter streams
topic information need
Twitcident: Applying filter & search functionality
for distilling information from Twitter during
incidents (e.g. fires, extreme weather situations)
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
34. Could we see it coming?
Popular artist
made a joke Impact
about the weather storm
Term usage 25 minutes before the incident
1. heavy weather, hail balls, lightning, pitch black…
2. drama, panic, hell, serious, extreme…
“ ”
34
37. Summary
Automatic Filtering of Tweets: [#MSM@WWW ’12]
• Topic-sensitive and topic-insensitive features
• Semantic features (semantic relatedness, diversity, sentiment
are beneficial)
Search and browsing: [ISWC ’11]
• Faceted Search
• Personalization & contextualization helps
Application: [Hypertext ‘12, Demo@WWW’12]
• Twitcident: fulfilling information needs during incidents
Future works:
• Weak signal detection based on tweets
• Duplicate detection
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
38. Thank you!
@fabianabel
http://wis.ewi.tudelft.nl/
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38
Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
Traditional Twitter SearchHighlight what does keyword matching means, the keywords, in search query and tweets.
Title -> syntactical featuresBox in the tweetGreen boxes for the hypothesesFlow from keyword-based relevance to … Slide 5-8, flow
Subtitle -> question?
Introduction to the usage of @, including mentions, and reply. Reply tweets frequently occur in private conversations. Therefore particularly, make a hypothesis about reply tweet.
The 21st International World Wide Web Conference #www2012 will take place in Lyon, France April 16-20 2012 @www2012Lyon www2012.wwwconference.orgSubtitle questionOne short, one longcomparison
Fade in the question later.
Fade in the entities one by one.
Fade in the entities one by one.
Fade in the entities one by one.
Not highlight www, lyon, france
18Can we utilize the contextual features.
Titles,
Timeline
Number of features.
ComparisonFade in the pairsHighlightTextbox -> Conclusion, (precision)
Very time consuming and overwhelming indeed!
entity extraction and semantic enrichment and relation discovery.
Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
Case #1: vroegsignalering
Case 1:handhaving (beeldrondom incident)
Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.