This document presents an adaptive faceted search framework for Twitter. It describes challenges in searching tweets due to their unstructured nature and proposes using semantic enrichment and facets to structure tweet representations. An evaluation shows faceted search outperforms hashtag-based search and different strategies like personalization and time-sensitivity improve search quality. The framework is applied to Twitcident, a crisis management system demonstrating benefits of semantic enrichment and faceted search for analyzing tweets.
Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
1. Leveraging the Semantics of Tweets for
Adaptive Faceted Search on Twitter
ISWC, Bonn, Germany, Oct 27th 2011
Fabian Abel1, Ilknur Celik 1, Geert-Jan Houben, Patrick Siehndel2
1Web Information Systems, TU Delft, the Netherlands
2L3S Research Center, Hannover, Germany
Delft
University of
Technology
2. What we do: Science and Engineering
for the Personal Web
domains: news social mediacultural heritage public datae-learning
Personalized Personalized
Adaptive Systems
Recommendations Search
Analysis and
User Modeling
Semantic Enrichment,
Linkage and Alignment
user/usage data
Social Web
Adaptive Faceted Search on Twitter 2
11. Music Artist
Page 60!!
Next Saturday @thatsimpsonguyaka Guilty Simpson will be performing at
tweet I was
Area51 in my hometwonEindhoven. #realliveshit #iwillspinrecords
looking hours ago via Blackberry
about 9 for
Locations
Adaptive Faceted Search on Twitter 11
12. Is there an easier way?
Faceted Search can help(hypothesis)
Expand Query: Current Query:
Locations more... Eindhoven Music
Events more... Results:
1. Yskiddd: Next
Music Artists: saturday@thatsimpsonguy aka Guilty
Simpson will be performing at Area51
+ Guilty Simpson
in my homeytown Eindhoven.
+ Bryan Adams #realliveshit#iwillspinrecords2
+ Elton John
+ Golden Earring 2. Usee123: Cool #EV3door7980 !!!
+ Rihanna http://bit.ly/igyyRhL
+ The eagles 3. sanmiquelmusic: This Saturday I'm
+ 3 Doors Down joining @KrusadersMusic to Intents
more...
Adaptive Faceted Search on Twitter 12
13. Challenges
Adaptive Faceted Search on Twitter 13
14. Facets of a Tweet
@bob: JulianAssange got
arrested
Facet type Facet Value
Creator @bob
Location Delft, the Netherlands
Creation time Nov 29 th 2011
Challenge 1: How to infer facets that
describe the content of a tweet?
Adaptive Faceted Search on Twitter 14
15. Faceted Search:
selecting facet-value pairs
Expand Query: Current Query:
Locations Music
+ Aachen
+ Aalborg Number of selectable
Results:
+ Aalesund facet values may be
1. Yskiddd: Next
+ Aarhus saturday@thatsimpsonguy aka Guilty
+ Aasiaat very high!
Simpson will be performing at Area51
+ Abaiang in my homeytown Eindhoven.
+ Abakan #realliveshit#iwillspinrecords2
more... 2. Usee123: Cool #EV3door7980 !!!
http://bit.ly/igyyRhL
Challenge more...
Events 2: How to adapt the faceted search
3. sanmiquelmusic: This Saturday I'm
interface to the current demands ofIntents
joining @KrusadersMusic to a user?
Music Artists more…
Adaptive Faceted Search on Twitter 15
17. Adaptive Faceted Search Framework
user
Adaptive Faceted Search
How to represent How to adapt the
the content of a facet-value pair
User and Context Modeling ranking to the
tweet?
facet extraction current demands
of the user?
Semantic Enrichment
Twitter posts
Adaptive Faceted Search on Twitter 17
18. Facet Extraction and Semantic Enrichment
powered by
Julian Assange
@bob: Julian Assange got
JulianAssange Tweet-based
arrested enrichment
Julian Assange
Julian Assange
JulianAssangearrested Link-based
London JulianAssange, the founder of
Julian Assange enrichment
WikiLeaks, is under arrest in
WikiLeaks
London…
London
WikiLeaks
Adaptive Faceted Search on Twitter 18
19. Impact of Link-based enrichment
Representation of
tweets:
significantly more
facets per tweet
with link-based
enrichment
Adaptive Faceted Search on Twitter 19
20. Faceted Search Strategies
• Challenge: most-relevant facet-value pair should appear at the
top of the ranking Locations Locations
1. Aachen 1. Eindhoven
2. Aalborg 2. Delft
3. Aalesund 3. Amsterdam
• Baseline: hashtag-based 4. Aarhus
…
4.
5.
Rotterdam
London
keyword search 2145. Eindhoven …
Adaptive Faceted Search on Twitter 20
21. Faceted Search Strategies
number of tweets that
• Challenge:facet-value pair facet-value pair should appear at the
most-relevant contain the FVP
top of the ranking Locations Locations
1. Aachen 1. Eindhoven
2. Aalborg 2. Delft
3. Aalesund 3. Amsterdam
• Baseline: hashtag-based 4. Aarhus
…
4.
5.
Rotterdam
London
keyword search of matching tweets
current hit list 2145. Eindhoven …
• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
Adaptive Faceted Search on Twitter 21
22. Faceted Search Strategiesstratey Profile
Personalized FVP ranking
weight in user profile
User
FVP weight
= number of tweets that
(location, Delft) 6
• Challenge:facet-value pair facet-value pair shouldJazzBaltica) at the
most-relevant the FVP
rank of (event, appear
contain the FVP 4
top of the ranking Locations (person, ChetBaker) 3
Locations
1. Aachen 1. Eindhoven
2. Aalborg 2. Delft
3. Aalesund 3. Amsterdam
• Baseline: hashtag-based 4. Aarhus 4. Rotterdam
current hit June 27 matching tweets time
… 5. London
user keyword search of
list 2145. Eindhoven
July 4 …
• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user
modeling strategies possible; here: entire tweeting history of the user)
Adaptive Faceted Search on Twitter 22
23. Faceted Search Strategiesstratey Profile
Personalized FVP ranking
Genre weight in user profile
User
Genre
FVP weight
+ Blues = number of tweets that
+ Blues
(location, Delft) 6
• Challenge:facet-value pair facet-value pair shouldJazzBaltica) at the
+ Jazz most-relevant the FVP
rank of
+ Jazz appear
contain the FVP 4
(event,
+ JazzMusic + Rock
top of the ranking
+ Rock Locations (person, ChetBaker) 3
+ Classic Locations
1. Aachen 1. Eindhoven
more... 2. Aalborg more... 2. Delft
3. Aalesund 3. Amsterdam
• Baseline: hashtag-based 4. Aarhus 4. Rotterdam
current hit June 27 minimize overlaps time
… 5. London
user keyword search of matching tweets
list 2145. Eindhoven
July 4 …
• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user
modeling strategies possible; here: entire tweeting history of the user)
3. Diversification: increase variety among the top-ranked FVPs
Adaptive Faceted Search on Twitter 23
24. Faceted Search Strategiesstratey Profile
Personalized FVP ranking
weight in user profile
Genre (event,FrenchOpen)
User
search Genre
FVP weight
+ Blues = number of tweets that
+ Blues 6
occurrence
(location, Delft)
frequency
• Challenge:facet-value pair facet-value pair shouldJazzBaltica) at the
+ Jazz most-relevant the FVP Event
+ Jazz appear
rank of JazzBaltica) contain the + JazzBaltica
FVP 4
of FVP
(event, (event,
+ JazzMusic + Rock
top of the ranking
+ Rock Locations (person, ChetBaker) 3
+ FrenchOpen
+ Classic Locations
1. Aachen 1. Eindhoven
more... 2. Aalborg more... more...
2. Delft
3. Aalesund 3. Amsterdam
• Baseline: hashtag-based 4. Aarhus 4. Rotterdam
current hit June 27list27 minimize overlaps time
currentlist of matching tweets time
… 5. London
user June keyword search of matching tweets
20
hit June 2145. Eindhoven
July 4 July 4 …
• Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP (baseline)
2. Personalization: adapt ranking to user profile ( different user
modeling strategies possible; here: entire tweeting history of the user)
3. Diversification: increase variety among the top-ranked FVPs
4. Time-sensitivity:adapt FVP ranking to temporal context
• Semantic enrichment: (i) tweet-based and (ii) link-based enrichment
Adaptive Faceted Search on Twitter 24
25. Research Questions
1. How well does faceted search that is supported by the
semantic enrichment perform in comparison to
keyword search?
2. What strategy performs best in ranking facet-value
pairs that allow users to find relevant tweets on Twitter?
3. How do the different building blocks of the faceted
search framework influence the performance?
Adaptive Faceted Search on Twitter 25
26. Dataset
more than:
20,000 Twitter users
4 months
30,000,000 tweets
Egyptian revolution
Jan 25
Nov 15 Dec 15 Jan 15 Feb 15 time
Adaptive Faceted Search on Twitter 26
27. Evaluation Framework
• User Simulation Model [cf. Koren et al., WWW’08]:
• Input: search settings = { (user who searches, relevant target tweet) }
• Drill down search result list until no more FVPscan be applied or less than
10 tweets match the query
• Simulating click behavior: first-matching FVP is selected ( user knows
target resource)
• Ground truth relevant target tweet = tweet that has been
re-tweeted by the user
• Metrics:
• Succes@k: probability that relevant FVP appears in the top k (the higher
the Succes@k, the faster the search and fewer the user effort)
• MRR: mean reciprocal rank of the target tweet when the user selected it
Adaptive Faceted Search on Twitter 27
28. Faceted-search vs. hashtag-based
(keyword) search
Faceted search based on
semantic enrichment of
tweets outperforms
hashtgag-based search
significantly.
Adaptive Faceted Search on Twitter 28
29. Personalized strategy
Results: Overview achieves ~12% better
performance than other
semantic strategies (and 2 x
better than hashtag-based)
Adaptive Faceted Search on Twitter 29
30. Impact of link-based enrichment
Personalized strategy
outperforms baseline
significantly
Link-based enrichment
improves quality for both
strategies
Adaptive Faceted Search on Twitter 30
31. Impact of time-sensitivity
Time-sensitivity based
ranking improves quality
for both frequency and
diversification strategies
Adaptive Faceted Search on Twitter 31
32. Application of the Faceted Search
Framework
Adaptive Faceted Search on Twitter 32
34. Conclusions
What we did:
• Adaptive Faceted Search on Twitter + Evaluation Framework
• Analysis and Evaluation (+ Application in Twitcident)
Findings:
1. Semantic Enrichment allows for structured representation of the
content of tweets basis for faceted search
2. Faceted search performs significantly better than hashtag-based
keyword search
3. Different building blocks for making faceted search on Twitter
adaptive improve the search quality:
a) Link-based enrichment: more discoverable tweets, better search performance
b) Personalization leads to significant improvements
c) Time-sensitivity improves performance as well
Adaptive Faceted Search on Twitter 34
entity extraction and semantic enrichment and relation discovery.
Might be better to remove the Costs column...?
Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.