Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
1. Semantics + Filtering + Search = Twitcident
Exploring Information in Social Web Streams
Hypertext 2012, Milwaukee, WI – June 28
Fabian Abel, Claudia Hauff,
Geert-Jan Houben, Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands
Delft
University of
Technology
2. 200,000,000
number of tweets published per day
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 2
3. Pukkelpop 2011
People tweet about everything,
everywhere :-)
3
6. First tweet…
And then your train blasts off full of the
anvils. #Nijmegen #veolia
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 6
7. First picture…
Astonishing! My train rams the platform at
Nijmegen! http://pic.twitter.com/QVVfJHyd
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 7
8. Traditional news media
A train ramed the anvils at Nijmegen.
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 8
9. Research Challenges
1. (Automatic) Filtering: Given an incident, how can one
automatically identify those tweets that are relevant to
the incident?
2. Search & Analytics: How can one improve search and
analytical capabilities so that users can explore
information in the streams of tweets?
Search &
Filtering
Analytics
Twitter streams
topic information need
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 9
10. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 10
11. Twitcident system
! "#$% ( %6% , 8&#*( % ( , - . , ( / % % ( % ( ( $&
$5' . $"7 % + 7 $" $9
, 1%
? ' #)$% #%
+ >*(
! " #$% ' ( ) $&#*( % ( , - . , ( /%0( *% + ( 1%
#& % + . -
! "#$% %( . ' 2 % ' . *$% 4$%0( *% + ( 1%
/. ' ( *( #3. . -
:#;% #*)"%
<( :3;% 2 *( /% )7 % ( 3%$*( #+ %
=7$( <. #2! & :);% #2 )&
>, 5- %
F i gu r e 2: Scr een sh ot of t h e T w i t ci d en t sy st em : ( a) sear ch an d fi l t er i n g fu n ct i on al i t y t o ex p l or e an d r et r i ev e
p ar t i cu l ar T w i t t er m essages, ( b ) m essages t h at ar e r el at ed t o t h e gi ven i nci dent ( h er e: fi r es i n T ex as) an d
m at ch t h e Semanticsy+of t h e u ser+an d ( c) r ealTwitcident t-i cs of t h e m at ch i n g m essages.
gi ven qu er Filtering Search = t i m e an al y Exploring Information in Social Web Streams 11
In t he T wit cident syst em, bot h facet ed search and re- incident is det ect ed t hen t he T wit cident framework t rans-
12. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 12
13. Incident detection
Twitter
2.
P2000 Broadcast
•Twiticident relies on
Initial query:
(Moerdijk OR Chemie-Pack)
Broadcasted incident AND (fire OR smoke OR
Refined query based on
Emergency
(i) description: flame…) SINCE:2011-01-05
incident profiling:
Prio 1 fire : : Vlasweg : 4 4782PW 1. 3. (Moerdijk OR Dordrecht…) AND
Moerdijk :: Chemie Pack (#moerdijkFire OR toxic…)
Broadcasting Services
for detecting incidents.
Twitcident
Framework
4.
• In the Netherlands : P2000
communication network
(ii)
Incident in Twitcident:
Twitcident system
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 13
14. Incident Profiling
•For an incident i:
• The profile of an incident is
described as a set of tuples. Location,
0.4
Netherlands
• Each tuple includes a facet- Incident,
0.5
value pair (f, v) and its Train accident
weight to the incident i. Location,
0.8
Nijmegen
Orgranization,
0.6
Veolia
Incident,
1.0
Crash
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 14
15. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 15
16. Social Media Aggregation
• Collecting Twitter messages, pictures, and
videos from Social Media Platforms e.g. Twitter,
PhotoBucket, Vimeo
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 16
17. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 17
18. Semantic Enrichment
•Named Entity Recognition
•Classification : Casualties, Damages, Risks…
•Linkage : External Resources
•Metadata extraction
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 18
19. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 19
20. Filtering
•Which tweets are relevant to the incidents?
• Preprocessing : Language detection
• Semantic Filtering : Compare tweet with P(i)
• Semantic Filtering with News Context
• P’(i) : P(i) complemented with f-v pairs from news
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 20
21. Search &
Analytics
Automatic
Filtering
Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 21
22. Faceted Search
•Strategies (ranking)
• Frequency-based
• Time-sensitive based
• Personalized
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 22
23. Real-time analytics
What type of things are mentioned in the tweets? Impact Area
What aspects are mentioned over time? What do people report about over
time?
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 23
24. Evaluation - Dataset
•Twitter corpus (TREC Microblog Track 2011
)
• 16 million tweets (Jan. 24th – Feb. 8th, 2011 )
• 4,766,901 tweets classified as English
• 6.2 million entity-extractions
•News (Same time period)
• 62 RSS News Feeds
• 13,959 News Articles
• 357,559 entity-extractions
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 24
26. Evaluation
For tweets Filtering (2/2)
The semantic strategy is more robust and
achieves higher precisions for complex topics.
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 26
28. Evaluation
For Faceted Search (2/2)
! ") +% ! ") ' %
! ") % ! "#&%
! "#( %
! "#+%
! "#%
! ", +% EF +%
! ", % ! "#$% ! "#+% ! "#&% ! "' *% EF ' ! %
! "#) % ! "#' %
! "' +% ! ", +%
! "' % G HH%
! "' ( %
! "! +% ! "' , %
!%
%
%
.%
@%
7
7
2?
;
56.
0.
058
34
D3
>.
2
12
C:
=
0.
B3
</.
/0
.:
0A
-.
89
A3
with semantic enrichment without semantic enrichment
The strategies with semantic enrichment outperform
the strategy without semantic enrichment in
predicting the appropriate facet-values.
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 28
Adaptive Faceted Search on Twitter 3
29. Conclusions
• What we have done:
• Twitcident, a framework for filtering, searching, and
analyzing information about incidents that people publish in
their Social Web Streams
• What we have achieved:
• Better filtering of Twitter messages for a given incident.
• Better search for relevant information about an incident within
the filtered messages.
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 29
30. Thank you!
@wisdelft
http://twitcident.org
Ke Tao
@taubau
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 30
Editor's Notes
there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
people tweet about everything, e.g. when they are at some festival like Pukkelpop(普客pop) they may report about their experiences...
this festival actually became a disaster (5 people died) - 80k tweets where published in the first 4 hours (during the incident, the emergency services had problems in getting an overview on the situation) -> how can one (a) automatically filter information from Twitter and (b) provide search and analytics? (s4)
there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
Research challenges here.
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Search, Filtering, Analytics
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
Search, Filtering, Analytics
Search, Filtering, Analytics
Search, Filtering, Analytics
Search, Filtering, Analytics
Search, Filtering, Analytics
Search, Filtering, AnalyticsWWW 2008Koren et al. Personalized Interactive Faceted Search