SlideShare a Scribd company logo
1 of 30
Semantics + Filtering + Search = Twitcident
Exploring Information in Social Web Streams

Hypertext 2012, Milwaukee, WI – June 28



                                          Fabian Abel, Claudia Hauff,
                        Geert-Jan Houben, Richard Stronkman, Ke Tao
                              Web Information Systems, TU Delft, the Netherlands

        Delft
        University of
        Technology
200,000,000
    number of tweets published per day



 Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   2
Pukkelpop 2011




                 People tweet about everything,
                                 everywhere :-)




                                              3
200,000,000
Pukkelpop 2011
became a tragedy

                            Filtering




 Useful tweets?    81,000 tweets in four hours



                            Search &
                            Analytics
                                             4
Case Nijmegen
Train accident


                 5
First tweet…




         And then your train blasts off full of the
               anvils. #Nijmegen #veolia




    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   6
First picture…
                       Astonishing! My train rams the platform at
                      Nijmegen! http://pic.twitter.com/QVVfJHyd




     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   7
Traditional news media

  A train ramed the anvils at Nijmegen.




    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   8
Research Challenges
  1. (Automatic) Filtering: Given an incident, how can one
     automatically identify those tweets that are relevant to
     the incident?

  2. Search & Analytics: How can one improve search and
     analytical capabilities so that users can explore
     information in the streams of tweets?


                                                                  Search &
                                        Filtering
                                                                  Analytics


Twitter streams
                                           topic            information need
       Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   9
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   10
Twitcident system
                                                                                                                                   ! "#$% ( %6% , 8&#*( % ( , - . , ( / % % ( % ( ( $&
                                                                                                                                        $5' . $"7 % +                   7 $" $9
                                                                                                                                                                         ,            1%




                                                                                                                   ? ' #)$% #%
                                                                                                                    +     >*(




                                                                           ! " #$% ' ( ) $&#*( % ( , - . , ( /%0( *% + ( 1%
                                                                                 #&        % +                .    -



                                                                                                                     ! "#$% %( . ' 2 % ' . *$% 4$%0( *% + ( 1%
                                                                                                                          /. '     ( *(      #3. .    -




                 :#;% #*)"%
                    <(                     :3;% 2 *( /% )7 % ( 3%$*( #+ %
                                              =7$(    <. #2!    &                                                             :);% #2 )&
                                                                                                                                 >, 5- %
F i gu r e 2: Scr een sh ot of t h e T w i t ci d en t sy st em : ( a) sear ch an d fi l t er i n g fu n ct i on al i t y t o ex p l or e an d r et r i ev e
p ar t i cu l ar T w i t t er m essages, ( b ) m essages t h at ar e r el at ed t o t h e gi ven i nci dent ( h er e: fi r es i n T ex as) an d
m at ch t h e Semanticsy+of t h e u ser+an d ( c) r ealTwitcident t-i cs of t h e m at ch i n g m essages.
                 gi ven qu er      Filtering Search = t i m e an al y Exploring Information in Social Web Streams                                                                          11
  In t he T wit cident syst em, bot h facet ed search and re-                       incident is det ect ed t hen t he T wit cident framework t rans-
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   12
Incident detection
                                                                                                                             Twitter

                                                                                                                    2.
                                                      P2000 Broadcast

•Twiticident relies on
                                                                                                                    Initial query:
                                                                                                                    (Moerdijk OR Chemie-Pack)
                                                          Broadcasted incident                                      AND (fire OR smoke OR
                                                                                                                Refined query based on



 Emergency
                                                    (i)   description:                                              flame…) SINCE:2011-01-05
                                                                                                                incident profiling:
                                                          Prio 1 fire : : Vlasweg : 4 4782PW   1.       3.      (Moerdijk OR Dordrecht…) AND
                                                          Moerdijk :: Chemie Pack                               (#moerdijkFire OR toxic…)



 Broadcasting Services
 for detecting incidents.
                                                                                                    Twitcident
                                                                                                    Framework


                                                                                                       4.
  • In the Netherlands : P2000
    communication network

                                                                          (ii)

                                                                                      Incident in Twitcident:




                                                                                                    Twitcident system


    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams                                                   13
Incident Profiling
•For an incident i:
  • The profile of an incident is
    described as a set of tuples.                       Location,
                                                                                   0.4
                                                       Netherlands
  • Each tuple includes a facet-                         Incident,
                                                                                   0.5
    value pair (f, v) and its                         Train accident
    weight to the incident i.                            Location,
                                                                                   0.8
                                                        Nijmegen
                                                       Orgranization,
                                                                                   0.6
                                                          Veolia
                                                         Incident,
                                                                                   1.0
                                                          Crash

    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   14
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   15
Social Media Aggregation
• Collecting Twitter messages, pictures, and
  videos from Social Media Platforms e.g. Twitter,
  PhotoBucket, Vimeo




     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   16
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   17
Semantic Enrichment
•Named Entity Recognition

•Classification : Casualties, Damages, Risks…

•Linkage : External Resources

•Metadata extraction


    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   18
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   19
Filtering
•Which tweets are relevant to the incidents?

  • Preprocessing : Language detection

  • Semantic Filtering : Compare tweet with P(i)

  • Semantic Filtering with News Context
    • P’(i) : P(i) complemented with f-v pairs from news


    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   20
Search &
                                                                 Analytics



                                                                 Automatic
                                                                  Filtering




                                                    Twitcident Pipeline
Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   21
Faceted Search
•Strategies (ranking)

  • Frequency-based

  • Time-sensitive based

  • Personalized



    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   22
Real-time analytics
    What type of things are mentioned in the tweets?   Impact Area




  What aspects are mentioned over time?                          What do people report about over
                                                                                           time?




    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams       23
Evaluation - Dataset

•Twitter corpus (TREC Microblog Track 2011
 )
  • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )
  • 4,766,901 tweets classified as English
  • 6.2 million entity-extractions

•News (Same time period)
  • 62 RSS News Feeds
  • 13,959 News Articles
  • 357,559 entity-extractions

     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   24
Evaluation
For tweets Filtering (1/2)
         ! "( %                ! "' &%
         ! "' %                                         ! ") #%
         ! ") %                                                                ! "+$%   G HI %
         ! "+%       ! "$' %
                                              ! "#( %                                   I J &! %
         ! "$%
                                                                     ! "&) %            I J $! %
         ! "#%             ! "$*%
         ! "&%    ! "#$%                            ! "#) %                             K- 2/5%
                                                                                              5
                                          ! "&' %                 ! "&&% ! "&#%
            !%
                    , - . /012%              , - . /012%      B/<- 50- C
                                                                    4 %
                     346- 74 %
                        5 08%              346- 74 9 4 % D- E9 >7F% 5 74
                                             5 08% 6:              346- 08%
                                                ; - 9 <%
                                         =>06- ?6@ 4 /1>0%
                                                    /5A

Semantic strategies outperform the keyword-
based filtering regarding all metrics.
    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams      25
Evaluation
For tweets Filtering (2/2)




The semantic strategy is more robust and
achieves higher precisions for complex topics.
    Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   26
Evaluation
 For Faceted Search (1/2)
                                           +%




                                &#$- %
                                                         ! "#$%




                      +01#*2"1% %
                               (1"3
                                         ! "*%                            ! "&' %




               ! "#$% () *+' #,%
                                                                                            ! "' #%

                                         ! "' %

                .! &&/% %
                     &"'
                                         ! ") %

                                         ! "( %

                                           !%
                                                  ,-. / 0. 1234567. 8% 67: 9 4
                                                                      :     6; 567. 8% : 67: 9 4
                                                                                              6; 567. 8%
                                                   ,62. 9 8% 6-2: % ,62. 9. 8% 6-2: %<. 3= >-8% 6-2: %
                                                          . 7.                 7.                7.
                                    with semantic enrichment          without semantic enrichment


The semantic faceted search strategy improves
the search performance by 34.8% and 22.4%.
     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams             27
Evaluation
For Faceted Search (2/2)
         ! ") +%                    ! ") ' %
           ! ") %     ! "#&%
                                                       ! "#( %
         ! "#+%
           ! "#%
         ! ", +%                                                                        EF +%
           ! ", % ! "#$% ! "#+% ! "#&%                                  ! "' *%         EF ' ! %
                                          ! "#) % ! "#' %
         ! "' +%                                           ! ", +%
           ! "' %                                                                       G HH%
                                                                   ! "' ( %
         ! "! +%                                                             ! "' , %
              !%
                        %




                                                                            %
                                        .%




                                                          @%
                        7




                                                                             7
                                                        2?
                                        ;
                    56.




                                                                          0.
                                    058
                   34




                                                                        D3
                                                    >.
                                    2
                 12




                                                                     C:
                                                    =
                                 0.




                                                                   B3
                                                </.
               /0




                               .:




                                                                 0A
            -.



                            89




                                                               A3
          with semantic enrichment                      without semantic enrichment

The strategies with semantic enrichment outperform
the strategy without semantic enrichment in
predicting the appropriate facet-values.
     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams     28
                                  Adaptive Faceted Search on Twitter                               3
Conclusions

• What we have done:
  • Twitcident, a framework for filtering, searching, and
   analyzing information about incidents that people publish in
   their Social Web Streams


• What we have achieved:
  • Better filtering of Twitter messages for a given incident.
  • Better search for relevant information about an incident within
   the filtered messages.


     Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   29
Thank you!
                            @wisdelft
                      http://twitcident.org

                    Ke Tao
                    @taubau


Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams   30

More Related Content

Similar to Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Buildingplatforms
BuildingplatformsBuildingplatforms
Buildingplatformscodebits
 
Create Values from Big Data & Social Media Mash-Up
Create Values from Big Data & Social Media Mash-UpCreate Values from Big Data & Social Media Mash-Up
Create Values from Big Data & Social Media Mash-UpPam Didner
 
Factual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 PresentationFactual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 PresentationFactualTeam
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Johan Blomme
 
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKDETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKIRJET Journal
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itGeorge Tziralis
 
Developing a Social Intelligence Program - FSMU
Developing a Social Intelligence Program - FSMUDeveloping a Social Intelligence Program - FSMU
Developing a Social Intelligence Program - FSMUW2O Group
 
5 Steps to Tactical Social Selling
5 Steps to Tactical Social Selling5 Steps to Tactical Social Selling
5 Steps to Tactical Social SellingInsideView
 
IASSIST 2011 presentation: Problems with our Data Citation Solution
IASSIST 2011 presentation:  Problems with our Data Citation SolutionIASSIST 2011 presentation:  Problems with our Data Citation Solution
IASSIST 2011 presentation: Problems with our Data Citation SolutionHeather Piwowar
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Symantec Intelligence Report August 2011
Symantec Intelligence Report August 2011Symantec Intelligence Report August 2011
Symantec Intelligence Report August 2011Symantec
 
ランチタイム共有サービス 昼会 @ appengine ja night 18
ランチタイム共有サービス 昼会 @ appengine ja night 18ランチタイム共有サービス 昼会 @ appengine ja night 18
ランチタイム共有サービス 昼会 @ appengine ja night 18Mitsuhiro Setoguchi
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
 
Your government is Mashed UP!
Your government is Mashed UP!Your government is Mashed UP!
Your government is Mashed UP!Wynn Netherland
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays
 
Ask the Pros: How to Manage Social Media at Your Nonprofit
Ask the Pros: How to Manage Social Media at Your NonprofitAsk the Pros: How to Manage Social Media at Your Nonprofit
Ask the Pros: How to Manage Social Media at Your NonprofitBig Duck
 
Social Media - Martin Schilder Groep
Social Media - Martin Schilder GroepSocial Media - Martin Schilder Groep
Social Media - Martin Schilder GroepJeroen van der Schenk
 
Analysis Report of Greek Blogosphere By MineKnowledge
Analysis Report of Greek Blogosphere By MineKnowledgeAnalysis Report of Greek Blogosphere By MineKnowledge
Analysis Report of Greek Blogosphere By MineKnowledgemineknowledge
 
Martin Schilder - Social Media Presentatie
Martin Schilder - Social Media PresentatieMartin Schilder - Social Media Presentatie
Martin Schilder - Social Media PresentatieMartin Schilder Groep
 

Similar to Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams (20)

Buildingplatforms
BuildingplatformsBuildingplatforms
Buildingplatforms
 
Create Values from Big Data & Social Media Mash-Up
Create Values from Big Data & Social Media Mash-UpCreate Values from Big Data & Social Media Mash-Up
Create Values from Big Data & Social Media Mash-Up
 
Factual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 PresentationFactual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 Presentation
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
 
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORKDETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
DETECTION OF MALICIOUS SOCIAL BOTS USING ML TECHNIQUE IN TWITTER NETWORK
 
Analysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.itAnalysis Report of Greek Blogosphere by DataMine.it
Analysis Report of Greek Blogosphere by DataMine.it
 
Developing a Social Intelligence Program - FSMU
Developing a Social Intelligence Program - FSMUDeveloping a Social Intelligence Program - FSMU
Developing a Social Intelligence Program - FSMU
 
5 Steps to Tactical Social Selling
5 Steps to Tactical Social Selling5 Steps to Tactical Social Selling
5 Steps to Tactical Social Selling
 
IASSIST 2011 presentation: Problems with our Data Citation Solution
IASSIST 2011 presentation:  Problems with our Data Citation SolutionIASSIST 2011 presentation:  Problems with our Data Citation Solution
IASSIST 2011 presentation: Problems with our Data Citation Solution
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Symantec Intelligence Report August 2011
Symantec Intelligence Report August 2011Symantec Intelligence Report August 2011
Symantec Intelligence Report August 2011
 
ランチタイム共有サービス 昼会 @ appengine ja night 18
ランチタイム共有サービス 昼会 @ appengine ja night 18ランチタイム共有サービス 昼会 @ appengine ja night 18
ランチタイム共有サービス 昼会 @ appengine ja night 18
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
 
Your government is Mashed UP!
Your government is Mashed UP!Your government is Mashed UP!
Your government is Mashed UP!
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
 
Ignite pitch
Ignite pitchIgnite pitch
Ignite pitch
 
Ask the Pros: How to Manage Social Media at Your Nonprofit
Ask the Pros: How to Manage Social Media at Your NonprofitAsk the Pros: How to Manage Social Media at Your Nonprofit
Ask the Pros: How to Manage Social Media at Your Nonprofit
 
Social Media - Martin Schilder Groep
Social Media - Martin Schilder GroepSocial Media - Martin Schilder Groep
Social Media - Martin Schilder Groep
 
Analysis Report of Greek Blogosphere By MineKnowledge
Analysis Report of Greek Blogosphere By MineKnowledgeAnalysis Report of Greek Blogosphere By MineKnowledge
Analysis Report of Greek Blogosphere By MineKnowledge
 
Martin Schilder - Social Media Presentatie
Martin Schilder - Social Media PresentatieMartin Schilder - Social Media Presentatie
Martin Schilder - Social Media Presentatie
 

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

  • 1. Semantics + Filtering + Search = Twitcident Exploring Information in Social Web Streams Hypertext 2012, Milwaukee, WI – June 28 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  • 2. 200,000,000 number of tweets published per day Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 2
  • 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  • 4. 200,000,000 Pukkelpop 2011 became a tragedy Filtering Useful tweets? 81,000 tweets in four hours Search & Analytics 4
  • 6. First tweet… And then your train blasts off full of the anvils. #Nijmegen #veolia Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 6
  • 7. First picture… Astonishing! My train rams the platform at Nijmegen! http://pic.twitter.com/QVVfJHyd Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 7
  • 8. Traditional news media A train ramed the anvils at Nijmegen. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 8
  • 9. Research Challenges 1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident? 2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets? Search & Filtering Analytics Twitter streams topic information need Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 9
  • 10. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 10
  • 11. Twitcident system ! "#$% ( %6% , 8&#*( % ( , - . , ( / % % ( % ( ( $& $5' . $"7 % + 7 $" $9 , 1% ? ' #)$% #% + >*( ! " #$% ' ( ) $&#*( % ( , - . , ( /%0( *% + ( 1% #& % + . - ! "#$% %( . ' 2 % ' . *$% 4$%0( *% + ( 1% /. ' ( *( #3. . - :#;% #*)"% <( :3;% 2 *( /% )7 % ( 3%$*( #+ % =7$( <. #2! & :);% #2 )& >, 5- % F i gu r e 2: Scr een sh ot of t h e T w i t ci d en t sy st em : ( a) sear ch an d fi l t er i n g fu n ct i on al i t y t o ex p l or e an d r et r i ev e p ar t i cu l ar T w i t t er m essages, ( b ) m essages t h at ar e r el at ed t o t h e gi ven i nci dent ( h er e: fi r es i n T ex as) an d m at ch t h e Semanticsy+of t h e u ser+an d ( c) r ealTwitcident t-i cs of t h e m at ch i n g m essages. gi ven qu er Filtering Search = t i m e an al y Exploring Information in Social Web Streams 11 In t he T wit cident syst em, bot h facet ed search and re- incident is det ect ed t hen t he T wit cident framework t rans-
  • 12. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 12
  • 13. Incident detection Twitter 2. P2000 Broadcast •Twiticident relies on Initial query: (Moerdijk OR Chemie-Pack) Broadcasted incident AND (fire OR smoke OR Refined query based on Emergency (i) description: flame…) SINCE:2011-01-05 incident profiling: Prio 1 fire : : Vlasweg : 4 4782PW 1. 3. (Moerdijk OR Dordrecht…) AND Moerdijk :: Chemie Pack (#moerdijkFire OR toxic…) Broadcasting Services for detecting incidents. Twitcident Framework 4. • In the Netherlands : P2000 communication network (ii) Incident in Twitcident: Twitcident system Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 13
  • 14. Incident Profiling •For an incident i: • The profile of an incident is described as a set of tuples. Location, 0.4 Netherlands • Each tuple includes a facet- Incident, 0.5 value pair (f, v) and its Train accident weight to the incident i. Location, 0.8 Nijmegen Orgranization, 0.6 Veolia Incident, 1.0 Crash Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 14
  • 15. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 15
  • 16. Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 16
  • 17. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 17
  • 18. Semantic Enrichment •Named Entity Recognition •Classification : Casualties, Damages, Risks… •Linkage : External Resources •Metadata extraction Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 18
  • 19. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 19
  • 20. Filtering •Which tweets are relevant to the incidents? • Preprocessing : Language detection • Semantic Filtering : Compare tweet with P(i) • Semantic Filtering with News Context • P’(i) : P(i) complemented with f-v pairs from news Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 20
  • 21. Search & Analytics Automatic Filtering Twitcident Pipeline Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 21
  • 22. Faceted Search •Strategies (ranking) • Frequency-based • Time-sensitive based • Personalized Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 22
  • 23. Real-time analytics What type of things are mentioned in the tweets? Impact Area What aspects are mentioned over time? What do people report about over time? Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 23
  • 24. Evaluation - Dataset •Twitter corpus (TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 ) • 4,766,901 tweets classified as English • 6.2 million entity-extractions •News (Same time period) • 62 RSS News Feeds • 13,959 News Articles • 357,559 entity-extractions Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 24
  • 25. Evaluation For tweets Filtering (1/2) ! "( % ! "' &% ! "' % ! ") #% ! ") % ! "+$% G HI % ! "+% ! "$' % ! "#( % I J &! % ! "$% ! "&) % I J $! % ! "#% ! "$*% ! "&% ! "#$% ! "#) % K- 2/5% 5 ! "&' % ! "&&% ! "&#% !% , - . /012% , - . /012% B/<- 50- C 4 % 346- 74 % 5 08% 346- 74 9 4 % D- E9 >7F% 5 74 5 08% 6: 346- 08% ; - 9 <% =>06- ?6@ 4 /1>0% /5A Semantic strategies outperform the keyword- based filtering regarding all metrics. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 25
  • 26. Evaluation For tweets Filtering (2/2) The semantic strategy is more robust and achieves higher precisions for complex topics. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 26
  • 27. Evaluation For Faceted Search (1/2) +% &#$- % ! "#$% +01#*2"1% % (1"3 ! "*% ! "&' % ! "#$% () *+' #,% ! "' #% ! "' % .! &&/% % &"' ! ") % ! "( % !% ,-. / 0. 1234567. 8% 67: 9 4 : 6; 567. 8% : 67: 9 4 6; 567. 8% ,62. 9 8% 6-2: % ,62. 9. 8% 6-2: %<. 3= >-8% 6-2: % . 7. 7. 7. with semantic enrichment without semantic enrichment The semantic faceted search strategy improves the search performance by 34.8% and 22.4%. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 27
  • 28. Evaluation For Faceted Search (2/2) ! ") +% ! ") ' % ! ") % ! "#&% ! "#( % ! "#+% ! "#% ! ", +% EF +% ! ", % ! "#$% ! "#+% ! "#&% ! "' *% EF ' ! % ! "#) % ! "#' % ! "' +% ! ", +% ! "' % G HH% ! "' ( % ! "! +% ! "' , % !% % % .% @% 7 7 2? ; 56. 0. 058 34 D3 >. 2 12 C: = 0. B3 </. /0 .: 0A -. 89 A3 with semantic enrichment without semantic enrichment The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 28 Adaptive Faceted Search on Twitter 3
  • 29. Conclusions • What we have done: • Twitcident, a framework for filtering, searching, and analyzing information about incidents that people publish in their Social Web Streams • What we have achieved: • Better filtering of Twitter messages for a given incident. • Better search for relevant information about an incident within the filtered messages. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 29
  • 30. Thank you! @wisdelft http://twitcident.org Ke Tao @taubau Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 30

Editor's Notes

  1. there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  2. people tweet about everything, e.g. when they are at some festival like Pukkelpop(普客pop) they may report about their experiences...
  3. this festival actually became a disaster (5 people died) - 80k tweets where published in the first 4 hours (during the incident, the emergency services had problems in getting an overview on the situation) -&gt; how can one (a) automatically filter information from Twitter and (b) provide search and analytics? (s4)
  4. there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  5. there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  6. there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  7. Research challenges here.
  8. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  9. Search, Filtering, Analytics
  10. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  11. Search, Filtering, Analytics
  12. Search, Filtering, Analytics
  13. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  14. Search, Filtering, Analytics
  15. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  16. Search, Filtering, Analytics
  17. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  18. Search, Filtering, Analytics
  19. Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  20. Search, Filtering, Analytics
  21. Search, Filtering, Analytics
  22. Search, Filtering, Analytics
  23. Search, Filtering, Analytics
  24. Search, Filtering, Analytics
  25. Search, Filtering, AnalyticsWWW 2008Koren et al. Personalized Interactive Faceted Search
  26. Search, Filtering, Analytics
  27. Search, Filtering, Analytics