SlideShare uma empresa Scribd logo
1 de 38
Twitter, Twinder, Twitcident: Filtering
and Search on Social Web Streams

Data Bridges Workshop, Inria, Paris, April 12th 2012



                        Fabian Abel, Claudia Hauff, Geert-Jan Houben,
                                           Richard Stronkman, Ke Tao
                              Web Information Systems, TU Delft, the Netherlands

        Delft
        University of
        Technology
200,000,000
  number of tweets published per day



 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   2
Pukkelpop 2011




                 People tweet about everything,
                                 everywhere :-)




                                              3
200,000,000
Pukkelpop 2011
became a tragedy

                            Filtering




                   81,000 tweets in four hours



                            Search &
                            Browsing
                                             4
Challenges
  1. (Automatic) Filtering: Given a topic (e.g. expressed via
     some keywords), how can one automatically identify
     those tweets that are relevant to the topic?

  2. Search & Browsing: How can one improve search and
     browsing capabilities so that users can explore
     information in the streams of tweets (that are relevant for
     a topic)?
                                                       Twinder
                            Filtering
                                           Search &    filtering
                                           Browsing    and search
                                                       framework
Twitter streams
                                          topic            information need
        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   5
Search &
                                                  Filtering
                                                                           Browsing


    Twitter streams
                                                    topic            information need




1. Filtering of Twitter streams


     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams     6
Filtering onTwitter
                                                                        Query:
                                                                      www2012

                                                                       Typical approach:
                                                                       Keyword-based
                                                                       matching



Are there further features that can be used as
indicators for estimating the relevance of a tweet
for a topic?

         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   7
Syntactical feature: hashtags
Is a tweet more relevant ifitcontains a #hashtag?

  Hypothesis: tweets that contain hashtags are more likely
  to be relevant than tweets that do not contain hashtags.




                                                                         #Hashtag


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    8
Syntactical feature: URLs
Is a tweetthatcontains a URL more relevant?

  Hypothesis: tweets that contain a URL are more likely to
  be relevant than tweets that do not contain a URL.




       URL

       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   9
Syntactical feature: “mentions”
 Is a tweetthatmentions@somebodymore relevant?

   Hypothesis: tweets that are formulated as a reply to another
   tweet are less likely to be relevant than other tweets.




Reply




  @mention

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   10
Syntacticalfeature: length
Does the length of a tweetinfluenceitsrelevancefor a topic?




                                                           54 characters (9 words)

                                   vs.
                                                           140 characters (20 words)


  Hypothesis: the longer a tweet, the more likely it is to be
  relevant and interesting.

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   11
Overview of features
Topic-sensitive and topic-insensitive features




     Topic sensitive                             Topic insensitive
      Keyword-based
                                                 Syntactical features
        relevance

       What about the semantics?


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   12
Semanticfeatures: number of entities
Findsemantics in a tweettoestimate the relevance

          dbp:Tim_Berners-Lee                      dbp:World_Wide_Web




     dbp:WWW_Conference                                      dbp:France

                                                    dbp:Lyon


  Hypothesis: the more entities a tweet mentions, the more
  likely it is to be relevant and interesting.

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   13
Semanticfeatures: diversity
The types of entitiesthat are featuredby a tweet matter

                                                                   Place     Place     Place Place
  Person                   Thing


                                           vs.           I plan to visit Paris, Bordeaux, Grenoble, Nice,
                                                         Marseille and Lyon.


     Event                      Place
                            Place                    Place                 Place



  Hypothesis: the higher the diversity of entities that are
  mentioned in a tweet, the more likely it is to be relevant.


           Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams              14
Semanticfeatures: sentiment
       Opinionsexpressed in tweets are interesting




Looking forward to the WWW
conference :-) Yes!          vs.       I plan to visit Paris, Bordeaux,
                                                                       vs.
                                       Grenoble, Nice, Marseille and Lyon.
                                                                              Why are the big players not releasing
                                                                              query logs to the WWW community? :-(
                                                                              #fail



   :-)                              neutral                                              :-(
           Hypothesis: the likelihood of a tweet’s relevance is
           influenced by its sentiment polarity.


                   Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams          15
Semanticrelatedness
    Exploitsemantics to relate query withtweets




                                     dbp:International_World_Wide_Web_Conference

dbp:Tim_Berners-Lee




             Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   16
Overview of features
Bynow, we have 4 types of features.




     Topic sensitive                            Topic insensitive
      Keyword-based                                Syntactical
     Semantic-based                                Semantics
        Context?                                   Context?

 What kind of contextual features
        might be helpful?
       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   17
Contextual feature: authority of the publisher
 Itmatterswhopublished a tweet




     Hypothesis: the higher the number of tweets that have
     been published by the creator of a tweet, the more likely
     it is that the tweet is relevant.


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   18
Contextual feature: time w.r.t. query
When was a tweetpublished?
 Hypothesis: the lower the temporal distance between the
 query time and the creation time of a tweet, the more likely
 is the tweet relevant to the topic.




                    Tweet                          query

            March 31                       April 16

       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   19
Summary of Features



  Topic sensitive                           Topic insensitive
   Keyword-based                               Syntactical
  Semantic-based                               Semantics
   Context-based                                 Context




   Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   20
Results
Achievedfor the TREC MicroblogChallenge


Features             Precision            Recall            F-measure
keyword relevance                0.3040            0.2924                   0.2981
without semantics
semantic relevance            0.3363
                             0.3053                0.4828
                                                   0.2931                  0.3965
                                                                          0.2991
all features                 0.3674                0.4736                 0.4138




 Overall, we can achieve the precision and
 recall of over 35% and 45% respectively by
 applying all the features.

                                                            Challenge the future   21
Importance of Features
          Topic-sensitive                               Topic-insensitive
     2                                       2
             Keyword-based                                       Syntactical
     1                                       1

     0                                       0
               Keyword-based relevance            hasHashtag      hasURL               isReply       length
     -1                                      -1



      2                                      2


      1
            Semantic-based                   1
                                                                 Semantics
      0                                      0
           Relevance           Relatedness           #entities             diversity             sentiment
     -1                                      -1


Semantic relatedness, URLs, !isReply, diversity and
      2
          Context-based            Context
                                             2



sentiment are good indicators for estimating the
      1                                      1

      0                                      0
relevance of a tweet.
     -1
                 Temporal context
               Keyword-based relevance
                                             -1
                                                               Social context
                                                         Keyword-based relevance




                                                                             Challenge the future    22
Search &
                                                  Filtering
                                                                           Browsing


    Twitter streams
                                                    topic            information need




2. Search & Browsing in Twitter Streams


     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    23
Idea: Faceted Search

   Expand Query:                          Current Query:
   Suggestions:                            Eindhoven                 Music
   + Guilty Simpson
   + Area51                               Results:
                                          1. Yskiddd: Next saturday
   Locations more...                         @thatsimpsonguy aka Guilty Simpson
                                             will be performing at Area51 in my
   Events more...                            homeytown Eindhoven. #realliveshit
                                             #iwillspinrecords2
   Music Artists:                         2. Usee123: Cool #EV3door7980 !!!
   + Guilty Simpson                          http://bit.ly/igyyRhL
   + Bryan Adams
   + Elton John                           3. sanmiquelmusic: This Saturday I'm
                                             joining @KrusadersMusic to Intents
   + Golden Earring
   more...
      Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    24
Adaptive Faceted Search
                                               user

                               Adaptive Faceted Search
                                                                           How to adapt the
How to represent                                                            facet-value pair
the content of a                                                             ranking to the
                            User and Context Modeling
     tweet?                                                               current demands of
 facet extraction                                                             the user?
                                                                          query suggestions
                                 Semantic Enrichment




                                     Twitter posts
            Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   25
Facet Extraction and Semantic Enrichment
    powered by

                                                     Julian Assange

                          @bob: Julian Assange got                                 Tweet-based
                          arrested                                                 enrichment

Julian Assange

                             Julian Assange
                             Julian Assange arrested                               Link-based
 London                Julian Assange, the founder of
                       Julian Assange                                              enrichment
                       WikiLeaks, is under arrest in
                       WikiLeaks
                       London…
                       London
WikiLeaks
                 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   26
Faceted-search vs. hashtag-based
    (keyword) search
                                                        Faceted search based on
                                                        semantic enrichment of
                                                          tweets outperforms
                                                         hashtgag-based search
                                                              significantly.




    Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   27
Impact of link-based enrichment
                                             Personalized strategy
                                             outperforms baseline
                                                 significantly

                                                    Link-based enrichment
                                                   improves quality for both
                                                          strategies




     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   28
Twitcident application



                                                                           Search &
                                                  Filtering
                                                                           Browsing


     Twitter streams
                                                    topic            information need


Twitcident: Applying filter & search functionality
  for distilling information from Twitter during
  incidents (e.g. fires, extreme weather situations)
         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams     29
200,000,000
Pukkelpop 2011
became a tragedy

                            Filtering




                   81,000 tweets in four hours



                            Search &
                            Browsing
                                             30
Search &
                                                               Browsing



                                                               Automatic
                                                                Filtering




                                                  Twitcident Pipeline
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   31
Faceted Search




Filtered Twitter stream
                                           32
Real-time visualizations
                           33
Could we see it coming?

                                              Popular artist
                                              made a joke                    Impact
                                            about the weather                 storm




                                 Term usage 25 minutes before the incident

     1.   heavy weather, hail balls, lightning, pitch black…
     2.   drama, panic, hell, serious, extreme…




“                                                                                     ”
                                                                                      34
Spotting eye witnesses
                         35
Real-time information from eyewitness




                                        36
Summary
Automatic Filtering of Tweets: [#MSM@WWW ’12]
• Topic-sensitive and topic-insensitive features
• Semantic features (semantic relatedness, diversity, sentiment
  are beneficial)
Search and browsing: [ISWC ’11]
• Faceted Search
• Personalization & contextualization helps
Application: [Hypertext ‘12, Demo@WWW’12]
• Twitcident: fulfilling information needs during incidents
Future works:
• Weak signal detection based on tweets
• Duplicate detection
         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   37
Thank you!


                 @fabianabel
                 http://wis.ewi.tudelft.nl/

Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   38

Mais conteúdo relacionado

Destaque

Presentation, case study_event detection in twitter
Presentation, case study_event detection in twitterPresentation, case study_event detection in twitter
Presentation, case study_event detection in twitterYue He
 
Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceMartin Junghanns
 
Suspicious Mail Detection IEEE
Suspicious Mail Detection IEEESuspicious Mail Detection IEEE
Suspicious Mail Detection IEEENikhil Kulkarni
 
MediaEval 2011 SED Opening
MediaEval 2011 SED OpeningMediaEval 2011 SED Opening
MediaEval 2011 SED OpeningRaphael Troncy
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...tksakaki
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 
Suspicious Email Detection
Suspicious Email DetectionSuspicious Email Detection
Suspicious Email DetectionSuraj Kumar
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkFlink Forward
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 

Destaque (13)

Presentation, case study_event detection in twitter
Presentation, case study_event detection in twitterPresentation, case study_event detection in twitter
Presentation, case study_event detection in twitter
 
Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
 
Suspicious Mail Detection IEEE
Suspicious Mail Detection IEEESuspicious Mail Detection IEEE
Suspicious Mail Detection IEEE
 
MediaEval 2011 SED Opening
MediaEval 2011 SED OpeningMediaEval 2011 SED Opening
MediaEval 2011 SED Opening
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Suspicious Email Detection
Suspicious Email DetectionSuspicious Email Detection
Suspicious Email Detection
 
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 

Semelhante a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Twitter, Twinder, Twitcident: Filtering and Search in Social Web StreamsTwitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Twitter, Twinder, Twitcident: Filtering and Search in Social Web StreamsWeb Information Systems, TU Delft
 
What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?Ke Tao
 
Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterA Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterQi Gao
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Ke Tao
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAndrew Walker
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringKlaxon
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsMike Kujawski
 
Twitter for business
Twitter for businessTwitter for business
Twitter for businessAckermann PR
 
Twitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayTwitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayHeidi Otway, APR, CPRC
 
2: Social media services and blogging
2: Social media services and blogging2: Social media services and blogging
2: Social media services and bloggingCOMP 113
 
Twitter mining
Twitter miningTwitter mining
Twitter miningmagicpeach
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityEric Athas
 
Webinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityWebinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityEric Athas
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!AfricanCommonsProject
 
Institute of Management Consultants
Institute of Management ConsultantsInstitute of Management Consultants
Institute of Management ConsultantsThinktank Social
 

Semelhante a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams (20)

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Twitter, Twinder, Twitcident: Filtering and Search in Social Web StreamsTwitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
 
What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterA Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social media
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and Monitoring
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
 
Twitter for business
Twitter for businessTwitter for business
Twitter for business
 
Twitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayTwitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi Otway
 
2: Social media services and blogging
2: Social media services and blogging2: Social media services and blogging
2: Social media services and blogging
 
The Value of Twitter
The Value of TwitterThe Value of Twitter
The Value of Twitter
 
Twitter mining
Twitter miningTwitter mining
Twitter mining
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social Community
 
Twitter Ecosystem
Twitter EcosystemTwitter Ecosystem
Twitter Ecosystem
 
Webinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityWebinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social Community
 
Twitter 101
Twitter 101Twitter 101
Twitter 101
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!
 
Institute of Management Consultants
Institute of Management ConsultantsInstitute of Management Consultants
Institute of Management Consultants
 

Último

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

  • 1. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams Data Bridges Workshop, Inria, Paris, April 12th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  • 2. 200,000,000 number of tweets published per day Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
  • 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  • 4. 200,000,000 Pukkelpop 2011 became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 4
  • 5. Challenges 1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic? 2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)? Twinder Filtering Search & filtering Browsing and search framework Twitter streams topic information need Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
  • 6. Search & Filtering Browsing Twitter streams topic information need 1. Filtering of Twitter streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
  • 7. Filtering onTwitter Query: www2012 Typical approach: Keyword-based matching Are there further features that can be used as indicators for estimating the relevance of a tweet for a topic? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
  • 8. Syntactical feature: hashtags Is a tweet more relevant ifitcontains a #hashtag? Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags. #Hashtag Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
  • 9. Syntactical feature: URLs Is a tweetthatcontains a URL more relevant? Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL. URL Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
  • 10. Syntactical feature: “mentions” Is a tweetthatmentions@somebodymore relevant? Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets. Reply @mention Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
  • 11. Syntacticalfeature: length Does the length of a tweetinfluenceitsrelevancefor a topic? 54 characters (9 words) vs. 140 characters (20 words) Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
  • 12. Overview of features Topic-sensitive and topic-insensitive features Topic sensitive Topic insensitive Keyword-based Syntactical features relevance What about the semantics? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
  • 13. Semanticfeatures: number of entities Findsemantics in a tweettoestimate the relevance dbp:Tim_Berners-Lee dbp:World_Wide_Web dbp:WWW_Conference dbp:France dbp:Lyon Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
  • 14. Semanticfeatures: diversity The types of entitiesthat are featuredby a tweet matter Place Place Place Place Person Thing vs. I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon. Event Place Place Place Place Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
  • 15. Semanticfeatures: sentiment Opinionsexpressed in tweets are interesting Looking forward to the WWW conference :-) Yes! vs. I plan to visit Paris, Bordeaux, vs. Grenoble, Nice, Marseille and Lyon. Why are the big players not releasing query logs to the WWW community? :-( #fail :-) neutral :-( Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
  • 16. Semanticrelatedness Exploitsemantics to relate query withtweets dbp:International_World_Wide_Web_Conference dbp:Tim_Berners-Lee Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
  • 17. Overview of features Bynow, we have 4 types of features. Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context? Context? What kind of contextual features might be helpful? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
  • 18. Contextual feature: authority of the publisher Itmatterswhopublished a tweet Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
  • 19. Contextual feature: time w.r.t. query When was a tweetpublished? Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic. Tweet query March 31 April 16 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
  • 20. Summary of Features Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context-based Context Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
  • 21. Results Achievedfor the TREC MicroblogChallenge Features Precision Recall F-measure keyword relevance 0.3040 0.2924 0.2981 without semantics semantic relevance 0.3363 0.3053 0.4828 0.2931 0.3965 0.2991 all features 0.3674 0.4736 0.4138 Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features. Challenge the future 21
  • 22. Importance of Features Topic-sensitive Topic-insensitive 2 2 Keyword-based Syntactical 1 1 0 0 Keyword-based relevance hasHashtag hasURL isReply length -1 -1 2 2 1 Semantic-based 1 Semantics 0 0 Relevance Relatedness #entities diversity sentiment -1 -1 Semantic relatedness, URLs, !isReply, diversity and 2 Context-based Context 2 sentiment are good indicators for estimating the 1 1 0 0 relevance of a tweet. -1 Temporal context Keyword-based relevance -1 Social context Keyword-based relevance Challenge the future 22
  • 23. Search & Filtering Browsing Twitter streams topic information need 2. Search & Browsing in Twitter Streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
  • 24. Idea: Faceted Search Expand Query: Current Query: Suggestions: Eindhoven Music + Guilty Simpson + Area51 Results: 1. Yskiddd: Next saturday Locations more... @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my Events more... homeytown Eindhoven. #realliveshit #iwillspinrecords2 Music Artists: 2. Usee123: Cool #EV3door7980 !!! + Guilty Simpson http://bit.ly/igyyRhL + Bryan Adams + Elton John 3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents + Golden Earring more... Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
  • 25. Adaptive Faceted Search user Adaptive Faceted Search How to adapt the How to represent facet-value pair the content of a ranking to the User and Context Modeling tweet? current demands of  facet extraction the user?  query suggestions Semantic Enrichment Twitter posts Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
  • 26. Facet Extraction and Semantic Enrichment powered by Julian Assange @bob: Julian Assange got Tweet-based arrested enrichment Julian Assange Julian Assange Julian Assange arrested Link-based London Julian Assange, the founder of Julian Assange enrichment WikiLeaks, is under arrest in WikiLeaks London… London WikiLeaks Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
  • 27. Faceted-search vs. hashtag-based (keyword) search Faceted search based on semantic enrichment of tweets outperforms hashtgag-based search significantly. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
  • 28. Impact of link-based enrichment Personalized strategy outperforms baseline significantly Link-based enrichment improves quality for both strategies Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
  • 29. Twitcident application Search & Filtering Browsing Twitter streams topic information need Twitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations) Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
  • 30. 200,000,000 Pukkelpop 2011 became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 30
  • 31. Search & Browsing Automatic Filtering Twitcident Pipeline Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 31
  • 34. Could we see it coming? Popular artist made a joke Impact about the weather storm Term usage 25 minutes before the incident 1. heavy weather, hail balls, lightning, pitch black… 2. drama, panic, hell, serious, extreme… “ ” 34
  • 36. Real-time information from eyewitness 36
  • 37. Summary Automatic Filtering of Tweets: [#MSM@WWW ’12] • Topic-sensitive and topic-insensitive features • Semantic features (semantic relatedness, diversity, sentiment are beneficial) Search and browsing: [ISWC ’11] • Faceted Search • Personalization & contextualization helps Application: [Hypertext ‘12, Demo@WWW’12] • Twitcident: fulfilling information needs during incidents Future works: • Weak signal detection based on tweets • Duplicate detection Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
  • 38. Thank you! @fabianabel http://wis.ewi.tudelft.nl/ Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38

Notas do Editor

  1. Motivation:Information overloadPersonalised “better” search
  2. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  3. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  4. Traditional Twitter SearchHighlight what does keyword matching means, the keywords, in search query and tweets.
  5. Title -> syntactical featuresBox in the tweetGreen boxes for the hypothesesFlow from keyword-based relevance to … Slide 5-8, flow
  6. Subtitle -> question?
  7. Introduction to the usage of @, including mentions, and reply. Reply tweets frequently occur in private conversations. Therefore particularly, make a hypothesis about reply tweet.
  8. The 21st International World Wide Web Conference #www2012 will take place in Lyon, France April 16-20 2012 @www2012Lyon www2012.wwwconference.orgSubtitle questionOne short, one longcomparison
  9. Fade in the question later.
  10. Fade in the entities one by one.
  11. Fade in the entities one by one.
  12. Fade in the entities one by one.
  13. Not highlight www, lyon, france
  14. 18Can we utilize the contextual features.
  15. Titles,
  16. Timeline
  17. Number of features.
  18. ComparisonFade in the pairsHighlightTextbox -> Conclusion, (precision)
  19. Very time consuming and overwhelming indeed!
  20. entity extraction and semantic enrichment and relation discovery.
  21. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  22. Case #1: vroegsignalering
  23. Case 1:handhaving (beeldrondom incident)
  24. Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.