SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Twitris
Browsing real-time data by space,
        time and theme
           http://twitris.knoesis.org
Motivation, Goals
Motivation, Goals
Mumbai Terror Attack 2008
  Citizen sensor observations (flickr, twitter,
  blogs..)
  No matter where you looked, tapping into a
  cultural perception was impossible

We wanted to know what people in India
were saying vs. those in Pakistan or the
U.S.A
Spatio-Temporal-Thematic Slices of
         Real-time Data

  Around NEWS-WORTHY EVENTS
    Using space and time as cues for extracting
    social perceptions (behind signals)
    Summarizing hundreds and thousands of
    real-time observations
The Health Care Reform Debate
          in the U.S
The Health Care Reform Debate
           in the U.S
Temporal navigation
The Health Care Reform Debate
           in the U.S
Temporal navigation   Spatial Markers
Zooming in on Florida
n-gram Summaries
Zooming in on Washington
n-gram Summaries
Find resources related to
                                  Find resources related to
                                      social perceptions
                                     social perceptions




   Browsing Real-time Data in Context
                                                                 News and
                                                               News and
                                                                 Wikipedia articles
                                                               Wikipedia articles
                                                               toto put extracted
                                                                  put extracted
        SOYLENT GREEN and the HEALTH CARE REFORM                 descriptors in
                                                               descriptors in
                                                                 context
                                                               context




    News and
    Wikipedia articles
    to put extracted
    descriptors in
    context




✓Exploit spatio, temporal semantics for thematic aggregation
  Exploit spatio, temporal semantics for thematic aggregation
Core of Twitris
n-gram summaries - Spatio-temporal-thematic
           event descriptors
Architecture
      Step1 : Gathering event-
          relevant tweets


       Because tweets are not
          pre-categorized



                   Skip if I run out of time ..
Topical Tweets
Gathering event-specific tweets: Iran Election
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts

5. Repeat from Step 3 and babysit!
Architecture
                        Step1 : Gathering event-
                            relevant tweets

                       Step2: Spatial, Temporal
ata Collection,   analysis metadata of tweets
                           and visualizing         in


ly Relevant Data
ning citizen observations from Twitte
Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
profile


  Location: Dayton, OH (Google geocoder service, GeoDB)
  Location: “best place in the world” (fail!)
Architecture
                     Step1 : Gathering event-
                         relevant tweets
                     Step2: Spatial, Temporal
                        metadata of tweets
ta Collection, analysis and visualizing in
                      Step3: Spatio-temporal
                             clusters

y Relevant Data
Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
                         that generated this data!

     Long-running, world-wide events (Iran Election Protest)
         clusters by country and week?
     Short, world-wide events (Olympics)
         clusters by country and day?
     Long-running, evolving, local events (Health Care
     Reform Debate)
         clusters by state and day?
                                                Tunable parameters
Tweets in a Spatio-Temporal Cluster

   Spatio-temporal bias dictate granularity of
   processing tweets
   Mumbai Terror Attack
     Cluster1: Tweets from India, 08/1/08
     Cluster2: Tweets from Pakistan, 08/1/08
     Cluster n: Tweets from USA, 08/13/08
Architecture
                        Step1 : Gathering event-
                            relevant tweets
                        Step2: Spatial, Temporal
                           metadata of tweets
                        Step3: Spatio-temporal
ta Collection,   analysis andclusters
                                visualizing        in
                       Step4: Thematic Descriptors
                        in spatio-temporal cluster
y Relevant Data
Thematic Descriptors

An event descriptor is an n-gram
  1,2 and 3 grams
n-gram descriptors
“President Obama in trying to regain control of the

health-care debate will likely shift his pitch in September”


1-grams: President, Obama, in, trying, to, regain, ...
2-grams: “President Obama”, “Obama in”, “in
trying”, “trying to”...
3-grams: “President Obama in”, “Obama in trying”;
“in trying to”...
Thematic Descriptors
“President”   “President Obama”   “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Temporal Importance (always popular vs. currently
trending)
Thematic Importance of an n-gram
 “President”    “President Obama”      “President Obama in”


  Exploiting Redundancy
      tfidf of n-gram (Lucene Index)
      amplify by fraction of nouns in the n-gram
      (Stanford Natural Language Parser)
      amplify by fraction of non-stop words (‘going to
      try’)
Thematic Importance of an n-gram
  Exploiting Variability
    Big three/Big 3; Ford, GM, Chrysler, General
    Motors..
    Contextually relevant words boost statistical
    importance                              #)$
                                                              *&'+,-('$

  Focus word (fw) : “big three”       #(1('2-$
                                      )/%/',$
                                                 !"#$%&'(($



  Associated words (awi) :                        ./'0$


  co-occurring in spatio-temporal set of tweets
Thematic Importance of an n-gram
            #)$
                               *&'+,-('$
                                           focus word (fw): Big Three
 #(1('2-$         !"#$%&'(($
 )/%/',$
                                              associated word (awi): Ford
                   ./'0$




            Thematic importance of focus word:



                    tfidf of fw                          tfidf of awi

                         association strength of fw and awi
focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated words
        of the focus word only with the strongly word co-occu
  nguage [9]. Borrowing fromassociations is in thisword co-occure
        to measure strength of past success to use area, we mea
  rengthlanguage [9]. Borrowingwordpast success in this area, words a
          between the focus from and the associated we meas
                 Contextual Relevance
        strength between the focus word and the associated words as
he notion of point-wise mutual information in terms of co-o
        the notion of point-wise mutual information in terms of co-oc
We measure assocstr scores as aas a function ofthe point-wisem
        We measure assocstr scores      function of the point-wise
 etweenbetween the word Strengthcontextandawi .i . This is done
         the focus focus word and the context of awi This is done
             Association     and the of fw of aw
 ssociation strengths are determined in in the contexts thatthe d
        association strengths are determined the contexts that the
        Let us depends on contexts Cawi ={caw1 ,caw ..} where caw
 et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk
                contexts for awi aw as
        strong descriptors collocate with awawiassoc str(f w,aw) )isis
  rong descriptors that     that collocate with . . assoc (f w,awi       c
                                                 i         str       i
                      Contexts of associated P (pmi(f w,caw ))
                                              word awi : ‘Ford’
                                 assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi
                                                 i )=
                                                      k
                                                                k ))
                                                        |Cawi |
           !"#$%&'(($   assocstr (f w,awi )=        k                         ,∀cawk ∈Caw
                                                        |Cawi |
        where the point-wise mutual information between f w and ca
  here the i)*'+$is calculated as:
        aw ),point-wise mutual information between f w and c
                                   Pointwise Mutual Information
 wi ), is calculated big
    chrysler, GM, as:        3                          p(f w,caw )
                                                               k                   p(cawk |f w)
                                  pmi(f w,cawk )=log p(f w)p(caw          )
                                                                            =log     p(cawk )
                                                                      k

   focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f)
        where p(f w)= pmi(f k |f w)=
                     n(f w)
                            ;p(caw
                                          p(f w,cawk
                                     n(cawk ,f w)
                                                                       w)
                                                  ; n(f w) =log frequency
                                                                 p(caw
                         N                     n(f w)             k                        k
ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai attack even
ocus word and all associations in Cf w . The thematic weights of
 long with Temporal Importance of a1 to compu
             their strengths are plugged into Eqn
                          Descriptor
hematic score ngrami (th), of the n-gram descriptor.
B. Temporal Importance of an event descriptor: While th
 re good indicators of what will always dominate
         Certain descriptors is important in a spatio-tempora
 escriptors tend to dominate discussions. In order to allow
         discussions
 ossibly interesting descriptors to surface, we discount the th
            “Terrorism” in Mumbai Terror Attack Tweets
 escriptor depending on how popular it has been in the recent p
 iscount score for a n-gram, a Care reform debatedepending on
            “Healthcare” in Health tuneable factor
 vent, is calculated over a period of time as:
         Allow recent (possibly interesting) ones to
         surface     ngram (te)=temporal  ∗
                                            PD ngrami (th)d
                                i             bias   d=1       d

                        0-1 bias: less to more importance
 here   ngrami (th)d   is the enhanced thematic score
                               to recent n-grams            of the descri
ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be relevant f
 ons. For this reason, we also apply a temporalbias weight ranging fr
weight closer to 1 Importance of while a weight closer to 0
     Spatial activity.
                   gives more importance, a Descriptor
 portance to past

  ial Importance of an event descriptor: We also discount the im
  a descriptor based on its occurence in other spatio-temporal sets
   is that Local descriptors are more interesting compared ar
           descriptors that occur all over the world on a given day
 sting compared to those that occur only in the spatio-temporal set
           to global ones
We define the spatial discount score for an n-gram as a fraction of sp
              Spatial discount
 artitions (e.g. countries) that had activity surrounding this descri

                                   k
            ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias )

             fraction of spatio-temporal           closer to 0 = global
            clusters n-gram occurred in                importance
of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
 rent spatio-temporal sets. For example, when processi
          STT Score of an n-gram
  Mumbai attack setting the spatialbias to 1 eliminate
 ial signals. While processing tweets from the US, on
 obal bias given that the event did not originate the
are setSpatio-temporal-thematic score of aof observations
        before we begin the processing descriptor
 he spatial thematic score - spatio-temporal discountsfrom
          = and temporal effects are discounted
final spatio-temporal-thematic (STT) weight of the n

           wi =ngrami (th)−ngrami (te)−ngrami (sp)


 illustrates the effect of our enhanced STT weights
ptors pertaining to the Mumbai terror attack event,
higher-order n-
grams picked over
  lower-order n-
 grams (if same
     scores)
Top X Descriptor Tag Cloud

 Tag size proportional to enhanced STT score

Mais conteúdo relacionado

Semelhante a Twitris

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
1crore projects
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
Patrick Grant
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
Amparo Elizabeth Cano Basave
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architecture
Louis Rosenfeld
 

Semelhante a Twitris (20)

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Hao lyu slides_sarcasm
Hao lyu slides_sarcasmHao lyu slides_sarcasm
Hao lyu slides_sarcasm
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architecture
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
 
Data Visualization at Twitter
Data Visualization at TwitterData Visualization at Twitter
Data Visualization at Twitter
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

Twitris

  • 1. Twitris Browsing real-time data by space, time and theme http://twitris.knoesis.org
  • 3. Motivation, Goals Mumbai Terror Attack 2008 Citizen sensor observations (flickr, twitter, blogs..) No matter where you looked, tapping into a cultural perception was impossible We wanted to know what people in India were saying vs. those in Pakistan or the U.S.A
  • 4. Spatio-Temporal-Thematic Slices of Real-time Data Around NEWS-WORTHY EVENTS Using space and time as cues for extracting social perceptions (behind signals) Summarizing hundreds and thousands of real-time observations
  • 5. The Health Care Reform Debate in the U.S
  • 6. The Health Care Reform Debate in the U.S Temporal navigation
  • 7. The Health Care Reform Debate in the U.S Temporal navigation Spatial Markers
  • 8. Zooming in on Florida
  • 10. Zooming in on Washington
  • 12. Find resources related to Find resources related to social perceptions social perceptions Browsing Real-time Data in Context News and News and Wikipedia articles Wikipedia articles toto put extracted put extracted SOYLENT GREEN and the HEALTH CARE REFORM descriptors in descriptors in context context News and Wikipedia articles to put extracted descriptors in context ✓Exploit spatio, temporal semantics for thematic aggregation Exploit spatio, temporal semantics for thematic aggregation
  • 13. Core of Twitris n-gram summaries - Spatio-temporal-thematic event descriptors
  • 14. Architecture Step1 : Gathering event- relevant tweets Because tweets are not pre-categorized Skip if I run out of time ..
  • 16. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran ..
  • 17. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 18. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 19. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query
  • 20. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets
  • 21. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts
  • 22. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts 5. Repeat from Step 3 and babysit!
  • 23. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal ata Collection, analysis metadata of tweets and visualizing in ly Relevant Data ning citizen observations from Twitte
  • 24. Geo-Coordinates of Tweets Location a tweet originates from Location it mentions Approximation: Poster location on Twitter profile Location: Dayton, OH (Google geocoder service, GeoDB) Location: “best place in the world” (fail!)
  • 25. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets ta Collection, analysis and visualizing in Step3: Spatio-temporal clusters y Relevant Data
  • 26. Spatio-Temporal Clusters of Tweets Because every event is different.. and we want to preserve social perceptions that generated this data! Long-running, world-wide events (Iran Election Protest) clusters by country and week? Short, world-wide events (Olympics) clusters by country and day? Long-running, evolving, local events (Health Care Reform Debate) clusters by state and day? Tunable parameters
  • 27. Tweets in a Spatio-Temporal Cluster Spatio-temporal bias dictate granularity of processing tweets Mumbai Terror Attack Cluster1: Tweets from India, 08/1/08 Cluster2: Tweets from Pakistan, 08/1/08 Cluster n: Tweets from USA, 08/13/08
  • 28. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets Step3: Spatio-temporal ta Collection, analysis andclusters visualizing in Step4: Thematic Descriptors in spatio-temporal cluster y Relevant Data
  • 29. Thematic Descriptors An event descriptor is an n-gram 1,2 and 3 grams
  • 30. n-gram descriptors “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... 3-grams: “President Obama in”, “Obama in trying”; “in trying to”...
  • 31. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by:
  • 32. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important
  • 33. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity)
  • 34. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
  • 35. Thematic Importance of an n-gram “President” “President Obama” “President Obama in” Exploiting Redundancy tfidf of n-gram (Lucene Index) amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser) amplify by fraction of non-stop words (‘going to try’)
  • 36. Thematic Importance of an n-gram Exploiting Variability Big three/Big 3; Ford, GM, Chrysler, General Motors.. Contextually relevant words boost statistical importance #)$ *&'+,-('$ Focus word (fw) : “big three” #(1('2-$ )/%/',$ !"#$%&'(($ Associated words (awi) : ./'0$ co-occurring in spatio-temporal set of tweets
  • 37. Thematic Importance of an n-gram #)$ *&'+,-('$ focus word (fw): Big Three #(1('2-$ !"#$%&'(($ )/%/',$ associated word (awi): Ford ./'0$ Thematic importance of focus word: tfidf of fw tfidf of awi association strength of fw and awi
  • 38. focus word in the given spatio-temporal corpus. The goal is to o measure strength of associations is to useassociated words of the focus word only with the strongly word co-occu nguage [9]. Borrowing fromassociations is in thisword co-occure to measure strength of past success to use area, we mea rengthlanguage [9]. Borrowingwordpast success in this area, words a between the focus from and the associated we meas Contextual Relevance strength between the focus word and the associated words as he notion of point-wise mutual information in terms of co-o the notion of point-wise mutual information in terms of co-oc We measure assocstr scores as aas a function ofthe point-wisem We measure assocstr scores function of the point-wise etweenbetween the word Strengthcontextandawi .i . This is done the focus focus word and the context of awi This is done Association and the of fw of aw ssociation strengths are determined in in the contexts thatthe d association strengths are determined the contexts that the Let us depends on contexts Cawi ={caw1 ,caw ..} where caw et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk contexts for awi aw as strong descriptors collocate with awawiassoc str(f w,aw) )isis rong descriptors that that collocate with . . assoc (f w,awi c i str i Contexts of associated P (pmi(f w,caw )) word awi : ‘Ford’ assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi i )= k k )) |Cawi | !"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw |Cawi | where the point-wise mutual information between f w and ca here the i)*'+$is calculated as: aw ),point-wise mutual information between f w and c Pointwise Mutual Information wi ), is calculated big chrysler, GM, as: 3 p(f w,caw ) k p(cawk |f w) pmi(f w,cawk )=log p(f w)p(caw ) =log p(cawk ) k focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f) where p(f w)= pmi(f k |f w)= n(f w) ;p(caw p(f w,cawk n(cawk ,f w) w) ; n(f w) =log frequency p(caw N n(f w) k k
  • 39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo b) Top 15 extracted descriptors in the US for Mumbai attack even ocus word and all associations in Cf w . The thematic weights of long with Temporal Importance of a1 to compu their strengths are plugged into Eqn Descriptor hematic score ngrami (th), of the n-gram descriptor. B. Temporal Importance of an event descriptor: While th re good indicators of what will always dominate Certain descriptors is important in a spatio-tempora escriptors tend to dominate discussions. In order to allow discussions ossibly interesting descriptors to surface, we discount the th “Terrorism” in Mumbai Terror Attack Tweets escriptor depending on how popular it has been in the recent p iscount score for a n-gram, a Care reform debatedepending on “Healthcare” in Health tuneable factor vent, is calculated over a period of time as: Allow recent (possibly interesting) ones to surface ngram (te)=temporal ∗ PD ngrami (th)d i bias d=1 d 0-1 bias: less to more importance here ngrami (th)d is the enhanced thematic score to recent n-grams of the descri
  • 40. ration for which we wish to apply the dampening factor, for exa nt week. However, this temporal discount might not be relevant f ons. For this reason, we also apply a temporalbias weight ranging fr weight closer to 1 Importance of while a weight closer to 0 Spatial activity. gives more importance, a Descriptor portance to past ial Importance of an event descriptor: We also discount the im a descriptor based on its occurence in other spatio-temporal sets is that Local descriptors are more interesting compared ar descriptors that occur all over the world on a given day sting compared to those that occur only in the spatio-temporal set to global ones We define the spatial discount score for an n-gram as a fraction of sp Spatial discount artitions (e.g. countries) that had activity surrounding this descri k ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias ) fraction of spatio-temporal closer to 0 = global clusters n-gram occurred in importance
  • 41. of importance to the global presence of the descripto ng on the event of interest, both these discounting fa rent spatio-temporal sets. For example, when processi STT Score of an n-gram Mumbai attack setting the spatialbias to 1 eliminate ial signals. While processing tweets from the US, on obal bias given that the event did not originate the are setSpatio-temporal-thematic score of aof observations before we begin the processing descriptor he spatial thematic score - spatio-temporal discountsfrom = and temporal effects are discounted final spatio-temporal-thematic (STT) weight of the n wi =ngrami (th)−ngrami (te)−ngrami (sp) illustrates the effect of our enhanced STT weights ptors pertaining to the Mumbai terror attack event,
  • 42. higher-order n- grams picked over lower-order n- grams (if same scores)
  • 43. Top X Descriptor Tag Cloud Tag size proportional to enhanced STT score