SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Twitris
Browsing real-time data by space,
        time and theme
           http://twitris.knoesis.org
Motivation, Goals
Motivation, Goals
Mumbai Terror Attack 2008
  Citizen sensor observations (flickr, twitter,
  blogs..)
  No matter where you looked, tapping into a
  cultural perception was impossible

We wanted to know what people in India
were saying vs. those in Pakistan or the
U.S.A
Spatio-Temporal-Thematic Slices of
         Real-time Data

  Around NEWS-WORTHY EVENTS
    Using space and time as cues for extracting
    social perceptions (behind signals)
    Summarizing hundreds and thousands of
    real-time observations
The Health Care Reform Debate
          in the U.S
The Health Care Reform Debate
           in the U.S
Temporal navigation
The Health Care Reform Debate
           in the U.S
Temporal navigation   Spatial Markers
Zooming in on Florida
n-gram Summaries
Zooming in on Washington
n-gram Summaries
Find resources related to
                                  Find resources related to
                                      social perceptions
                                     social perceptions




   Browsing Real-time Data in Context
                                                                 News and
                                                               News and
                                                                 Wikipedia articles
                                                               Wikipedia articles
                                                               toto put extracted
                                                                  put extracted
        SOYLENT GREEN and the HEALTH CARE REFORM                 descriptors in
                                                               descriptors in
                                                                 context
                                                               context




    News and
    Wikipedia articles
    to put extracted
    descriptors in
    context




✓Exploit spatio, temporal semantics for thematic aggregation
  Exploit spatio, temporal semantics for thematic aggregation
Core of Twitris
n-gram summaries - Spatio-temporal-thematic
           event descriptors
Architecture
      Step1 : Gathering event-
          relevant tweets


       Because tweets are not
          pre-categorized



                   Skip if I run out of time ..
Topical Tweets
Gathering event-specific tweets: Iran Election
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts

5. Repeat from Step 3 and babysit!
Architecture
                        Step1 : Gathering event-
                            relevant tweets

                       Step2: Spatial, Temporal
ata Collection,   analysis metadata of tweets
                           and visualizing         in


ly Relevant Data
ning citizen observations from Twitte
Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
profile


  Location: Dayton, OH (Google geocoder service, GeoDB)
  Location: “best place in the world” (fail!)
Architecture
                     Step1 : Gathering event-
                         relevant tweets
                     Step2: Spatial, Temporal
                        metadata of tweets
ta Collection, analysis and visualizing in
                      Step3: Spatio-temporal
                             clusters

y Relevant Data
Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
                         that generated this data!

     Long-running, world-wide events (Iran Election Protest)
         clusters by country and week?
     Short, world-wide events (Olympics)
         clusters by country and day?
     Long-running, evolving, local events (Health Care
     Reform Debate)
         clusters by state and day?
                                                Tunable parameters
Tweets in a Spatio-Temporal Cluster

   Spatio-temporal bias dictate granularity of
   processing tweets
   Mumbai Terror Attack
     Cluster1: Tweets from India, 08/1/08
     Cluster2: Tweets from Pakistan, 08/1/08
     Cluster n: Tweets from USA, 08/13/08
Architecture
                        Step1 : Gathering event-
                            relevant tweets
                        Step2: Spatial, Temporal
                           metadata of tweets
                        Step3: Spatio-temporal
ta Collection,   analysis andclusters
                                visualizing        in
                       Step4: Thematic Descriptors
                        in spatio-temporal cluster
y Relevant Data
Thematic Descriptors

An event descriptor is an n-gram
  1,2 and 3 grams
n-gram descriptors
“President Obama in trying to regain control of the

health-care debate will likely shift his pitch in September”


1-grams: President, Obama, in, trying, to, regain, ...
2-grams: “President Obama”, “Obama in”, “in
trying”, “trying to”...
3-grams: “President Obama in”, “Obama in trying”;
“in trying to”...
Thematic Descriptors
“President”   “President Obama”   “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Temporal Importance (always popular vs. currently
trending)
Thematic Importance of an n-gram
 “President”    “President Obama”      “President Obama in”


  Exploiting Redundancy
      tfidf of n-gram (Lucene Index)
      amplify by fraction of nouns in the n-gram
      (Stanford Natural Language Parser)
      amplify by fraction of non-stop words (‘going to
      try’)
Thematic Importance of an n-gram
  Exploiting Variability
    Big three/Big 3; Ford, GM, Chrysler, General
    Motors..
    Contextually relevant words boost statistical
    importance                              #)$
                                                              *&'+,-('$

  Focus word (fw) : “big three”       #(1('2-$
                                      )/%/',$
                                                 !"#$%&'(($



  Associated words (awi) :                        ./'0$


  co-occurring in spatio-temporal set of tweets
Thematic Importance of an n-gram
            #)$
                               *&'+,-('$
                                           focus word (fw): Big Three
 #(1('2-$         !"#$%&'(($
 )/%/',$
                                              associated word (awi): Ford
                   ./'0$




            Thematic importance of focus word:



                    tfidf of fw                          tfidf of awi

                         association strength of fw and awi
focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated words
        of the focus word only with the strongly word co-occu
  nguage [9]. Borrowing fromassociations is in thisword co-occure
        to measure strength of past success to use area, we mea
  rengthlanguage [9]. Borrowingwordpast success in this area, words a
          between the focus from and the associated we meas
                 Contextual Relevance
        strength between the focus word and the associated words as
he notion of point-wise mutual information in terms of co-o
        the notion of point-wise mutual information in terms of co-oc
We measure assocstr scores as aas a function ofthe point-wisem
        We measure assocstr scores      function of the point-wise
 etweenbetween the word Strengthcontextandawi .i . This is done
         the focus focus word and the context of awi This is done
             Association     and the of fw of aw
 ssociation strengths are determined in in the contexts thatthe d
        association strengths are determined the contexts that the
        Let us depends on contexts Cawi ={caw1 ,caw ..} where caw
 et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk
                contexts for awi aw as
        strong descriptors collocate with awawiassoc str(f w,aw) )isis
  rong descriptors that     that collocate with . . assoc (f w,awi       c
                                                 i         str       i
                      Contexts of associated P (pmi(f w,caw ))
                                              word awi : ‘Ford’
                                 assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi
                                                 i )=
                                                      k
                                                                k ))
                                                        |Cawi |
           !"#$%&'(($   assocstr (f w,awi )=        k                         ,∀cawk ∈Caw
                                                        |Cawi |
        where the point-wise mutual information between f w and ca
  here the i)*'+$is calculated as:
        aw ),point-wise mutual information between f w and c
                                   Pointwise Mutual Information
 wi ), is calculated big
    chrysler, GM, as:        3                          p(f w,caw )
                                                               k                   p(cawk |f w)
                                  pmi(f w,cawk )=log p(f w)p(caw          )
                                                                            =log     p(cawk )
                                                                      k

   focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f)
        where p(f w)= pmi(f k |f w)=
                     n(f w)
                            ;p(caw
                                          p(f w,cawk
                                     n(cawk ,f w)
                                                                       w)
                                                  ; n(f w) =log frequency
                                                                 p(caw
                         N                     n(f w)             k                        k
ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai attack even
ocus word and all associations in Cf w . The thematic weights of
 long with Temporal Importance of a1 to compu
             their strengths are plugged into Eqn
                          Descriptor
hematic score ngrami (th), of the n-gram descriptor.
B. Temporal Importance of an event descriptor: While th
 re good indicators of what will always dominate
         Certain descriptors is important in a spatio-tempora
 escriptors tend to dominate discussions. In order to allow
         discussions
 ossibly interesting descriptors to surface, we discount the th
            “Terrorism” in Mumbai Terror Attack Tweets
 escriptor depending on how popular it has been in the recent p
 iscount score for a n-gram, a Care reform debatedepending on
            “Healthcare” in Health tuneable factor
 vent, is calculated over a period of time as:
         Allow recent (possibly interesting) ones to
         surface     ngram (te)=temporal  ∗
                                            PD ngrami (th)d
                                i             bias   d=1       d

                        0-1 bias: less to more importance
 here   ngrami (th)d   is the enhanced thematic score
                               to recent n-grams            of the descri
ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be relevant f
 ons. For this reason, we also apply a temporalbias weight ranging fr
weight closer to 1 Importance of while a weight closer to 0
     Spatial activity.
                   gives more importance, a Descriptor
 portance to past

  ial Importance of an event descriptor: We also discount the im
  a descriptor based on its occurence in other spatio-temporal sets
   is that Local descriptors are more interesting compared ar
           descriptors that occur all over the world on a given day
 sting compared to those that occur only in the spatio-temporal set
           to global ones
We define the spatial discount score for an n-gram as a fraction of sp
              Spatial discount
 artitions (e.g. countries) that had activity surrounding this descri

                                   k
            ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias )

             fraction of spatio-temporal           closer to 0 = global
            clusters n-gram occurred in                importance
of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
 rent spatio-temporal sets. For example, when processi
          STT Score of an n-gram
  Mumbai attack setting the spatialbias to 1 eliminate
 ial signals. While processing tweets from the US, on
 obal bias given that the event did not originate the
are setSpatio-temporal-thematic score of aof observations
        before we begin the processing descriptor
 he spatial thematic score - spatio-temporal discountsfrom
          = and temporal effects are discounted
final spatio-temporal-thematic (STT) weight of the n

           wi =ngrami (th)−ngrami (te)−ngrami (sp)


 illustrates the effect of our enhanced STT weights
ptors pertaining to the Mumbai terror attack event,
higher-order n-
grams picked over
  lower-order n-
 grams (if same
     scores)
Top X Descriptor Tag Cloud

 Tag size proportional to enhanced STT score

Mais conteúdo relacionado

Semelhante a Twitris

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysisTimea Turdean
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Ashutosh Jadhav
 
Hao lyu slides_sarcasm
Hao lyu slides_sarcasmHao lyu slides_sarcasm
Hao lyu slides_sarcasmHao Lyu
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter ReserchKim Holmberg
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportPatrick Grant
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Amparo Elizabeth Cano Basave
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureLouis Rosenfeld
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Artificial Intelligence Institute at UofSC
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersGabriela Agustini
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Alfonso Crisci
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 

Semelhante a Twitris (20)

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Hao lyu slides_sarcasm
Hao lyu slides_sarcasmHao lyu slides_sarcasm
Hao lyu slides_sarcasm
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architecture
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
 
Data Visualization at Twitter
Data Visualization at TwitterData Visualization at Twitter
Data Visualization at Twitter
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 

Último

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 

Último (20)

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 

Twitris

  • 1. Twitris Browsing real-time data by space, time and theme http://twitris.knoesis.org
  • 3. Motivation, Goals Mumbai Terror Attack 2008 Citizen sensor observations (flickr, twitter, blogs..) No matter where you looked, tapping into a cultural perception was impossible We wanted to know what people in India were saying vs. those in Pakistan or the U.S.A
  • 4. Spatio-Temporal-Thematic Slices of Real-time Data Around NEWS-WORTHY EVENTS Using space and time as cues for extracting social perceptions (behind signals) Summarizing hundreds and thousands of real-time observations
  • 5. The Health Care Reform Debate in the U.S
  • 6. The Health Care Reform Debate in the U.S Temporal navigation
  • 7. The Health Care Reform Debate in the U.S Temporal navigation Spatial Markers
  • 8. Zooming in on Florida
  • 10. Zooming in on Washington
  • 12. Find resources related to Find resources related to social perceptions social perceptions Browsing Real-time Data in Context News and News and Wikipedia articles Wikipedia articles toto put extracted put extracted SOYLENT GREEN and the HEALTH CARE REFORM descriptors in descriptors in context context News and Wikipedia articles to put extracted descriptors in context ✓Exploit spatio, temporal semantics for thematic aggregation Exploit spatio, temporal semantics for thematic aggregation
  • 13. Core of Twitris n-gram summaries - Spatio-temporal-thematic event descriptors
  • 14. Architecture Step1 : Gathering event- relevant tweets Because tweets are not pre-categorized Skip if I run out of time ..
  • 16. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran ..
  • 17. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 18. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 19. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query
  • 20. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets
  • 21. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts
  • 22. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts 5. Repeat from Step 3 and babysit!
  • 23. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal ata Collection, analysis metadata of tweets and visualizing in ly Relevant Data ning citizen observations from Twitte
  • 24. Geo-Coordinates of Tweets Location a tweet originates from Location it mentions Approximation: Poster location on Twitter profile Location: Dayton, OH (Google geocoder service, GeoDB) Location: “best place in the world” (fail!)
  • 25. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets ta Collection, analysis and visualizing in Step3: Spatio-temporal clusters y Relevant Data
  • 26. Spatio-Temporal Clusters of Tweets Because every event is different.. and we want to preserve social perceptions that generated this data! Long-running, world-wide events (Iran Election Protest) clusters by country and week? Short, world-wide events (Olympics) clusters by country and day? Long-running, evolving, local events (Health Care Reform Debate) clusters by state and day? Tunable parameters
  • 27. Tweets in a Spatio-Temporal Cluster Spatio-temporal bias dictate granularity of processing tweets Mumbai Terror Attack Cluster1: Tweets from India, 08/1/08 Cluster2: Tweets from Pakistan, 08/1/08 Cluster n: Tweets from USA, 08/13/08
  • 28. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets Step3: Spatio-temporal ta Collection, analysis andclusters visualizing in Step4: Thematic Descriptors in spatio-temporal cluster y Relevant Data
  • 29. Thematic Descriptors An event descriptor is an n-gram 1,2 and 3 grams
  • 30. n-gram descriptors “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... 3-grams: “President Obama in”, “Obama in trying”; “in trying to”...
  • 31. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by:
  • 32. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important
  • 33. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity)
  • 34. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
  • 35. Thematic Importance of an n-gram “President” “President Obama” “President Obama in” Exploiting Redundancy tfidf of n-gram (Lucene Index) amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser) amplify by fraction of non-stop words (‘going to try’)
  • 36. Thematic Importance of an n-gram Exploiting Variability Big three/Big 3; Ford, GM, Chrysler, General Motors.. Contextually relevant words boost statistical importance #)$ *&'+,-('$ Focus word (fw) : “big three” #(1('2-$ )/%/',$ !"#$%&'(($ Associated words (awi) : ./'0$ co-occurring in spatio-temporal set of tweets
  • 37. Thematic Importance of an n-gram #)$ *&'+,-('$ focus word (fw): Big Three #(1('2-$ !"#$%&'(($ )/%/',$ associated word (awi): Ford ./'0$ Thematic importance of focus word: tfidf of fw tfidf of awi association strength of fw and awi
  • 38. focus word in the given spatio-temporal corpus. The goal is to o measure strength of associations is to useassociated words of the focus word only with the strongly word co-occu nguage [9]. Borrowing fromassociations is in thisword co-occure to measure strength of past success to use area, we mea rengthlanguage [9]. Borrowingwordpast success in this area, words a between the focus from and the associated we meas Contextual Relevance strength between the focus word and the associated words as he notion of point-wise mutual information in terms of co-o the notion of point-wise mutual information in terms of co-oc We measure assocstr scores as aas a function ofthe point-wisem We measure assocstr scores function of the point-wise etweenbetween the word Strengthcontextandawi .i . This is done the focus focus word and the context of awi This is done Association and the of fw of aw ssociation strengths are determined in in the contexts thatthe d association strengths are determined the contexts that the Let us depends on contexts Cawi ={caw1 ,caw ..} where caw et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk contexts for awi aw as strong descriptors collocate with awawiassoc str(f w,aw) )isis rong descriptors that that collocate with . . assoc (f w,awi c i str i Contexts of associated P (pmi(f w,caw )) word awi : ‘Ford’ assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi i )= k k )) |Cawi | !"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw |Cawi | where the point-wise mutual information between f w and ca here the i)*'+$is calculated as: aw ),point-wise mutual information between f w and c Pointwise Mutual Information wi ), is calculated big chrysler, GM, as: 3 p(f w,caw ) k p(cawk |f w) pmi(f w,cawk )=log p(f w)p(caw ) =log p(cawk ) k focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f) where p(f w)= pmi(f k |f w)= n(f w) ;p(caw p(f w,cawk n(cawk ,f w) w) ; n(f w) =log frequency p(caw N n(f w) k k
  • 39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo b) Top 15 extracted descriptors in the US for Mumbai attack even ocus word and all associations in Cf w . The thematic weights of long with Temporal Importance of a1 to compu their strengths are plugged into Eqn Descriptor hematic score ngrami (th), of the n-gram descriptor. B. Temporal Importance of an event descriptor: While th re good indicators of what will always dominate Certain descriptors is important in a spatio-tempora escriptors tend to dominate discussions. In order to allow discussions ossibly interesting descriptors to surface, we discount the th “Terrorism” in Mumbai Terror Attack Tweets escriptor depending on how popular it has been in the recent p iscount score for a n-gram, a Care reform debatedepending on “Healthcare” in Health tuneable factor vent, is calculated over a period of time as: Allow recent (possibly interesting) ones to surface ngram (te)=temporal ∗ PD ngrami (th)d i bias d=1 d 0-1 bias: less to more importance here ngrami (th)d is the enhanced thematic score to recent n-grams of the descri
  • 40. ration for which we wish to apply the dampening factor, for exa nt week. However, this temporal discount might not be relevant f ons. For this reason, we also apply a temporalbias weight ranging fr weight closer to 1 Importance of while a weight closer to 0 Spatial activity. gives more importance, a Descriptor portance to past ial Importance of an event descriptor: We also discount the im a descriptor based on its occurence in other spatio-temporal sets is that Local descriptors are more interesting compared ar descriptors that occur all over the world on a given day sting compared to those that occur only in the spatio-temporal set to global ones We define the spatial discount score for an n-gram as a fraction of sp Spatial discount artitions (e.g. countries) that had activity surrounding this descri k ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias ) fraction of spatio-temporal closer to 0 = global clusters n-gram occurred in importance
  • 41. of importance to the global presence of the descripto ng on the event of interest, both these discounting fa rent spatio-temporal sets. For example, when processi STT Score of an n-gram Mumbai attack setting the spatialbias to 1 eliminate ial signals. While processing tweets from the US, on obal bias given that the event did not originate the are setSpatio-temporal-thematic score of aof observations before we begin the processing descriptor he spatial thematic score - spatio-temporal discountsfrom = and temporal effects are discounted final spatio-temporal-thematic (STT) weight of the n wi =ngrami (th)−ngrami (te)−ngrami (sp) illustrates the effect of our enhanced STT weights ptors pertaining to the Mumbai terror attack event,
  • 42. higher-order n- grams picked over lower-order n- grams (if same scores)
  • 43. Top X Descriptor Tag Cloud Tag size proportional to enhanced STT score