SlideShare uma empresa Scribd logo
1 de 32
Twitris – System for Understanding
  Perceptions From Social Data	

                      	

                        	

                        	

           http://twitris.knoesis.org/	

 Ohio Center of Excellence in Knowledge Enabled
             Computing (Kno.e.sis)	

      Wright State University, Dayton, OH	



                                                  1
Twitris - Motivation	

1.  Information Overload"
•    WHAT to be aware of"
•    Multiple Storylines about same event!!"




                                        Image: http://bit.ly/etFezl 2
Twitris - Motivation	


2. Evolution of Citizen Observation"
     •  with location, time and occurrence of other
        events"




                                                      3
Twitris - Motivation	


3. Big picture of the event"
   –  How to find out "
     •  Location and time based interesting facts for an
        event from Twitter"
     •  Event related information from other sources
        (images, videos, news and Wikipedia articles)"
     "


                                                           4
Twitris: Twitter + Tetris	

•  Twitris lets you browse citizen reports using social
   perceptions as the fulcrum"
   –  What is being said about an event (theme)"
   –  Where (spatial)"
   –  When (temporal)"

•  Contextual information from web resources like news,
   Wikipedia articles, Flickr, TwitPic and Youtube"

•  Study diversity and change in perceptions"


                                                          5
Twitris Architecture	


                       4




                              2
1                  3


                                  6
Data Collection and Preprocessing:
 Semi-automated Tweet Crawler	

Extract topically relevant tweets using Twitter search
   API and search keywords"
    –  Because tweets are not pre-categorized!"

Strategy: Semi-automated Multithread Continuous
   "   " Tweet Crawler"
"


       l    Start with manually selected keywords (seed)"
       l    Crawl using keywords, hashtags"
       l    Periodically update keywords used for crawl "
             (to capture evolution of the topic)"
       l    Continue crawl"                                 7
Data Collection and Preprocessing:
      Metadata Extraction	

 •    Tweet published date-time, author, location"
 •    Location from where tweet is originated"
      −  From the tweet"
      −  From authorʼs profile"
            •    Location: Dayton, OH (Google geocoder service)"
            •    Location: “best place in the world” (fail!)"
 •        Location Geocode lookup"
 •        Cache (location, latitude, longitude) for speedup"
      "



                                                                   8
Key Phrase Extraction:	

    1. Spatio-Temporal Clustering	

•  Objective: from volume of tweets to event descriptive key
   phrases, preserving spatio-temporal-thematic aspects of
   social perceptions!
"
1.  Spatio-temporal clustering"
"
    –  Group observations based on location and time"
    "
    –  Global events (Iran Election Protest, Japan
        Earthquake)"
       •  clusters by country and day"
        "
    –  Local events (Heathcare reform debate, Austin
        Plane crash)"
       •  clusters by state and day"
                                                          9
Key Phrase Extraction: 	

    Spatio-Temporal clustering	

Temporal navigation   Spatial Markers




                                        10
Key Phrase Extraction:	

"
              2. N-gram generation	

"
"
"
"
"
"

     “President Obama in trying to regain control of
    the health-care debate will likely shift his pitch in
    September”"
    "1-grams: President, Obama, in, trying, to, regain, ..."
    "2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”...
    "3-grams: “President Obama in”, “Obama in trying”; “in trying to”..."

                                                                            11
Key Phrase Extraction:	

        3. n-gram Weight Calculation	

A n-gramʼs weight is calculated by"
"


         1.  Thematic Importance"
            –    redundancy: statistically discriminatory in nature"
            –    variability: contextually important"

         2.  Spatial Importance (local vs. global popularity)"
         3.  Temporal Importance (always popular vs.
             currently trending)"
    "


                                                                       12
Key Phrase Extraction:	

3.1.A Thematic Importance of a n-gram	


A.  Exploiting Redundancy"

   1.  TF-IDF of n-gram (Lucene Index)"
   2.  Amplify by fraction of nouns in the n-gram (Stanford
       Natural Language Parser)"
   3.  Amplify by fraction of non-stop words (ʻgoing to tryʼ)"
   4.  Pick higher order n-gram (for overlapping segments and
       same TF-IDF)"
   5.  Select top 5 n-grams for further analysis"
Key Phrase Extraction:	

    3.1.B Thematic Importance of a n-gram	


B. Exploiting Variability"
     –  Contextually relevant words boost statistical
        importance"
•  Focus word (fw) : “n-gram”"
"


•  Associated words (awi) : top 5 co-occurring words in
   spatio-temporal set of tweets"
•  Association strength: Point-wise Mutual Information"
Key Phrase Extraction:	

3.2 Thematic-Temporal Importance	

•  Temporal Importance of the n-gram"
     •  always popular vs. currently trending"
•  Certain descriptors always dominate observations"
     –  Obama, President in the US presidential election"
"

    •  To allow less popular, interesting descriptors to surface, we
       discount thematic score proportional to recent popularity"



    •  Spatio-temporal-thematic score of a descriptor"
       "= thematic score - spatio-temporal discounts"
                                                                       15
Key Phrase Extraction:	

3.3 Thematic-Temporal-Spatial Importance	


•  Descriptors that occur all over the world not as
   interesting as those local to a region "
   –  (local vs. global popularity)"

•  Discount thematic-temporal score proportional to number
   of spatial sets (not local) that mention the descriptor"


•  Final Spatio-Temporal-Thematic (STT) weight of a "
   n-gram is"


                                                         16
Key Phrase Extraction: Results	

TFIDF vs. Spatio-
Temporal-Thematic
(STT) Scores of
Descriptors"




                                    17
Key Phrase Extraction: Example	

•  Objective: from volume of tweets to event descriptive key phrases,
   preserving spatio-temporal-thematic aspects of social perceptions




                                                                    18
Analysis of Embedded Links	

•  Due 140 character tweet size limit people are
   increasingly integrating hyperlinks into tweets (Articles,
   blogs, Images, video)"
•  Steps: "
   –  Extraction and resolution of links"
   –  Provide hyperlink to articles, blogs"
   –  Check semantic relevance for images and videos"
       •  Based on title and description "



                                                           19
External Context for
        Understanding Event	


•  Wikipedia articles"
•  Related news"




                                 20
Twitris: Widgets	





                      21
Sentiment Analysis	


•  using statistical and machine"
   learning techniques




                                    22
Entity-Relationship Graph	

•  using semantically annotated Dbpedia"
   entities mentioned in the tweets "




                                           23
Tweet Traffic Analysis	

•  Event popularity over a period of time"




                                             24
Twitris:  
Functional    
Overview	




                 25
Twitris: Demo, Quick Show	





    •  http://twitris.knoesis.org/




                                     26
Ongoing work	





                  27
Continuous Semantics 	

Domain models to enhance understanding of the content"




                                                   28
Coordination	

•  Coordinating needs and resources in disaster
   situation"
  –  Analyze SMS and Web reports from disaster location"
  –  Use domain models for efficient and timely coordination"
                                                  Image: http://bit.ly/hcp4PG




                                                                         29
Twitris Team 	



Meena Nagarajan




                              Amit Sheth              Hemant Purohit

      Ashutosh Jadhav




                                                   Lu Chen
       Pramod Anantharam
                               Pavan Kapanipathi
References	

1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org"
2.  Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal-
    thematic analysis of citizen-sensor data - challenges and experiences. In: Web
    Information Systems Engineering. (2009)"
3.  Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen,              Amit
    P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced
    Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29
    2009, Washington, DC, USA"
4.  A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path
    towards event monitoring and situational awareness, February 17, 2009"
5.  A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet
    Computing, July/August 2009."
6.  Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and
    reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502"
7.  Mendes PN, Passant A, Kapanipathi P, Sheth AP, 'Linked Open Social Signals,' WI2010 IEEE/WIC/
    ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010"
8.  Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and
    Retweet Practices. 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010"

                                                                                                      31
                               * All the trademarks belong to their respective owners
 

      Thanks!	

        	

     Questions?	



                     32

Mais conteúdo relacionado

Semelhante a Twitris - Web Information System 2011 Course

Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Pavan Kapanipathi
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
final_nlp
final_nlpfinal_nlp
final_nlpaphex34
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis Zelia Blaga
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
Data for nuclear non-proliferation
Data for nuclear non-proliferation Data for nuclear non-proliferation
Data for nuclear non-proliferation fisherali
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and futureRoi Blanco
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...HopeBay Technologies, Inc.
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Artificial Intelligence Institute at UofSC
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Shalin Hai-Jew
 
DMG_final
DMG_finalDMG_final
DMG_finalaphex34
 
Social media analytics
Social media analyticsSocial media analytics
Social media analyticsJithu Pettan
 

Semelhante a Twitris - Web Information System 2011 Course (20)

Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
Data for nuclear non-proliferation
Data for nuclear non-proliferation Data for nuclear non-proliferation
Data for nuclear non-proliferation
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Ieee visap bkang
Ieee visap bkangIeee visap bkang
Ieee visap bkang
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
 
DMG_final
DMG_finalDMG_final
DMG_final
 
Social media analytics
Social media analyticsSocial media analytics
Social media analytics
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Twitris - Web Information System 2011 Course

  • 1. Twitris – System for Understanding Perceptions From Social Data http://twitris.knoesis.org/ Ohio Center of Excellence in Knowledge Enabled Computing (Kno.e.sis) Wright State University, Dayton, OH 1
  • 2. Twitris - Motivation 1.  Information Overload" •  WHAT to be aware of" •  Multiple Storylines about same event!!" Image: http://bit.ly/etFezl 2
  • 3. Twitris - Motivation 2. Evolution of Citizen Observation" •  with location, time and occurrence of other events" 3
  • 4. Twitris - Motivation 3. Big picture of the event" –  How to find out " •  Location and time based interesting facts for an event from Twitter" •  Event related information from other sources (images, videos, news and Wikipedia articles)" " 4
  • 5. Twitris: Twitter + Tetris •  Twitris lets you browse citizen reports using social perceptions as the fulcrum" –  What is being said about an event (theme)" –  Where (spatial)" –  When (temporal)" •  Contextual information from web resources like news, Wikipedia articles, Flickr, TwitPic and Youtube" •  Study diversity and change in perceptions" 5
  • 7. Data Collection and Preprocessing: Semi-automated Tweet Crawler Extract topically relevant tweets using Twitter search API and search keywords" –  Because tweets are not pre-categorized!" Strategy: Semi-automated Multithread Continuous " " Tweet Crawler" " l  Start with manually selected keywords (seed)" l  Crawl using keywords, hashtags" l  Periodically update keywords used for crawl " (to capture evolution of the topic)" l  Continue crawl" 7
  • 8. Data Collection and Preprocessing: Metadata Extraction •  Tweet published date-time, author, location" •  Location from where tweet is originated" −  From the tweet" −  From authorʼs profile" •  Location: Dayton, OH (Google geocoder service)" •  Location: “best place in the world” (fail!)" •  Location Geocode lookup" •  Cache (location, latitude, longitude) for speedup" " 8
  • 9. Key Phrase Extraction: 1. Spatio-Temporal Clustering •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions! " 1.  Spatio-temporal clustering" " –  Group observations based on location and time" " –  Global events (Iran Election Protest, Japan Earthquake)" •  clusters by country and day" " –  Local events (Heathcare reform debate, Austin Plane crash)" •  clusters by state and day" 9
  • 10. Key Phrase Extraction: Spatio-Temporal clustering Temporal navigation Spatial Markers 10
  • 11. Key Phrase Extraction: " 2. N-gram generation " " " " " " “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”" "1-grams: President, Obama, in, trying, to, regain, ..." "2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... "3-grams: “President Obama in”, “Obama in trying”; “in trying to”..." 11
  • 12. Key Phrase Extraction: 3. n-gram Weight Calculation A n-gramʼs weight is calculated by" " 1.  Thematic Importance" –  redundancy: statistically discriminatory in nature" –  variability: contextually important" 2.  Spatial Importance (local vs. global popularity)" 3.  Temporal Importance (always popular vs. currently trending)" " 12
  • 13. Key Phrase Extraction: 3.1.A Thematic Importance of a n-gram A.  Exploiting Redundancy" 1.  TF-IDF of n-gram (Lucene Index)" 2.  Amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser)" 3.  Amplify by fraction of non-stop words (ʻgoing to tryʼ)" 4.  Pick higher order n-gram (for overlapping segments and same TF-IDF)" 5.  Select top 5 n-grams for further analysis"
  • 14. Key Phrase Extraction: 3.1.B Thematic Importance of a n-gram B. Exploiting Variability" –  Contextually relevant words boost statistical importance" •  Focus word (fw) : “n-gram”" " •  Associated words (awi) : top 5 co-occurring words in spatio-temporal set of tweets" •  Association strength: Point-wise Mutual Information"
  • 15. Key Phrase Extraction: 3.2 Thematic-Temporal Importance •  Temporal Importance of the n-gram" •  always popular vs. currently trending" •  Certain descriptors always dominate observations" –  Obama, President in the US presidential election" " •  To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity" •  Spatio-temporal-thematic score of a descriptor" "= thematic score - spatio-temporal discounts" 15
  • 16. Key Phrase Extraction: 3.3 Thematic-Temporal-Spatial Importance •  Descriptors that occur all over the world not as interesting as those local to a region " –  (local vs. global popularity)" •  Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor" •  Final Spatio-Temporal-Thematic (STT) weight of a " n-gram is" 16
  • 17. Key Phrase Extraction: Results TFIDF vs. Spatio- Temporal-Thematic (STT) Scores of Descriptors" 17
  • 18. Key Phrase Extraction: Example •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions 18
  • 19. Analysis of Embedded Links •  Due 140 character tweet size limit people are increasingly integrating hyperlinks into tweets (Articles, blogs, Images, video)" •  Steps: " –  Extraction and resolution of links" –  Provide hyperlink to articles, blogs" –  Check semantic relevance for images and videos" •  Based on title and description " 19
  • 20. External Context for Understanding Event •  Wikipedia articles" •  Related news" 20
  • 22. Sentiment Analysis •  using statistical and machine" learning techniques 22
  • 23. Entity-Relationship Graph •  using semantically annotated Dbpedia" entities mentioned in the tweets " 23
  • 24. Tweet Traffic Analysis •  Event popularity over a period of time" 24
  • 25. Twitris:   Functional     Overview 25
  • 26. Twitris: Demo, Quick Show •  http://twitris.knoesis.org/ 26
  • 28. Continuous Semantics Domain models to enhance understanding of the content" 28
  • 29. Coordination •  Coordinating needs and resources in disaster situation" –  Analyze SMS and Web reports from disaster location" –  Use domain models for efficient and timely coordination" Image: http://bit.ly/hcp4PG 29
  • 30. Twitris Team Meena Nagarajan Amit Sheth Hemant Purohit Ashutosh Jadhav Lu Chen Pramod Anantharam Pavan Kapanipathi
  • 31. References 1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org" 2.  Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal- thematic analysis of citizen-sensor data - challenges and experiences. In: Web Information Systems Engineering. (2009)" 3.  Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29 2009, Washington, DC, USA" 4.  A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009" 5.  A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009." 6.  Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502" 7.  Mendes PN, Passant A, Kapanipathi P, Sheth AP, 'Linked Open Social Signals,' WI2010 IEEE/WIC/ ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010" 8.  Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices. 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010" 31 * All the trademarks belong to their respective owners
  • 32.   Thanks! Questions? 32