SlideShare uma empresa Scribd logo
1 de 9
Project : Sub-event
detection on Social Media
Codebase:
https://github.com/pallavshah/TwitterSubeventDetector
Pallav Shah Akshay Joshi
Rajat Bhardwaj Ravneet Singh Kathuria
The Project
• Make a timeline/summary of events from a corpus of tweets
commenting on the event.
• The corpus consists of tweets from a specific domain talking about a
single major event.
• The objective of the project is to extract sub-events within the event.
• Summary will be short description about the sub event.
Our Approach
We followed a two-step approach:
• Sub-event Detection: The first step is to identify if and when a sub-
event has occurred and if it has, what tweets comprise the sub-event
• Tweet Selection: The second step is to choose a representative tweet
that describes the sub-event appropriately.
The aggregation of these two processes will in turn provide a set of
tweets as a summary of the event.
Part1: Detecting the sub-
event
Sub-event detection is done by finding the distance measure between
different tweets of same event.
• Dictionary of words: The parsed data is used to create a dictionary
which stores relevant words and its count in the corpus.
• Vector for each tweet: The generated dictionary and a second parse
over the parsed data are used to get a single sparse vector
corresponding to each tweet. This vector contains the id and count of
each word present in the tweet.
Part 1: Detecting the sub-
event(continued)
• The sub-event detector module:
 The module uses LSHash Library of Python to find similarity distance
between various tweets. Each tweet is analyzed and compared with the
existing group of similar tweets.
If the tweet matches to any of the group with a high threshold, the tweet is
assumed to belong to that group and added to it.
Otherwise, a new group is created with that tweet as the representative
tweet of the group. In the end all the tweets as thus partitioned into groups
(or clusters) representing different sub-events.
Part 2: Summarization of Sub-
event
• Term Frequency Inverse Document Frequency: A statistical weighting
technique that assigns each term within a document a weight that
reflects the term’s saliency within the document. The TF-IDF value is
composed of two primary parts.
The term frequency component (TF) assigns more weight to words that occur
frequently within a document because important words are often repeated.
The inverse document frequency component (IDF) compensates for the fact
that some words such as common stop words are frequent.
Normalization of tweets: The tweets are normalized to prevent bias towards
larger tweets.
System Block Diagram
Technologies Used
We have used the following python libraries:
• LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev
• Gensim: http://radimrehurek.com/gensim/
Dataset
We used Snow dataset containing tweets of 2012 US General Elections.
Experiments and Results
• Tested on the 2012 US General Elections tweets data set from SNOW
2014.
• Results bore around 60% accuracy as compared to manual evaluation
of the tweets data.

Mais conteúdo relacionado

Semelhante a Twitter Sub-event Detection Project Presentation

Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
Ahmedali Durga
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
1crore projects
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
Ankita Kumari
 
final_nlp
final_nlpfinal_nlp
final_nlp
aphex34
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
moresmile
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2
The Night's Watch
 

Semelhante a Twitter Sub-event Detection Project Presentation (20)

Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Survey
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Ire major project
Ire major projectIre major project
Ire major project
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaion
 
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
Comparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksComparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworks
 

Último

Último (10)

DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
 
2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptx2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptx
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
 
Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.
 
Databricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfDatabricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdf
 
Breathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptxBreathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptx
 
ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024
 
Understanding Poverty: A Community Questionnaire
Understanding Poverty: A Community QuestionnaireUnderstanding Poverty: A Community Questionnaire
Understanding Poverty: A Community Questionnaire
 
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docxThe Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
 

Twitter Sub-event Detection Project Presentation

  • 1. Project : Sub-event detection on Social Media Codebase: https://github.com/pallavshah/TwitterSubeventDetector Pallav Shah Akshay Joshi Rajat Bhardwaj Ravneet Singh Kathuria
  • 2. The Project • Make a timeline/summary of events from a corpus of tweets commenting on the event. • The corpus consists of tweets from a specific domain talking about a single major event. • The objective of the project is to extract sub-events within the event. • Summary will be short description about the sub event.
  • 3. Our Approach We followed a two-step approach: • Sub-event Detection: The first step is to identify if and when a sub- event has occurred and if it has, what tweets comprise the sub-event • Tweet Selection: The second step is to choose a representative tweet that describes the sub-event appropriately. The aggregation of these two processes will in turn provide a set of tweets as a summary of the event.
  • 4. Part1: Detecting the sub- event Sub-event detection is done by finding the distance measure between different tweets of same event. • Dictionary of words: The parsed data is used to create a dictionary which stores relevant words and its count in the corpus. • Vector for each tweet: The generated dictionary and a second parse over the parsed data are used to get a single sparse vector corresponding to each tweet. This vector contains the id and count of each word present in the tweet.
  • 5. Part 1: Detecting the sub- event(continued) • The sub-event detector module:  The module uses LSHash Library of Python to find similarity distance between various tweets. Each tweet is analyzed and compared with the existing group of similar tweets. If the tweet matches to any of the group with a high threshold, the tweet is assumed to belong to that group and added to it. Otherwise, a new group is created with that tweet as the representative tweet of the group. In the end all the tweets as thus partitioned into groups (or clusters) representing different sub-events.
  • 6. Part 2: Summarization of Sub- event • Term Frequency Inverse Document Frequency: A statistical weighting technique that assigns each term within a document a weight that reflects the term’s saliency within the document. The TF-IDF value is composed of two primary parts. The term frequency component (TF) assigns more weight to words that occur frequently within a document because important words are often repeated. The inverse document frequency component (IDF) compensates for the fact that some words such as common stop words are frequent. Normalization of tweets: The tweets are normalized to prevent bias towards larger tweets.
  • 8. Technologies Used We have used the following python libraries: • LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev • Gensim: http://radimrehurek.com/gensim/ Dataset We used Snow dataset containing tweets of 2012 US General Elections.
  • 9. Experiments and Results • Tested on the 2012 US General Elections tweets data set from SNOW 2014. • Results bore around 60% accuracy as compared to manual evaluation of the tweets data.