SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
Topic Detection From Events
Abstract
We live in a world of information overload. Manually
annotating text with topics isn't an option anymore. In this
paper, we deal with tweets. Firstly we recognize the
topics/entities they speak about. Having done that, we
cluster them based on the recognized entities and verbs to
get hierarchy of clusters. The clusters are then labelled
based on the most frequent entities.
Tools Used
● Twitter NLP
● Wiki Semantic Distance
● Verb net
Approach
To get first level of clusters:
1. Tokenize the tweets.
2. Apply POS tagging.
3. Apply IOB tagging on each token using Feature Extraction.
4. Extract Entities by applying some rules on the tweet with IOB token.
5. For the identified entity, find the nearest wikipedia entity using string edit
distance.
6. Create an inverted index based on the identified entities.
Approach contd...
Then:
1. We use k-means clustering using jaccard similarity as the similarity metric
at each level.
2. We get the most frequent tags from each of the clusters and use them to
label the clusters.
Architecture
Results
Results contd...
Results contd...
Conclusion
Our methods successfully cluster tweets into a semantically related hierarchy.
We took a dataset that was constrained to a specific domain i.e. elections.
Future work may involve experimenting with different datasets. Wiki semantic
distance might be more useful in case of a more diverse dataset. Future work
can also focus on experimenting with different datasets to find out when wiki
semantic distance begins to significantly outperform jaccard similarity.
Thanks!
Team 15
Garima Ahuja
Harish Kolli
Ashwin
Venkatram

Mais conteúdo relacionado

Semelhante a Topic Detection From Tweets Using Entity Recognition & Clustering

Intro to Objective C
Intro to Objective CIntro to Objective C
Intro to Objective CAshiq Uz Zoha
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streamstmra
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxnilesh405711
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntityAnkita Kumari
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPJustin Long
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
Flutter - Fluent-Utter
Flutter - Fluent-UtterFlutter - Fluent-Utter
Flutter - Fluent-UtterApurva Gupta
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8
 
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfJAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfnofakeNews
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017SK Reddy
 

Semelhante a Topic Detection From Tweets Using Entity Recognition & Clustering (20)

Python made easy
Python made easy Python made easy
Python made easy
 
Intro to Objective C
Intro to Objective CIntro to Objective C
Intro to Objective C
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streams
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Java unit 7
Java unit 7Java unit 7
Java unit 7
 
Python-Classes.pptx
Python-Classes.pptxPython-Classes.pptx
Python-Classes.pptx
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
3. jvm
3. jvm3. jvm
3. jvm
 
Flutter - Fluent-Utter
Flutter - Fluent-UtterFlutter - Fluent-Utter
Flutter - Fluent-Utter
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 
No more bad news!
No more bad news!No more bad news!
No more bad news!
 
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdfJAVA VIVA QUESTIONS_CODERS LODGE.pdf
JAVA VIVA QUESTIONS_CODERS LODGE.pdf
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Topic Detection From Tweets Using Entity Recognition & Clustering

  • 2. Abstract We live in a world of information overload. Manually annotating text with topics isn't an option anymore. In this paper, we deal with tweets. Firstly we recognize the topics/entities they speak about. Having done that, we cluster them based on the recognized entities and verbs to get hierarchy of clusters. The clusters are then labelled based on the most frequent entities.
  • 3. Tools Used ● Twitter NLP ● Wiki Semantic Distance ● Verb net
  • 4. Approach To get first level of clusters: 1. Tokenize the tweets. 2. Apply POS tagging. 3. Apply IOB tagging on each token using Feature Extraction. 4. Extract Entities by applying some rules on the tweet with IOB token. 5. For the identified entity, find the nearest wikipedia entity using string edit distance. 6. Create an inverted index based on the identified entities.
  • 5. Approach contd... Then: 1. We use k-means clustering using jaccard similarity as the similarity metric at each level. 2. We get the most frequent tags from each of the clusters and use them to label the clusters.
  • 10. Conclusion Our methods successfully cluster tweets into a semantically related hierarchy. We took a dataset that was constrained to a specific domain i.e. elections. Future work may involve experimenting with different datasets. Wiki semantic distance might be more useful in case of a more diverse dataset. Future work can also focus on experimenting with different datasets to find out when wiki semantic distance begins to significantly outperform jaccard similarity.
  • 11. Thanks! Team 15 Garima Ahuja Harish Kolli Ashwin Venkatram