SlideShare uma empresa Scribd logo
1 de 13
Topic Modeling in Social
Media
Faculty Mentor: Dr. Vasudeva Varma
Student Mentor: Sandeep Panem
Kashyap Murthy Krishnakant Vishwakarma Sajal Sharma
Vinil Narang
Topic Modeling
A Topic Model is a statistical model for
discovering the abstract "topics" that
occur in a collection of documents.
LDA(Latent Dirichlet Allocation)
Topic Modeling represents documents as mixtures of topics that spit out words with certain
probabilities.
LDA is the most common topic model currently in use.
LDA is used in social platforms to come up with a model which can predict on which
topics related document a user prefers to read and write.
LDA …
LDA is a method which considers
documents beyond mere just bag of words.
It describes a generative process whereby,
given a Dirichlet conditioned bag filled with
topic-distributions for each document, we
draw a topic mixture from this bag. Then,
we repeatedly draw both a topic and then
a word from that topic to generate the
words in that document.
We have a generative model that
represents the process by which a corpora
of documents were created in terms of
Topics.
Approaches for topic modeling
Approach1: Finding LDA model for each user in the network.
Approach2: Finding top K influential users and applying the LDA model on these users only.
Approach3: Finding communities present in the network and approximating users topics by
applying LDA model of its corresponding community.
Comparing different approaches.
Approach1:
 Calculating LDA for each user is very costly since the data set is very large.
 So this approach is not feasible for practical purposes.
Comparing different approaches.
Approach 2:
 Since LDA can’t be applied to each user, therefore we have selected top k users on which we have
applied LDA.
 The top k users were found out using Page Rank algorithm.
 Then we extracted the tweets for these k users and applied LDA on them.
 For this project we followed approach 2.
Approach 2…
1. Step 1:
 Input : A file containing the follower-
followee relationship in the form of edge list.
It contains ~6.25 crore nodes(users) and
~146 crore edges(follower-followee
relationship)
 Output : A file containing the userid and
pageranks of all the nodes(users) in the
graph.
 Approach : Used GraphChi’s pagerank
algorithm to perform the above mentioned
task.
 
 2. Step 2:
 Input : A file containing the pageranks and
userids of all the nodes(users) in the graph.
 Output : A file containing the pageranks
and userids of all the users sorted on the
basis of pagerank.
 Approach : External merge sort is used to
perform this kind of sorting because the size
Approach 2…
3. Step 3:
 Input : A file containing the pageranks and userids of all the nodes sorted by pagerank.
 Output : A file containing the top 50 users(userids and pagerank)
 4. Step 4:
 Input : A file containing the top 50 users.
 Output : The tweets of these top 50 users.
 Approach : We used the tweet crawler to crawl the tweets of these users.
 5. Step 5:
 Input : 50 files containing tweets of top 50 users.
 Output : LDA models of these 50 users.
 Approach : Used Mallet to perform this task.
Approach 2…
Problems:
We encountered a case where all the influential elements could belong to a single
community.
Then LDA on the top k influential elements would not result in a generic model, hence this
technique is not robust and effective for this scenarios.
Hence we had to move on to the next approach of community detection before applying
the LDA.
Approach 3… and future works :
Since Approach 2 was not giving expected results, so we tried Community Detection
approach.
The GraphChi Community Detection code is still in early phase of development and is not
giving expected results.
There is another API named Infomap which tends to work well for small graphs but the code
burst for large graphs.
We are still working on approach 3 and aiming for better accuracy and results.
Thank you 

Mais conteúdo relacionado

Destaque

A hybrid approach for anaphora resolution in hindi
A hybrid approach for anaphora resolution in hindiA hybrid approach for anaphora resolution in hindi
A hybrid approach for anaphora resolution in hindikrishnakant vishwakarma
 
Intelligent Test
Intelligent TestIntelligent Test
Intelligent Testugo kelechi
 
Program pelatihan teknologi informasi untuk aparatur kelurahan
Program pelatihan teknologi informasi untuk aparatur kelurahanProgram pelatihan teknologi informasi untuk aparatur kelurahan
Program pelatihan teknologi informasi untuk aparatur kelurahanisak74
 
Pelatihan teknologi informasi
Pelatihan teknologi informasiPelatihan teknologi informasi
Pelatihan teknologi informasicucunursyamsu76
 
Museum transportasi - TMII
Museum transportasi - TMIIMuseum transportasi - TMII
Museum transportasi - TMIIsyalindrihs
 
Multimoda & inter. freight forwarding presentation
Multimoda & inter. freight forwarding presentationMultimoda & inter. freight forwarding presentation
Multimoda & inter. freight forwarding presentationsyalindrihs
 

Destaque (15)

Microsoft Cloud
Microsoft CloudMicrosoft Cloud
Microsoft Cloud
 
Ecosystems
EcosystemsEcosystems
Ecosystems
 
A hybrid approach for anaphora resolution in hindi
A hybrid approach for anaphora resolution in hindiA hybrid approach for anaphora resolution in hindi
A hybrid approach for anaphora resolution in hindi
 
Intelligent Test
Intelligent TestIntelligent Test
Intelligent Test
 
Ali Acar
Ali AcarAli Acar
Ali Acar
 
Ali Rıza Çalkan
Ali Rıza ÇalkanAli Rıza Çalkan
Ali Rıza Çalkan
 
Program pelatihan teknologi informasi untuk aparatur kelurahan
Program pelatihan teknologi informasi untuk aparatur kelurahanProgram pelatihan teknologi informasi untuk aparatur kelurahan
Program pelatihan teknologi informasi untuk aparatur kelurahan
 
Pelatihan teknologi informasi
Pelatihan teknologi informasiPelatihan teknologi informasi
Pelatihan teknologi informasi
 
Edibe Sözen
Edibe SözenEdibe Sözen
Edibe Sözen
 
Osman Develioğlu
Osman DevelioğluOsman Develioğlu
Osman Develioğlu
 
TAX
TAXTAX
TAX
 
Museum transportasi - TMII
Museum transportasi - TMIIMuseum transportasi - TMII
Museum transportasi - TMII
 
Multimoda & inter. freight forwarding presentation
Multimoda & inter. freight forwarding presentationMultimoda & inter. freight forwarding presentation
Multimoda & inter. freight forwarding presentation
 
ismail tunca
ismail tuncaismail tunca
ismail tunca
 
Tercera clase
Tercera claseTercera clase
Tercera clase
 

Semelhante a Topic Modeling in Social Media LDA

IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
onlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfonlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfrohanthombre2
 
Online library management system
Online library management systemOnline library management system
Online library management systemBharat Kunwar
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Pieter Heyvaert
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInKun Le
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)Zina Petrushyna
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18Karthikeyan Rajasekharan
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)Mark Underwood
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEEMEMTECHSTUDENTPROJECTS
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)ShankarPrasaadRajama
 
OOAD unit1 introduction to object orientation
 OOAD unit1 introduction to object orientation OOAD unit1 introduction to object orientation
OOAD unit1 introduction to object orientationDr Chetan Shelke
 
online news portal system
online news portal systemonline news portal system
online news portal systemArman Ahmed
 

Semelhante a Topic Modeling in Social Media LDA (20)

SocialLda
SocialLda SocialLda
SocialLda
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
onlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfonlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdf
 
Online library management system
Online library management systemOnline library management system
Online library management system
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18
 
KDD Cup Research Paper
KDD Cup Research PaperKDD Cup Research Paper
KDD Cup Research Paper
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 
50120140505004
5012014050500450120140505004
50120140505004
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Computers
ComputersComputers
Computers
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)
 
OOAD unit1 introduction to object orientation
 OOAD unit1 introduction to object orientation OOAD unit1 introduction to object orientation
OOAD unit1 introduction to object orientation
 
online news portal system
online news portal systemonline news portal system
online news portal system
 

Último

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 

Último (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 

Topic Modeling in Social Media LDA

  • 1. Topic Modeling in Social Media Faculty Mentor: Dr. Vasudeva Varma Student Mentor: Sandeep Panem Kashyap Murthy Krishnakant Vishwakarma Sajal Sharma Vinil Narang
  • 2. Topic Modeling A Topic Model is a statistical model for discovering the abstract "topics" that occur in a collection of documents.
  • 3. LDA(Latent Dirichlet Allocation) Topic Modeling represents documents as mixtures of topics that spit out words with certain probabilities. LDA is the most common topic model currently in use. LDA is used in social platforms to come up with a model which can predict on which topics related document a user prefers to read and write.
  • 4. LDA … LDA is a method which considers documents beyond mere just bag of words. It describes a generative process whereby, given a Dirichlet conditioned bag filled with topic-distributions for each document, we draw a topic mixture from this bag. Then, we repeatedly draw both a topic and then a word from that topic to generate the words in that document. We have a generative model that represents the process by which a corpora of documents were created in terms of Topics.
  • 5. Approaches for topic modeling Approach1: Finding LDA model for each user in the network. Approach2: Finding top K influential users and applying the LDA model on these users only. Approach3: Finding communities present in the network and approximating users topics by applying LDA model of its corresponding community.
  • 6. Comparing different approaches. Approach1:  Calculating LDA for each user is very costly since the data set is very large.  So this approach is not feasible for practical purposes.
  • 7. Comparing different approaches. Approach 2:  Since LDA can’t be applied to each user, therefore we have selected top k users on which we have applied LDA.  The top k users were found out using Page Rank algorithm.  Then we extracted the tweets for these k users and applied LDA on them.  For this project we followed approach 2.
  • 8. Approach 2… 1. Step 1:  Input : A file containing the follower- followee relationship in the form of edge list. It contains ~6.25 crore nodes(users) and ~146 crore edges(follower-followee relationship)  Output : A file containing the userid and pageranks of all the nodes(users) in the graph.  Approach : Used GraphChi’s pagerank algorithm to perform the above mentioned task.    2. Step 2:  Input : A file containing the pageranks and userids of all the nodes(users) in the graph.  Output : A file containing the pageranks and userids of all the users sorted on the basis of pagerank.  Approach : External merge sort is used to perform this kind of sorting because the size
  • 9. Approach 2… 3. Step 3:  Input : A file containing the pageranks and userids of all the nodes sorted by pagerank.  Output : A file containing the top 50 users(userids and pagerank)  4. Step 4:  Input : A file containing the top 50 users.  Output : The tweets of these top 50 users.  Approach : We used the tweet crawler to crawl the tweets of these users.  5. Step 5:  Input : 50 files containing tweets of top 50 users.  Output : LDA models of these 50 users.  Approach : Used Mallet to perform this task.
  • 11. Problems: We encountered a case where all the influential elements could belong to a single community. Then LDA on the top k influential elements would not result in a generic model, hence this technique is not robust and effective for this scenarios. Hence we had to move on to the next approach of community detection before applying the LDA.
  • 12. Approach 3… and future works : Since Approach 2 was not giving expected results, so we tried Community Detection approach. The GraphChi Community Detection code is still in early phase of development and is not giving expected results. There is another API named Infomap which tends to work well for small graphs but the code burst for large graphs. We are still working on approach 3 and aiming for better accuracy and results.