SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
Interpretable Topic Modeling Using Near-Identity
Cross-Document Coreference Resolution
Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1
Motivation
• Topic modelling is a widely-known technique for
document representation, summarization, and
classification.
• Approaches such as LDA, NMF, pLSA are state-of-the-
art techniques to reduce text dimensionality and
summarize documents.
• Often, topic modelling approaches produce
statistically valid yet hard to interpret topics [1,2,6,7,8].
• On the contrary, cross-document coreference
resolution (CDCR) aims at identification of clearly
defined events and entities [3, 4].
• SoTA CDCR methods identify narrow events and
entities well but suffer from lack of generalization
and summarization across identified concepts [4].
Contact
Web: https://dke.uni-wuppertal.de/en/
Email: zhukova@uni-wuppertal.de, felix.hamborg@uni.kn
1 University of Wuppertal, 2 University of Konstanz
Research Question
How to extract more interpretable and
representative topics with CDCR-based methods
from text documents?
NI-CDCR Topic Modelling
Target Concept Analysis (TCA) [4] is a CDCR approach that
resolves concept mentions with identity and near-
identity relations and uses concepts as topics.
260
References
[1] Loulwah Alsumait and Daniel Barbar. 2009. Topic Significance Ranking of LDA Generative Models. In Machine
Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia,
September 7-11, 2009, Proceedings, Part I, 67–82. [2] Jonathan Chang et al. 2009. Reading Tea Leaves: How
Humans Interpret Topic Models. Journal of Chemical Information and Modeling 53, (2009), 1689–1699. [3] Felix
Hamborg et al. 2019. Automated Identification of Media Bias by Word Choice and Labeling in News Articles.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL) (2019), 196–205. [4] Felix Hamborg et al.
2019. Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by Word Choice
and Labeling. in Proceedings of the iConference 2019 (2019). [5] Han Kyul Kim et al. 2017. Bag-of-concepts:
Comprehending document representation through clustering words in distributed representation.
Neurocomputing 266, (2017), 336–352. [6] Jey Han Lau et al. 2014. Machine Reading Tea Leaves: Automatically
Evaluating Topic Coherence and Topic Model Quality. Proceedings of the 14th Conference of the European Chapter
of the Association for Computational Linguistics (2014), 530–539. [7] David Newman et al. 2010. Evaluating topic
models for digital libraries. January (2010), 215. [8] Hanna M Wallach and David Mimno. 2009. Evaluation Methods
for Topic Models. In Proceedings of the 26th annual international conference on machine learning, 1105–1112.
Future Work
• Evaluate the proposed approach using likelihood and
perplexity.
• Evaluate using more diverse real-world text datasets
• Improve the quality of CDCR-based topics.
Qualitative Evaluation
• Preliminary evaluation on NewsWCL50 [4]
o a news articles dataset for concept identification with large
lexical diversity.
• TCA extracts topics that distinctively describe
meaningful parts, e.g., entities: actors, actions,
locations.
Method Most representative terms of one topic
LSA Mr. Trump, say, Iran, nuclear, Comey, deal
LDA Comey, Trump, memo, write, president, Thursday
NMF visit, London, Trump, Khan, protest, July, state
BoC FBI, clandestine, Watergate, CIA, Comey, intel
TCA Russia investigation, Mr. Mueller's investigation, an
investigation into Russia's election meddling, a probe
into Russian interference
Quantitative Evaluation
• TCA’s coherence Cv = 0.52 < 0.68 achieved by NMF
• ↓ coherence, but ↑ #topics and ↑ interpretability
Method Unit Cv # topics Topic type
LSA L, B 0.47 2 Abstract
LDA L, B 0.62 10 Summary of events
NMF L, B 0.68 10 Summary of events
BoC L, B 0.52 534 Concepts
TCA L, P 0.52 97 Concepts
DACA recipients
undocumented
immigrants
illegally
brought minors
US military action
military
presence
military
threat
Document representation:
topic
topic’s
words
Three types of properties analysed in merging sieves:
Sieves 1-2 Core meaning of phrases
L – lemmas B – bigrams (freq. > 5) P – phrases (NPs, VPs)
Sieves 3-4
similar properties of two phrases?
→ merge as referring to one concept
TCA’s resolution principle:
Normalized frequencies of topic’s words in each doc
→ weighted distribution of topics
Donald Trump ↔ President
Core meaning modifiers
undocumented + immigrants ↔ DACA + recipients
Sieves 5-6 Frequent word patterns
White House officials ↔ White House representatives

Mais conteúdo relacionado

Semelhante a Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution

Creating metadata for data visualization
Creating metadata for data visualizationCreating metadata for data visualization
Creating metadata for data visualizationgesinaphillips
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsLisa Graves
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with refLola Devung
 
LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docxwrite4
 
Argument Structures Of Political Debates
Argument Structures Of Political DebatesArgument Structures Of Political Debates
Argument Structures Of Political DebatesKate Campbell
 
Introduction To Critical Enquiry Research
Introduction To Critical Enquiry ResearchIntroduction To Critical Enquiry Research
Introduction To Critical Enquiry ResearchTerry Flew
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET Journal
 
Probabilistic Topic Models
Probabilistic Topic ModelsProbabilistic Topic Models
Probabilistic Topic ModelsSteve Follmer
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshopPim Huijnen
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentPamela Rutledge
 
Content analysis
Content analysisContent analysis
Content analysisAtul Thakur
 
Content analysis
Content analysisContent analysis
Content analysisAtul Thakur
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
 
Introduction and Tools of Research
Introduction and Tools of ResearchIntroduction and Tools of Research
Introduction and Tools of ResearchMyke Evans
 
Detecting subtle text manipulations
Detecting subtle text manipulationsDetecting subtle text manipulations
Detecting subtle text manipulationsMartino Mensio
 

Semelhante a Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution (20)

Creating metadata for data visualization
Creating metadata for data visualizationCreating metadata for data visualization
Creating metadata for data visualization
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
Introduction to Nvivo
Introduction to NvivoIntroduction to Nvivo
Introduction to Nvivo
 
Coding.ppt
Coding.pptCoding.ppt
Coding.ppt
 
Overview of the dissertation process. Tools, techniques by phase.
Overview of the dissertation process. Tools, techniques by phase.Overview of the dissertation process. Tools, techniques by phase.
Overview of the dissertation process. Tools, techniques by phase.
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with ref
 
LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docx
 
Guestion paper
Guestion paperGuestion paper
Guestion paper
 
Argument Structures Of Political Debates
Argument Structures Of Political DebatesArgument Structures Of Political Debates
Argument Structures Of Political Debates
 
Introduction To Critical Enquiry Research
Introduction To Critical Enquiry ResearchIntroduction To Critical Enquiry Research
Introduction To Critical Enquiry Research
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
 
Probabilistic Topic Models
Probabilistic Topic ModelsProbabilistic Topic Models
Probabilistic Topic Models
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshop
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
Introduction and Tools of Research
Introduction and Tools of ResearchIntroduction and Tools of Research
Introduction and Tools of Research
 
Detecting subtle text manipulations
Detecting subtle text manipulationsDetecting subtle text manipulations
Detecting subtle text manipulations
 

Mais de Anastasia Zhukova

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingAnastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Anastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
 

Mais de Anastasia Zhukova (10)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Último

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Último (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution

  • 1. Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference Resolution Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1 Motivation • Topic modelling is a widely-known technique for document representation, summarization, and classification. • Approaches such as LDA, NMF, pLSA are state-of-the- art techniques to reduce text dimensionality and summarize documents. • Often, topic modelling approaches produce statistically valid yet hard to interpret topics [1,2,6,7,8]. • On the contrary, cross-document coreference resolution (CDCR) aims at identification of clearly defined events and entities [3, 4]. • SoTA CDCR methods identify narrow events and entities well but suffer from lack of generalization and summarization across identified concepts [4]. Contact Web: https://dke.uni-wuppertal.de/en/ Email: zhukova@uni-wuppertal.de, felix.hamborg@uni.kn 1 University of Wuppertal, 2 University of Konstanz Research Question How to extract more interpretable and representative topics with CDCR-based methods from text documents? NI-CDCR Topic Modelling Target Concept Analysis (TCA) [4] is a CDCR approach that resolves concept mentions with identity and near- identity relations and uses concepts as topics. 260 References [1] Loulwah Alsumait and Daniel Barbar. 2009. Topic Significance Ranking of LDA Generative Models. In Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I, 67–82. [2] Jonathan Chang et al. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. Journal of Chemical Information and Modeling 53, (2009), 1689–1699. [3] Felix Hamborg et al. 2019. Automated Identification of Media Bias by Word Choice and Labeling in News Articles. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL) (2019), 196–205. [4] Felix Hamborg et al. 2019. Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by Word Choice and Labeling. in Proceedings of the iConference 2019 (2019). [5] Han Kyul Kim et al. 2017. Bag-of-concepts: Comprehending document representation through clustering words in distributed representation. Neurocomputing 266, (2017), 336–352. [6] Jey Han Lau et al. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (2014), 530–539. [7] David Newman et al. 2010. Evaluating topic models for digital libraries. January (2010), 215. [8] Hanna M Wallach and David Mimno. 2009. Evaluation Methods for Topic Models. In Proceedings of the 26th annual international conference on machine learning, 1105–1112. Future Work • Evaluate the proposed approach using likelihood and perplexity. • Evaluate using more diverse real-world text datasets • Improve the quality of CDCR-based topics. Qualitative Evaluation • Preliminary evaluation on NewsWCL50 [4] o a news articles dataset for concept identification with large lexical diversity. • TCA extracts topics that distinctively describe meaningful parts, e.g., entities: actors, actions, locations. Method Most representative terms of one topic LSA Mr. Trump, say, Iran, nuclear, Comey, deal LDA Comey, Trump, memo, write, president, Thursday NMF visit, London, Trump, Khan, protest, July, state BoC FBI, clandestine, Watergate, CIA, Comey, intel TCA Russia investigation, Mr. Mueller's investigation, an investigation into Russia's election meddling, a probe into Russian interference Quantitative Evaluation • TCA’s coherence Cv = 0.52 < 0.68 achieved by NMF • ↓ coherence, but ↑ #topics and ↑ interpretability Method Unit Cv # topics Topic type LSA L, B 0.47 2 Abstract LDA L, B 0.62 10 Summary of events NMF L, B 0.68 10 Summary of events BoC L, B 0.52 534 Concepts TCA L, P 0.52 97 Concepts DACA recipients undocumented immigrants illegally brought minors US military action military presence military threat Document representation: topic topic’s words Three types of properties analysed in merging sieves: Sieves 1-2 Core meaning of phrases L – lemmas B – bigrams (freq. > 5) P – phrases (NPs, VPs) Sieves 3-4 similar properties of two phrases? → merge as referring to one concept TCA’s resolution principle: Normalized frequencies of topic’s words in each doc → weighted distribution of topics Donald Trump ↔ President Core meaning modifiers undocumented + immigrants ↔ DACA + recipients Sieves 5-6 Frequent word patterns White House officials ↔ White House representatives