O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
Carregando em…3
×

Confira estes a seguir

1 de 40 Anúncio

Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles

Baixar para ler offline

The term media bias denotes the differences of the news coverage about the same event. Slanted news coverage occurs when journalists frame the information favorably, i.e., they report with different word choice about the same concept, thus leading to the readers’ distorted information perception. A word choice and labeling (WCL) analysis system was implemented to reveal biased language in news articles. In the area of Artificial Intelligence (AI), the WCL analysis system imitates well-established methodologies of content and framing analyses employed by the social sciences. The central thesis contribution is a development and implementation of the multistep merging approach (MSMA) that unlike state-of-the-art natural language preprocessing (NLP) techniques, e.g., coreference resolution, identifies coreferential phrases of a broader sense, e.g., “undocumented immigrants” and “illegal aliens.” An evaluation of the approach on the extended
NewsWCL50 dataset was made achieving the performance of 𝐹1 = 0.84, which is twice higher than a best performing baseline. Finally, to enable visual exploration of the identified entities, a four-visualization usability prototype was proposed and implemented, which enables exploring entity composition of the analyzed news articles and phrasing diversity of the identified entities.

The term media bias denotes the differences of the news coverage about the same event. Slanted news coverage occurs when journalists frame the information favorably, i.e., they report with different word choice about the same concept, thus leading to the readers’ distorted information perception. A word choice and labeling (WCL) analysis system was implemented to reveal biased language in news articles. In the area of Artificial Intelligence (AI), the WCL analysis system imitates well-established methodologies of content and framing analyses employed by the social sciences. The central thesis contribution is a development and implementation of the multistep merging approach (MSMA) that unlike state-of-the-art natural language preprocessing (NLP) techniques, e.g., coreference resolution, identifies coreferential phrases of a broader sense, e.g., “undocumented immigrants” and “illegal aliens.” An evaluation of the approach on the extended
NewsWCL50 dataset was made achieving the performance of 𝐹1 = 0.84, which is twice higher than a best performing baseline. Finally, to enable visual exploration of the identified entities, a four-visualization usability prototype was proposed and implemented, which enables exploring entity composition of the analyzed news articles and phrasing diversity of the identified entities.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles (20)

Mais de Anastasia Zhukova (9)

Anúncio

Mais recentes (20)

Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles

  1. 1. Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles Anastasia Zhukova Doctoral supervisor: Felix Hamborg 1st examiner: Prof. Dr. Bela Gipp 2nd examiner: Prof. Dr. Karsten Donnay Date: 2019-03-07
  2. 2. Agenda 1. Introduction 2. Project motivation and research objectives 3. Related work and research gap 4. Word choice and labeling (WCL) analysis system 5. Usability prototype 6. Multi-step merging approach 7. Evaluation results 8. Future work 9. Conclusion 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 2
  3. 3. 07-Feb-23 3 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Introduction https://tgram.ru/channels/otsuka_bld • Biased perception of the Russian president depends on how he was framed
  4. 4. 07-Feb-23 4 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Introduction invasion forces vs. coalition forces heart-wrenching tales of hardship vs. information on the lifestyles http://umich.edu/~newsbias/wordchoice.html Word Choice (WC) Labeling (L)
  5. 5. 5 WCL depends on… [1-5] • actor or perspective selection • author position • goal of the message 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling http-//www.anmbadiary.com/2015/04/framing-effect-and-marketing.html Project motivation *equal with some degree of approximation When not identified WCL influences on… [2, 4-6] • emotion evaluation • decision making process • false information propagation Existing solutions… (cf.[15-17]) • involve manual annotation by social scientists • automated approaches yield simplistic results • results are not scalable and not interactive
  6. 6. Project research objectives RQ: How can we automatically identify instances of bias by WCL referring to the semantic concepts in a set of English news articles reporting on the same event by using natural language processing (NLP)? Research tasks: 1. Design and develop a modular WCL analysis system; 2. Develop a first usability prototype with interactive visualization to explore the results of WCL analysis; 3. Research, propose, and implement an approach based on the NLP methods to identify semantic concepts that can be a target of bias by WCL; 4. Conduct an evaluation of the proposed semantic concept identification approach. 6 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
  7. 7. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 7 Related work and research gap 1. Social science methodology a. Content analysis [2, 7, 9] b. Framing analysis [1, 4, 6, 10, 11] → effective but manual and time-consuming 2. Automated WCL identification a. from topic perspective [12, 14-17] b. from actor perspective [13, 18] → require interpretation of the word choice difference → no concept-to-concept automatic comparison 3. Natural language processing a. Named Entity Recognition (NER) (cf. [21]) b. Coreference resolution (cf.[12,20,24]) c. Cross-document coreference resolution (cf. [22, 23]) → do not resolve broad sense anaphora → do not analyze difference of word choice
  8. 8. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 8 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  9. 9. 9 WCL analysis pipeline methodology 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Putin president savior tyrant humble man thief president savior Putin tyrant humble man thief https://tgram.ru/channels/otsuka_bld Data preprocessing Semantic concept identification Framing analysis of semantic concepts Framing similarity across news articles Semantic concept identification
  10. 10. 10 WCL analysis system 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Preprocessing Coreference resolution Tokenization POS tagging Dependency parsing NE Recognition Related articles Sentence splitting Parsing Concept identification Candidate extraction Corefs NPs Candidate alignment Multi-step merging Core meaning Core meaning modifiers Frequent word patterns Usability prototype Emotion frames LIWC emotion dimensions Emotion clustering Visualization Matrix view Bar chart view Article view • Inductive analysis, i.e., no prior knowledge given • The implementation is focused on the candidate alignment task
  11. 11. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 11 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  12. 12. 12 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Usability prototype Matrix view Bar chart view Article view WCL diversity
  13. 13. 13 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Usability prototype Selection mode of the Matrix view Candidate view Selection mode of the Article view
  14. 14. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 14 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  15. 15. Candidate alignment task 15 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Task NER Coref. resolution Cand. alignment Categorization/grouping Cross-document coreferences Linking of mentions a. Common knowledge anaphora b. Broad sense anaphora • Candidate alignment task aims at resolving anaphora both of common knowledge and broad sense.
  16. 16. Multi-step merging approach (MSMA): overview • Initial entities: coreferences and NPs • Extract entity attributes to highlight certain properties • Specify entity comparability to other entities • Iterate multiple times over all entities →merge entities based on similarities attributes • Merging step = level in a hierarchy 16 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling all entities sorted by their size similar color = similarity in one criterion compare the first entity to the other entities the considered entity merges similar entities place the updated entity to the end and continue the considered entity merges similar entities place the updated entity to the end sort entities by their size Step 1 Step N … Init.
  17. 17. Multi-step merging approach: steps 17 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step1: Representative phrases’ heads Matching phrases’ heads, e.g. “President Trump” and “Donald Trump” Step2: Head sets Semantically similar head sets, e.g., {“Trump”, “president”} and {“billionaire”} Step3: Representative labeling phrases Semantically similar labeling phrases, e.g., “undocumented immigrants” and “illegal aliens” Step4: Compounds Semantically similar compounds, e.g., “DACA illegals” and “DACA recipients” Step 5: Representative frequent wordsets Semantically similar frequent wordsets, e.g., “United States” and “U.S.” Step 6: Representative frequent phrases String-similar frequent phrases, e.g., “Deferred Action of Childhood Arrivals” and “Childhood Arrivals”
  18. 18. Multi-step merging approach: summary 18 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Type Step Goal Problems Core meaning Representative phrases’ heads Compare on the output of coreference resolution Applicable only for named entity (NE) entity types Head sets Find synonyms of head words among entities Word collocations contain more meaning than head words Core meaning modifiers Representative labeling phrases Identify most prominent adjective + noun patterns Adjective is not the only core meaning modifier Compounds Compare noun-to-noun similar compounds More than two-word phrases are required to represent entities Frequent word patterns Representative frequent wordsets Identify frequently repeated wording Wordsets disregard word order important for pattern identification Representative frequent phrases Identify frequently repeated phrases Requires extensive repetitive wording
  19. 19. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 19 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  20. 20. • Dataset: extended NewsWCL50 corpus • Ten topics of 5 articles each: NewsWCL50 [25] • One topic of 25 articles collected according to the NewsWCL50 methodology • Simplified content analysis (CA) annotation • Used annotation codes referring to the entities • Avoided complex semantic concepts, e.g., a reaction on something • Annotated extracted NPs and coreferential chains • Metrics • Weighted precision, recall, F1-score (evaluation of the best matching entities (BMEs) [27] • Homogeneity, completeness, V-measure (general clustering evaluation) [26] • WCL complexity metric (phasing diversity) • Baselines • Random baseline (B1) • CoreNLP coreference resolution: employ only coreferential chains (B2) [24] • Candidate clustering in the word vector space (B3) • Concept type categorization • Actor, e.g., Donald Trump • Group, i.e., group of people acting as one entity • Country, i.e., country names, anaphora, related to it organizations • Misc, i.e., events, objects, abstract entities 20 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Experiment setup
  21. 21. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 21 Evaluation results 0.12 0.97 0.87 0.91 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Precision B1_P B2_P B3_P M_P 0.15 0.17 0.32 0.82 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Recall B1_R B2_R B3_R M_R 0.12 0.27 0.42 0.84 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 F1-score B1_F1 B2_F1 B3_F1 M_F1 B1: Random guessing B2: CoreNLP coreference resolution B3: Candidate clustering M: Multi-step merging approach
  22. 22. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 22 WCL complexity evaluation Concept Type WCL F1 Actor 2.10 0.97 Country 4.49 0.74 Misc 5.67 0.82 Group 9.20 0.78 0.97 0.74 0.82 0.78 0.00 0.20 0.40 0.60 0.80 1.00 1.00 3.00 5.00 7.00 9.00 11.00 F1 WCL metric 0.91 0.81 0.88 0.78 0.82 0.81 0.00 0.20 0.40 0.60 0.80 1.00 2.00 5.00 8.00 11.00 14.00 F1 WCL metric Topic WCL F1 8 2.84 0.91 7 2.89 0.89 5 3.31 0.83 4 3.54 0.87 1 3.63 0.85 3 3.95 0.87 0 3.99 0.81 9 4.63 0.88 2 5.44 0.78 6 8.37 0.82 10 12.71 0.81 • Concept type split • Topic split • Logarithmic trend: Concepts with high WCL diversity are harder to identify. • The most phrase-diverse topics 6 and 10 perform comparably to the average performance (F1 = 0.84) ➢ WCL complexity is a metric representing anaphora phrasing diversity that refer to a concept. High complexity = high phrasing variation
  23. 23. 23 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Merging steps evaluation: concept types Steps Actor Country Misc Group B1 0.123 0.124 0.107 0.112 B2 0.407 0.297 0.198 0.137 B3 0.450 0.428 0.468 0.289 Init. 0.408 0.298 0.204 0.140 Step 1 0.872 0.634 0.298 0.222 Step 2 0.927 0.685 0.779 0.502 Step 3 0.927 0.685 0.803 0.744 Step 4 0.970 0.700 0.803 0.744 Step 5 0.970 0.736 0.808 0.783 Step 6 0.970 0.736 0.817 0.783 Merging step types Actor Country Misc Group Core meaning (Steps 1 & 2) 0.519 0.388 0.575 0.362 Core modifiers (Steps 3 & 4) 0.043 0.014 0.024 0.242 Word patterns (Steps 5 & 6) 0.000 0.037 0.014 0.039 Overall 0.562 0.439 0.613 0.643 • Development of F1-score at each step • Difference of F1-score o Gradual increase at all merging steps o Init.step: extracted from CoreNLP coreferential chains and NPs o Step 1 outperforms B3 on NE-based types o Step 2 outperforms B3 on non-NE-based types o Highest F1: 𝐹1𝐴𝑐𝑡𝑜𝑟 = 0.97 o Lowest F1: “Country” and “Group” types o Lowest F1 boost: “Country” type → lack of semantic similarity o Highest F1 boost: “Group” type → many semantic patterns captured
  24. 24. ➢ Better approach performance: on small or big topics? • Big topic: 25 articles per topic • Small topic: three subsets of topics of 5 articles each • We report average performance • big: F1 = 0.81 small: F1 = 0.72 • Big topic outperforms on “Misc” and “Group” types • Reasons: semantically similar repetitive word choice occurs often enough in a big topic 24 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Big vs. small topic comparison 0.96 0.67 0.63 0.66 0.96 0.88 0.75 0.59 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Actor Misc Group Country DACA: F1 R5_avg_F1 All25_F1 1.59 6.68 9.37 12.65 1.79 11.47 19.34 23.89 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Actor Misc Group Country DACA: WCL metric R5_avg_WCL All25_WCL
  25. 25. • MSMA: F1 = 0.84 baseline B3: F1 = 0.42 • Best performance on “Actor” type: F1 = 0.97 • Largest phrasing diversity: “Group” type • Largest performance boost on “Group” type ∆ = 0.643 • Better performance on the larger topics: big: F1 = 0.81 small: F1 = 0.72 • Worst performance on “Group” and “Country” types: “Group” type: o Requires additional merging step(s) o Concept sense disambiguation “Country” type: o Low word semantic representation by the chosen word vector model o Broadly defined CA concepts: mix of country names and organizations 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 25 Discussion summary 0.12 0.27 0.42 0.84 0.00 0.20 0.40 0.60 0.80 1.00 F1-score B1_F1 B2_F1 B3_F1 M_F1 0.41 0.2 0.1 0.3 0.97 0.82 0.78 0.74 0 0.2 0.4 0.6 0.8 1 Actor Misc Group Country F1-score: Concept types Init step All six steps
  26. 26. • Additional merging step using local context • e.g., “Kim Jong Um” = “Little Rocket Man” • Concept sense disambiguation • e.g., “American people”≠ “foreign people” • Different word vector models • find better semantic representation of phrases • More complex concepts • Identify concepts such as action or reaction on something • Next step: Deductive analysis • collect large corpus of “silver”-quality annotated topics • train a sequential neural network (SNN) model • identify framing by WCL in any news topic 26 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Future work
  27. 27. Contributions: 1. Proposed methodology of WCL analysis pipeline 2. Implemented WCL analysis system 3. Proposed, implemented and evaluated multi-step merging approach MSMA: F1 = 0.84 baseline B3: F1 = 0.42 Approach benefits: • resolves anaphora of broad sense • uses only candidate phrases without their context • no additional long model training required • tested on a specific dataset for WCL analysis 4. Implemented the first usability prototype Future work: • Concept sense disambiguation • SNN model for WCL deductive analysis 27 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Conclusion
  28. 28. 1. Kahneman, D., Tversky, A., 1984. Choices, values, and frames. Am. Psychol. 39, 341–350. 2. F. Hamborg, K. Donnay, and B. Gipp, “Automated identification of media bias in news articles : an interdisciplinary literature review,” International Journal on Digital Libraries, 2018. 3. W. Linstrõm, M., & Marais, M. Linstrom, and W. Marais, “Qualitative News Frame Analysis: A Methodology,” Communitas, vol. 17, no. 17, pp. 21–38, 2012. 4. D. Chong and J. N. Druckman, “Framing Theory,” Annual Review of Political Science, vol. 10, no. 1, pp. 103–126, 2007. 5. A. Duzett, “Media Bias in Strategic Word Choice,” http://www.aim.org/on-targetblog/media-bias-in-strategic-word-choice/, 2011. 6. J. N. Druckman, “Political Preference Formation : Competition and the ( Ir ) relevance of Framing Effects,” The American Political Science Review, vol. 98, no. 4, pp. 671–686, 2004. 7. M. Linstrom and W. Marais, “Qualitative News Frame Analysis: A Methodology,”Communitas, vol. 17, pp. 21–38, 2012. 8. F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019. 9. M. Schreier, Qualitative content analysis in practice. Sage publications, 2012. 10. R. M. Entman, “Framing: Toward Clarification of a Fractured Paradigm,” Journal of Communication, vol. 43, no. 4, pp. 51–58, 1993. 11. R. M. Entman, “Framing bias: Media in the distribution of power,” Journal of Communication, vol. 57, no. 1, pp. 163–173, 2007. 12. Tian, Yan, and Concetta M. Stewart. "Framing the SARS crisis: A computer-assisted text analysis of CNN and BBC online news reports of SARS." Asian Journal of Communication 15.3 (2005): 289-301. 13. Sendén, Marie Gustafsson, Sverker Sikström, and Torun Lindholm. "“She” and “He” in news media messages: pronoun use reflects gender biases in semantic contexts." Sex Roles 72.1-2 (2015): 40-49. 14. Fortuna, Blaz, Carolina Galleguillos, and Nello Cristianini. "Detection of bias in media outlets with statistical learning methods." Text Mining. Chapman and Hall/CRC, 2009. 57-80. 15. Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. "Linguistic models for analyzing and detecting biased language." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013. 28 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling References
  29. 29. 16. Z. Papacharissi and M. de Fatima Oliveira, “News frames terrorism: A comparative analysis of frames employed in terrorism coverage in U.S. and U.K. newspapers,” International Journal of Press/Politics, vol. 13, no. 1, pp. 52–74, 2008. 17. D. M. Garyantes and P. J. Murphy, “Success or chaos?: Framing and ideology in news coverage of the Iraqi national elections,” International Communication Gazette, vol. 72, no. 2, pp. 151–170, 2010. 18. D. Card, J. H. Gross, A. E. Boydstun, and N. A. Smith, “Analyzing Framing through the Casts of Characters in the News,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), pp. 1410–1420, 2016. 19. K. Clark and C. D. Manning, “Deep Reinforcement Learning for Mention-Ranking Coreference Models,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2256–2262, 2016. 20. H. Lee, “A Scaffolding Approach to Coreference Resolution Integrating Statistical and Rule-based Models,” Natural Language Engineering, vol. 23, no. 5, pp. 733–762, 2017 21. J. R. Finkel, T. Grenager, and C. Manning, “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling,” Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370, 2005. 22. S. Dutta and G. Weikum, “Cross-Document Co-Reference Resolution using SampleBased Clustering with Knowledge Enrichment,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 15–28, 2015 23. S. Singh, A. Subramanya, F. Pereira, and A. Mccallum, “Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models,” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 793–803, 2011. 24. K. Clark and C. D. Manning, “Improving Coreference Resolution by Learning EntityLevel Distributed Representations,” In Proceedings of the 54th Annual Meeting of the79 Association for Computational Linguistics, pp. 643–653, 2016 25. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” Manuscript submitted for publication, pp. 1–10, 26. Rosenberg, Andrew, and Julia Hirschberg. "V-measure: A conditional entropy-based external cluster evaluation measure." Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). 2007. 27. N. Chinchor and P. D, “MUC-5 EVALUATION METRIC S Science Applications International Corporatio n 10260 Campus Point Drive , MIS A2-F San Diego , CA 9212 1 Naval Command , Control , and Ocean Surveillance Cente r RDT & E Division ( NRaD ) Information Access Technology Project Te,” System, pp. 69–78, 1992 29 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling References
  30. 30. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 30 Thank you for your attention! Questions?
  31. 31. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 31 Back-up slides
  32. 32. 34 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Entity Type Entity Subtype Source Example CA Concept Type person nn (noun single) WordNet + POS immigrant Actor nns (noun plural) WordNet + POS politicians Group ne (named entity) NER Trump Actor nes (named entity plural) NER + POS Democrats Group group -- WordNet university Group ne NER Congress Country/Group country -- WordNet Homeland Country ne NER Germany Country other -- -- vote Misc Idea: • Words can be similar in the vector space but the results will be irrelevant to CA concepts • Identify entity types for the effective results • Entity types resemble concept type from manual CA Entity types
  33. 33. 35 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 1: Representative phrases’ heads Donald Trump Trump Mr. Trump forceful Mr. Trump President Trump Donald Trump the president The president of the US identical by string comparison Entity 1 Entity 2 Merged entities Heads of phrases Representative phrases Trump Donald Trump Trump Trump Donald Trump Trump Mr. Trump forceful Mr. Trump President Trump Donald Trump the president The president of the US Heads of phrases Representative phrases
  34. 34. 36 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 2: Headsets young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants illegal aliens who were brought as children nearly 800,000 illegal aliens illegal aliens young illegal aliens headsets {illegals} {immigrants} {aliens} similar in the vector space Entity 1 Entity 2 Entity 3 the word alone is related to the UFO; it will be merged later as “illegal alien” at the third step Merge entities young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants headsets
  35. 35. 37 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 3: Representative labeling phrases young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases young immigrants, undocumented immigrants, illegal immigrants, young illegals, endangered immigrants, additional illegals Entity 1 Entity 2 Merged entities Representative labeling phrases A1: young immigrants, A2: illegal immigrants, A3: young illegals B1: young people, B2: foreign people young people, foreign people, bad people, estimated people Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 3 2×3 ≥ 0.3 → similar in the vector space young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Representative labeling phrases
  36. 36. 38 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 4a: Headword-compound match PM Theresa May Mrs. May UK Prime Minister Theresa May Prime Minister The British prime minister identical by string comparison Entity 1 Entity 2 Merged entities Heads of phrases Compounds {Minister May, PM May, Mrs. May, Theresa May} Minister Minister {minister, Minister} PM Theresa May Mrs. May UK Prime Minister Theresa May Prime Minister The British prime minister dependent governor
  37. 37. 39 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 4b: Common compounds DACA recipients the program’s beneficiaries DACA beneficiaries 800,000 recipients DACA participants 800,000 participants more than a quarter of DACA registrants program participants Entity 1 Entity 2 Compounds with overlapping words Compounds DACA recipients, program’s beneficiaries DACA beneficiaries DACA participants, DACA registrants, program participants A1: DACA recipients, A2: DACA beneficiaries B1: DACA participants, B2: DACA registrants {DACA} Overlapping NE compounds Sim.matrix A1 A2 B1 B2 1 1 0 0 Compounds Merged entities 2 2×2 ≥ 0.3 → similar in the vector space DACA recipients the program’s beneficiaries DACA beneficiaries 800,000 recipients DACA participants 800,000 participants more than a quarter of DACA registrants program participants Compounds with overlapping words
  38. 38. 40 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 5: Representative frequent wordsets illegals whose DACA protection is pending DACA illegals young illegals illegal alien applicants DACA applicants more than 2,000 DACA recipients DACA beneficiaries DACA recipients whose status expires on March 5 former DACA participants the participants Entity 1 Entity 2 Frequent wordsets A1: {DACA, illegals}, A2: {illegals}, A3: {applicants}, A4: {DACA} B1: {DACA, recipients}, B2: {DACA}, B3: {participants} Frequent wordsets Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 5 4×3 ≥ 0.3 → similar in the vector space A4 B3 1 1 0 0 0 0 Merged entities illegals whose DACA protection is pending DACA illegals young illegals illegal alien applicants DACA applicants more than 2,000 DACA recipients DACA beneficiaries DACA recipients whose status expires on March 5 former DACA participants the participants
  39. 39. 41 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 6: Representative frequent phrases DACA program (x10) DACA (x10) Deferred Action Childhood Arrivals program (x5) Obama-era program (x5) Childhood Arrivals DACA (x4) Deferred Action for Childhood Arrivals program (x 3) Deferred Action (x5) Deferred Action for Childhood Arrivals (x2) Entity 1 Entity 2 Frequent phrases B1: Deferred Action Childhood Arrivals, B2: Deferred Action, B3: Childhood Arrivals, B4: Deferred Action Childhood Arrivals program, B5: Childhood Arrivals DACA A1: DACA, A2: program, A3: DACA program, A4: Childhood Arrivals program, A5: Obama-era program Frequent phrases Sim.matrix A1 A2 A3 B1 B2 1 1 0 0 0 A4 B3 1 1 0 0 0 0 Merged entities A5 B4 B5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 𝑠𝑖𝑚𝑣𝑎𝑙 = 𝑠𝑖𝑚ℎ𝑜𝑟 = 4 5 ≥ 0.5 → similar in the vector space DACA program (x10) DACA (x10) Deferred Action Childhood Arrivals program (x5) Obama-era program (x5) Childhood Arrivals DACA (x4) Deferred Action for Childhood Arrivals program (x 3) Deferred Action (x5) Deferred Action for Childhood Arrivals (x2)
  40. 40. 42 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling WCL complexity metric 𝑊𝐶𝐿 = ෍ ℎ∈𝐻 𝑆ℎ 𝐿ℎ • 𝐻 is a set of phrases’ heads in a code, • 𝑆ℎ is a set of unique phrases with a phrase’s head ℎ , • 𝐿ℎ is a list of non-unique phrases with a phrase’s head ℎ.

×