O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Open Data and Data Journalism
Open Data and Data Journalism
Carregando em…3
×

Confira estes a seguir

1 de 18 Anúncio

Mais Conteúdo rRelacionado

Semelhante a Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles (20)

Mais de Anastasia Zhukova (10)

Anúncio

Mais recentes (20)

Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles

  1. 1. Automated Identification of Media Bias by Word Choice and Labeling in News Articles Anastasia Zhukova Data & Knowledge Engineering Group 29.09.2020
  2. 2. Short CV 2 2008 – 2014 Information technology, M.Eng. Moscow Aviation Institute 2015 – 2019 Computer and Information Science, M. Sc. University of Konstanz 2018 Graduate Student Researcher Natural Language Processing group National Institute of Informatics 2019 – present Doctoral Researcher, Ph.D. Candidate Data & Knowledge Engineering group University of Wuppertal
  3. 3. Motivation 3 https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288 https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus
  4. 4. Agenda 4 • Background • Methodology • Results • Conclusion https://tgram.ru/channels/otsuka_bld
  5. 5. Media Bias Model 5 Ideological View Target Audience Owners Advertisers Business Interest Funding ... Political Interest Reputation ... Gathering Writing Editing News Reality News Event Perception Consumers News Production and Consumption Process Presentation Style • Placement • Size Allocation • Picture Selection • Picture Explanation Writing Style • Labeling • Word choice Fact Selection • Event Selection • Source Selection • Commission • Omission Political View Consumer Context • Background Knowledge • Attitude • Social Status • Country Spin Government Reasons Process Forms an arrogant person Word Choice (WC) Labeling (L) a genius F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018 a smart person
  6. 6. WCL problem 6 Word choice & labeling… • strongly impacts the public perception of news topics • disturbs decision making process • leads to false information propagation Hurricane Katrina, 2005 F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  7. 7. Social science background 7 Content analysis “What to think about” Frame analysis “How to think about it” Event-related articles Putin president savior tyrant humble man thief Cross-document coreference resolution president savior Putin tyrant humble man thief Sentiment analysis president savior tyrant humble man thief Putin war sanctions Crimea army Candidate extraction Social sciences Computer science Identified actors, actions, events, concepts, etc. Concept polarization Content analysis Cross-document coreference resolution F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
  8. 8. Research question 8 Given - No training set - A set of event-related articles - Extracted candidate phrases of groups of persons Goal - Find phrases referring to the same concepts - Use only phrases themselves, i.e., no context information - Exploratory unsupervised task illegal aliens undocumented immigrants Directly referring mentions White House officials American authorities Indirectly referring mentions How can an automated approach identify instances of bias by word choice and labeling in the concepts (in)directly referring to groups of people in a set of English news articles reporting on the same event?these instances? A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  9. 9. Multi-step merging approach (MSMA) 1.0 9 Corefs. & NPs ↓ number of mentions … … Extraction of a specific attribute … recursion Pairwise comparison & merging … “Winner takes it all” strategy F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  10. 10. Merge using similar heads 10 young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants illegal aliens who were brought as children nearly 800,000 illegal aliens illegal aliens young illegal aliens headsets {illegals} {immigrants} {aliens} similar in the vector space Entity 1 Entity 2 Entity 3 the word alone is related to the UFO; it will be merged later as “illegal alien” at the third step Merge entities young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants headsets F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  11. 11. Merge using representative phrases 11 A1: young immigrants, A2: illegal immigrants, A3: young illegals young immigrants, undocumented immigrants, illegal immigrants, young illegals, endangered immigrants, additional illegals young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Entity 1 Entity 2 Merged entities Representative labeling phrases B1: young people, B2: foreign people young people, foreign people, bad people, estimated people Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 3 2×3 ≥ 0.3 → similar in the vector space young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Representative labeling phrases F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  12. 12. MSMA 1.0 evaluation 12 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% Init. Step 1 Step 2 Step 3 Step 4 F1 of concept types Actor Country Misc Group Core modifiers Core meaning Evaluation of the simplified version of NewsWCL50 annotation. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019 A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
  13. 13. MSMA 1.0 Drawbacks 13 • Overparametrization • Lack of stability – small variation in wording affected results • Few head modifiers used – only adjectives • Frequently falsely merged concepts – American people – young immigrant people – Chinese officials – American officials • Low recall & low precision – smaller related entities remain unmerged – unrelated entities are merged • “Winner takes it all” strategy is not optimal Problems of MSMA 1.0 Goals of MSMA 2.0 • Self-controlled merging • Default set of parameters for all datasets • Stable performance in case of added phrases • Use all head’s modifiers • Keep concepts fine-grained • Improve merging related smaller entities Same challenge: unsupervised learning, no training set A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  14. 14. MSMA 2.0: Preprocessing 14 non-NE persons NE (ORG) persons core mentions non-NE group non-NE person ORG person generalizing mentions specializing mentions Republican establishment GOP leaders, Republicans a red attorney general, a Republican Americans U.S. citizens U.S. + citizens 2*U.S. + citizens young young + 2*U.S. + citizens 1. Concept’s sub-type prioritization 4. Weighting of the NE components 3. NE-grid: operation restriction or similarity amplification immigrants young + immigrants GOP Republicans Republican United_States U.S. American Americans Spanish Mexico GOP Republicans Republican United_States U.S. American Americans Spanish Mexico 5. Multiple similarity levels - Head-similarity matrix SH - Phrase-similarity matrix SP - Core-phrase-similarity matrix SCP - Ratio-matrix RM 2. More head modifiers adjectival, noun, compound modifiers A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020 young + citizens
  15. 15. MSMA 2.0: Pipeline 15 2. Forming cluster bodies min 𝑠𝑖𝑚 𝑆𝑃 𝑚,𝑐𝑐 = 0.5 𝑐𝑐 ∈ 𝐶𝐶𝑖 𝑚 𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts 𝑪𝑩𝒊 𝑪𝑩𝒋 𝑪𝑪𝒊 𝑪𝑪𝒋 1. Identification of cluster cores border points? noise? 0.4 𝒎 𝑐𝑏 ∀𝒄𝒃 ∋ 𝑪𝑩𝒊 𝑆𝑃𝑚,𝑐𝑏 = 3 ∀𝒄𝒃 ∋ 𝑪𝑩𝒋 𝑆𝑃𝑚,𝑐𝑏 = 2 𝑪𝒊 𝑪𝒋 3. Adding border points 4. Forming non-core clusters 5. Merging final clusters 𝑪𝒊 𝑪𝒋 𝑐𝑚 ∈ 𝐶𝑀 𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ 0.4 𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗 ≥ log5000|𝑀| .7 .8 .8 .8 .8 .7 𝑐𝑚 0 𝑐𝑚 1 𝑐𝑚 3 𝑐𝑚 5 𝑐𝑚 6 𝑐𝑚0 𝑐𝑚1 𝑐𝑚3 𝑐𝑚5 𝑐𝑚6 ∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃 𝑚,𝑐𝑐 ≥ 0.5 and normalized similarity to 𝐶𝐶𝑖 is larger than to 𝐶𝐶𝑗 min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and normalized similarity to 𝐶𝐵𝑖 is larger than to 𝐶𝐵𝑗 .7 .8 .8 .8 .8 .7 𝑐0 𝑐1 𝑐3 𝑐5 𝑐6 • Use all modifiers • On concept level • TF-IDF-weighted concept- similarity matrix 𝑐 0 𝑐 1 𝑐 3 𝑐 5 𝑐 6 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
  16. 16. Evaluation and results 16 Democrats, Democratic leaders Illinois Democrat American public, American families, U.S. citizens, Poor unskilled American workers Voice of Americans Demonstrators, DACA protesters, Opposition Administration officials, USCIS employees, Executive authority, DHS officials, Chief of White House, Acting secretary Mexican, Spanish, Mexican officials GOP senators, Republicans, Republican leaders, A group of red state attorneys European ally, The Europeans, European leaders, Western European Diplomats Israeli officials, Israeli Ambassador, The Israelis Russian agents, Russian nationals, The Russians caravan participants, asylum-seeking immigrant caravan, members of the caravan, more than a few hundred asylum seekers, 150 migrants, many of whom were children, asylum-seekers, the people that are waiting outside, these large “caravans” of people, unauthorized immigrants, refugees, people traveling without documents, a caravan of hundreds of Central Americans, a group of about 100 people, Central American migrants and supporters one of the chief critics of DACA, opponents of the policy, some immigration critics, immigration hard-liners, groups who support stricter immigration controls Indirect mentions: ORG Indirect mentions: GPEs Direct mentions F1 Direct Indirect CoreNLP 27.9 31.4 Hier.Clust. 37.2 29.1 EECDCR 41.6 42.6 MSMA 1.0 44.7 40.9 MSMA 2.0 ELMo 42.1 40.1 MSMA 2.0 fastText 48.3 43.6 MSMA 2.0 word2vec 48.5 44.3 A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020 A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
  17. 17. Conclusion 17 • Bias by WCL has strong influence of the readers • Revealing bias is a step towards mitigating it • MSMA 1.0 & 2.0 successfully resolve biased mentions Help social sciences with frame analyses Help news readers become aware of bias in media Newsalyze news readers, researchers Help make the world a better place Objectivity Frame 2 Frame 1 https://github.com/fhamborg/newsalyze-backend Soon to be publicly available
  18. 18. Questions 18 Contact: Anastasia Zhukova Zhukova@uni-wuppertal.de @ana_m_zhukova http://dke.uni-wuppertal.de/zhukova Thank you for your attention! Questions?

×