SlideShare uma empresa Scribd logo
1 de 23
Fine-Grained Controversy Detection in Wikipedia
Siarhei Bykau (Purdue University)
Flip Korn (Google Research)
Divesh Srivastava (AT&T Labs-Research)
Yannis Velegrakis (University of Trento)
Siarhei Bykau 2
Wikipedia: The Wisdom of Crowds
● Collaborative
Content Creation [Giles
2005]
– Up-to-date
– Pluralistic
– Neutral point of view
● Data Quality Problems:
– Reputation&Trust [Adler
and Alfaro 2007, Adler et al. 2008]
– Vandalism [Chin et al. 2010,
Potthast et al. 2008, Smeth et al.
2008]
– Stability [Druck et al. 2008]
– Controversy
Siarhei Bykau 3
Controversy
● A prolonged dispute by a number of people on the same topic *
● Should be distingueshed from:
– regular edits
– vandalism
● Help in
– preserve neutral point of view (NPOV)
– requesting supporting evidences
* http://en.wikipedia.org/wiki/Controversy
Siarhei Bykau 4
Arab-Israeli Conflic
●
Sensitive page, rife with controversial content
– Number of casualties, Israeli per-capita GDP, etc.
Siarhei Bykau 5
The Beatles
● Non-sensitive page, with controversial content
– Should it be “The Beatles” or “the Beatles”?
Siarhei Bykau 6
Caesar salad
Siarhei Bykau 7
Controversy Detection:
Related Work
● machine learning [Kittur et al 2007]
● # of revisions, # of unique authors, page length
● mutual reinforcement principle [Vuong et al 2008]
● content is more controversial if page’s controversy is low
● bipolarities in the edit graph [Sepehri Rad and Barbosa 2011]
● nodes = authors
● edges = one author deletes/reverts content written by another
● revert statistics [Yasseri et al. 2012]
● number of authors who revert an article back to a previous
version
Siarhei Bykau 8
Controversy Detection:
Related Work
● None of these methods work to fine-grained controversies
– WHERE a controversy is located
– WHO is involved into a controversy
– WHEN a controversy occurred
– WHAT are the arguments of a controversy
Siarhei Bykau 9
Caesar salad
● Previous work only detects that the Caesar salad page is controversial
The history of this popular salad is a
controversial issue, even in the spelling of
the name. There is a widely held
misconception that it is named after
[[Julius Caesar]], but the salad's creation
is generally attributed to restaurateur
'''[[Cesar Cardini]]''' (an [[Italy|Italian]]-born
Mexican). As his daughter Rosa (1928–
2003) reported,[2] her father invented the
dish when a Fourth of July 1924 rush
depleted the kitchen's supplies. Cardini
made do with what he had, adding the
dramatic flair of the table-side tossing "by
the chef".
The history of this popular salad is a
controversial issue, even in the spelling of
the name. There is a widely held
misconception that it is named after
'''[[Cesar Cardini]]''', but the salad's
creation is generally attributed to [[Julius
Caesar]] (an [[Italy|Italian]]-born emperor).
As his daughter Rosa (1928–2003)
reported,[2] her father invented the dish
when a Fourth of July 1924 rush depleted
the kitchen's supplies. Cardini made do
with what he had, adding the dramatic flair
of the table-side tossing "by the chef".
- What are diffirent alternatives?
- When the controversy occured?
- Who created the salad?
- After whom it is named?
Siarhei Bykau 10
Challenge: Fine-grained Controversies
● Controversies are typically expressed via substitutions
– Not Insertions/Deletions
– Alternating content
...There is a widely held
misconception that it is named
after [[Julius Caesar]], but the
salad's creation is generally
attributed to restaurateur '''[[Cesar
Cardini]]''' (an [[Italy|Italian]]-born
Mexican). As his daughter Rosa
(1928–2003) reported,...
...There is a widely held
misconception that it is named after
'''[[Cesar Cardini]]''', but the salad's
creation is generally attributed to
[[Julius Caesar]] (an [[Italy|Italian]]-
born emperor). As his daughter
Rosa (1928–2003) reported,..
Siarhei Bykau 11
Challenge: Track Topic across Revisions
● Positions of edits change significantly across revisions
● Text is ambiguous
● Surrounding context of edit clarifies semantics
– Edits with same or similar context likely refer to the same topic
...There is a widely held
misconception that it is named
after [[Julius Caesar]], but the
salad's creation is generally
attributed to restaurateur '''[[Cesar
Cardini]]''' (an [[Italy|Italian]]-born
Mexican). As his daughter Rosa
(1928–2003) reported,...
...There is a widely held
misconception that it is named
after '''[[Cesar Cardini]]''', but the
salad's creation is generally
attributed to [[Julius Caesar]] (an
[[Italy|Italian]]-born emperor). As his
daughter Rosa (1928–2003)
reported,..
Siarhei Bykau 12
Challenge: Distinguish from Other Edits
● Cardinality
– # of edits
● Duration
– Lifespan of a controversy
● Plurality
– # of distinct authors
Siarhei Bykau 13
Challenge: Variability of Text Content
● sequence of wiki links, not words
– Link -> semantic concept
– Wikipedia encourages to have a high density of wiki links
olive oil Worcestershire sauce
Julius Caesar Cesar Cardini Italy
Mexican Hollywood
olive oil Worcestershire sauce
Caesar Cadini Julius Caesar
Caesar Cadini Italy Hollywood
Siarhei Bykau 14
Challenge: Large Number of Revisions
● 4.5 million content pages, about 100 million revs, 7 TB of data
● scalable controversy detection algorithm (CDA)
● Input: a Wikipedia page with its revision history
– Edit extraction // use Myer’s algorithm, find substitutions
– Eliminate edits with low user support
– Cluster edits based on context // use DBSCAN for efficiency
– Cluster and merge the sets of edits based on the subject
● Output: ranked clusters of edits which represent controversies
Siarhei Bykau 15
Experimental Evaluation Setup
● Dataset: English-language Wikipedia dump from December
2013
– 4.5 million content pages, about 100 million revisions, 7 TB of data
● Implemented CDA in Java, used JWPL parser to discover links
– Baseline identifies controversies based on the number of revisions
Parameter Range Default Value
model link, text link
radius of context 2, 4, 6, 8 8
max tokens in
substituion
1, 2, 3, 4, 5 2
context similarity [0...1] 0.75
number of authors 1, 2, 3, 4, 5 2
Siarhei Bykau 16
Sources of Controversy
● Wikipedia Provided Controversies (WPC)
– Metrics:
● Recall
● User surveys
– Metrics:
● noise/signal ratio
● Top1 Precision
● # of distinct controversies
Siarhei Bykau 17
Recall on selected WPC
● Baseline – adapted [Kittur et al 2007]
● Text model has higher recall than link model, baseline is
worst
Siarhei Bykau 18
Recall on full WPC using Text Model
● Text model can retrieve 117 out of 263 WPCs in top-10
result
– Clean controversies doesn't have irrelevant substitutions
Siarhei Bykau 19
New Previously Unknown Controversies
page WPC New controversies
Chopin nationality birthday, photo, name
Avril Lavigne song spelling music genre, birthplace, religion
Bolzano name spelling language
Futurama verb spelling TV, seasons, channel
Freddie Mercury origin name spelling, image
Siarhei Bykau 20
Precision
● Link model has considerably higher precision than
text model
– For many (cardinality, duration, plurality) ranking functions
Link Model Text Model
Siarhei Bykau 21
Subsititutions vs Insertions/Delitions
metric link text link ins/del text ins/del baseline
noise/signal 0.19 0.25 0.64 0.57 0.75
# of dist contr 65 80 29 25 17
● Link model with substitutions has lowest noise/signal ratio
● Models with insertions/deletions have very high noise/signal ratio
● Text model with substitutions find highest # of controversies
● Models with insertions/deletions find low number of controversies
Siarhei Bykau 22
Experiment Takeaways
● Text model with substitutions has a higher recall
– Able to retrieve 23% more controversies among WPC
● Link model with substitutions has a much higher
precision
– Use of semantic concepts in wiki links doubles the precision
● Cardinality, duration, plurality – good ranking
functions
– Validates the definition of controversy
Siarhei Bykau 23
Conclusions
● Detection of fine-grained controversies in Wikipedia
– answer Where, What, Who and When questions
● Link model generates more semantically meaningful
controversies then text model
● Experimental evaluation shows the efficiency and
effectiveness of the proposed solutions

Mais conteúdo relacionado

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Último (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Destaque

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Destaque (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Fine-Grained Controversy Detection on Wikipedia

  • 1. Fine-Grained Controversy Detection in Wikipedia Siarhei Bykau (Purdue University) Flip Korn (Google Research) Divesh Srivastava (AT&T Labs-Research) Yannis Velegrakis (University of Trento)
  • 2. Siarhei Bykau 2 Wikipedia: The Wisdom of Crowds ● Collaborative Content Creation [Giles 2005] – Up-to-date – Pluralistic – Neutral point of view ● Data Quality Problems: – Reputation&Trust [Adler and Alfaro 2007, Adler et al. 2008] – Vandalism [Chin et al. 2010, Potthast et al. 2008, Smeth et al. 2008] – Stability [Druck et al. 2008] – Controversy
  • 3. Siarhei Bykau 3 Controversy ● A prolonged dispute by a number of people on the same topic * ● Should be distingueshed from: – regular edits – vandalism ● Help in – preserve neutral point of view (NPOV) – requesting supporting evidences * http://en.wikipedia.org/wiki/Controversy
  • 4. Siarhei Bykau 4 Arab-Israeli Conflic ● Sensitive page, rife with controversial content – Number of casualties, Israeli per-capita GDP, etc.
  • 5. Siarhei Bykau 5 The Beatles ● Non-sensitive page, with controversial content – Should it be “The Beatles” or “the Beatles”?
  • 7. Siarhei Bykau 7 Controversy Detection: Related Work ● machine learning [Kittur et al 2007] ● # of revisions, # of unique authors, page length ● mutual reinforcement principle [Vuong et al 2008] ● content is more controversial if page’s controversy is low ● bipolarities in the edit graph [Sepehri Rad and Barbosa 2011] ● nodes = authors ● edges = one author deletes/reverts content written by another ● revert statistics [Yasseri et al. 2012] ● number of authors who revert an article back to a previous version
  • 8. Siarhei Bykau 8 Controversy Detection: Related Work ● None of these methods work to fine-grained controversies – WHERE a controversy is located – WHO is involved into a controversy – WHEN a controversy occurred – WHAT are the arguments of a controversy
  • 9. Siarhei Bykau 9 Caesar salad ● Previous work only detects that the Caesar salad page is controversial The history of this popular salad is a controversial issue, even in the spelling of the name. There is a widely held misconception that it is named after [[Julius Caesar]], but the salad's creation is generally attributed to restaurateur '''[[Cesar Cardini]]''' (an [[Italy|Italian]]-born Mexican). As his daughter Rosa (1928– 2003) reported,[2] her father invented the dish when a Fourth of July 1924 rush depleted the kitchen's supplies. Cardini made do with what he had, adding the dramatic flair of the table-side tossing "by the chef". The history of this popular salad is a controversial issue, even in the spelling of the name. There is a widely held misconception that it is named after '''[[Cesar Cardini]]''', but the salad's creation is generally attributed to [[Julius Caesar]] (an [[Italy|Italian]]-born emperor). As his daughter Rosa (1928–2003) reported,[2] her father invented the dish when a Fourth of July 1924 rush depleted the kitchen's supplies. Cardini made do with what he had, adding the dramatic flair of the table-side tossing "by the chef". - What are diffirent alternatives? - When the controversy occured? - Who created the salad? - After whom it is named?
  • 10. Siarhei Bykau 10 Challenge: Fine-grained Controversies ● Controversies are typically expressed via substitutions – Not Insertions/Deletions – Alternating content ...There is a widely held misconception that it is named after [[Julius Caesar]], but the salad's creation is generally attributed to restaurateur '''[[Cesar Cardini]]''' (an [[Italy|Italian]]-born Mexican). As his daughter Rosa (1928–2003) reported,... ...There is a widely held misconception that it is named after '''[[Cesar Cardini]]''', but the salad's creation is generally attributed to [[Julius Caesar]] (an [[Italy|Italian]]- born emperor). As his daughter Rosa (1928–2003) reported,..
  • 11. Siarhei Bykau 11 Challenge: Track Topic across Revisions ● Positions of edits change significantly across revisions ● Text is ambiguous ● Surrounding context of edit clarifies semantics – Edits with same or similar context likely refer to the same topic ...There is a widely held misconception that it is named after [[Julius Caesar]], but the salad's creation is generally attributed to restaurateur '''[[Cesar Cardini]]''' (an [[Italy|Italian]]-born Mexican). As his daughter Rosa (1928–2003) reported,... ...There is a widely held misconception that it is named after '''[[Cesar Cardini]]''', but the salad's creation is generally attributed to [[Julius Caesar]] (an [[Italy|Italian]]-born emperor). As his daughter Rosa (1928–2003) reported,..
  • 12. Siarhei Bykau 12 Challenge: Distinguish from Other Edits ● Cardinality – # of edits ● Duration – Lifespan of a controversy ● Plurality – # of distinct authors
  • 13. Siarhei Bykau 13 Challenge: Variability of Text Content ● sequence of wiki links, not words – Link -> semantic concept – Wikipedia encourages to have a high density of wiki links olive oil Worcestershire sauce Julius Caesar Cesar Cardini Italy Mexican Hollywood olive oil Worcestershire sauce Caesar Cadini Julius Caesar Caesar Cadini Italy Hollywood
  • 14. Siarhei Bykau 14 Challenge: Large Number of Revisions ● 4.5 million content pages, about 100 million revs, 7 TB of data ● scalable controversy detection algorithm (CDA) ● Input: a Wikipedia page with its revision history – Edit extraction // use Myer’s algorithm, find substitutions – Eliminate edits with low user support – Cluster edits based on context // use DBSCAN for efficiency – Cluster and merge the sets of edits based on the subject ● Output: ranked clusters of edits which represent controversies
  • 15. Siarhei Bykau 15 Experimental Evaluation Setup ● Dataset: English-language Wikipedia dump from December 2013 – 4.5 million content pages, about 100 million revisions, 7 TB of data ● Implemented CDA in Java, used JWPL parser to discover links – Baseline identifies controversies based on the number of revisions Parameter Range Default Value model link, text link radius of context 2, 4, 6, 8 8 max tokens in substituion 1, 2, 3, 4, 5 2 context similarity [0...1] 0.75 number of authors 1, 2, 3, 4, 5 2
  • 16. Siarhei Bykau 16 Sources of Controversy ● Wikipedia Provided Controversies (WPC) – Metrics: ● Recall ● User surveys – Metrics: ● noise/signal ratio ● Top1 Precision ● # of distinct controversies
  • 17. Siarhei Bykau 17 Recall on selected WPC ● Baseline – adapted [Kittur et al 2007] ● Text model has higher recall than link model, baseline is worst
  • 18. Siarhei Bykau 18 Recall on full WPC using Text Model ● Text model can retrieve 117 out of 263 WPCs in top-10 result – Clean controversies doesn't have irrelevant substitutions
  • 19. Siarhei Bykau 19 New Previously Unknown Controversies page WPC New controversies Chopin nationality birthday, photo, name Avril Lavigne song spelling music genre, birthplace, religion Bolzano name spelling language Futurama verb spelling TV, seasons, channel Freddie Mercury origin name spelling, image
  • 20. Siarhei Bykau 20 Precision ● Link model has considerably higher precision than text model – For many (cardinality, duration, plurality) ranking functions Link Model Text Model
  • 21. Siarhei Bykau 21 Subsititutions vs Insertions/Delitions metric link text link ins/del text ins/del baseline noise/signal 0.19 0.25 0.64 0.57 0.75 # of dist contr 65 80 29 25 17 ● Link model with substitutions has lowest noise/signal ratio ● Models with insertions/deletions have very high noise/signal ratio ● Text model with substitutions find highest # of controversies ● Models with insertions/deletions find low number of controversies
  • 22. Siarhei Bykau 22 Experiment Takeaways ● Text model with substitutions has a higher recall – Able to retrieve 23% more controversies among WPC ● Link model with substitutions has a much higher precision – Use of semantic concepts in wiki links doubles the precision ● Cardinality, duration, plurality – good ranking functions – Validates the definition of controversy
  • 23. Siarhei Bykau 23 Conclusions ● Detection of fine-grained controversies in Wikipedia – answer Where, What, Who and When questions ● Link model generates more semantically meaningful controversies then text model ● Experimental evaluation shows the efficiency and effectiveness of the proposed solutions