O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 41 Anúncio

Mais Conteúdo rRelacionado

Semelhante a Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web (20)

Mais recentes (20)

Anúncio

Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web

  1. 1. WARNING CONTENT IN THIS TALK IS OFFENSIVE AND UNCENSORED 5/17/2017 ICWSM 2017 0
  2. 2. Measuring 4chan’s Politically Incorrect Forum and Its Effects on the Web Gabriel Emile Hine, Jeremiah Onaolapo, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Riginos Samaras, Gianluca Stringhini, Jeremy Blackburn (@jhblackb)
  3. 3. 4chan 5/17/2017 ICWSM 2017 2
  4. 4. 5/17/2017 ICWSM 2017 3
  5. 5. 5/17/2017 ICWSM 2017 4
  6. 6. 5/17/2017 ICWSM 2017 5
  7. 7. 5/17/2017 ICWSM 2017 6
  8. 8. 5/17/2017 ICWSM 2017 7
  9. 9. 5/17/2017 ICWSM 2017 8
  10. 10. 5/17/2017 ICWSM 2017 9
  11. 11. Why Do We Care About 4chan? • It’s a niche site, right?! 5/17/2017 ICWSM 2017 10
  12. 12. What Exactly Is 4chan? • An image board • Conversations grouped into threads • Anonymous • No user accounts • Ephemeral • Dangerous? 5/17/2017 ICWSM 2017 11
  13. 13. Basics • An “original poster” (OP) creates a new thread by making a post • Single image attached • Other users can reply • With or without images, and possibly add references to previous posts, quote text, etc. • NB: Quotations are implied by the > symbol, as users regularly use it to put words in other posters’ mouths • No likes, share, favorites, etc. • No validation from other users except more posts! 5/17/2017 ICWSM 2017 12
  14. 14. Boards • 4chan separates conversation into different areas of interests known as “boards” • We focus on /pol/ (politically incorrect) • Also look at /sp/ (sport) and and /int/ (international) for comparison 5/17/2017 ICWSM 2017 13
  15. 15. /pol/ - Politically Incorrect 5/17/2017 ICWSM 2017 14
  16. 16. /pol/ - Politically Incorrect 5/17/2017 ICWSM 2017 15 Extremely lax moderation Volunteer “janitors” as well as ”admins” Almost anything goes
  17. 17. The Bump System • Limit each board to N live threads • Threads ordered by MRU • A new post in a thread “bumps” it up to the top • Create a new thread  An old thread dies • Bump limit • Max of times thread can be bumped • Ensures that no discussion will dominate forever 5/17/2017 ICWSM 2017 16 0.00 0.25 0.50 0.75 1.00 1 10 100 1000 Number of posts per thread CDF 10−5 10−4 10−3 10−2 10−1 10 0 0 250 500 750 1000 Number of posts per thread CCDF board /int/ /pol/ /sp/
  18. 18. The Bump System • Limit each board to N live threads • Threads ordered by MRU • A new post in a thread “bumps” it up to the top • Create a new thread  An old thread dies • Bump limit • Max of times thread can be bumped • Ensures that no discussion will dominate forever 5/17/2017 ICWSM 2017 17 0.00 0.25 0.50 0.75 1.00 1 10 100 1000 Number of posts per thread CDF 10−5 10−4 10−3 10−2 10−1 10 0 0 250 500 750 1000 Number of posts per thread CCDF board /int/ /pol/ /sp/ Just crawling data is challenging! We need to build a crawler that works with the ephemeral nature of 4chan
  19. 19. Datasets • Crawled /pol/, /sp/, /int/ • June 30 to September 12 • Using 4chan JSON API • Methodology: • Every 5 minutes take a snapshot of catalog • Once a thread is pruned, retrieve full/final contents from archive • NB: Archives are pruned after 7 days! • EMAIL US IF YOU WANT TO WORK ON THIS DATA WITH US!!! 5/17/2017 ICWSM 2017 18 /pol/ /sp/ /int/ Total Threads 217K 14.4K 24.9K 256K Posts 8.3M 1.2M 1.4M 10.9M
  20. 20. or ? Hate Speech Usage • Crowdsourced dictionary • Manually filtered a bit • /pol/ by far most hate speech use • /pol/ 12% • /sp/ 7.3% • /int/ 6.3% • Twitter 2.2% 5/17/2017 ICWSM 2017 19
  21. 21. Raids • Attempts to disrupt another site • Not a DDoS • Disrupts community that calls service home, not the service itself • Raids are a favorite past time on 4chan • “Pool’s closed!” • Have become less “funny” and more “scary” lately 5/17/2017 ICWSM 2017 20
  22. 22. Extended version only Case Study: Operation Google • /pol/ got wind that Google had some anti-hate AI • /pol/ does not take kindly to censorship • Unless they are censoring things of course ;) • /pol/ has beaten AI before • Search for “MS Tay bot” • google = nigger • skype = kike (jew) 5/17/2017 ICWSM 2017 21
  23. 23. Extended version only Impact of Operation Google on /pol/ • Huge spike in code word use • No dip in “real” words • Relatively short lived… • Why?! 5/17/2017 ICWSM 2017 22
  24. 24. Extended version only Not That Successful on Twitter 5/17/2017 ICWSM 2017 23
  25. 25. Raids On YouTube Videos • Many YouTube links posted on /pol/ • We’ve anecdotally observed them raided • Can we find quantitative evidence of raids?! 5/17/2017 ICWSM 2017 24
  26. 26. How Might Raids Happen? • Someone posts a YouTube link • Maybe with a prompt like “you know what to do” • Thread is an aggregation point for raiders • E.g., “Hah! I called that person a nigger!” • If raid is taking place: • Peak in YouTube comments while thread alive? • /pol/ thread and YT comments synchronized? 5/17/2017 ICWSM 2017 25
  27. 27. What’s a YouTube Raid Look Like? 5/17/2017 ICWSM 2017 26
  28. 28. Activity Peaks • YT videos with peaks during 4chan thread • Determined via PDF of commenting timeseries 5/17/2017 ICWSM 2017 27
  29. 29. Activity Peaks • YT videos with peaks during 4chan thread • Determined via PDF of commenting timeseries 5/17/2017 ICWSM 2017 28 14% of videos see peak commenting activity during /pol/ thread lifetime
  30. 30. Synchronization Example • Two series • Second randomly shifted from first by 0.2 on avg 5/17/2017 ICWSM 2017 29
  31. 31. Synchronization Example Blue lines  per-sample lag Red area  density of the lags Peak of density curve = 0.2 5/17/2017 ICWSM 2017 30
  32. 32. How Can We Validate? • Just because there is synchronization, doesn’t mean a raid is going on • In fact, we expect some background noise • We need to come up with metric to validate • If a raid is happening, we would expect to see elevated levels of hate speech •  more hate comments per second • Raiders likely using same YouTube accounts? 5/17/2017 ICWSM 2017 31
  33. 33. Evidence for Raids 5/17/2017 ICWSM 2017 32
  34. 34. Evidence for Raids • Synchronization  More overlap • I.e., the same YouTube commenters 5/17/2017 ICWSM 2017 33
  35. 35. Summary • Bump system leads to interesting dynamics • Forces fresh content • /pol/ is a hateful place • Compared to Twitter and even by 4chan standards • There is evidence for raids • High level of synchronization between 4chan/YouTube • High overlap of YouTube commenters • Lots of results I didn’t show • 4chan users are well distributed throughout the world • Staggering amount of original content posted • Check the paper/our extended technical report! 5/17/2017 ICWSM 2017 34
  36. 36. Thanks for your time! Questions? 5/17/2017 ICWSM 2017 35
  37. 37. Backups 5/17/2017 ICWSM 2017 36
  38. 38. Geographic Distribution of Users • /pol/ shows the flag of the country the user is posting from • Normalize number of threads created per country by that country’s Internet using population • /pol/ users really well distributed! • Native English speaking countries most highly represented • Plenty of other countries really well represented too though! 5/17/2017 ICWSM 2017 37
  39. 39. Are Flags Trustworthy? • Use spectral clustering of the topics that each country posts about • The clusters follow real world socio-political blocks • While flags are not perfect, they seem reasonable 5/17/2017 ICWSM 2017 38
  40. 40. Image Usage • 1,003,785 unique images out of 3,216,972 total images • 800 GB worth of images in ~3 months • 70% posted once • 95% no more than 5 times 5/17/2017 ICWSM 2017 39
  41. 41. Most Popular Images on /pol/ 5/17/2017 ICWSM 2017 40

×