1) The document analyzes over 740,000 tweets from the second 2016 presidential debate between Donald Trump and Hillary Clinton to understand how quotes were recorded, interpreted, and shared on Twitter.
2) The tweets were filtered into collections based on memorable quotes from the debate and analyzed to find variations in how the quotes were reported, interpretive biases in the tweets, and sentiment toward the quotes.
3) The results showed that Twitter users often reported quotes with some variation or commentary, word trees illustrating changes to quotes over time, and that sarcastic tweets may have skewed sentiment analysis of the quotes.
2. Our Project
Question
Is it “fake news” to misquote a presidential candidate by just one word? What about two? Three? When
exactly does “fake” news become fake?
Hypothesis
“Fake news” doesn’t only happen from the top down, but also happens at the very first moment of
interpretation, especially when shared on social media networks
Goals
● How Twitter users were recording, interpreting, and sharing the words spoken by Donald Trump and
Hillary Clinton in real time.
● Find out how the “facts” (the accurate transcription of the words) began to evolve into counterfacts
or alternate versions of their words.
● Find out if there is any interpretive bias and emotional valence in the tweets.
2
3. Dataset
Data Type: Tweets (~740,000 unique tweets)
Source: Social Feed Manager
Time Range: During and immediately after the Second Presidential Debate (10/01/2016)
Search terms: #debate, #debates, #debatenight, #debate2016, #debates2016, @HillaryClinton,
@realDonaldTrump, @debates
3
4. The Data Processing
Create Collections
Filter the json data based on several memorable debate quotes/topics
Collection 1 Quotes: “That was locker room talk.”
Keywords: locker room, locker-room, lockerroom
Collection 2 Quotes: “Nobody has more respect for women than I do.”
Keywords: respect for women
Collection 3 Quotes: “You would be in the jail.”
Keywords: jail
Collection 4 Quotes: “You need both a public and private position on certain issues.”
Keywords: public position, private position
4
5. Processing the Data
Pre-processing
change into lowercase
remove hashtags, mentions, URLs
remove stopwords
Tweet Variance
use TF-IDF (scikit-learn) to create the term vectors
calculate the cosine similarity among selected tweets
Sentiment
calculate the sentiment value (nltk.sentiment.vader)
Topic Analysis
create topics in each collection (# of topics: 3, # of words / topic: 8) (gensim)
5
9. Conclusion
● Twitter users were recording, interpreting, and sharing the words spoken by
Donald Trump and Hillary Clinton in real time -- often with some variation or
comment
○ Sarcastic/insincere comments likely skewed sentiment analysis
● Further research would require improving the methods for cleaning the data,
analyzing the ways that quotes changed over a longer period of time, how
those interpretations were reflected in other outlets, and how influential
variances and interpretive biases were in shaping public understanding of
what the candidates said compared to deliberate “fake news”
9