1. MSR Presentation on
RUMOR DETECTION ON REAL-TIME
TWITTER DATA USING SUPERVISED
LEARNING
Presented By:-
Patel Divya M.
M.E. (Information Technology)
Enroll. No. : 160430723010
SHANTILAL SHAH
ENGINEERING COLLEGE,
BHAVNAGAR
Guided By:-
Dr. Dinesh B. Vaghela
Asst. Prof. of Information Technology Dept.
GUJARAT
TECHNOLOGICAL
UNIVERSITY
1
2. • Introduction
• Research Topic: Rumor Detection
• Research work: Objective
• Literature Review
• Problem Statement
• Implementation Strategy
• Implementation Environment
• Conclusion
• Future Work
• References
Outline
1
2
3. Introduction
• Twitter is most popular micro-blogging service on social media[1].
• A common people have a direct platform to share information and their opinions about
the news events and any other information[1].
• Not all the information posted on twitter is correct or useful in providing information
about the event to other people[1].
1
3
4. Introduction: Rumor Detection
• What is Rumor?
An unverified statement that starts from one or more sources and spreads
over time [2].
A rumor can end in three ways: it can be resolved as either true, false or remain
unresolved [2].
• So, its necessary to provide some solutions for detecting such kind of activity spread on
social media.
1
4
5. Research work: Objective
Phase 1:
• Survey of current methods and models available for Detecting Rumors.
• To study and analyze different methods of Rumor Detection on real time Twitter data.
Phase 2:
• To design a new model/approach for detection of rumors.
Phase 3:
• To implement a proposed model/approach for detection of rumors.
• To evaluate the performance of Rumor detection on Twitter by proposed model.
5
6. Paper Name: “Towards Automated Real-Time Detection of Misinformation on Twitter”
Authors: Suchita Jain, Vanya Sharma and Rishabh Kaushal [3]
Publisher / Journal Name: IEEE-2016.
Literature Review
6
Proposed
Model
Focused on the problem by providing an approach to detect
misinformation or rumors on Twitter in real-time automatically.
Their approach based on the supposition that verified News Channel
accounts on Twitter give more credible information as compared to the
public account of user.
Observation They calculate accuracy according the tweet they retrieve from both the
News channels and general users.
Limitation Feature selection/extraction part is missing.
7. Paper Name : “Automatic detection of Rumoured Tweets and finding its Origin”
Authors: Sahana V P, Alwyn R Pias, Richa Shastri, and Shweta Mandloi [4]
Publisher / Journal Name: IEEE-2015.
7
Proposed
Model
Focused on the topic “London Riots in 2011”.
The methodology contains mainly three sections: data, feature extraction,
classification.
Used 20 features based on tweet content and user accounts.
Then after they trained a classifier to correctly classifies the tweets. For
that they used Weka tool for classification.
Also proposed an algorithm to find the origin of the rumored tweets i.e.
obtain the account information of the user who first started spreading
rumors on Twitter.
Observation Achieved best accuracy for J48 decision tree classification algorithm.
Recall rate is given high accuracy 0.877.
Limitation Focused only on one specific rumor topic.
Real-time twitter data were not considered.
8. Paper Name : “Detection and Analysis of 2016 US Presidential Election Related Rumors on
Twitter”
Authors: Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, Yu Wang, and Jiebo Luo [5]
Publisher / Journal Name: Springer 2017.
8
Proposed
Model
Focused on the 2016 U.S. presidential election.
Presented an analysis of rumor tweets from the followers of two
presidential candidates: Hillary Clinton and Donald Trump.
They detected rumor tweets by matching large amount of tweets related
to president election with verified rumor articles.
They collected over 8 million tweets from the followers of the two
candidates.
They compared the performance of five matching algorithms with
respect to the rumor detection task: TF-IDF, BM25, Word2Vec and
Doc2Vec, lexicon-based algorithm.
Observation Precision gives 94.7% accuracy which is the highest accuracy result
according to their detection algorithm.
Limitation Focused only on specific topic i.e. “2016 US President Election related
rumors”.
9. Paper Name : “Automatic Detection of Rumor on Social Network”
Authors: Qiao Zhang, Shuiyuan Zhang, Jian Dong, Jinhua Xiong, and Xueqi Cheng [6]
Publisher / Journal Name: Springer 2015.
9
Proposed
Model
Proposed an automatic rumor detection method based on the combination
of new proposed implicit features and shallow features of the messages.
It mainly divided into 3 parts: data cleaning, feature extraction and model
training.
Used User-based implicit features and Content-based implicit features.
A large amount of supervised model they used such as Support Vector
Machine, Random Forest.
Observation Results show that Implicit-Content-Based method have significant
improvement compared with Shallow-Content-Based method, with
10.5% improvement in precision and 4.7% in recall rate.
Limitation User credibility.
Detection of rumors on the Chinese micro-blogging services.
10. Paper Name : “Detecting Rumors on Online Social Networks Using Multi-layer Autoencoder”
Authors: Yan Zhang, Weiling Chen, Chai Kiat Yeo, Chiew Tong Lau, Bu Sung Lee [7]
Publisher / Journal Name: IEEE 2017.
10
Proposed
Model
Proposed an anomaly detection method based on autoencoder to perform
rumor detection.
They used Sina Weibo which is the most popular microblog in China.
Proposed several self-adapting thresholds which are calculated based on
the property of each recent Weibo set.
Observation Results show that the autoencoder model achieves a good accuracy i.e.
88%, F1 i.e. 82% and a low false positive rate i.e. 7%.
Limitation Detection of rumors on the chienese micro-blogging services.
Performance of autoencoder with 2 hidden layer gives best performance.
11. Literature Review
Sr.
No.
Title Algorithm/Technique Used Advantages Disadvantages
1 Towards Automated Real-Time
Detection of Misinformation on
Twitter [3]
Sentiment and semantic
analysis
Detect rumors on Twitter
using tweets from the
verified news channels as
base.
Detect rumors especially in
the critical times of
emergency.
There result is based only on
the semantic and sentiment
analysis of the tweets.
They weren’t used any
features. Also they weren’t
used any classification
techniques to detect rumors.
2 Automatic detection of Rumored
Tweets and finding its Origin [4]
J48 decision tree Classifier Automatically detect the
spread of rumoured tweets.
Focused on specific rumor
topic.
Real-time twitter data were
not considered.
3 Detection and Analysis of 2016
US Presidential Election Related
Rumors on Twitter [5]
TF-IDF and BM25,
Word2Vec and Doc2Vec,
Lexicon matching
Detect rumor tweets from the
aspects of people, content
and time.
Their detection algorithm
understand rumors during
political events.
Focused only one specific
topic i.e. “2016 US President
Election related rumors”.
11
12. Continue..
4 Automatic Detection of Rumor on
Social Network [6]
Support vector machine Best result as compared to
Shallow features.
User credibility
Detection of rumors only on
the Chinese micro-blogging
services.
5 Detecting Rumors on Online Social
Networks Using Multi-layer
Autoencoder [7]
Autoencoder (Artificial
Neural Network)
Multi-layer autoencoder is
used.
Self-adapting thresholds
which is used to distinguish
rumors from non-rumors.
Detection of rumors only on
the Chinese micro-blogging
services.
Used unsupervised learning
method to detect rumors.
6 Rumor Detection and
Classification for Twitter Data [8]
J48 decision tree Classifier They detect rumors as a type
of misinformation
propagation.
Rumor detection and
classification (RDC) within
the context of microblog
social media.
The result is not better than the
pre-processing method applied
on the algorithm.
Real-time twitter data were not
considered.
12
Table 1: Comparative analysis of Literature
13. Problem Statement
• An advantage of social media is that all the people can share information and
also gives their opinions on that platform.
• The downside of such rapid diffusion of information is that false information
are also spread.
• As the rumors are spreading on Twitter and other social media so fast and
easier. We need to provide some solutions to detect such rumors.
13
14. Proposed Work
14
Figure 1: Basic steps for Proposed method
Dataset
Collection
Pre-
processing
Feature
Extraction
Classification
16. Data Pre-processing
• Remove all URLs (e.g. www.xyz.com), hash tags (e.g. #topic), targets
(@username)
• Correct the spellings; sequence of repeated characters is to be
handled
• Remove all punctuations ,symbols, numbers
• Remove Stop Words
• Remove Non-English Tweets
16
21. • Our Proposed approach is divided into three step: 1) Pre-processing, 2) Sentiment
Analysis, and 3) Classification.
• In first step, we are going to preprocess on the real-time tweets to determine the topic
about which the given input tweet is posted.
• In second step, we are finding tweet’s sentiment polarity of each tweets by using
sentiment score.
• In final step, we are going to apply this sentiment score as an input to the different
classification algorithm.
21
22. • We are using Proposed approach with News websites approach to compare different
specific rumors topic.
• If both gives the same result then we can say that our approach gives the better accuracy.
• This comparison approach also provide the verification about the rumor topic.
22
28. Conclusion
• After the study of different research paper on rumor detection, different methods to
are used to detect rumors. There are many classifiers available for detecting
rumors. This research work can be useful to detect rumors on Twitter platform
efficiently and accurately.
28
30. References
[1] Anubrata Das, Moumita Roy, Soumi Dutta, Saptarshi Ghosh, Asit Kumar Das. “Predicting Trends in the
Twitter Social Network: A Machine Learning Approach”, Springer International Publishing Switzerland, 2015.
[2] Soroush Vosoughi, PhD Thesis, “Automatic Detection and Verification of Rumors on Twitter”, June 2015.
[3] Suchita Jain, Vanya Sharma and Rishabh Kaushal. “Towards Automated Real-Time Detection of
Misinformation on Twitter”, Intl. Conference on Advances in Computing, Communications and Informatics
(ICACCI), IEEE 2016.
[4] Sahana V P, Alwyn R Pias, Richa Shastri, and Shweta Mandloi. “Automatic detection of Rumoured Tweets
and finding its Origin”, Intl. Conference on Computing and Network Communications (CoCoNet'15), IEEE
2015.
[5] Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, Yu Wang, and Jiebo Luo. “Detection and Analysis of
2016 US Presidential Election Related Rumors on Twitter”, Springer International Publishing AG 2017,
Springer 2017.
[6] Qiao Zhang, Shuiyuan Zhang, Jian Dong, Jinhua Xiong, and Xueqi Cheng. “Automatic Detection of Rumor
on Social Network”, Springer International Publishing Switzerland 2015, Springer 2017.
[7] Yan Zhang, Weiling Chen, Chai Kiat Yeo, Chiew Tong Lau, Bu Sung Lee, “Detecting Rumors on Online
Social Networks Using Multi-layer Autoencoder”, IEEE Technology & Engineering Management Conference
(TEMSCON), IEEE 2017
[8] Sardar Hamidian and Mona Diab. “Rumor Detection and Classification for Twitter Data”, The Fifth
International Conference on Social Media Technologies, Communication, and Informatics, SOTICS 2015. 30