2. Introduction
Now-a-days people are generally using social networking sites for communicating with the
other users and for sharing information across the world
Twitter is one among several social networking sites which are expanding on daily basis.
SPAM attacks are increasing in social media these days, and many social media users are
exposed to these and similar SPAM attacks. Spammers have the intention of collecting
personal information and attacking user profiles that they identify. These Attackers share
SPAM content with malware links and expect users to install this software on their
computers.
There is a need to develop effective systems for detecting SPAM accounts and SPAM
contents so that social networks can be cleaned and users can have a better experience
3. Proposal
In this study, we group similar Twitter users and introduce a dynamic
feature selection technique that uses different features for each user group
instead of using a static feature set and apply machine learning algorithms
to classify spam users on Twitter.
5. Data extraction
A CRAWLER has been developed using Twitter Rest and Streaming API to
collect user information. This software enables to collect user data without
being tied to the API provided by Twitter as the restrictions that the Twitter
API imposes on the user are not suitable for use because it prevents data
collection intensively. Before the data collection process starts, users were
randomly selected from among users who shared about Twitter’s agenda
items.
6. 2.Feature extraction
Property Tool is to be created by making calculations on raw user data.
While the features accommodated in the property pool have been
determined,attention has been paid to the most commonly used features in the
literature, which does not require very high costs.
In this way, we are trying to establish a decision mechanism with high accuracy by
keeping the necessary time and resources required for collecting and extracting
features at a minimum level.
7. Feature Set
User based features Content based features
User age Link count
for per tweet
Total tweet count
Average hashtag count
Total followers count
Average mention count
Total followings count
Average favorite count
Tweet count for per age Average retweet
count
Follower count for per age Retweeted rate
Followings count for per age Followers count for
per tweet
Total common followers count Average spam
8. Machine Learning
The machine learning phase is the last phase in which the target user is made a
decision as to whether it is a SPAM account. In this phase, users who are
already grouped are tried to be classified with various classification algorithms
together with dynamically determined properties for the group they are
belonging to.
In this study we are using k-NN, SVM classifying machine learning
algorithms.
9. References
1. Fabricio B, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on
Twitter, collaboration, electronic messaging, anti- abuse and spam conference (CEAS),
vol 6. National Academy Press
2. Rashhid C, Nuriddin M, Mahmud GAN, Rashedur M (2013) A data mining based
spam detection system for YouTube. In: Eighth international conference on digital
information management, pp. 373–378
3. Sarita Y, Daniel R, Grant S, Danah B (2010) Detecting spam in a twitter network.
Microsoft
Res First Monday, 15(1)
4. Stafford G, Louis LY (2013) An evaluation of the effect of spam on twitter trending
topics.IEEE, New York
5. Zhao Y, Zhaoxiang Z, Yungonh W, Liu J (2012) Robust mobile spamming detection
via graph patterns. In: 21st international conference on pattern recognition.