Dynamic feature selection for spam detection (1).pptx

R
Dynamic feature
selection for spam
detection
Rivika jain under the guidance of : Dr Faraz Ahmad
19cs26
Introduction
Now-a-days people are generally using social networking sites for communicating with the
other users and for sharing information across the world
Twitter is one among several social networking sites which are expanding on daily basis.
SPAM attacks are increasing in social media these days, and many social media users are
exposed to these and similar SPAM attacks. Spammers have the intention of collecting
personal information and attacking user profiles that they identify. These Attackers share
SPAM content with malware links and expect users to install this software on their
computers.
There is a need to develop effective systems for detecting SPAM accounts and SPAM
contents so that social networks can be cleaned and users can have a better experience
Proposal
In this study, we group similar Twitter users and introduce a dynamic
feature selection technique that uses different features for each user group
instead of using a static feature set and apply machine learning algorithms
to classify spam users on Twitter.
Methodology
1. Architecture
● Data collection
● Feature extraction
● Machine learning
Data extraction
A CRAWLER has been developed using Twitter Rest and Streaming API to
collect user information. This software enables to collect user data without
being tied to the API provided by Twitter as the restrictions that the Twitter
API imposes on the user are not suitable for use because it prevents data
collection intensively. Before the data collection process starts, users were
randomly selected from among users who shared about Twitter’s agenda
items.
2.Feature extraction
Property Tool is to be created by making calculations on raw user data.
While the features accommodated in the property pool have been
determined,attention has been paid to the most commonly used features in the
literature, which does not require very high costs.
In this way, we are trying to establish a decision mechanism with high accuracy by
keeping the necessary time and resources required for collecting and extracting
features at a minimum level.
Feature Set
User based features Content based features
User age Link count
for per tweet
Total tweet count
Average hashtag count
Total followers count
Average mention count
Total followings count
Average favorite count
Tweet count for per age Average retweet
count
Follower count for per age Retweeted rate
Followings count for per age Followers count for
per tweet
Total common followers count Average spam
Machine Learning
The machine learning phase is the last phase in which the target user is made a
decision as to whether it is a SPAM account. In this phase, users who are
already grouped are tried to be classified with various classification algorithms
together with dynamically determined properties for the group they are
belonging to.
In this study we are using k-NN, SVM classifying machine learning
algorithms.
References
1. Fabricio B, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on
Twitter, collaboration, electronic messaging, anti- abuse and spam conference (CEAS),
vol 6. National Academy Press
2. Rashhid C, Nuriddin M, Mahmud GAN, Rashedur M (2013) A data mining based
spam detection system for YouTube. In: Eighth international conference on digital
information management, pp. 373–378
3. Sarita Y, Daniel R, Grant S, Danah B (2010) Detecting spam in a twitter network.
Microsoft
Res First Monday, 15(1)
4. Stafford G, Louis LY (2013) An evaluation of the effect of spam on twitter trending
topics.IEEE, New York
5. Zhao Y, Zhaoxiang Z, Yungonh W, Liu J (2012) Robust mobile spamming detection
via graph patterns. In: 21st international conference on pattern recognition.
1 de 9

Mais conteúdo relacionado

Similar a Dynamic feature selection for spam detection (1).pptx(20)

F017433947F017433947
F017433947
IOSR Journals153 visualizações
AGGRESSION DETECTION USING MACHINE LEARNING MODELAGGRESSION DETECTION USING MACHINE LEARNING MODEL
AGGRESSION DETECTION USING MACHINE LEARNING MODEL
IRJET Journal3 visualizações
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
DivyaPatel7294571 visão
Categorize balanced dataset for troll detectionCategorize balanced dataset for troll detection
Categorize balanced dataset for troll detection
vivatechijri150 visualizações
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
IJET - International Journal of Engineering and Techniques454 visualizações

Último(20)

Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 visualizações
Java 21 and Beyond- A Roadmap of Innovations  .pdfJava 21 and Beyond- A Roadmap of Innovations  .pdf
Java 21 and Beyond- A Roadmap of Innovations .pdf
Ana-Maria Mihalceanu54 visualizações
ThroughputThroughput
Throughput
Moisés Armani Ramírez31 visualizações
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray1094 visualizações
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 visualizações
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 visualizações
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting177 visualizações
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
CXL Forum110 visualizações
TE Connectivity: Card Edge InterconnectsTE Connectivity: Card Edge Interconnects
TE Connectivity: Card Edge Interconnects
CXL Forum95 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang34 visualizações
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman22 visualizações
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum123 visualizações

Dynamic feature selection for spam detection (1).pptx

  • 1. Dynamic feature selection for spam detection Rivika jain under the guidance of : Dr Faraz Ahmad 19cs26
  • 2. Introduction Now-a-days people are generally using social networking sites for communicating with the other users and for sharing information across the world Twitter is one among several social networking sites which are expanding on daily basis. SPAM attacks are increasing in social media these days, and many social media users are exposed to these and similar SPAM attacks. Spammers have the intention of collecting personal information and attacking user profiles that they identify. These Attackers share SPAM content with malware links and expect users to install this software on their computers. There is a need to develop effective systems for detecting SPAM accounts and SPAM contents so that social networks can be cleaned and users can have a better experience
  • 3. Proposal In this study, we group similar Twitter users and introduce a dynamic feature selection technique that uses different features for each user group instead of using a static feature set and apply machine learning algorithms to classify spam users on Twitter.
  • 4. Methodology 1. Architecture ● Data collection ● Feature extraction ● Machine learning
  • 5. Data extraction A CRAWLER has been developed using Twitter Rest and Streaming API to collect user information. This software enables to collect user data without being tied to the API provided by Twitter as the restrictions that the Twitter API imposes on the user are not suitable for use because it prevents data collection intensively. Before the data collection process starts, users were randomly selected from among users who shared about Twitter’s agenda items.
  • 6. 2.Feature extraction Property Tool is to be created by making calculations on raw user data. While the features accommodated in the property pool have been determined,attention has been paid to the most commonly used features in the literature, which does not require very high costs. In this way, we are trying to establish a decision mechanism with high accuracy by keeping the necessary time and resources required for collecting and extracting features at a minimum level.
  • 7. Feature Set User based features Content based features User age Link count for per tweet Total tweet count Average hashtag count Total followers count Average mention count Total followings count Average favorite count Tweet count for per age Average retweet count Follower count for per age Retweeted rate Followings count for per age Followers count for per tweet Total common followers count Average spam
  • 8. Machine Learning The machine learning phase is the last phase in which the target user is made a decision as to whether it is a SPAM account. In this phase, users who are already grouped are tried to be classified with various classification algorithms together with dynamically determined properties for the group they are belonging to. In this study we are using k-NN, SVM classifying machine learning algorithms.
  • 9. References 1. Fabricio B, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter, collaboration, electronic messaging, anti- abuse and spam conference (CEAS), vol 6. National Academy Press 2. Rashhid C, Nuriddin M, Mahmud GAN, Rashedur M (2013) A data mining based spam detection system for YouTube. In: Eighth international conference on digital information management, pp. 373–378 3. Sarita Y, Daniel R, Grant S, Danah B (2010) Detecting spam in a twitter network. Microsoft Res First Monday, 15(1) 4. Stafford G, Louis LY (2013) An evaluation of the effect of spam on twitter trending topics.IEEE, New York 5. Zhao Y, Zhaoxiang Z, Yungonh W, Liu J (2012) Robust mobile spamming detection via graph patterns. In: 21st international conference on pattern recognition.