With the advent of online social media, phishers have started using social networks like Twitter, Facebook, Foursquare to spread phishing scams. Twitter is an immensely popular micro-blogging network where people post short messages of 140 characters called tweets. It has over 100 million active users who post about 200 million tweets everyday. Because of this vast information dissemination, phishers have started using Twitter as a medium to spread phishing. It is also difficult to detect phishing on Twitter unlike emails because of the quick spread of phishing links in the network, short size of the content, and use of URL obfuscation to shorten the URL to meet the requirement of 140 character tweet limit. Our technique, PhishAri, detects phishing on Twitter in realtime. We use Twitter specific features along with URL features to detect whether a tweet posted with a URL is phishing or not. Some of the Twitter specific features we use are tweet content and its characteristics like length, hashtags and mentions. Other Twitter features used are the characteristics of the Twitter user posting the tweet such as age of the account, number of tweets and the follower-followee ratio. These twitter specific features coupled with URL based features prove to be a strong mechanism to detect phishing tweets. We use machine learning classification techniques and detect phishing tweets with an accuracy of 92.52%. We have deployed our system for end-users by providing an easy to use Chrome browser extension. The extension works in realtime and classifies a tweet as phishing or safe when it appears in Twitter timeline of a user. In this research, we show that we are able to detect phishing tweets at zero hour with high accuracy which is much faster than public blacklists and as well as Twitter's own defense mechanism to detect malicious content. We also performed a quick user evaluation of PhishAri in a laboratory study to show that users like and are happy to use PhishAri in real-world. To the best of our knowledge, this is the first realtime, comprehensive, and usable system to detect phishing on Twitter.
2. Motivation: Some Statistics
• $520 million were lost worldwide from
phishing attacks in 2011 alone. (RSA Report)
• In 2012, around 20% of all phishing attacks
targeted Facebook
• Social network phishing has jumped 221%
attacks during Q1 of 2012
2
3. Phishing Detection on OSM:
Current State-of-Art
• Offline Spam Characterization & Detection Studies
• No characterization of phishing on OSM
• Lack of Realtime detection mechanisms
• Absence of end-user deployed systems
• Dependence on Spam/Phishing Blacklists
3
4. What Did We Do to
Fill the Gap?
• Built a mechanism to Automatically detect
phishing on Twitter in Realtime
• No dependency on Blacklists
• Deployed end-user system for Twitter
users - Chrome Extension
4
5. Twitter 101
Hey, I am in
Puerto Rico
attending @APWG
eCrime research
Tweets
<140 char
Talking about
#phishing on OSN
Earn Money #help #money
http://bit.ly/Pw637z
5
6. Twitter 101
Hey, I am in
Puerto Rico
To mention/reply
to a Twitter user
@Tag
attending @APWG
eCrime research
Talking about
#phishing on OSN
To mention a topic
#Tag
Earn Money #help #money
http://bit.ly/Pw637z
To link external media
URL in Tweet
6
7. Twitter 101
We’ll follow
Blue!
attending @APWG
eCrime research
I’ll follow
Grey2!
I’ll follow
Grey1!
Followers
Nice! I’ll share this tweet
in my network!
attending @APWG
eCrime research
Followees
Retweet (RT)
7
8. Twitter 101
We’ll follow
Blue!
@Blue
Twitter Timeline
attending @APWG
eCrime research
I’ll follow
Grey2!
I’ll follow
Grey1!
Followers
Tweets by Followees
Retweets by Followees
Tweets by Self
Retweets by Self
Tweets with @Blue
Nice! I’ll share this
tweet in my
network!
attending @APWG
eCrime research
Followees
Retweet (RT)
8
9. Challenges of Phishing
Detection on Twitter
• Only 140 Characters - very less information
• Use of short URLs in tweets
• 100,000 Tweets per minute - quick spread
• Phishing Blacklists are slow - not reliable
9
10. Our Contribution
• PhishAri: Automatic realtime phishing
detection mechanism for Twitter
• More efficient than plain blacklisting method
• Better than Twitter’s own phishing detection
mechanism
• Real-world implementation of the system Chrome Extension for Twitter
10
11. Methodology
•
•
Step 1: Classification Model for Phishing
Detection
•
•
•
Data Collection
Feature Extraction
Classification
Step 2: Realtime end-user Interface
•
•
Using pre-trained classification model
Chrome Browser Extension
11
13. Features Used
• URL Features - Length, number of dots,
characters, redirections
• WHOIs Features - domain name,
ownership period
• Tweet Features - Number of #tags,
@mentions, length, trending topics
• Network Features - Follower/Followee
ratio, Age of account, Number of Tweets
13
15. Evaluation
• Comparison with Blacklists
• 80.6% more phishing tweets detected by
PhishAri at zero hour which were caught by
blacklists after 3 days.
• Comparison with Twitter’s defense mechanism
• 84.6% more phishing tweets detected by
PhishAri at zero hour which were marked as
suspicious by Twitter after 3 days
15
16. Time Evaluation
• Used Intel Xeon 16 core Ubuntu server with
2.67 GHz processor and 32 GB RAM
• Multiprocessing Modules for faster processing
• Time required for the feature extraction &
classification of a tweet is a maximum of
0.522 seconds (Min: 0.167 sec, Avg: 0.425 sec, Median 0.384 sec)
16
18. PhishAri: RESTful API
• Use above classification model to create a
RESTful API
• POST requests can be made to API to query
a tweet
• Pre-trained classifier model used for
classification of new tweets
18
20. PhishAri Chrome Extension
• Red / Green Indicators in front of Tweets with
URLs
• Detects phishing tweets on
• User Timeline
• Twitter search results
• Profile of other users
• DMs (Limited as for now)
20
23. PhishAri Extension: User
Experience and Statistics
• 78 Active Users
• User study shows that • users want support for other browsers,
mobile apps
• found useful to use
• more robustness desired
23
24. Conclusion
• “Phish” + “Ari” = Realtime Automatic Detection
• 92.52% Accuracy with Random Forest Classifier
• Efficient - takes only 0.522 seconds for indicator
to appear
• No dependency on Blacklists
• Faster than Blacklists
• Faster than Twitter’s own detection mechanism
24
25. Future Work
• Backend database for faster lookup
• Increase the scope of PhishAri from public to all
tweets
• Increase response time of PhishAri and
appearance of indicators
• Support for other browsers and mobile apps
25