SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
WarningBird: Detecting Suspicious URLs in
             Twitter Stream

           Sangho Lee and Jong Kim
   Pohang University of Science and Technology


                January 18, 2012
Threat
         Post URLs to attract traffic to website
         Can deliver various payloads
Threat
         Post URLs to attract traffic to website
         Can deliver various payloads

           Spam
Threat
         Post URLs to attract traffic to website
         Can deliver various payloads

           Spam
           Phishing
Threat
         Post URLs to attract traffic to website
         Can deliver various payloads

           Spam
           Phishing
           Download
           Malicious
           Software
Twitter
      Online micro-blogging service
          Large (about 100 million accounts)
          URL shortener services
          Tweets broadcasted to legitimate users
Twitter
      Online micro-blogging service
          Large (about 100 million accounts)
          URL shortener services
          Tweets broadcasted to legitimate users
      Good vector for attackers to attract traffic
          Many potential targets
          URL shorteners common and mask actual website
          Many users view tweets based on content and not authorship
Existing Detection Approaches and Limitations
    1. Detect accounts based on account information
           E.g., ratio of Tweets with URLs to Tweets without URLs
           Easily fabricated by attacker
Existing Detection Approaches and Limitations
    1. Detect accounts based on account information
           E.g., ratio of Tweets with URLs to Tweets without URLs
           Easily fabricated by attacker
    2. Detect accounts based on social graph
           E.g., connectivity measures for each node
           Hard to obtain and analyze large amounts of Twitter data
Existing Detection Approaches and Limitations
    1. Detect accounts based on account information
           E.g., ratio of Tweets with URLs to Tweets without URLs
           Easily fabricated by attacker
    2. Detect accounts based on social graph
           E.g., connectivity measures for each node
           Hard to obtain and analyze large amounts of Twitter data
    3. Crawl URLs to classify them
           E.g., detect malicious URLs based on html content
           Redirection chains used by attackers
Redirection Chains




      Redirect chains start by resolving shortened URL
      Several hops of URLs owned by attacker to redirect user
      Dynamically choose which page a user ultimately visits
          Crawlers goto legitimate URL
          Legitimate users goto the malicious URL
Problem
      Given a URL posted on Twitter, determine whether a
      legitimate user would ultimately be directed to a malicious
      URL by visiting the URL on Twitter
Problem
      Given a URL posted on Twitter, determine whether a
      legitimate user would ultimately be directed to a malicious
      URL by visiting the URL on Twitter
      Assumptions:
          Cannot use features easily fabricated by attacker
          No access to large Twitter graph
          Have access to part of redirect chain available to crawlers
          Redirect chains cannot be fabricated
Problem
      Given a URL posted on Twitter, determine whether a
      legitimate user would ultimately be directed to a malicious
      URL by visiting the URL on Twitter
      Assumptions:
          Cannot use features easily fabricated by attacker
          No access to large Twitter graph
          Have access to part of redirect chain available to crawlers
          Redirect chains cannot be fabricated
      Solution Overview:
          Create classifier
          Rely on redirect chain for features
          Validate accuracy/performance with Twitter data
Warning Bird




      Input: tweets
      Output: suspicious URLs
      Live website shows recent suspicious URLs
Data Collection




      Use Twitter Streaming API to collect Tweets
      Keep only Tweets with URLs
      Crawl and store URL chain of each URL
      Queue many Tweets to be analyzed together
Feature Extraction




     Grouping domains xyz.com
     = 20.30.40.50 = abc.com
     Find entry point URLs
     11 features based on URL
     chains and Tweet context
Features
Classifier




       Features are all normalized between zero and one
       Logistic regression classification experimentally found to be
       the best
       Ground truth from Twitter account status for supervised
       learning
Experimentation
      Real Twitter data from Twitter Streaming API
      Their own commodity hardware
      Performed experiments on Twitter data to investigate
          Accuracy
          Performance
          Delay in Detection
Accuracy Results
      60 days of training data 183k benign and 42k malicious URLs
      30 days of test data 71k benign and 6.7k malicious URLs
      Achieved 3.67% FPR and 3.21% FNR
      Of 71k benign, 2.6k marked malicious
      Of 6.7k malicious, 200 not discovered
Performance Results
      Running time of various components
          24ms time to crawl redirections (100 concurrent crawls)
          2ms domain grouping
          1.6ms feature extraction
          0.5ms classification
      Process 100,000 URLs in one hour
      Can distribute redirection crawling to improve this
Delay Results




      WarningBird can detect faster than Twitter
      Only shows results for those accounts suspended by Twitter
      within a day
Conclusion
      Found important feature others have ignored
      Attacker must either spend more for more redirection servers
      or risk being caught

Mais conteúdo relacionado

Mais procurados

Howtwitter works
Howtwitter worksHowtwitter works
Howtwitter works
zebikhan
 
Conventions of twitter 2
Conventions of twitter 2Conventions of twitter 2
Conventions of twitter 2
haverstockmedia
 
bluemix_spark_service
bluemix_spark_servicebluemix_spark_service
bluemix_spark_service
vishi nema
 

Mais procurados (13)

Paper nctsn
Paper nctsnPaper nctsn
Paper nctsn
 
4 tools for saving great tweets
4 tools for saving great tweets4 tools for saving great tweets
4 tools for saving great tweets
 
Mz sdl-140331
Mz sdl-140331Mz sdl-140331
Mz sdl-140331
 
Howtwitter works
Howtwitter worksHowtwitter works
Howtwitter works
 
How to Promote Your Twitter Profile
How to Promote Your Twitter ProfileHow to Promote Your Twitter Profile
How to Promote Your Twitter Profile
 
Web of Short URL’s
Web of Short URL’sWeb of Short URL’s
Web of Short URL’s
 
Fake followers audit
Fake followers auditFake followers audit
Fake followers audit
 
Spear Phishing Methodology
Spear Phishing MethodologySpear Phishing Methodology
Spear Phishing Methodology
 
Effective Anti-Phishing Strategies and Exercises - FISSEA 2017 Conference
Effective Anti-Phishing Strategies and Exercises - FISSEA 2017 ConferenceEffective Anti-Phishing Strategies and Exercises - FISSEA 2017 Conference
Effective Anti-Phishing Strategies and Exercises - FISSEA 2017 Conference
 
Social media analysis in R using twitter API
Social media analysis in R using twitter API Social media analysis in R using twitter API
Social media analysis in R using twitter API
 
Conventions of twitter 2
Conventions of twitter 2Conventions of twitter 2
Conventions of twitter 2
 
bluemix_spark_service
bluemix_spark_servicebluemix_spark_service
bluemix_spark_service
 
Social Developers London update for Twitter Developers
Social Developers London update for Twitter Developers Social Developers London update for Twitter Developers
Social Developers London update for Twitter Developers
 

Semelhante a Warningbird

Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Cybersecurity Education and Research Centre
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
Ashwin Palani
 
Rails 3 and OAuth for Barcamp Tampa
Rails 3 and OAuth for Barcamp TampaRails 3 and OAuth for Barcamp Tampa
Rails 3 and OAuth for Barcamp Tampa
Bryce Kerley
 

Semelhante a Warningbird (20)

Warningbird a near real time detection system for suspicious urls in twitter ...
Warningbird a near real time detection system for suspicious urls in twitter ...Warningbird a near real time detection system for suspicious urls in twitter ...
Warningbird a near real time detection system for suspicious urls in twitter ...
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Web Application Security
Web Application SecurityWeb Application Security
Web Application Security
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 
ppt presentation
ppt presentationppt presentation
ppt presentation
 
Spring Social - Messaging Friends & Influencing People
Spring Social - Messaging Friends & Influencing PeopleSpring Social - Messaging Friends & Influencing People
Spring Social - Messaging Friends & Influencing People
 
Url manipulation
Url manipulationUrl manipulation
Url manipulation
 
Proxy log review and use cases
Proxy log review and use casesProxy log review and use cases
Proxy log review and use cases
 
GNUCITIZEN Pdp Owasp Day September 2007
GNUCITIZEN Pdp Owasp Day   September 2007GNUCITIZEN Pdp Owasp Day   September 2007
GNUCITIZEN Pdp Owasp Day September 2007
 
Tracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesTracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo Pipes
 
F43033234
F43033234F43033234
F43033234
 
Colloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerColloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web Crawler
 
On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
Extracting Resources that Help Tell Events' Stories
Extracting Resources that Help Tell Events' StoriesExtracting Resources that Help Tell Events' Stories
Extracting Resources that Help Tell Events' Stories
 
Using & Abusing APIs: An Examination of the API Attack Surface
Using & Abusing APIs: An Examination of the API Attack SurfaceUsing & Abusing APIs: An Examination of the API Attack Surface
Using & Abusing APIs: An Examination of the API Attack Surface
 
Search Engine Poisoning
Search Engine PoisoningSearch Engine Poisoning
Search Engine Poisoning
 
Web spoofing (1)
Web spoofing (1)Web spoofing (1)
Web spoofing (1)
 
Rails 3 and OAuth for Barcamp Tampa
Rails 3 and OAuth for Barcamp TampaRails 3 and OAuth for Barcamp Tampa
Rails 3 and OAuth for Barcamp Tampa
 
Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Warningbird

  • 1. WarningBird: Detecting Suspicious URLs in Twitter Stream Sangho Lee and Jong Kim Pohang University of Science and Technology January 18, 2012
  • 2. Threat Post URLs to attract traffic to website Can deliver various payloads
  • 3. Threat Post URLs to attract traffic to website Can deliver various payloads Spam
  • 4. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing
  • 5. Threat Post URLs to attract traffic to website Can deliver various payloads Spam Phishing Download Malicious Software
  • 6. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users
  • 7. Twitter Online micro-blogging service Large (about 100 million accounts) URL shortener services Tweets broadcasted to legitimate users Good vector for attackers to attract traffic Many potential targets URL shorteners common and mask actual website Many users view tweets based on content and not authorship
  • 8. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker
  • 9. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data
  • 10. Existing Detection Approaches and Limitations 1. Detect accounts based on account information E.g., ratio of Tweets with URLs to Tweets without URLs Easily fabricated by attacker 2. Detect accounts based on social graph E.g., connectivity measures for each node Hard to obtain and analyze large amounts of Twitter data 3. Crawl URLs to classify them E.g., detect malicious URLs based on html content Redirection chains used by attackers
  • 11. Redirection Chains Redirect chains start by resolving shortened URL Several hops of URLs owned by attacker to redirect user Dynamically choose which page a user ultimately visits Crawlers goto legitimate URL Legitimate users goto the malicious URL
  • 12. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter
  • 13. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated
  • 14. Problem Given a URL posted on Twitter, determine whether a legitimate user would ultimately be directed to a malicious URL by visiting the URL on Twitter Assumptions: Cannot use features easily fabricated by attacker No access to large Twitter graph Have access to part of redirect chain available to crawlers Redirect chains cannot be fabricated Solution Overview: Create classifier Rely on redirect chain for features Validate accuracy/performance with Twitter data
  • 15. Warning Bird Input: tweets Output: suspicious URLs Live website shows recent suspicious URLs
  • 16. Data Collection Use Twitter Streaming API to collect Tweets Keep only Tweets with URLs Crawl and store URL chain of each URL Queue many Tweets to be analyzed together
  • 17. Feature Extraction Grouping domains xyz.com = 20.30.40.50 = abc.com Find entry point URLs 11 features based on URL chains and Tweet context
  • 19. Classifier Features are all normalized between zero and one Logistic regression classification experimentally found to be the best Ground truth from Twitter account status for supervised learning
  • 20. Experimentation Real Twitter data from Twitter Streaming API Their own commodity hardware Performed experiments on Twitter data to investigate Accuracy Performance Delay in Detection
  • 21. Accuracy Results 60 days of training data 183k benign and 42k malicious URLs 30 days of test data 71k benign and 6.7k malicious URLs Achieved 3.67% FPR and 3.21% FNR Of 71k benign, 2.6k marked malicious Of 6.7k malicious, 200 not discovered
  • 22. Performance Results Running time of various components 24ms time to crawl redirections (100 concurrent crawls) 2ms domain grouping 1.6ms feature extraction 0.5ms classification Process 100,000 URLs in one hour Can distribute redirection crawling to improve this
  • 23. Delay Results WarningBird can detect faster than Twitter Only shows results for those accounts suspended by Twitter within a day
  • 24. Conclusion Found important feature others have ignored Attacker must either spend more for more redirection servers or risk being caught