O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Jerry @ TDOHCON 2017-10-14
jlee58.tw@gmail.com
Jerry
•
• https://jerrynest.io/
•
2
3
4
5
–
–
–
–
–
6
•
•
7
•
•
8
•
•
•
9
•
10
•
•
–
–
11
•
•
•
12
•
•
•
13
•
14
15
16
http://best0969.cdn7-network17-server2.club
http://app9259.cdn7-network27-server2.club
http://apps4684.cdn7-bignetwork1...
17
Redirect!
•
18
19
•
20
21
•
•
•
•
22
23
…
• 84% of phishing sites exist for less than 24 hours and some sites just appear
for less than 15 minutes.
• Almost all o...
1. Establishment and maintenance of infrastructure
– Collection of public phishing data
– Updating of blacklist
– Streamin...
26
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
PhishTank
•
•
•
27
Suspicious phishing URL The interface for verification
Alexa Top List
28
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
29
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
• Deployed with container
technology, multi-thread
crawlers on Google Cloud
Platform (GCP)
• Features and screenshot will ...
1
31
•
•
32
/
•
•
•
33
2
•
•
•
34
wget --no-parent -Q10m --timestamping --reject otf,woff,woff2,ttf,eot --convert-links --
page-requisites --span...
35
36
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
37
•
–
–
–
–
–
–
–
–
–
–
–
–
38
•
–
–
–
–
–
–
•
–
–
–
–
–
39
• Domain based Features
– Age of Domain
– DNS Record
– Website Traffic
– PageRank
– Google In...
40
Rule: IF !
𝑈𝑅𝐿	𝑙𝑒𝑛𝑔𝑡ℎ < 54	 → 	𝑓𝑒𝑎𝑡𝑢𝑟𝑒 = Legitimate
	𝑒𝑙𝑠𝑒	𝑖𝑓	𝑈𝑅𝐿	𝑙𝑒𝑛𝑔𝑡ℎ ≥ 54	𝑎𝑛𝑑	 ≤ 75	 → 	𝑓𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑆𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠	
𝑜𝑡ℎ𝑒𝑟𝑤...
41
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
• Black/white list-based
– Google Safe Browsing
– PhishNet
– Automated individual white-list (AIWL)
• Visual-based
– Earth...
•
–
–
–
•
–
•
–
–
43
44
45
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
/
•
•
46
Displaying phishing information The interface for labeler to verify phishing sites
+
47
Submitted
Voting Module Monitoring module
Voting Verified Blacklist Invalid
Crawl VoterClassifier Evaluation Crawl Cl...
/ Selenium
48
49
50
Phishing site Crawler Feature Extraction Detection Model Analysis & Report
HTML
CSS
JavaScript
Fonts
…
HTTP response he...
51
IEEE DASC 2017 Accepted
•
–
–
•
–
–
•
–
52
The integrated architecture
53
Data
collection
Blacklist
update
Infrastructure
Monitoring
ETL Monitoring Model (Validation...
Two-stage phishing detection model
• The two-stage phishing detection model
is combined with validation and
detection mode...
Phishing data validation
• Once a page encounter the following situations, we call it invalid
– Offline: the website is no...
Examples of invalid page
56
The page has been removed
Blocked by host provider
Domain Parking
Redirect to homepage Error m...
Examples of phishing page
57
Multi-provider login page Specific target
58
59
Active learning
60
Ensemble
validation model
Labeled
training set
Unlabeled
pool
Sampling
algorithm
(Initial label size)
(...
The rules of manual labeling
61
The screenshot on PhishTankThe screenshot we took
URL and host information
Label area
1. C...
62
63
64
65
Real
New version
Fake Fake Fake
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
推薦閱讀 /未來的犯罪:當萬物都可駭,我們該如何面對
96
97
• PhishTank - https://www.phishtank.com/
• UCI Phishing dataset - https://archive.ics.uci.edu/ml/datasets/phishing+webs...
98
Blog: https://jerrynest.io/
Facebook: https://www.facebook.com/jerrynest.io/
Próximos SlideShares
Carregando em…5
×

對抗釣魚與詐騙網站的經驗談

2.091 visualizações

Publicada em

你曾落入網路釣魚的陷阱嗎?網址看起來正常、網頁看起來很像,但只要稍微眼花,個人資料就已經被偷走拉!這場演講將介紹網路釣魚的近況、現有技術,分享對抗釣魚的實作經驗。

Publicada em: Software
  • Seja o primeiro a comentar

對抗釣魚與詐騙網站的經驗談

  1. 1. Jerry @ TDOHCON 2017-10-14 jlee58.tw@gmail.com
  2. 2. Jerry • • https://jerrynest.io/ • 2
  3. 3. 3
  4. 4. 4
  5. 5. 5
  6. 6. – – – – – 6
  7. 7. • • 7
  8. 8. • • 8
  9. 9. • • • 9
  10. 10. • 10
  11. 11. • • – – 11
  12. 12. • • • 12
  13. 13. • • • 13
  14. 14. • 14
  15. 15. 15
  16. 16. 16 http://best0969.cdn7-network17-server2.club http://app9259.cdn7-network27-server2.club http://apps4684.cdn7-bignetwork17-server9.top
  17. 17. 17 Redirect!
  18. 18. • 18
  19. 19. 19
  20. 20. • 20
  21. 21. 21 •
  22. 22. • • • 22
  23. 23. 23
  24. 24. … • 84% of phishing sites exist for less than 24 hours and some sites just appear for less than 15 minutes. • Almost all of the phishing sites are hidden within the legitimate domains. 24 Changing Fast Cross-platformHacked Server
  25. 25. 1. Establishment and maintenance of infrastructure – Collection of public phishing data – Updating of blacklist – Streaming analysis with Storm – The analysis of duplication 2. The evolution of detection and prevention technology – List-based – Visual-based – Feature-based – Ensemble model 25
  26. 26. 26 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  27. 27. PhishTank • • • 27 Suspicious phishing URL The interface for verification
  28. 28. Alexa Top List 28 http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
  29. 29. 29 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  30. 30. • Deployed with container technology, multi-thread crawlers on Google Cloud Platform (GCP) • Features and screenshot will be extracted and store in File storage, image server and Mongo database. 30 Legitimate sitesPhishing sites Crawler 1 Crawler 2 Crawler 3 Crawler N Image Server MongoDB Feature Extractor Web Crawlers Analysis URL Fetcher URL Pool File Storage Data sources …
  31. 31. 1 31
  32. 32. • • 32
  33. 33. / • • • 33
  34. 34. 2 • • • 34 wget --no-parent -Q10m --timestamping --reject otf,woff,woff2,ttf,eot --convert-links -- page-requisites --span-hosts --adjust-extension --no-check-certificate -e robots=off -U "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" -P download/ "https://www.google.com.tw/"
  35. 35. 35
  36. 36. 36 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  37. 37. 37
  38. 38. • – – – – – – – – – – – – 38
  39. 39. • – – – – – – • – – – – – 39 • Domain based Features – Age of Domain – DNS Record – Website Traffic – PageRank – Google Index – Number of Links Pointing to Page – Statistical-Reports Based Feature
  40. 40. 40 Rule: IF ! 𝑈𝑅𝐿 𝑙𝑒𝑛𝑔𝑡ℎ < 54 → 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 = Legitimate 𝑒𝑙𝑠𝑒 𝑖𝑓 𝑈𝑅𝐿 𝑙𝑒𝑛𝑔𝑡ℎ ≥ 54 𝑎𝑛𝑑 ≤ 75 → 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑆𝑢𝑠𝑝𝑖𝑐𝑖𝑜𝑢𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 → 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 = Phishing LegitimateSuspicious Phishing http://federmacedoadv.com.br/3f/aze/ab51e2e319e51502f416dbe46b773a5e/?c md=_home&amp;dispatch=11004d58f5b74f8dc1e7c2e8dd4105e811004d58f5b7 4f8dc1e7c2e8dd4105e8@phishing.website.html
  41. 41. 41 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  42. 42. • Black/white list-based – Google Safe Browsing – PhishNet – Automated individual white-list (AIWL) • Visual-based – Earth Mover’s Distance (EMD) algorithm – SURF – Histogram of Oriented Gradients (HOG) • Feature-based – CANTINA – PhishWho – Mobile features • Ensemble – AJNA (SSL/TLS feature and JavaScript-based visual clues) – kAYO – MobileFish 42
  43. 43. • – – – • – • – – 43
  44. 44. 44
  45. 45. 45 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  46. 46. / • • 46 Displaying phishing information The interface for labeler to verify phishing sites
  47. 47. + 47 Submitted Voting Module Monitoring module Voting Verified Blacklist Invalid Crawl VoterClassifier Evaluation Crawl Classifier Evaluation The lifecycle of phish on the PhishTank • – • –
  48. 48. / Selenium 48
  49. 49. 49
  50. 50. 50 Phishing site Crawler Feature Extraction Detection Model Analysis & Report HTML CSS JavaScript Fonts … HTTP response header DNS record WHOIS record IP address URL SSL record Screenshot …
  51. 51. 51 IEEE DASC 2017 Accepted
  52. 52. • – – • – – • – 52
  53. 53. The integrated architecture 53 Data collection Blacklist update Infrastructure Monitoring ETL Monitoring Model (Validation/Detection)VisualizationVoting PhishBox Visual-based Phishing Detection technology Feature- based Feature selection
  54. 54. Two-stage phishing detection model • The two-stage phishing detection model is combined with validation and detection model – Non-phish = invalid + legitimate • Build the validation model with manual labeling – Apply supervised learning algorithm – Apply active learning • Improve the performance of detection model with the validated phishing data 54 Target Non-Phish Invalid Valid Legitimate Phish Phish Two-stage Model Validation model Detection model
  55. 55. Phishing data validation • Once a page encounter the following situations, we call it invalid – Offline: the website is not reachable. E.g. status code 404. – Redirection: the page is redirected to the legitimate page. – Invalid content: the content of the page is changed and contains invalid keyword such as “this account has been suspended” or “the page is forbidden”. 55 [Invalid content] The account has been suspended by host provider.[Redirection] Redirect to google homepage Construct a validation classifier!
  56. 56. Examples of invalid page 56 The page has been removed Blocked by host provider Domain Parking Redirect to homepage Error message from host provider Redirect to legitimate site
  57. 57. Examples of phishing page 57 Multi-provider login page Specific target
  58. 58. 58
  59. 59. 59
  60. 60. Active learning 60 Ensemble validation model Labeled training set Unlabeled pool Sampling algorithm (Initial label size) (Query block size)
  61. 61. The rules of manual labeling 61 The screenshot on PhishTankThe screenshot we took URL and host information Label area 1. Check the screenshots to confirm if it is invalid 2. Check the URL and WHOIS to confirm if it is invalid 3. Check the website with search engine to confirm if it is invalid
  62. 62. 62
  63. 63. 63
  64. 64. 64
  65. 65. 65 Real New version Fake Fake Fake
  66. 66. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69
  70. 70. 70
  71. 71. 71
  72. 72. 72
  73. 73. 73
  74. 74. 74
  75. 75. 75
  76. 76. 76
  77. 77. 77
  78. 78. 78
  79. 79. 79
  80. 80. 80
  81. 81. 81
  82. 82. 82
  83. 83. 83
  84. 84. 84
  85. 85. 85
  86. 86. 86
  87. 87. 87
  88. 88. 88
  89. 89. 89
  90. 90. 90
  91. 91. 91
  92. 92. 92
  93. 93. 93
  94. 94. 94
  95. 95. 95 推薦閱讀 /未來的犯罪:當萬物都可駭,我們該如何面對
  96. 96. 96
  97. 97. 97 • PhishTank - https://www.phishtank.com/ • UCI Phishing dataset - https://archive.ics.uci.edu/ml/datasets/phishing+websites • Google Cloud Platform - https://cloud.google.com/ • Weka (Data mining) - https://www.cs.waikato.ac.nz/ml/weka/ • scikit-learn (Machine Learning) - http://scikit-learn.org/stable/ • Micorsoft Machine Learning - https://azure.microsoft.com/zh-tw/services/machine- learning-studio/ • PhishBox : An approach for phishing validation and detection • 不要被騙了!帶你分析 Google 會員抽獎詐騙網頁 - https://jerrynest.io/google-scam- site/
  98. 98. 98 Blog: https://jerrynest.io/ Facebook: https://www.facebook.com/jerrynest.io/

×