SlideShare uma empresa Scribd logo
1 de 15
CANTINA
A Content-Based Approach to Detecting Phishing Web
Sites
•CANTINA is a content-based
approach.
•Examines whether the content is
legitimate or not.
•Detects phishing URLs and links.
ABSTRACT
INTRODUCTION
• Phishing
A kind of attack in which victims are tricked by
spoofed emails and fraudulent web sites into giving
up personal information
•How many phishing sites are there?
9,255 unique phishing sites were reported in June of
2006 alone
•How much phishing costs each year?
$1 billion to 2.8 billion per year
EXISTING SYSTEM
• NetCraft(Surface Characteristics)
• SpoofGuard(Surface Characteristics and
blacklist)
• Cloudmark(Blacklist )
PROPOSED SYSTEM
• Detects phishing websites
• Examines text-based content along with surface
characteristics.
• Text based content includes:
-Age of Domain.
-Known Images.
-Suspicious URL.
-Suspicious links.
 Detects phishing links in users email.
TF-IDF ALGORITHM
• Term Frequency (TF)
–The number of times a given term appears
in a specific document
–Measure of the importance of the term
within the particular document
• Inverse Document Frequency (IDF)
–Measure how common a term is across an
entire collection of documents
• High TF-IDF weight means High TF
REAL EBAY WEBPAGE
FAKE EBAY WEBPAGE
MODULES
• Parsing the web pages
• Generating the lexical signature
• Testing Process
• Report Generation
Parsing the web pages
• Link, anchor tag, form tag and attachment in the
web pages is turned into corresponding Text Link,
HTML Link e.t.c.
•Done by parsing each Text
• Uses HTML Parser API
• It is used for extracting information from
HTML code
Generating the lexical signature
• TF-IDF algorithm used to generate
lexical signatures.
• Calculating the TF-IDF value for each
word in a document.
• Selecting the words with highest
value.
Testing Process
• Feed this lexical signature to a search
engine.
• Check domain name of the current
web page matches the domain name
of the N top search results.
Report Generation
• If a page is Legitimate it returns
“legitimate”
• If a page is phishing it returns
“phishing”
• Used to detect fraudulent websites,
emails.
•Protects from giving up personal
information like credit card numbers,
bank details, account passwords etc.
•Used to detect suspicious links in
email.
APPLICATIONS
•Content-based approach for detecting
phishing websites.
•User friendly interface for the users.
•Anti-phishing website that protects users
from giving their personal information.
CONCLUSION

Mais conteúdo relacionado

Semelhante a Cantina content based approach to detect phishing websites

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010Yahoo Developer Network
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate SecurityFiroze Hussain
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)FabMinds
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxborith10b
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajeshRajeshP153
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learningijtsrd
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At ScaleMayank Dhiman
 
Identity Theft
Identity TheftIdentity Theft
Identity TheftSimpletel
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Chi En (Ashley) Shen
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analyticsShilpa P
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basicsJyoti Yadav
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Selman Bozkır
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Yahoo Developer Network
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'Daniel Williams
 

Semelhante a Cantina content based approach to detect phishing websites (20)

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate Security
 
Web mining
Web miningWeb mining
Web mining
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajesh
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At Scale
 
Identity Theft
Identity TheftIdentity Theft
Identity Theft
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
 
Phishing
PhishingPhishing
Phishing
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analytics
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basics
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'
 
DC presentation 1
DC presentation 1DC presentation 1
DC presentation 1
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Cantina content based approach to detect phishing websites

  • 1. CANTINA A Content-Based Approach to Detecting Phishing Web Sites
  • 2. •CANTINA is a content-based approach. •Examines whether the content is legitimate or not. •Detects phishing URLs and links. ABSTRACT
  • 3. INTRODUCTION • Phishing A kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information •How many phishing sites are there? 9,255 unique phishing sites were reported in June of 2006 alone •How much phishing costs each year? $1 billion to 2.8 billion per year
  • 4. EXISTING SYSTEM • NetCraft(Surface Characteristics) • SpoofGuard(Surface Characteristics and blacklist) • Cloudmark(Blacklist )
  • 5. PROPOSED SYSTEM • Detects phishing websites • Examines text-based content along with surface characteristics. • Text based content includes: -Age of Domain. -Known Images. -Suspicious URL. -Suspicious links.  Detects phishing links in users email.
  • 6. TF-IDF ALGORITHM • Term Frequency (TF) –The number of times a given term appears in a specific document –Measure of the importance of the term within the particular document • Inverse Document Frequency (IDF) –Measure how common a term is across an entire collection of documents • High TF-IDF weight means High TF
  • 9. MODULES • Parsing the web pages • Generating the lexical signature • Testing Process • Report Generation
  • 10. Parsing the web pages • Link, anchor tag, form tag and attachment in the web pages is turned into corresponding Text Link, HTML Link e.t.c. •Done by parsing each Text • Uses HTML Parser API • It is used for extracting information from HTML code
  • 11. Generating the lexical signature • TF-IDF algorithm used to generate lexical signatures. • Calculating the TF-IDF value for each word in a document. • Selecting the words with highest value.
  • 12. Testing Process • Feed this lexical signature to a search engine. • Check domain name of the current web page matches the domain name of the N top search results.
  • 13. Report Generation • If a page is Legitimate it returns “legitimate” • If a page is phishing it returns “phishing”
  • 14. • Used to detect fraudulent websites, emails. •Protects from giving up personal information like credit card numbers, bank details, account passwords etc. •Used to detect suspicious links in email. APPLICATIONS
  • 15. •Content-based approach for detecting phishing websites. •User friendly interface for the users. •Anti-phishing website that protects users from giving their personal information. CONCLUSION