SlideShare uma empresa Scribd logo
1 de 42
©2009CarnegieMellonUniversity:1
Smartening the Crowds:
Computational Techniques for
Improving Human Verification
to Fight Phishing Scams
Gang Liu
Wenyin Liu
Department of Computer Science
City University of Hong Kong
Guang Xiang
Bryan A. Pendleton
Jason I. Hong
Carnegie Mellon University
©2011CarnegieMellonUniversity:2
©2011CarnegieMellonUniversity:3
Detecting Phishing Websites
• Method 1: Use heuristics
– Unusual patterns in URL, HTML, topology
– Approach is favored by researchers
– High true positives, some false positives
• Method 2: Manually verify
– Approach used by industry blacklists today
(Microsoft, Google, PhishTank)
– Very few false positives, low risk of liability
– Slow, easy to overwhelm
©2011CarnegieMellonUniversity:4
©2011CarnegieMellonUniversity:5
©2011CarnegieMellonUniversity:6
©2011CarnegieMellonUniversity:7
Wisdom of Crowds Approach
• Mechanics of PhishTank
– Submissions require at least 4 votes
and 70% agreement
– Some votes weighted more
• Total stats (Oct2006 – Feb2011)
– 1.1M URL submissions from volunteers
– 4.3M votes
– resulting in about 646k identified phish
• Why so many votes for only 646k phish?
©2011CarnegieMellonUniversity:8
PhishTank Statistics
Jan 2011
Submissions 16019
Total Votes 69648
Valid Phish 12789
Invalid Phish 549
Median Time 2hrs 23min
• 69648 votes → max of 17412 labels
– But only 12789 phish and 549 legitimate identified
– 2681 URLs not identified at all
• Median delay of 2+ hours still has room
for improvement
©2011CarnegieMellonUniversity:9
Why Care?
• Can improve performance of
human-verified blacklists
– Dramatically reduce time to blacklist
– Improve breadth of coverage
– Offer same or better level of accuracy
• More broadly, new way of improving
performance of crowd for a task
©2011CarnegieMellonUniversity:10
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
©2011CarnegieMellonUniversity:11
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
©2011CarnegieMellonUniversity:12
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
©2011CarnegieMellonUniversity:13
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
©2011CarnegieMellonUniversity:14
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
©2011CarnegieMellonUniversity:15
Overview of Our Work
• Crawled unverified submissions from
PhishTank over 2 week period
• Replayed URLs on MTurk over 2 weeks
– Required participants to play
2 rounds of Anti-Phishing Phil
– Clustered phish by html similarity
– Two cases: phish one at a time, or in a
cluster (not strictly separate conditions)
– Evaluated effectiveness of vote weight
algorithm after the fact
©2011CarnegieMellonUniversity:16
Anti-Phishing Phil
• We had MTurkers play two rounds of
Phil [Sheng 2007] to qualify (µ = 5.2min)
• Goal was to reduce lazy MTurkers and
ensure base level of knowledge
©2011CarnegieMellonUniversity:17
Clustering Phish
• Observations
– Most phish are generated by toolkits and
thus are similar in content and appearance
– Can potentially reduce labor by labeling
suspicious sites in bulk
– Labeling single sites as phish can be hard
if unfamiliar, easier if multiple examples
©2011CarnegieMellonUniversity:18
Clustering Phish
• Motivations
– Most phish are generated by toolkits and
thus similar
– Labeling single sites as phish can be hard,
easier if multiple examples
– Reduce labor by labeling suspicious sites
in bulk
©2011CarnegieMellonUniversity:19
Clustering Phish
• Motivations
– Most phish are generated by toolkits and
thus similar
– Labeling single sites as phish can be hard,
easier if multiple examples
– Reduce labor by labeling suspicious sites
in bulk
©2011CarnegieMellonUniversity:20
Most Phish Can be Clustered
• With all data over two weeks, 3180 of
3973 web pages can be grouped (80%)
– Used shingling and DBSCAN (see paper)
– 392 clusters, size from 2 to 153 URLs
©2011CarnegieMellonUniversity:21
©2011CarnegieMellonUniversity:22
MTurk Tasks
• Two kinds of tasks, control and cluster
– Listed these two as separate HITs
– MTurkers paid $0.01 per label
– Cannot do between-conditions on MTurk
– MTurker saw a given URL at most once
• Four votes minimum, 70% threshold
– Stopped at 4 votes, cannot dynamically
request more votes on MTurk
– 153 (3.9%) in control and 127 (3.2%) in
cluster not labeled
©2011CarnegieMellonUniversity:23
MTurk Tasks
• URLs were replayed in order
– Ex. If crawled at 2:51am from PhishTank
on day 1, then we would replay at 2:51am
on day 1 of experiment
– Listed new HITs each day rather than a
HIT lasting two weeks (to avoid delays
and last minute rush)
©2011CarnegieMellonUniversity:24
Summary of Experiment
• 3973 suspicious URLs
– Ground truth from Google, MSIE, and
PhishTank, checked every 10 min
– 3877 were phish, 96 not
• 239 MTurkers participated
– 174 did HITs for both control and cluster
– 26 in Control only, 39 in Cluster only
• Total of 33,781 votes placed
– 16,308 in control
– 11,463 in cluster (17473 equivalent)
• Cost (participants + Amazon): $476.67 USD
©2011CarnegieMellonUniversity:25
Results of Aquarium
• All votes are the individual votes
• Labeled URLs are after aggregation
©2011CarnegieMellonUniversity:26
Comparing Coverage and Time
©2011CarnegieMellonUniversity:27
Voteweight
• Use time and accuracy to weight votes
– Those who vote early and accurately
are weighted more
– Older votes discounted
– Incorporates a penalty for wrong votes
• Done after data was collected
– Harder to do in real-time since we don’t
know true label until later
• See paper for parameter tuning
– Of threshold and penalty function
©2011CarnegieMellonUniversity:28
Voteweight Results
• Control condition best scenario
– Before-after
– 94.8% accuracy, avg 11.8 hrs, median 3.8
– 95.6% accuracy, avg 11.0 hrs, median 2.3
• Cluster condition best scenario
– Before-after
– 95.4% accuracy, avg 1.8 hrs, median 0.7
– 97.2% accuracy, avg 0.8 hrs, median 0.5
• Overall: small gains, potentially more
fragile and more complex though
©2011CarnegieMellonUniversity:29
Limitations of Our Study
• Two limitations of MTurk
– No separation between control and cluster
– ~3% tie votes unresolved (more votes)
• Possible learning effects?
– Hard to tease out with our data
– Aquarium doesn’t offer feedback
– Everyone played Phil
– No condition prioritized over other
• Optimistic case, no active subversion
©2011CarnegieMellonUniversity:30
Conclusion
• Investigated two techniques for
smartening the crowd for anti-phishing
– Clustering and voteweight
• Clustering offers significant
advantages wrt time and coverage
• Voteweight offers smaller
improvements in effectiveness
©2011CarnegieMellonUniversity:31
Acknowledgments
• Research supported by CyLab under
ARO DAAD19-02-1-0389 and
W911NF-09-1-0273
• Research Grants Council of the Hong
Kong Special Administrative Region,
China [Project No. CityU 117907]
©2011CarnegieMellonUniversity:32
©2011CarnegieMellonUniversity:33
Individual Accuracy
©2011CarnegieMellonUniversity:34
4. Framework
©2011CarnegieMellonUniversity:35
4.1 whitelists
 include 3208 domains
 From Google safe browsing (2784)
 http://sb.google.com/safebrowsing/update?version
=goog-white-domain:1:1
 From millersmiles (424)
 http://www.millersmiles.co.uk/scams.php
 Reduce false positive
 Save human effort
©2011CarnegieMellonUniversity:36
4.2 Clustering
 Content similarity measurement (Shingling method)
 S(q),S(d) denote the set of unique n-grams in page q and d
 The threshold is 0.65
 The average time cost : 0.063 microseconds (SD=0.05)
 calculating similarity of two web pages on a laptop with 2GHz
dual core CPU with 1 GB of RAM
 DBSCAN
 Eps=0.65 and MinPts=2.
 The time cost of clustering over all 3973 pages collected
was about 1 second.
( )
( ) ( )
( ) ( )dSqS
dSqS
dqr


=,
©2011CarnegieMellonUniversity:37
4.2 Clustering
 Incremental update of the data
 If there is no similar web page, we create a new cluster for
the new submission.
 If the similarity is above the given threshold and all similar
web pages are in the same cluster, we assign the new
submission to this cluster (unless the cluster is at its
maximum size).
 If there are many similar web papes in different clusters,
we choose the largest cluster that is not at its maximum
size.
 After a new submission is grouped in a cluster
 It has zero votes and does not inherit the votes of any other
submissions in the same cluster.
©2011CarnegieMellonUniversity:38
4.3 Voteweight
The core idea behind voteweight is that participants
who are more helpful in terms of time and accuracy are
weighted more than other participants.
 It measures how powerful of users’ votes impact on the final
status of the suspicious URLs.
 Its value comes from the accuracy of users’ historical data
 a correct vote should be rewarded and a wrong one should
be penalized
 recent behavior should be weighted more than past behavior
©2011CarnegieMellonUniversity:39
4.3 Voteweight
 In our model, we use y {t,+∞} y {-t,-∞}∈ ∪ ∈ to label the
status of a URL,
 where y is the sum of voteweight of a given URL,
 t is the threshold of voteweight,
 y≥ t means a URL has been voted as a phishing
URL
 y≤-t means voted as legitimate.
©2011CarnegieMellonUniversity:40
4.3 Voteweight
∑ =
= M
k k
i
i
v
v
v
1
'


 ≥
=
otherwise
RVifRV
v ii
i
0
0
iii PRRV ⋅−= α ∑=
=⋅
−
+−
=
N
j
LC
j
i jI
TT
TT
R jij
1 0
0
)(
1
∑=
≠⋅
−
+−
=
N
j
LC
j
i jI
TT
TT
P jij
1 0
0
)(
1
( )



∉
∈
=
Axif
Axif
xIA
0
1
∑=
⋅=
K
i
itit Cvl
1
'



−
=
otherwise
phishasvotedif
Cit
1
1
(5)
(2)
(3) (4)
(1)
(6)
(7) (8)
©2011CarnegieMellonUniversity:41
7. Investigating Voteweight
Tuning Parameters in Control Condition
 Voteweight achieves its best accuracy 95.6% and time cost of 11
hours with t=0.08 and α=2.5 in the control condition
 Average time cost drops to 11 hours (11.8 hours without
voteweight)
 Median time cost drops to 2.3 hours (3.8 hours without woteweight)
©2011CarnegieMellonUniversity:42
7. Investigating Voteweight
Tuning Parameters in Cluster Condition
 Voteweight achieves its best accuracy of 97.2% and time cost of
0.8 hours with t=0.06 and α= 1 in the control condition
 Average time cost drops to 0.8 hours (1.8 hours without
voteweight)
 Median time cost drops to 0.5 hours (0.7 hours without
woteweight)

Mais conteúdo relacionado

Destaque

WebQuilt: Capturing and Visualizing the Web Experience at WWW10
WebQuilt: Capturing and Visualizing the Web Experience at WWW10WebQuilt: Capturing and Visualizing the Web Experience at WWW10
WebQuilt: Capturing and Visualizing the Web Experience at WWW10Jason Hong
 
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014Jason Hong
 
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004Jason Hong
 
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Jason Hong
 
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010Jason Hong
 
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...Jason Hong
 
Satin, a toolkit for sketch-based applications at UIST 2000
Satin, a toolkit for sketch-based applications at UIST 2000Satin, a toolkit for sketch-based applications at UIST 2000
Satin, a toolkit for sketch-based applications at UIST 2000Jason Hong
 
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...Jason Hong
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Jason Hong
 
Why People Hate Your App: Making Sense of User Feedback in a Mobile App Stor...
Why People Hate Your App: Making  Sense of User Feedback in a Mobile App Stor...Why People Hate Your App: Making  Sense of User Feedback in a Mobile App Stor...
Why People Hate Your App: Making Sense of User Feedback in a Mobile App Stor...Jason Hong
 
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...Jason Hong
 
Designing the User Experience for Online Privacy, at IAPP Navigate 2013
Designing the User Experience for Online Privacy, at IAPP Navigate 2013Designing the User Experience for Online Privacy, at IAPP Navigate 2013
Designing the User Experience for Online Privacy, at IAPP Navigate 2013Jason Hong
 
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012Jason Hong
 
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...Jason Hong
 
Improving Usable Authentication
Improving Usable AuthenticationImproving Usable Authentication
Improving Usable AuthenticationJason Hong
 
Usable Security and Passwords, Cylab Corporate Partners Oct 2009
Usable Security and Passwords, Cylab Corporate Partners Oct 2009Usable Security and Passwords, Cylab Corporate Partners Oct 2009
Usable Security and Passwords, Cylab Corporate Partners Oct 2009Jason Hong
 
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004Jason Hong
 
Increasing Security Sensitivity With Social Proof: A Large-Scale Experimenta...
Increasing Security Sensitivity With Social Proof: A Large-Scale  Experimenta...Increasing Security Sensitivity With Social Proof: A Large-Scale  Experimenta...
Increasing Security Sensitivity With Social Proof: A Large-Scale Experimenta...Jason Hong
 
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004Jason Hong
 
Intelligent Agents for Helping Humanity Reach Its Full Potential
Intelligent Agents for Helping Humanity Reach Its Full PotentialIntelligent Agents for Helping Humanity Reach Its Full Potential
Intelligent Agents for Helping Humanity Reach Its Full PotentialJason Hong
 

Destaque (20)

WebQuilt: Capturing and Visualizing the Web Experience at WWW10
WebQuilt: Capturing and Visualizing the Web Experience at WWW10WebQuilt: Capturing and Visualizing the Web Experience at WWW10
WebQuilt: Capturing and Visualizing the Web Experience at WWW10
 
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014
Toss ‘N’ Turn: Smartphone as Sleep and Sleep Quality Detector, at CHI 2014
 
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004
Privacy in the Age of Ubiquitous Computing, Stanford PCD seminar March 2004
 
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
Exploring Capturable Everyday Memory for Autobiographical Authentication, at ...
 
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010
Teaching Johnny Not to Fall for Phish, for ISSA 2010 on May 2010
 
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...
Knowledgeable Users are the Best Cyber Security Defense, for ISSA webinar Sep...
 
Satin, a toolkit for sketch-based applications at UIST 2000
Satin, a toolkit for sketch-based applications at UIST 2000Satin, a toolkit for sketch-based applications at UIST 2000
Satin, a toolkit for sketch-based applications at UIST 2000
 
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...
Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People N...
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
 
Why People Hate Your App: Making Sense of User Feedback in a Mobile App Stor...
Why People Hate Your App: Making  Sense of User Feedback in a Mobile App Stor...Why People Hate Your App: Making  Sense of User Feedback in a Mobile App Stor...
Why People Hate Your App: Making Sense of User Feedback in a Mobile App Stor...
 
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...
Sketch Recognizers from the End-User’s, the Designer’s, and the Programmer’s ...
 
Designing the User Experience for Online Privacy, at IAPP Navigate 2013
Designing the User Experience for Online Privacy, at IAPP Navigate 2013Designing the User Experience for Online Privacy, at IAPP Navigate 2013
Designing the User Experience for Online Privacy, at IAPP Navigate 2013
 
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012
OTO: Online Trust Oracle for User-Centric Trust Establishment, at CCS 2012
 
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...
Statistical Analysis of Phished Email Users, Intercepted by the APWG/CMU Phis...
 
Improving Usable Authentication
Improving Usable AuthenticationImproving Usable Authentication
Improving Usable Authentication
 
Usable Security and Passwords, Cylab Corporate Partners Oct 2009
Usable Security and Passwords, Cylab Corporate Partners Oct 2009Usable Security and Passwords, Cylab Corporate Partners Oct 2009
Usable Security and Passwords, Cylab Corporate Partners Oct 2009
 
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
 
Increasing Security Sensitivity With Social Proof: A Large-Scale Experimenta...
Increasing Security Sensitivity With Social Proof: A Large-Scale  Experimenta...Increasing Security Sensitivity With Social Proof: A Large-Scale  Experimenta...
Increasing Security Sensitivity With Social Proof: A Large-Scale Experimenta...
 
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004
Topiary: A Tool for Prototyping Location-Enhanced Applications, at UIST 2004
 
Intelligent Agents for Helping Humanity Reach Its Full Potential
Intelligent Agents for Helping Humanity Reach Its Full PotentialIntelligent Agents for Helping Humanity Reach Its Full Potential
Intelligent Agents for Helping Humanity Reach Its Full Potential
 

Semelhante a Smartening the Crowd: Computational Techniques for Improving Human Verification, at SOUPS 2011

Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...Jason Hong
 
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...Yoji Kiyota
 
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...Paragon_Science_Inc
 
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012Jason Hong
 
Helping the 3rd Sector be more efficient and effective
Helping the 3rd Sector be more efficient and effectiveHelping the 3rd Sector be more efficient and effective
Helping the 3rd Sector be more efficient and effectiveThe OR Society
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Pete Burnap
 
Sharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) WorkshopSharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) WorkshopHristian Daskalov
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)Alexander Borzunov
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesDavid Zibriczky
 
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...Angela Fessl
 
Combining Different Summarization Techiniques for Legal Text
Combining Different Summarization Techiniques for Legal TextCombining Different Summarization Techiniques for Legal Text
Combining Different Summarization Techiniques for Legal TextFilippo Galgani
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experiencePavel Serdyukov
 
The Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationThe Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationPerficient
 
01 internet peering-workshop-agenda
01 internet peering-workshop-agenda01 internet peering-workshop-agenda
01 internet peering-workshop-agendaWilliam Norton
 
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Webanalisten .nl
 
Automatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning systemAutomatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning systemEstevam Hruschka
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITBristlecone SCC
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP
 

Semelhante a Smartening the Crowd: Computational Techniques for Improving Human Verification, at SOUPS 2011 (20)

Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
 
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...
Mining User Experience through Crowdsourcing: A Property Search Behavior Corp...
 
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...
Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Re...
 
Chap2 mrkt263
Chap2 mrkt263Chap2 mrkt263
Chap2 mrkt263
 
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012
Leveraging Human Factors for Effective Security Training, at FISSEA Mar2012
 
Helping the 3rd Sector be more efficient and effective
Helping the 3rd Sector be more efficient and effectiveHelping the 3rd Sector be more efficient and effective
Helping the 3rd Sector be more efficient and effective
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
Sharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) WorkshopSharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) Workshop
 
Online LSNTAP / PBN 2014 Webinar
Online LSNTAP / PBN 2014 WebinarOnline LSNTAP / PBN 2014 Webinar
Online LSNTAP / PBN 2014 Webinar
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
 
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
 
Combining Different Summarization Techiniques for Legal Text
Combining Different Summarization Techiniques for Legal TextCombining Different Summarization Techiniques for Legal Text
Combining Different Summarization Techiniques for Legal Text
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experience
 
The Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationThe Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data Migration
 
01 internet peering-workshop-agenda
01 internet peering-workshop-agenda01 internet peering-workshop-agenda
01 internet peering-workshop-agenda
 
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
 
Automatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning systemAutomatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning system
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Smartening the Crowd: Computational Techniques for Improving Human Verification, at SOUPS 2011

  • 1. ©2009CarnegieMellonUniversity:1 Smartening the Crowds: Computational Techniques for Improving Human Verification to Fight Phishing Scams Gang Liu Wenyin Liu Department of Computer Science City University of Hong Kong Guang Xiang Bryan A. Pendleton Jason I. Hong Carnegie Mellon University
  • 3. ©2011CarnegieMellonUniversity:3 Detecting Phishing Websites • Method 1: Use heuristics – Unusual patterns in URL, HTML, topology – Approach is favored by researchers – High true positives, some false positives • Method 2: Manually verify – Approach used by industry blacklists today (Microsoft, Google, PhishTank) – Very few false positives, low risk of liability – Slow, easy to overwhelm
  • 7. ©2011CarnegieMellonUniversity:7 Wisdom of Crowds Approach • Mechanics of PhishTank – Submissions require at least 4 votes and 70% agreement – Some votes weighted more • Total stats (Oct2006 – Feb2011) – 1.1M URL submissions from volunteers – 4.3M votes – resulting in about 646k identified phish • Why so many votes for only 646k phish?
  • 8. ©2011CarnegieMellonUniversity:8 PhishTank Statistics Jan 2011 Submissions 16019 Total Votes 69648 Valid Phish 12789 Invalid Phish 549 Median Time 2hrs 23min • 69648 votes → max of 17412 labels – But only 12789 phish and 549 legitimate identified – 2681 URLs not identified at all • Median delay of 2+ hours still has room for improvement
  • 9. ©2011CarnegieMellonUniversity:9 Why Care? • Can improve performance of human-verified blacklists – Dramatically reduce time to blacklist – Improve breadth of coverage – Offer same or better level of accuracy • More broadly, new way of improving performance of crowd for a task
  • 10. ©2011CarnegieMellonUniversity:10 Ways of Smartening the Crowd • Change the order URLs are shown – Ex. most recent vs closest to completion • Change how submissions are shown – Ex. show one at a time or in groups • Adjust threshold for labels – PhishTank is 4 votes and 70% – Ex. vote weights, algorithm also votes • Motivating people / allocating work – Filtering by brand, competitions, teams of voters, leaderboards
  • 11. ©2011CarnegieMellonUniversity:11 Ways of Smartening the Crowd • Change the order URLs are shown – Ex. most recent vs closest to completion • Change how submissions are shown – Ex. show one at a time or in groups • Adjust threshold for labels – PhishTank is 4 votes and 70% – Ex. vote weights, algorithm also votes • Motivating people / allocating work – Filtering by brand, competitions, teams of voters, leaderboards
  • 12. ©2011CarnegieMellonUniversity:12 Ways of Smartening the Crowd • Change the order URLs are shown – Ex. most recent vs closest to completion • Change how submissions are shown – Ex. show one at a time or in groups • Adjust threshold for labels – PhishTank is 4 votes and 70% – Ex. vote weights, algorithm also votes • Motivating people / allocating work – Filtering by brand, competitions, teams of voters, leaderboards
  • 13. ©2011CarnegieMellonUniversity:13 Ways of Smartening the Crowd • Change the order URLs are shown – Ex. most recent vs closest to completion • Change how submissions are shown – Ex. show one at a time or in groups • Adjust threshold for labels – PhishTank is 4 votes and 70% – Ex. vote weights, algorithm also votes • Motivating people / allocating work – Filtering by brand, competitions, teams of voters, leaderboards
  • 14. ©2011CarnegieMellonUniversity:14 Ways of Smartening the Crowd • Change the order URLs are shown – Ex. most recent vs closest to completion • Change how submissions are shown – Ex. show one at a time or in groups • Adjust threshold for labels – PhishTank is 4 votes and 70% – Ex. vote weights, algorithm also votes • Motivating people / allocating work – Filtering by brand, competitions, teams of voters, leaderboards
  • 15. ©2011CarnegieMellonUniversity:15 Overview of Our Work • Crawled unverified submissions from PhishTank over 2 week period • Replayed URLs on MTurk over 2 weeks – Required participants to play 2 rounds of Anti-Phishing Phil – Clustered phish by html similarity – Two cases: phish one at a time, or in a cluster (not strictly separate conditions) – Evaluated effectiveness of vote weight algorithm after the fact
  • 16. ©2011CarnegieMellonUniversity:16 Anti-Phishing Phil • We had MTurkers play two rounds of Phil [Sheng 2007] to qualify (µ = 5.2min) • Goal was to reduce lazy MTurkers and ensure base level of knowledge
  • 17. ©2011CarnegieMellonUniversity:17 Clustering Phish • Observations – Most phish are generated by toolkits and thus are similar in content and appearance – Can potentially reduce labor by labeling suspicious sites in bulk – Labeling single sites as phish can be hard if unfamiliar, easier if multiple examples
  • 18. ©2011CarnegieMellonUniversity:18 Clustering Phish • Motivations – Most phish are generated by toolkits and thus similar – Labeling single sites as phish can be hard, easier if multiple examples – Reduce labor by labeling suspicious sites in bulk
  • 19. ©2011CarnegieMellonUniversity:19 Clustering Phish • Motivations – Most phish are generated by toolkits and thus similar – Labeling single sites as phish can be hard, easier if multiple examples – Reduce labor by labeling suspicious sites in bulk
  • 20. ©2011CarnegieMellonUniversity:20 Most Phish Can be Clustered • With all data over two weeks, 3180 of 3973 web pages can be grouped (80%) – Used shingling and DBSCAN (see paper) – 392 clusters, size from 2 to 153 URLs
  • 22. ©2011CarnegieMellonUniversity:22 MTurk Tasks • Two kinds of tasks, control and cluster – Listed these two as separate HITs – MTurkers paid $0.01 per label – Cannot do between-conditions on MTurk – MTurker saw a given URL at most once • Four votes minimum, 70% threshold – Stopped at 4 votes, cannot dynamically request more votes on MTurk – 153 (3.9%) in control and 127 (3.2%) in cluster not labeled
  • 23. ©2011CarnegieMellonUniversity:23 MTurk Tasks • URLs were replayed in order – Ex. If crawled at 2:51am from PhishTank on day 1, then we would replay at 2:51am on day 1 of experiment – Listed new HITs each day rather than a HIT lasting two weeks (to avoid delays and last minute rush)
  • 24. ©2011CarnegieMellonUniversity:24 Summary of Experiment • 3973 suspicious URLs – Ground truth from Google, MSIE, and PhishTank, checked every 10 min – 3877 were phish, 96 not • 239 MTurkers participated – 174 did HITs for both control and cluster – 26 in Control only, 39 in Cluster only • Total of 33,781 votes placed – 16,308 in control – 11,463 in cluster (17473 equivalent) • Cost (participants + Amazon): $476.67 USD
  • 25. ©2011CarnegieMellonUniversity:25 Results of Aquarium • All votes are the individual votes • Labeled URLs are after aggregation
  • 27. ©2011CarnegieMellonUniversity:27 Voteweight • Use time and accuracy to weight votes – Those who vote early and accurately are weighted more – Older votes discounted – Incorporates a penalty for wrong votes • Done after data was collected – Harder to do in real-time since we don’t know true label until later • See paper for parameter tuning – Of threshold and penalty function
  • 28. ©2011CarnegieMellonUniversity:28 Voteweight Results • Control condition best scenario – Before-after – 94.8% accuracy, avg 11.8 hrs, median 3.8 – 95.6% accuracy, avg 11.0 hrs, median 2.3 • Cluster condition best scenario – Before-after – 95.4% accuracy, avg 1.8 hrs, median 0.7 – 97.2% accuracy, avg 0.8 hrs, median 0.5 • Overall: small gains, potentially more fragile and more complex though
  • 29. ©2011CarnegieMellonUniversity:29 Limitations of Our Study • Two limitations of MTurk – No separation between control and cluster – ~3% tie votes unresolved (more votes) • Possible learning effects? – Hard to tease out with our data – Aquarium doesn’t offer feedback – Everyone played Phil – No condition prioritized over other • Optimistic case, no active subversion
  • 30. ©2011CarnegieMellonUniversity:30 Conclusion • Investigated two techniques for smartening the crowd for anti-phishing – Clustering and voteweight • Clustering offers significant advantages wrt time and coverage • Voteweight offers smaller improvements in effectiveness
  • 31. ©2011CarnegieMellonUniversity:31 Acknowledgments • Research supported by CyLab under ARO DAAD19-02-1-0389 and W911NF-09-1-0273 • Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 117907]
  • 35. ©2011CarnegieMellonUniversity:35 4.1 whitelists  include 3208 domains  From Google safe browsing (2784)  http://sb.google.com/safebrowsing/update?version =goog-white-domain:1:1  From millersmiles (424)  http://www.millersmiles.co.uk/scams.php  Reduce false positive  Save human effort
  • 36. ©2011CarnegieMellonUniversity:36 4.2 Clustering  Content similarity measurement (Shingling method)  S(q),S(d) denote the set of unique n-grams in page q and d  The threshold is 0.65  The average time cost : 0.063 microseconds (SD=0.05)  calculating similarity of two web pages on a laptop with 2GHz dual core CPU with 1 GB of RAM  DBSCAN  Eps=0.65 and MinPts=2.  The time cost of clustering over all 3973 pages collected was about 1 second. ( ) ( ) ( ) ( ) ( )dSqS dSqS dqr   =,
  • 37. ©2011CarnegieMellonUniversity:37 4.2 Clustering  Incremental update of the data  If there is no similar web page, we create a new cluster for the new submission.  If the similarity is above the given threshold and all similar web pages are in the same cluster, we assign the new submission to this cluster (unless the cluster is at its maximum size).  If there are many similar web papes in different clusters, we choose the largest cluster that is not at its maximum size.  After a new submission is grouped in a cluster  It has zero votes and does not inherit the votes of any other submissions in the same cluster.
  • 38. ©2011CarnegieMellonUniversity:38 4.3 Voteweight The core idea behind voteweight is that participants who are more helpful in terms of time and accuracy are weighted more than other participants.  It measures how powerful of users’ votes impact on the final status of the suspicious URLs.  Its value comes from the accuracy of users’ historical data  a correct vote should be rewarded and a wrong one should be penalized  recent behavior should be weighted more than past behavior
  • 39. ©2011CarnegieMellonUniversity:39 4.3 Voteweight  In our model, we use y {t,+∞} y {-t,-∞}∈ ∪ ∈ to label the status of a URL,  where y is the sum of voteweight of a given URL,  t is the threshold of voteweight,  y≥ t means a URL has been voted as a phishing URL  y≤-t means voted as legitimate.
  • 40. ©2011CarnegieMellonUniversity:40 4.3 Voteweight ∑ = = M k k i i v v v 1 '    ≥ = otherwise RVifRV v ii i 0 0 iii PRRV ⋅−= α ∑= =⋅ − +− = N j LC j i jI TT TT R jij 1 0 0 )( 1 ∑= ≠⋅ − +− = N j LC j i jI TT TT P jij 1 0 0 )( 1 ( )    ∉ ∈ = Axif Axif xIA 0 1 ∑= ⋅= K i itit Cvl 1 '    − = otherwise phishasvotedif Cit 1 1 (5) (2) (3) (4) (1) (6) (7) (8)
  • 41. ©2011CarnegieMellonUniversity:41 7. Investigating Voteweight Tuning Parameters in Control Condition  Voteweight achieves its best accuracy 95.6% and time cost of 11 hours with t=0.08 and α=2.5 in the control condition  Average time cost drops to 11 hours (11.8 hours without voteweight)  Median time cost drops to 2.3 hours (3.8 hours without woteweight)
  • 42. ©2011CarnegieMellonUniversity:42 7. Investigating Voteweight Tuning Parameters in Cluster Condition  Voteweight achieves its best accuracy of 97.2% and time cost of 0.8 hours with t=0.06 and α= 1 in the control condition  Average time cost drops to 0.8 hours (1.8 hours without voteweight)  Median time cost drops to 0.5 hours (0.7 hours without woteweight)

Notas do Editor

  1. Average accuracy for each decile of users, sorted by accuracy. For example, the average accuracy of the top 10% of users in both conditions was 100%, whereas the average accuracy of the bottom 10% was under 30% for the Control Condition and under 50% in the Cluster Condition.