Attacks Against Captcha Systems - DefCamp 2012

Attacking CAPTCHAs
explained

Ioan – Carol Plangu

What's a CAPTCHA

Completely
Automated
Public
Turing test to tell
Computers and
Humans
Apart

Three attack methods


Implementation attack


Automated recognition


Manual labor

The implementation attack
Scenario 1

the image session id can be reused

Scenario 1

the image session id can be reused

id
Restricted
Captcha
page
form

Scenario 2

the number of captcha tests is limited

Scenario 2

the number of captcha tests is limited

we just need to solve them all and store them in a hash
table

Scenario 3

hash of solution sent to client

Scenario 3

hash of solution sent to client

rainbow tables :)

Manual labor

There are two options:

Or not...

XXX
Complete this captcha form to continue

Automated recognition
We're going to actually reproduce a human
response for the given question

The sound sample is usually
generated

It's hard to add noise to the
generated speech without making it
hard for the human

The most common approach


Greedy optimization – reverse engineer
everything

Character segmentation

OCR

Possible security measures

Funky background image



− usually can be removed with basic preprocessing



Text distortions



Text distortions

− modern OCR techniques can beat it



Text distortions
− modern OCR techniques can beat it

Anti segmentation measures

Beating segmentation

If a character signature can be extracted from
only the vertical signature, character
segmentation becomes trivial

A Low-cost Attack on a Microsoft CAPTCHA - Jeff Yan, Ahmad Salah El Ahmad
School of Computing Science, Newcastle University, UK


We can otherwise ignore it!


We can otherwise ignore it!

The following slides are about an experiment
about this approach

A Monte-Carlo experiment


Note: for testing performance, the variance of
the characters has been kept to a minimum

f(x) → y
x in binary( 0 - 2^3000 )
y in 10^6

Training:

− Select one character image at random
− Select N black spots
− Sort the points for uniqueness
− Subtract the first point from all others for position
independence
− Assign it a 'weight' for each character using the
following formula:
matched characters count / sample size
− Assign it a 'score' (indicates classification quality)
selected digit weight / (1 + other digit weights)

Recognition:

− Make a score map for all points
− Select the most appropriate character for each
column
− Process the resulting string into a 6 digit string

An equivalent model

input layer

linear hidden layer
(feature layer)

threshold layers

softmax layer

An equivalent model

input layer

OCR
linear hidden layer
(feature layer) without zero
penalty

==

threshold layers No biases for
the first layer

(avoids the
2*binary - 1
effect)
softmax layer

Hacking the OCR:

To negate the effect the biases, for each image we
add random noise in the white areas

This will greatly improve the recognition in a noisy
image

An more powerful model

input layer

Hacked OCR layer

Score map

output layer

The demo source is hosted at
https://github.com/theshark08/howtobreakacaptcha01

Attacks Against Captcha Systems - DefCamp 2012

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (9)

Semelhante a Attacks Against Captcha Systems - DefCamp 2012

Semelhante a Attacks Against Captcha Systems - DefCamp 2012 (20)

Mais de DefCamp

Mais de DefCamp (20)

Attacks Against Captcha Systems - DefCamp 2012