SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Detec%ng
Decep%on
in

        Wri%ng
Style


Sadia
Afroz,
Michael
Brennan
and
Rachel
Greenstadt.

        Privacy,
Security
and
Automa%on
Lab

                   Drexel
University

Overview

•  Authorship
recogni%on

•  Authorship
recogni%on
in
adversarial

   environment

•  Decep%on
detec%on

•  Experiments
on
different
datasets

Authorship
recogni%on






















Who
wrote
the
document?

Authorship
recogni%on

Stylometry:



  –  An
authorship
recogni%on
system
based
solely
on

     wri%ng
style.

  –  Not
handwri%ng

  –  Only
linguis%c
style:
word
choice,
sentence
length,

     parts‐of‐speech
usage,
…

Why
it
works?



•  Everybody
has
learned
language
differently

How
regular
authorship
recogni%on

              works



           Extract
features

                               Machine
Learning

                                   System

Extract

                       Determine

           features
                       authorship

                       Machine
Learning

                           System



Document
of


unknown
authorship

Assump%ons

•  Wri%ng
style
is
invariant.


   –  It’s
like
a
fingerprint,
you
can’t
really
change
it.


   –  Authorship
recogni%on
can
iden%fy
you
if
there

      are
sufficient
wri%ng
samples
and
a
set
of

      suspects.

Adversarial
AUacks


•  Imita%on
or
framing
aUack

    –  Where
one
author
imitates
another
author

    –  Par%cipants
were
asked
to
imitate
Cormac

       McCarthy
in
wri%ng
about
their
day

•  Obfusca%on
aUack

    –  Where
an
author
hides
his
regular
style

    –  Par%cipants
were
asked
to
write
about
their

       neighborhood
in
a
different
style

M.
Brennan
and
R.
Greenstadt.
Prac%cal
aUacks
against
authorship
recogni%on

techniques.
In
Proceedings
of
the
Twenty‐First
Conference
on
Innova%ve

Applica%ons
of
Ar%ficial
Intelligence
(IAAI),
Pasadena,
CA,
2009.

Accuracy
in
detec6ng
authorship
of
regular

                             documents

  1

0.9

0.8

0.7

0.6

0.5
        More
than
80%
accurate
in
detec%ng
          9‐Feature
(NN)

            authorship
with
40
authors
in
regular
       Synonym‐Based

0.4

                         document
                       Writeprints
Baseline
(SVM)

0.3
                                                     Random

0.2

0.1

  0

       5
     10
   15
   20
   25
    30
   35
   40

                      Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                 Obfuscated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
obfuscated
documents
                        Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                   Imitated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
imitated
documents
                          Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Can
we
detect
Stylis%c
Decep%on?



                      Imitated

           Regular


                Obfuscated

Extended‐Brennan‐Greenstadt


                    Corpus

•  56
authors


   –  12
of
the
par%cipants
are
from
Drexel
University

   –  The
rest
are
paid
workers
from
Amazon
mechanical
turk

•  Three
kinds
of
wri%ng
samples

   –  Regular
wri%ngs
(5000
words)

   –  Imitated
wri%ng

      •  A
500‐word
ar%cle
describing
a
day


      •  Imitate
Cormac
McCarthy
from
`The
Road’

   –  Obfuscated
wri%ng

      •  A
500‐word
ar%cle

describing
neighborhood

      •  Hide
own
wri%ng
style

Detec%ng
stylis%c
decep%on
is
possible


100
   98
   94.5

                                  89.5

 90
     95.7
       85

 80
                    75.3

 70

                                      59.9

 60

                            48
                 Writeprint,
SVM

 50
                                      43

                                                Lying‐detec%on,
J48

 40

                                                9‐feature
set,
J48

 30

 20

 10

  0

        Regular
     Imita%on
    Obfusca%on

Feature
Changes
in
Imita6on
and
Obfusca6on


            Personal
pronoun

              Sentence
count

                      Par%cle

                 Short
Words

                        Verb

               Unique
words

                      Adverb

             Existen%al
there
                                        Imita%on

   Average
syllables
per
word
                                        Obfusca%on

         Average
word
length

                    Adjec%ve

            Cardinal
number

Gunning‐Fog
readability
index

     Average
sentence
length


              ‐80
 ‐60
 ‐40
 ‐20
   0
   20
   40
   60
   80
 100

Problem
with
the
dataset:

             Topic
Similarity

•  All
the
decep%ve
documents
were
of
same

   topic.

                                             5,$6.)78)9+,$($-.)8$%.'($)&$.)+-)9$.$60-1)
                                                      %9:$(&%(+%4)%'.;7(&;+3)
                                            $"



•  Non‐content‐specific

                                          !#,"
                                          !#+"
                                          !#*"




                             !"#$%&'($)

features
have
same


                                          !#)"
                                          !#("                                                        =>3/0<1<"
                                          !#'"                                                        ?5@-<08"
                                          !#&"

effect
as
content‐specific


                                                                                                      A23/53/"
                                          !#%"
                                          !#$"
                                            !"

features.
                                       -.-/0123"           4567804"
                                                             *+,$($-.)/(+0-1)2%#34$&)
                                                                                        29:7;<0123"
Hemingway‐Faulkner
Imita%on


                Corpus

•  Ar%cles
from
the
Interna%onal
Imita%on

   Hemingway
Contest
(2000‐2005)

•  Ar%cles
from
the
Faux
Faulkner
Contest

   (2001‐2005)

•  Original
excerpts
of
Ernest
Hemingway
and

   William
Faulkner

Decep%on
detec%on
is
possible

even
when
the
topic
is
not
similar



•  81.2%
accurate
in
detec%ng
imitated

   documents.

Long
term
decep%on:

            A
Gay
Girl
In
Damascus





Thomas
MacMaster.

                                      Fake
picture
of
Amina
Arraf.

–  Original
author
was
a
40‐year
old
American
ci%zen,

   Thomas
MacMaster.

–  Pretended
to
be
a
Syrian
gay
woman,
Amina
Arraf.

–  The
author
worked
for
at
least
5
years
to
create
a

   new
style.

Long
term
decep%on
is
hard
to
detect

•  None
of
the
blog
posts
were
found
to
be

   decep%ve.

•  But
regular
authorship
recogni%on
can
help.

•  We
tried
to
aUribute
authorship
of
the
blog

   posts
using
Thomas
(as
himself),
Thomas
(as

   Amina),
BriUa
(Thomas’s
wife).

Long
term
decep%on

 Authorship
recogni%on
of
the
blog

               posts





Thomas
MacMaster.
   Amina
Arraf
   BriUa
(Thomas’s
wife)


   54%
                    43%
                    3%

Future
works

•  Intrusion
detec%on

•  Social
spam
detec%on

•  Iden%fying
quality
discourse

Two
Tools

•  JStylo:
Authorship
Recogni%on
Analysis
Tool.

•  Anonymouth:
Authorship
Recogni%on
Evasion

   Tool.



•  Free,
Open
Source.
(GNU
GPL)

•  Alpha
releases
available
today
at

   hUps://psal.cs.drexel.edu

   –  Migra%ng
to
GitHub
soon.

Privacy,
Security
and
Automa%on
Lab

      (hUps://psal.cs.drexel.edu)

•  Faculty

   –  Dr.
Rachel
Greenstadt

•  Graduate
Students

   –  Sadia
Afroz
(Decep%on
Detec%on
Lead)

   –  Diamond
Bishop

   –  Michael
Brennan

   –  Aylin
Caliskan

   –  Ariel
Stolerman
(JStylo
Lead
Developer)

•  Undergraduate
Students

   –  Pavan
Kantharaju

   –  Andrew
McDonald
(Anonymouth
Lead
Developer)


Mais conteúdo relacionado

Mais de pamselle (10)

Power Spriting With Compass
Power Spriting With CompassPower Spriting With Compass
Power Spriting With Compass
 
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
 
GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)
 
GDI WordPress 4 January 2012
GDI WordPress 4 January 2012GDI WordPress 4 January 2012
GDI WordPress 4 January 2012
 
GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)
 
GDI WordPress 3 January 2012
GDI WordPress 3 January 2012GDI WordPress 3 January 2012
GDI WordPress 3 January 2012
 
GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 GDI WordPress 2 January 2012
GDI WordPress 2 January 2012
 
Gdi word press_2
Gdi word press_2Gdi word press_2
Gdi word press_2
 
GDI WordPress 1 January 2012
GDI WordPress 1 January 2012GDI WordPress 1 January 2012
GDI WordPress 1 January 2012
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Sadia Afroz: Detecting Hoaxes, Frauds, and Deception in Writing Style Online