SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Analyzing Spammers' Social Networks for Fun
and Profit
   A Case Study of Cyber Criminal Ecosystem on Twitter


         Chao Yang, Texas A&M University
         Robert Harkreader, Texas A&M University
         Jialong Zhang, Texas A&M University




                                WMMKS Lab 郭至軒

WWW'12
Criminal Account




malicious behavior
Twitter Rule


A Twitter account can be considered to
be spamming, and thus be suspended
by Twitter.
Twitter Rule


If it has a small number of followers
compared to the amount of accounts
that it follows.


          Follow          Follow
     10            self            1000
Who follow criminal accounts?

       Let the criminal accounts still exist.
Cyber Criminal Ecosystem


criminal account

criminal supporter

legitimate account
                      victim




          inner       outer
Inner Social Relationship




inner
Inner Social Relationship




                                            2,060

                                            9,868

G = (V,E)

V: all criminal accounts
E: all follow relationship, directed edge
Inner Social Relationship




Relationship Graph      Connected Components
                           8 weakly connected
                           components (at least
                           3 nodes)

                           521 isolated nodes
Inner Social Relationship


Finding 1:
Criminal accounts tend to be socially connected, forming
a small-world network.
Inner Social Relationship


Graph Density

                             Follow
               Account                     Density
                           Relationship

   Criminal
   Space in     2,060         9,868       2.33 × 10-3
   Sample
    Entire
    Twitter   41.7 × 106    1.47 × 109    8.45 × 10-7
    Space
Inner Social Relationship


Graph Density

                             Follow
               Account                     Density
                           Relationship

   Criminal   Almost 3,000 times
   Space in     2,060         9,868       2.33 × 10-3
   Sample
    Entire
    Twitter   41.7 × 106    1.47 × 109    8.45 × 10-7
    Space
Inner Social Relationship


Reciprocity   Number of Bidirectional Links


                     Reciprocity of 95% criminal
                     accounts higher than 0.2.

                     Reciprocity of 55% normal
                     accounts higher than 0.2.


                     Reciprocity of around 20%
                     criminal accounts are nearly
                     1.0.
Inner Social Relationship


Average Shortest Path Length
        Average number of steps along the shortest
        paths for all possible pairs of graph nodes.


                                   ASPL


     Criminal Accounts              2.60


    Legitimate Accounts             4.12
Inner Social Relationship




Criminal accounts have strong social
connections with each other.
Inner Social Relationship




What are the main factors leading
to that structure?
Inner Social Relationship

Tend to follow many accounts without considering
those accounts' quality much.


     Following Quality:

     average follower number of an account's all
     following accounts
Inner Social Relationship

Tend to follow many accounts without considering
those accounts' quality much.




                         FQ of 85% criminal accounts
                         lower than 20,000.

                         FQ of 45% normal accounts
                         lower than 20,000.
Inner Social Relationship

Criminal accounts, belonging to the same criminal
organizations.
Inner Social Relationship

Criminal accounts, belonging to the same criminal
organizations.

 Group criminal accounts into
different criminal campaigns by              2,060
        malicious URL.

                                             9,868
        17 campaigns


         8,667 edges                      87.8 %
Inner Social Relationship

Provide followers to criminal accounts




1. Break the Following Limits
   Policy

2. Evade spam detection
Inner Social Relationship


Finding 2:
Compared with criminal leaves, criminal hubs are more
inclined to follow criminal accounts.
Inner Social Relationship


                        HITS algorithm to
                        calculate hub score


                        k-means algorithm to
                        cluster them


Relationship Graph      criminal hubs: 90
                        criminal leaf: 1,970
Inner Social Relationship




Criminal Following Ratio (CFR):

ratio of the number of an account’s criminal-
followings to its total following number
Inner Social Relationship



 CRF of 80% criminal hubs
 higher than 0.1.


 CRF of 20% criminal leaves
 higher than 0.1.


 CRF of 60% criminal leaves
 lower than 0.05.
Inner Social Relationship




Why?
Inner Social Relationship

Criminal hubs tend to obtain followers more
effectively by following other criminal accounts.


     Shared Following Ratio (SFR):

     percentage of an account’s followers, who
     also follows at least one of this account’s
     criminal-followings
Inner Social Relationship

Criminal hubs tend to obtain followers more
effectively by following other criminal accounts.




                           SRF of 80% criminal hubs
                           higher than 0.4.


                           CRF of 5% criminal leaves
                           higher than 0.4.
Inner Social Relationship


              criminal hubs

following leaves and acquiring their
followers’ information



             criminal leaves

randomly following other accounts to
expect them to follow back
Outer Social Relationship
Outer Social Relationship


criminal supporters
accounts outside the criminal community, who have close
"follow relationships" with criminal accounts
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
     MR score:

     measuring how closely this account follows
     criminal accounts



                      MR score
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
   1. the more criminal accounts followed, the higher
      score

   2. the further away from a criminal account, the
      lower score

   3. the closer the support relationship between a
      Twitter account and a criminal account, the higher
      score
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)

      Malicious Relevance Graph, G = (V,E,W)

      V: all accounts
      E: all follow relationship, directed edge
      W: weight for each edge, closeness of
      relationship
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
       MR Score Initialization:

       Mi = 1, if Vi is criminal account
       Mi = 0, if Vi is not criminal account
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
       MR Score Aggregation:

       an account’s score should sum up all
       the scores inherited from the accounts
       it follows
                    C1

               MR(C1) = M1
                                     A

                    C2        MR(A) = M1 + M2

               MR(C2) = M2
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
       MR Score Dampening:

       the amount of MR score that an
       account inherits from other accounts
       should be multiplied by a dampening
       factor of α according to their social
       distances, where 0 < α < 1

          C               A1              A2

       MR(C) = M     MR(C) = α × M   MR(C) = α2 × M
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)
       MR Score Splitting:

       the amount of MR score that an
       account inherits from the accounts it
       follows should be multiplied by a
                                                        A1
       relationship-closeness factor
                                                  MR(A1) = 0.5 × M
                                         C

                                      MR(C) = M         A2

                                                  MR(A2) = 0.5 × M
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)




      n: number of total nodes
      Iij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0
Outer Social Relationship

Malicious Relevance Score Propagation
Algorithm (Mr.SPA)




      I: the column-vector normalized adjacency
      matrix of nodes
Outer Social Relationship


After Mr. SPA...

use x-means algorithm
to cluster accounts based      5,924 criminal supporters
on their MR scores


most accounts have             most accounts do not
relatively small scores        have very close follow
and are grouped into one       relationships with criminal
single cluster                 accounts
Outer Social Relationship

Social Butterflies
     Those accounts that have extraordinarily large numbers
     of followers and followings.



use 2,000 following
as a threshold



3,818 social
butterflies
Outer Social Relationship

Social Butterflies
   The reason why social butterflies tend to have close
   friendships with criminals is mainly because most of
   them usually follow back the users who follow them
   without careful examinations.
Outer Social Relationship

Social Promoters
      Those accounts that have large following-follower
      ratios, larger following numbers and relatively high URL
      ratios.


whose URL ratios are higher than
0.1, and following numbers and
following-follower ratios are both at
the top 10-percentile


508 social promoters
Outer Social Relationship

Social Promoters
   The reason why social promoters tend to have close
   friendships with criminal accounts is probably because
   most of them usually promote themselves or their
   business by actively following other accounts without
   considerations of those accounts’ quality.
Outer Social Relationship

Dummies
     Those accounts who post few tweets but have many
     followers.



post fewer than 5 tweets and
whose follower numbers are
at the top 10-percentile



81 dummies
Outer Social Relationship

Dummies
  The reason why dummies intend to have close
  friendship with criminals is mainly because most of
  them are controlled or utilized by cyber criminals.
Inferring Criminal Accounts




The number of Twitter accounts is HUGE!
Inferring Criminal Accounts

Criminal account Inference Algorithm
(CIA)




                    start from a seed set
Inferring Criminal Accounts

Criminal account Inference Algorithm
(CIA)

      1. criminal accounts tend to be socially
         connected

      2. criminal accounts usually share similar
         topics, thus having strong semantic
         coordinations among them
Inferring Criminal Accounts

Criminal account Inference Algorithm
(CIA)

      Malicious Relevance Graph, G = (V,E,W)

      V: all accounts
      E: all follow relationship, directed edge
      W: weight for each edge, WS(i,j)




                  P.S. SS: Semantic Similarity Score
Inferring Criminal Accounts

Criminal account Inference Algorithm
(CIA)




      n: number of total nodes
      Iij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0
Inferring Criminal Accounts

Evaluation of CIA




    Dataset I: around half million accounts from our
    previous study [35]

    Dataset II: another new crawled 30,000 accounts
    by starting from 10 newly identified criminal
    accounts and using BFS strategy
Inferring Criminal Accounts

Evaluation of CIA
        Dataset I
                              CA: Criminal Account
                              MA: Malicious Affected Account




   Selection Strategies
 100 seeds, select 4,000
       accounts
Inferring Criminal Accounts

Evaluation of CIA
       Dataset I
                         CA: Criminal Account
                         MA: Malicious Affected Account




    Selection Sizes
      100 seeds
Inferring Criminal Accounts

Evaluation of CIA
       Dataset I
                            CA: Criminal Account
                            MA: Malicious Affected Account




      Seed Sizes
 select 4,000 accounts
Inferring Criminal Accounts

Evaluation of CIA
        Dataset I
                              CA: Criminal Account
                              MA: Malicious Affected Account




       Seed Type
 100 seeds, select 4,000
       accounts
Inferring Criminal Accounts

Evaluation of CIA
       Dataset I
                             CA: Criminal Account
                             MA: Malicious Affected Account




  Recursive Inference
 50 seeds, select 4,000
       accounts
Inferring Criminal Accounts

Evaluation of CIA
       Dataset II
                             CA: Criminal Account
                             MA: Malicious Affected Account




      Seed Type
 10 seeds, select 4,000
       accounts
Conclusion


S   Provide a macro scale to view the
    criminal accounts.

    Focus on analysis and use heuristic
W   method. And the detail of semantic
    similarity score has omitted.
    Find out many malicious accounts,

O   maybe analyze them can improve
    accuracy of CIA to detect criminal
    accounts.
    There is bias for training dataset, and
T   let CIA improve not much when
    detect criminal accounts.
Thank You for Your Listening!

Mais conteúdo relacionado

Semelhante a [WWW2012] analyzing spammers' social networks for fun and profit

Assignment 2 Cybercrimes and Computer Security SystemsWith
Assignment 2 Cybercrimes and Computer Security SystemsWith Assignment 2 Cybercrimes and Computer Security SystemsWith
Assignment 2 Cybercrimes and Computer Security SystemsWith desteinbrook
 
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...Alex Gilgur
 
Why can’t police catch cyber criminals
Why can’t police catch cyber criminalsWhy can’t police catch cyber criminals
Why can’t police catch cyber criminalsChip Thornsburg
 
Private Distributed Collaborative Filtering
Private Distributed Collaborative FilteringPrivate Distributed Collaborative Filtering
Private Distributed Collaborative FilteringNeal Lathia
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social mediaDawn Dawson
 
Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?Colin Lieu
 
IBM Counter Financial Crimes Management
IBM Counter Financial Crimes ManagementIBM Counter Financial Crimes Management
IBM Counter Financial Crimes ManagementVirginia Fernandez
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrenciesMatteo Romiti
 
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...PierantonioAzzalini
 
2000 Word Essay How Long Introduction. Online assignment writing service.
2000 Word Essay How Long Introduction. Online assignment writing service.2000 Word Essay How Long Introduction. Online assignment writing service.
2000 Word Essay How Long Introduction. Online assignment writing service.Tammy Adams
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisNeo4j
 
How To Develop A Concept Paper Goal Mentorship
How To Develop A Concept Paper Goal MentorshipHow To Develop A Concept Paper Goal Mentorship
How To Develop A Concept Paper Goal MentorshipRebecca Rivera
 
the answer does not have to be long at all the question just has to .docx
the answer does not have to be long at all the question just has to .docxthe answer does not have to be long at all the question just has to .docx
the answer does not have to be long at all the question just has to .docxanhcrowley
 
2019 Data Breach Investigations Report (DBIR)
2019 Data Breach Investigations Report (DBIR)2019 Data Breach Investigations Report (DBIR)
2019 Data Breach Investigations Report (DBIR)- Mark - Fullbright
 
1. Write A Composition Using One Of The Topics Listed Bel
1. Write A Composition Using One Of The Topics Listed Bel1. Write A Composition Using One Of The Topics Listed Bel
1. Write A Composition Using One Of The Topics Listed BelAngie Flores
 
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...TransUnion
 

Semelhante a [WWW2012] analyzing spammers' social networks for fun and profit (20)

CyberDen 2020
CyberDen 2020CyberDen 2020
CyberDen 2020
 
CRIME DETECTION
CRIME DETECTIONCRIME DETECTION
CRIME DETECTION
 
Assignment 2 Cybercrimes and Computer Security SystemsWith
Assignment 2 Cybercrimes and Computer Security SystemsWith Assignment 2 Cybercrimes and Computer Security SystemsWith
Assignment 2 Cybercrimes and Computer Security SystemsWith
 
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfi...
 
Why can’t police catch cyber criminals
Why can’t police catch cyber criminalsWhy can’t police catch cyber criminals
Why can’t police catch cyber criminals
 
Private Distributed Collaborative Filtering
Private Distributed Collaborative FilteringPrivate Distributed Collaborative Filtering
Private Distributed Collaborative Filtering
 
Social Engineering CSO Survival Guide
Social Engineering CSO Survival GuideSocial Engineering CSO Survival Guide
Social Engineering CSO Survival Guide
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social media
 
Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?
 
IBM Counter Financial Crimes Management
IBM Counter Financial Crimes ManagementIBM Counter Financial Crimes Management
IBM Counter Financial Crimes Management
 
IBM Counter Finalcial Crimes Management
IBM Counter Finalcial Crimes ManagementIBM Counter Finalcial Crimes Management
IBM Counter Finalcial Crimes Management
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrencies
 
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from A...
 
2000 Word Essay How Long Introduction. Online assignment writing service.
2000 Word Essay How Long Introduction. Online assignment writing service.2000 Word Essay How Long Introduction. Online assignment writing service.
2000 Word Essay How Long Introduction. Online assignment writing service.
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
How To Develop A Concept Paper Goal Mentorship
How To Develop A Concept Paper Goal MentorshipHow To Develop A Concept Paper Goal Mentorship
How To Develop A Concept Paper Goal Mentorship
 
the answer does not have to be long at all the question just has to .docx
the answer does not have to be long at all the question just has to .docxthe answer does not have to be long at all the question just has to .docx
the answer does not have to be long at all the question just has to .docx
 
2019 Data Breach Investigations Report (DBIR)
2019 Data Breach Investigations Report (DBIR)2019 Data Breach Investigations Report (DBIR)
2019 Data Breach Investigations Report (DBIR)
 
1. Write A Composition Using One Of The Topics Listed Bel
1. Write A Composition Using One Of The Topics Listed Bel1. Write A Composition Using One Of The Topics Listed Bel
1. Write A Composition Using One Of The Topics Listed Bel
 
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...
Fraud Prevention Strategies to Fight First-Party Fraud and Synthetic Identity...
 

Mais de Chih-Hsuan Kuo

[Mozilla] content-select
[Mozilla] content-select[Mozilla] content-select
[Mozilla] content-selectChih-Hsuan Kuo
 
Ownership System in Rust
Ownership System in RustOwnership System in Rust
Ownership System in RustChih-Hsuan Kuo
 
在開始工作以前,我以為我會寫扣。
在開始工作以前,我以為我會寫扣。在開始工作以前,我以為我會寫扣。
在開始工作以前,我以為我會寫扣。Chih-Hsuan Kuo
 
Effective Modern C++ - Item 35 & 36
Effective Modern C++ - Item 35 & 36Effective Modern C++ - Item 35 & 36
Effective Modern C++ - Item 35 & 36Chih-Hsuan Kuo
 
Use C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoUse C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoChih-Hsuan Kuo
 
Pocket Authentication with OAuth on Firefox OS
Pocket Authentication with OAuth on Firefox OSPocket Authentication with OAuth on Firefox OS
Pocket Authentication with OAuth on Firefox OSChih-Hsuan Kuo
 
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in GeckoChih-Hsuan Kuo
 
面試面試面試,因為很重要所以要說三次!
面試面試面試,因為很重要所以要說三次!面試面試面試,因為很重要所以要說三次!
面試面試面試,因為很重要所以要說三次!Chih-Hsuan Kuo
 
Windows 真的不好用...
Windows 真的不好用...Windows 真的不好用...
Windows 真的不好用...Chih-Hsuan Kuo
 
[ACM-ICPC] Tree Isomorphism
[ACM-ICPC] Tree Isomorphism[ACM-ICPC] Tree Isomorphism
[ACM-ICPC] Tree IsomorphismChih-Hsuan Kuo
 
[ACM-ICPC] Dinic's Algorithm
[ACM-ICPC] Dinic's Algorithm[ACM-ICPC] Dinic's Algorithm
[ACM-ICPC] Dinic's AlgorithmChih-Hsuan Kuo
 
[ACM-ICPC] Disjoint Set
[ACM-ICPC] Disjoint Set[ACM-ICPC] Disjoint Set
[ACM-ICPC] Disjoint SetChih-Hsuan Kuo
 

Mais de Chih-Hsuan Kuo (20)

Rust
RustRust
Rust
 
[Mozilla] content-select
[Mozilla] content-select[Mozilla] content-select
[Mozilla] content-select
 
Ownership System in Rust
Ownership System in RustOwnership System in Rust
Ownership System in Rust
 
在開始工作以前,我以為我會寫扣。
在開始工作以前,我以為我會寫扣。在開始工作以前,我以為我會寫扣。
在開始工作以前,我以為我會寫扣。
 
Effective Modern C++ - Item 35 & 36
Effective Modern C++ - Item 35 & 36Effective Modern C++ - Item 35 & 36
Effective Modern C++ - Item 35 & 36
 
Use C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoUse C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in Gecko
 
Pocket Authentication with OAuth on Firefox OS
Pocket Authentication with OAuth on Firefox OSPocket Authentication with OAuth on Firefox OS
Pocket Authentication with OAuth on Firefox OS
 
Necko walkthrough
Necko walkthroughNecko walkthrough
Necko walkthrough
 
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in Gecko
 
面試面試面試,因為很重要所以要說三次!
面試面試面試,因為很重要所以要說三次!面試面試面試,因為很重要所以要說三次!
面試面試面試,因為很重要所以要說三次!
 
應徵軟體工程師
應徵軟體工程師應徵軟體工程師
應徵軟體工程師
 
面試心得分享
面試心得分享面試心得分享
面試心得分享
 
Windows 真的不好用...
Windows 真的不好用...Windows 真的不好用...
Windows 真的不好用...
 
Python @Wheel Lab
Python @Wheel LabPython @Wheel Lab
Python @Wheel Lab
 
Introduction to VP8
Introduction to VP8Introduction to VP8
Introduction to VP8
 
Python @NCKU CSIE
Python @NCKU CSIEPython @NCKU CSIE
Python @NCKU CSIE
 
[ACM-ICPC] Tree Isomorphism
[ACM-ICPC] Tree Isomorphism[ACM-ICPC] Tree Isomorphism
[ACM-ICPC] Tree Isomorphism
 
[ACM-ICPC] Dinic's Algorithm
[ACM-ICPC] Dinic's Algorithm[ACM-ICPC] Dinic's Algorithm
[ACM-ICPC] Dinic's Algorithm
 
[ACM-ICPC] Disjoint Set
[ACM-ICPC] Disjoint Set[ACM-ICPC] Disjoint Set
[ACM-ICPC] Disjoint Set
 
[ACM-ICPC] Traversal
[ACM-ICPC] Traversal[ACM-ICPC] Traversal
[ACM-ICPC] Traversal
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

[WWW2012] analyzing spammers' social networks for fun and profit

  • 1. Analyzing Spammers' Social Networks for Fun and Profit A Case Study of Cyber Criminal Ecosystem on Twitter Chao Yang, Texas A&M University Robert Harkreader, Texas A&M University Jialong Zhang, Texas A&M University WMMKS Lab 郭至軒 WWW'12
  • 3. Twitter Rule A Twitter account can be considered to be spamming, and thus be suspended by Twitter.
  • 4. Twitter Rule If it has a small number of followers compared to the amount of accounts that it follows. Follow Follow 10 self 1000
  • 5. Who follow criminal accounts? Let the criminal accounts still exist.
  • 6. Cyber Criminal Ecosystem criminal account criminal supporter legitimate account victim inner outer
  • 8. Inner Social Relationship 2,060 9,868 G = (V,E) V: all criminal accounts E: all follow relationship, directed edge
  • 9. Inner Social Relationship Relationship Graph Connected Components 8 weakly connected components (at least 3 nodes) 521 isolated nodes
  • 10. Inner Social Relationship Finding 1: Criminal accounts tend to be socially connected, forming a small-world network.
  • 11. Inner Social Relationship Graph Density Follow Account Density Relationship Criminal Space in 2,060 9,868 2.33 × 10-3 Sample Entire Twitter 41.7 × 106 1.47 × 109 8.45 × 10-7 Space
  • 12. Inner Social Relationship Graph Density Follow Account Density Relationship Criminal Almost 3,000 times Space in 2,060 9,868 2.33 × 10-3 Sample Entire Twitter 41.7 × 106 1.47 × 109 8.45 × 10-7 Space
  • 13. Inner Social Relationship Reciprocity Number of Bidirectional Links Reciprocity of 95% criminal accounts higher than 0.2. Reciprocity of 55% normal accounts higher than 0.2. Reciprocity of around 20% criminal accounts are nearly 1.0.
  • 14. Inner Social Relationship Average Shortest Path Length Average number of steps along the shortest paths for all possible pairs of graph nodes. ASPL Criminal Accounts 2.60 Legitimate Accounts 4.12
  • 15. Inner Social Relationship Criminal accounts have strong social connections with each other.
  • 16. Inner Social Relationship What are the main factors leading to that structure?
  • 17. Inner Social Relationship Tend to follow many accounts without considering those accounts' quality much. Following Quality: average follower number of an account's all following accounts
  • 18. Inner Social Relationship Tend to follow many accounts without considering those accounts' quality much. FQ of 85% criminal accounts lower than 20,000. FQ of 45% normal accounts lower than 20,000.
  • 19. Inner Social Relationship Criminal accounts, belonging to the same criminal organizations.
  • 20. Inner Social Relationship Criminal accounts, belonging to the same criminal organizations. Group criminal accounts into different criminal campaigns by 2,060 malicious URL. 9,868 17 campaigns 8,667 edges 87.8 %
  • 21. Inner Social Relationship Provide followers to criminal accounts 1. Break the Following Limits Policy 2. Evade spam detection
  • 22. Inner Social Relationship Finding 2: Compared with criminal leaves, criminal hubs are more inclined to follow criminal accounts.
  • 23. Inner Social Relationship HITS algorithm to calculate hub score k-means algorithm to cluster them Relationship Graph criminal hubs: 90 criminal leaf: 1,970
  • 24. Inner Social Relationship Criminal Following Ratio (CFR): ratio of the number of an account’s criminal- followings to its total following number
  • 25. Inner Social Relationship CRF of 80% criminal hubs higher than 0.1. CRF of 20% criminal leaves higher than 0.1. CRF of 60% criminal leaves lower than 0.05.
  • 27. Inner Social Relationship Criminal hubs tend to obtain followers more effectively by following other criminal accounts. Shared Following Ratio (SFR): percentage of an account’s followers, who also follows at least one of this account’s criminal-followings
  • 28. Inner Social Relationship Criminal hubs tend to obtain followers more effectively by following other criminal accounts. SRF of 80% criminal hubs higher than 0.4. CRF of 5% criminal leaves higher than 0.4.
  • 29. Inner Social Relationship criminal hubs following leaves and acquiring their followers’ information criminal leaves randomly following other accounts to expect them to follow back
  • 31. Outer Social Relationship criminal supporters accounts outside the criminal community, who have close "follow relationships" with criminal accounts
  • 32. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) MR score: measuring how closely this account follows criminal accounts MR score
  • 33. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) 1. the more criminal accounts followed, the higher score 2. the further away from a criminal account, the lower score 3. the closer the support relationship between a Twitter account and a criminal account, the higher score
  • 34. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) Malicious Relevance Graph, G = (V,E,W) V: all accounts E: all follow relationship, directed edge W: weight for each edge, closeness of relationship
  • 35. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) MR Score Initialization: Mi = 1, if Vi is criminal account Mi = 0, if Vi is not criminal account
  • 36. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) MR Score Aggregation: an account’s score should sum up all the scores inherited from the accounts it follows C1 MR(C1) = M1 A C2 MR(A) = M1 + M2 MR(C2) = M2
  • 37. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) MR Score Dampening: the amount of MR score that an account inherits from other accounts should be multiplied by a dampening factor of α according to their social distances, where 0 < α < 1 C A1 A2 MR(C) = M MR(C) = α × M MR(C) = α2 × M
  • 38. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) MR Score Splitting: the amount of MR score that an account inherits from the accounts it follows should be multiplied by a A1 relationship-closeness factor MR(A1) = 0.5 × M C MR(C) = M A2 MR(A2) = 0.5 × M
  • 39. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) n: number of total nodes Iij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0
  • 40. Outer Social Relationship Malicious Relevance Score Propagation Algorithm (Mr.SPA) I: the column-vector normalized adjacency matrix of nodes
  • 41. Outer Social Relationship After Mr. SPA... use x-means algorithm to cluster accounts based 5,924 criminal supporters on their MR scores most accounts have most accounts do not relatively small scores have very close follow and are grouped into one relationships with criminal single cluster accounts
  • 42. Outer Social Relationship Social Butterflies Those accounts that have extraordinarily large numbers of followers and followings. use 2,000 following as a threshold 3,818 social butterflies
  • 43. Outer Social Relationship Social Butterflies The reason why social butterflies tend to have close friendships with criminals is mainly because most of them usually follow back the users who follow them without careful examinations.
  • 44. Outer Social Relationship Social Promoters Those accounts that have large following-follower ratios, larger following numbers and relatively high URL ratios. whose URL ratios are higher than 0.1, and following numbers and following-follower ratios are both at the top 10-percentile 508 social promoters
  • 45. Outer Social Relationship Social Promoters The reason why social promoters tend to have close friendships with criminal accounts is probably because most of them usually promote themselves or their business by actively following other accounts without considerations of those accounts’ quality.
  • 46. Outer Social Relationship Dummies Those accounts who post few tweets but have many followers. post fewer than 5 tweets and whose follower numbers are at the top 10-percentile 81 dummies
  • 47. Outer Social Relationship Dummies The reason why dummies intend to have close friendship with criminals is mainly because most of them are controlled or utilized by cyber criminals.
  • 48. Inferring Criminal Accounts The number of Twitter accounts is HUGE!
  • 49. Inferring Criminal Accounts Criminal account Inference Algorithm (CIA) start from a seed set
  • 50. Inferring Criminal Accounts Criminal account Inference Algorithm (CIA) 1. criminal accounts tend to be socially connected 2. criminal accounts usually share similar topics, thus having strong semantic coordinations among them
  • 51. Inferring Criminal Accounts Criminal account Inference Algorithm (CIA) Malicious Relevance Graph, G = (V,E,W) V: all accounts E: all follow relationship, directed edge W: weight for each edge, WS(i,j) P.S. SS: Semantic Similarity Score
  • 52. Inferring Criminal Accounts Criminal account Inference Algorithm (CIA) n: number of total nodes Iij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0
  • 53. Inferring Criminal Accounts Evaluation of CIA Dataset I: around half million accounts from our previous study [35] Dataset II: another new crawled 30,000 accounts by starting from 10 newly identified criminal accounts and using BFS strategy
  • 54. Inferring Criminal Accounts Evaluation of CIA Dataset I CA: Criminal Account MA: Malicious Affected Account Selection Strategies 100 seeds, select 4,000 accounts
  • 55. Inferring Criminal Accounts Evaluation of CIA Dataset I CA: Criminal Account MA: Malicious Affected Account Selection Sizes 100 seeds
  • 56. Inferring Criminal Accounts Evaluation of CIA Dataset I CA: Criminal Account MA: Malicious Affected Account Seed Sizes select 4,000 accounts
  • 57. Inferring Criminal Accounts Evaluation of CIA Dataset I CA: Criminal Account MA: Malicious Affected Account Seed Type 100 seeds, select 4,000 accounts
  • 58. Inferring Criminal Accounts Evaluation of CIA Dataset I CA: Criminal Account MA: Malicious Affected Account Recursive Inference 50 seeds, select 4,000 accounts
  • 59. Inferring Criminal Accounts Evaluation of CIA Dataset II CA: Criminal Account MA: Malicious Affected Account Seed Type 10 seeds, select 4,000 accounts
  • 60. Conclusion S Provide a macro scale to view the criminal accounts. Focus on analysis and use heuristic W method. And the detail of semantic similarity score has omitted. Find out many malicious accounts, O maybe analyze them can improve accuracy of CIA to detect criminal accounts. There is bias for training dataset, and T let CIA improve not much when detect criminal accounts.
  • 61. Thank You for Your Listening!