SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Content-Driven
Author Reputation and Text Trust
          for the Wikipedia
              Luca de Alfaro
               UC Santa Cruz

               Joint work with
  Bo Adler, Ian Pye, Caitlin Sadowski (UCSC)


                            Wikimania, August 2007
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Text Trust:
• Goal: provide a measure of the reliability of the text.
• Method: computed from the reputation of the authors
  who create and revise the text.
Reputation: Our guiding principles

• Do not alter the Wikipedia user experience
  – Compute reputation from content evolution, rather
    than user-to-user comments.
• Be welcoming to all users
  – Never publicly display user reputation values.
    Authors know only their own reputation.
• Be objective
  – Rely on content evolution rather than comments.
  – Quantitatively evaluate how well it works.
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
                   +
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                  A Wikipedia article

                         A
                              edits
                   +
time




              +          B
                              builds on A’s edit
                   -
                         C    reverts to A’s version
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
                                      -3
Content-driven reputation mitigates
reputation wars
                                             -2
Wars in user-driven reputation:      A                 B
                                             -3

Wars in content-driven reputation:

                   • B can badmouth A by undoing
                      her work
         A
             -     • But this is risky: if others then
         B            re-instate A’s work, it is B’s
                      reputation that suffers.
Content-driven reputation mitigates
reputation wars
                                              -2
Wars in user-driven reputation:      A                  B
                                              -3

Wars in content-driven reputation:

                     • B can badmouth A by undoing
                       her work
         A
             -       • But this is risky: if others then
                 +
         B             re-instate A’s work, it is B’s
           -           reputation that suffers.
          others?
Validation: Does our reputation have
predictive value?
              Time
Article 1
Article 2
Article 3
Article 4
  ...


              = edits by user A
Validation: Does our reputation have
predictive value?
                       Time
Article 1
Article 2
                                   E
Article 3
Article 4
  ...


                                       The longevity of an edit E
   The reputation of author A
                                        depends on the history
at the time of an edit E depends
                                            after the edit.
 on the history before the edit.


              Can we show a correlation between
            author reputation and edit longevity ?
Building a content-driven
             reputation system
                for Wikipedia



This is a summary; for details see:
B.T. Adler, L. de Alfaro. A Content Driven Reputation
System for the Wikipedia. In Proc. of WWW 2007.
What is a “contribution”?

     Text                        Edit

     bla   ei        bla   yak            bla bla


                                        buy viagra!
    bla    yak ei   yak    bla

                                          bla bla
We measure how
long the added
                    We measure how long the “edit”
text survives.
                    (reorganization) survives.
Based on text
                    Based on edit distance.
tracking.
Text


version 9      bla   bla wuga boink
                5     8   9    6

 version 10    bla   bla   wuga   wuga   wuga boink
                5     8     10     10     9    6


 We label each word with the version where it was
 introduced. This enables us to keep track of how
 long it lives.
Text: the destiny of a contribution


      of words
      number



                                       time
                                    (versions)

    Amount of        Amount of
     new text      surviving text



    The life of the text introduced at a revision.
Text: Longevity

            Tk
                                                 j-k
                                   Tk ¢ α text
          of words
          number



                                                  time
                               j
                     k                     (versions)

• Text longevity: the α text 2 [0,1] that yields the best
  geometrical approximation for the amount of residual
  text.
• Short-lived text: α text < 0.2 (at most 20% of the text
  makes it from one version to the next).
Text: Reputation update

          Tk
                                 Tj
        of words
        number



                                         time
                            j
                   k                  (versions)
                   Ak       Aj        (authors)


As a consequence of edit j, we increase the reputation
  of Ak by an amount proportional to Tj and to the
  reputation of Aj
Measuring surviving text

                                          “Dead” text
                “Live” text

Version
         wuga     boing bla ble     stored as “dead”
 9
          7        9     6     6


                                           wuga   boing bla ble
          buy    viagra now!
 10
                                            7      9     6    6
          10       10    10           t
                                   es h
                                   bc
                                     at
                                   m
         wuga     boing bla ble
 11
          7        9     6     6
      We track authorship of deleted text, and we match the text of
      new versions both with live and with dead text.
Edit

 k-1 d(k
        -   1,
                 k)

                      k judged
  d(k




                      d(k, j)
     -1,




                                 k<j
    j)




                 j     judge




We compute the edit distance between versions k-1, k, and
 j, with k < j
                                       (see paper for details on the distance)
Edit: good or bad?

        the past
                                                     k judged
  k-1
                                 the past
                                      k-1
                 k judged




                                                    , j)
  d(k




                                    d(k
                 d(k, j)




                                                 d(k
     -1,




                                       -1,
        j)




                                       j)
                                             j
             j    judge                          judge

     the future                       the future


k is good: d(k-1, j) > d(k, j)     k is bad: d(k-1, j) < d(k, j)
 “k went towards the future”        “k went against the future”
Edit: Longevity
 the past
                                                   Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                    k
 ogr
          )-d
              (k,j
              es s




                                        The fraction of change that is in
                   )
                    ”




                                          the same direction of the future.
                                            α edit ' 1: k is a good edit
                                        •
                                j           α edit ' -1: k is reverted
                                        •
                  the future
Edit: Updating reputation
 the past
                                         Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                  kA
 ogr
          )-d
              (k,j
              es s




                                     k
                                         Reputation update:
                   )
                    ”




                                         The reputation of Ak

                                         • increases if α edit > 0,
                                j Aj
                                         • decreases if α edit < 0.
                  the future

                                          (see paper for details)
Data Sets

• English till Feb 07 1,988,627 pages, 40,455,416 versions

• French till Feb 07    452,577 pages,   5,643,636 versions

• Italian till May 07   301,584 pages,   3,129,453 versions



The entire Wikipedias, with the whole history, not just a
  sample (we wanted to compute the reputation using all edits
  of each user).
Results: English Wikipedia, in detail

                                      % of edits below a given longevity

                       Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
                         0   16.922    93.11   91.65   89.15   83.76       73.53
                         1    1.191    77.24   69.83   65.60   61.11       56.00
                         2    1.335    69.53   57.08   49.79   45.71       41.25
log (1 + reputation)




                         3    1.627    38.00   28.61   20.23   16.16       13.62
                         4    2.780    32.84   22.31   13.32    9.57        8.04
                         5    4.408    41.70   15.76    5.90    3.80        2.57
                         6    6.698    29.40   16.74    7.54    4.35        3.12
                         7    8.281    32.04   15.16    5.44    2.25        1.40
                         8   12.233    34.06   16.64    6.78    3.79        2.73
                         9   44.524    32.55   15.51    5.05    1.88        1.14
Results: English Wikipedia, in detail

                                       % of edits below a given longevity

                        Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
low                       0   16.922    93.11   91.65   89.15   83.76       73.53
rep                       1    1.191    77.24   69.83   65.60   61.11       56.00
                          2    1.335    69.53   57.08   49.79   45.71       41.25
 log (1 + reputation)




                          3    1.627    38.00   28.61   20.23   16.16       13.62
                          4    2.780    32.84   22.31   13.32    9.57        8.04
                          5    4.408    41.70   15.76    5.90    3.80        2.57
                          6    6.698    29.40   16.74    7.54    4.35        3.12
                          7    8.281    32.04   15.16    5.44    2.25        1.40
                          8   12.233    34.06   16.64    6.78    3.79        2.73
                          9   44.524    32.55   15.51    5.05    1.88        1.14

                                                                  Short-Lived
Predictive power of low reputation

Low-reputation:      Lower 20% of range


            Short-lived edits         Short-lived text
                                          α text
                  α edit                           · 0.2
                           · -0.8
                                         (less than 20%
           (almost entirely undone)
                                      survives each revision)
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                                A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong




  Old text is colored according       New text is colored
  to the reputation of its original   according to the
  author, and of all subsequent       reputation of A
  revisors (including A).
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                           A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong


• On the English Wikipedia, we should be able to spot
  untrusted content with over 80% recall and 60%
  precision!
   – In fact, we do even better than this, as new content
     is always flagged lower trust (see next).
Demo: http://trust.cse.ucsc.edu/
Text trust: How is “Fogh” spelled?
Text Trust: more examples from the demo
Text Trust: Details

Trust depends on:
• Authorship: Author lends 50% of their reputation to
  the text they create.
   – Thus, even text from high-rep authors is only medium-
     rep when added: high trust is achieved only via multiple
     reviews, never via a single author.
• Revision: When an author of reputation r preserves a
  word of trust t < r, the word increases in trust to
                          t + 0.3(r – t)
• The algorithms still need fine-tuning.
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
Batch Implementation


                      periodic xml dumps
                          (to initialize)

                           edit feed
                       (to keep updated)
                                              Trust server
Wikipedia servers

• No need to affect the main Wikipedia servers
• People can click “check trust” and visit the trust server.
• Good for experimenting with new ideas
• Necessary to color the past (come up to speed).
On-Line Implementation

Process edits as they arrive:
• Benefit: real-time colorization of text
• Need to integrate the code in MediaWiki
• Time to process an edit: < 1s (not much longer than
  parsing it).
• Storage required: proportional to the size of the last
  revision (not to the total history size!)
• Can be easily used for other Wikis
My questions:
• Feedback?
• Do you like it?
• Should we try to set up a “trust server” with
  an edit feed from the Wikipedia?
• Try the demo:

       http://trust.cse.ucsc.edu/

Your questions?

Mais conteúdo relacionado

Destaque

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusRégis Vansnick
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxQuinchy Riya
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]HUB INSTITUTE
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Pptkmlaughl
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social MediaPaul Marsden
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Hicham Sabre
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringHUB INSTITUTE
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfEmarketing.fr
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPTvarun0912
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsAhmad Jawdat
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blancAref Jdey
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenVanksen
 

Destaque (13)

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenus
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociaux
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]
 
Online Reputation Management
Online Reputation ManagementOnline Reputation Management
Online Reputation Management
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Ppt
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social Media
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vf
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPT
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blanc
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by Vanksen
 

Mais de nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Archnextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conferencenextlib
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New Worldnextlib
 
Social Graph
Social GraphSocial Graph
Social Graphnextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Predictionnextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Javanextlib
 
SVD review
SVD reviewSVD review
SVD reviewnextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlersnextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systemsnextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Designnextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bigtable
BigtableBigtable
Bigtablenextlib
 

Mais de nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

UC Santa Cruz Author Provides Content-Driven Reputation Model for Wikipedia

  • 1. Content-Driven Author Reputation and Text Trust for the Wikipedia Luca de Alfaro UC Santa Cruz Joint work with Bo Adler, Ian Pye, Caitlin Sadowski (UCSC) Wikimania, August 2007
  • 2. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions.
  • 3. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions. Text Trust: • Goal: provide a measure of the reliability of the text. • Method: computed from the reputation of the authors who create and revise the text.
  • 4. Reputation: Our guiding principles • Do not alter the Wikipedia user experience – Compute reputation from content evolution, rather than user-to-user comments. • Be welcoming to all users – Never publicly display user reputation values. Authors know only their own reputation. • Be objective – Rely on content evolution rather than comments. – Quantitatively evaluate how well it works.
  • 5. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article time
  • 6. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time
  • 7. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time B builds on A’s edit
  • 8. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time B builds on A’s edit
  • 9. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time + B builds on A’s edit - C reverts to A’s version
  • 10. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B
  • 11. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3
  • 12. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then B re-instate A’s work, it is B’s reputation that suffers.
  • 13. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then + B re-instate A’s work, it is B’s - reputation that suffers. others?
  • 14. Validation: Does our reputation have predictive value? Time Article 1 Article 2 Article 3 Article 4 ... = edits by user A
  • 15. Validation: Does our reputation have predictive value? Time Article 1 Article 2 E Article 3 Article 4 ... The longevity of an edit E The reputation of author A depends on the history at the time of an edit E depends after the edit. on the history before the edit. Can we show a correlation between author reputation and edit longevity ?
  • 16. Building a content-driven reputation system for Wikipedia This is a summary; for details see: B.T. Adler, L. de Alfaro. A Content Driven Reputation System for the Wikipedia. In Proc. of WWW 2007.
  • 17. What is a “contribution”? Text Edit bla ei bla yak bla bla buy viagra! bla yak ei yak bla bla bla We measure how long the added We measure how long the “edit” text survives. (reorganization) survives. Based on text Based on edit distance. tracking.
  • 18. Text version 9 bla bla wuga boink 5 8 9 6 version 10 bla bla wuga wuga wuga boink 5 8 10 10 9 6 We label each word with the version where it was introduced. This enables us to keep track of how long it lives.
  • 19. Text: the destiny of a contribution of words number time (versions) Amount of Amount of new text surviving text The life of the text introduced at a revision.
  • 20. Text: Longevity Tk j-k Tk ¢ α text of words number time j k (versions) • Text longevity: the α text 2 [0,1] that yields the best geometrical approximation for the amount of residual text. • Short-lived text: α text < 0.2 (at most 20% of the text makes it from one version to the next).
  • 21. Text: Reputation update Tk Tj of words number time j k (versions) Ak Aj (authors) As a consequence of edit j, we increase the reputation of Ak by an amount proportional to Tj and to the reputation of Aj
  • 22. Measuring surviving text “Dead” text “Live” text Version wuga boing bla ble stored as “dead” 9 7 9 6 6 wuga boing bla ble buy viagra now! 10 7 9 6 6 10 10 10 t es h bc at m wuga boing bla ble 11 7 9 6 6 We track authorship of deleted text, and we match the text of new versions both with live and with dead text.
  • 23. Edit k-1 d(k - 1, k) k judged d(k d(k, j) -1, k<j j) j judge We compute the edit distance between versions k-1, k, and j, with k < j (see paper for details on the distance)
  • 24. Edit: good or bad? the past k judged k-1 the past k-1 k judged , j) d(k d(k d(k, j) d(k -1, -1, j) j) j j judge judge the future the future k is good: d(k-1, j) > d(k, j) k is bad: d(k-1, j) < d(k, j) “k went towards the future” “k went against the future”
  • 25. Edit: Longevity the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j k ogr )-d (k,j es s The fraction of change that is in ) ” the same direction of the future. α edit ' 1: k is a good edit • j α edit ' -1: k is reverted • the future
  • 26. Edit: Updating reputation the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j kA ogr )-d (k,j es s k Reputation update: ) ” The reputation of Ak • increases if α edit > 0, j Aj • decreases if α edit < 0. the future (see paper for details)
  • 27. Data Sets • English till Feb 07 1,988,627 pages, 40,455,416 versions • French till Feb 07 452,577 pages, 5,643,636 versions • Italian till May 07 301,584 pages, 3,129,453 versions The entire Wikipedias, with the whole history, not just a sample (we wanted to compute the reputation using all edits of each user).
  • 28. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 0 16.922 93.11 91.65 89.15 83.76 73.53 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14
  • 29. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 low 0 16.922 93.11 91.65 89.15 83.76 73.53 rep 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14 Short-Lived
  • 30. Predictive power of low reputation Low-reputation: Lower 20% of range Short-lived edits Short-lived text α text α edit · 0.2 · -0.8 (less than 20% (almost entirely undone) survives each revision)
  • 31. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong Old text is colored according New text is colored to the reputation of its original according to the author, and of all subsequent reputation of A revisors (including A).
  • 32. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong • On the English Wikipedia, we should be able to spot untrusted content with over 80% recall and 60% precision! – In fact, we do even better than this, as new content is always flagged lower trust (see next).
  • 34. Text trust: How is “Fogh” spelled?
  • 35. Text Trust: more examples from the demo
  • 36. Text Trust: Details Trust depends on: • Authorship: Author lends 50% of their reputation to the text they create. – Thus, even text from high-rep authors is only medium- rep when added: high trust is achieved only via multiple reviews, never via a single author. • Revision: When an author of reputation r preserves a word of trust t < r, the word increases in trust to t + 0.3(r – t) • The algorithms still need fine-tuning.
  • 37. From fresh to trusted text
  • 38. From fresh to trusted text
  • 39. From fresh to trusted text
  • 40. From fresh to trusted text
  • 41. From fresh to trusted text
  • 42. Batch Implementation periodic xml dumps (to initialize) edit feed (to keep updated) Trust server Wikipedia servers • No need to affect the main Wikipedia servers • People can click “check trust” and visit the trust server. • Good for experimenting with new ideas • Necessary to color the past (come up to speed).
  • 43. On-Line Implementation Process edits as they arrive: • Benefit: real-time colorization of text • Need to integrate the code in MediaWiki • Time to process an edit: < 1s (not much longer than parsing it). • Storage required: proportional to the size of the last revision (not to the total history size!) • Can be easily used for other Wikis
  • 44. My questions: • Feedback? • Do you like it? • Should we try to set up a “trust server” with an edit feed from the Wikipedia? • Try the demo: http://trust.cse.ucsc.edu/ Your questions?