SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Tags vs Shelves:
From Social Tagging to Social Classification
                    Hypertext 2011


Arkaitz Zubiaga, Christian K¨rner, Markus Strohmaier
                            o

                  UNED (Madrid, Spain)
                              &
        Graz University of Technology (Graz, Austria)


                     June 8th, 2011
Motivation


Index


1     Motivation

2     User Behavior Measures

3     Experiments

4     Results

5     Conclusions & Outlook




    Zubiaga, K¨rner, Strohmaier ()
              o                          Tags vs Shelves   June 8th, 2011   2 / 31
Motivation


Book Cataloging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   3 / 31
Motivation


Book Cataloging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   4 / 31
Motivation


Book Cataloging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   5 / 31
Motivation


Book Cataloging




      Librarians have been cataloging books for centuries.
      The task of manually cataloging books becomes very expensive and
      effortful for large collections.
             For instance, the Library of Congress reported an average cost of $94.58
             for cataloging each book in 2002 (291,749 books, total: $27.5 million)
      Given the enormous costs and efforts required for the task, research is
      moving towards automatic classification.




 Zubiaga, K¨rner, Strohmaier ()
           o                           Tags vs Shelves               June 8th, 2011   6 / 31
Motivation


Automatic Classification of Books



         Problem: it is not easy to get data representing the aboutness of the
         books.
                 In addition, content of books is not always available digitally.
         Solution:
                 Social tags provided by users have shown to be helpful (Zubiaga et al,
                 2009)1 .
                 Social tagging sites like LibraryThing and GoodReads are gathering
                 vast amounts of tag annotations on books.




    1
        A. Zubiaga, R. Mart´
                           ınez, V. Fresno. Getting the Most Out of Social Annotations for Web Page Classification. DocEng
2009.
  Zubiaga, K¨rner, Strohmaier ()
            o                                          Tags vs Shelves                              June 8th, 2011     7 / 31
Motivation


Tagging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   8 / 31
Motivation


Social Tagging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   9 / 31
Motivation


Social Tagging




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   10 / 31
Motivation


Problem Statement




Can we find a type of user whose tags further resemble the categorization
by experts?
Can we characterize those users?




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves     June 8th, 2011   11 / 31
Motivation


User Behavior

        K¨rner et al.2 suggested and described the existence of two kinds of
         o
        user behavior: Categorizers and Describers.

                                                                Categorizer        Describer
                 Goal of Tagging                              later browsing     later retrieval
                 Change of Tag Vocabulary                          costly            cheap
                 Size of Tag Vocabulary                           limited             open
                 Tags                                           subjective          objective


        Previous works suggest that Describers rather help infer semantic
        relations among tags.
        Our goal is to discover whether this kind of tagging behavior affects
        the usefulness of tags as to the social classification of books.

   2
       C. K¨rner. Understanding the Motivation behind Tagging. Hypertext 2009.
           o
 Zubiaga, K¨rner, Strohmaier ()
           o                                          Tags vs Shelves                    June 8th, 2011   12 / 31
Motivation


User Behavior




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   13 / 31
Motivation


User Behavior




 Zubiaga, K¨rner, Strohmaier ()
           o                          Tags vs Shelves   June 8th, 2011   14 / 31
User Behavior Measures


Index


1     Motivation

2     User Behavior Measures

3     Experiments

4     Results

5     Conclusions & Outlook




    Zubiaga, K¨rner, Strohmaier ()
              o                                      Tags vs Shelves   June 8th, 2011   15 / 31
User Behavior Measures


User Behavior Measures
        Tags per Post (TPP) – Verbosity
                                                                         r
                                                                              |Tur |
                                                  TPP(u) =                                                               (1)
                                                                             |Ru |
        Orphan Ratio (ORPHAN) – Diversity
                                                               |R(tmax )|
                                                    n=                                                                   (2)
                                                                 100
                                     o
                                   |Tu | o
                                        , T = {t||R(t)| ≤ n}
                              ORPHAN(u) =                                                                                (3)
                                   |Tu | u
        Tag Resource Ratio (TRR) – Verbosity + Diversity
                                                                             |Tu |
                                                      TRR(u) =                                                           (4)
                                                                             |Ru |
C. K¨rner, R. Kern, H.-P. Grahsl, and M. Strohmaier. Of categorizers and Describers: an evaluation of quantitative measures for
    o
tagging motivation. Hypertext 2010.
  Zubiaga, K¨rner, Strohmaier ()
            o                                          Tags vs Shelves                               June 8th, 2011      16 / 31
User Behavior Measures


Computing measures




      These 3 measures provide a weight for each user.
      These weights enable to infer a ranking of users according to each
      measure.
      From these rankings, we choose subsets of users as extreme
      Categorizers (highest-ranked) and extreme Describers (lowest-ranked).
      Subsets range from 10% to 100%, with a step size of 10%.




 Zubiaga, K¨rner, Strohmaier ()
           o                                      Tags vs Shelves   June 8th, 2011   17 / 31
User Behavior Measures


Book Cataloging

      We select subsets of users according to number of tag assignments.
      Selecting by percents of users would be unfair, since it would provide
      different amounts of data.




 Zubiaga, K¨rner, Strohmaier ()
           o                                      Tags vs Shelves   June 8th, 2011   18 / 31
User Behavior Measures


Objective




      We aim at analyzing whether:
             Categorizers provide tags that further help infer categorization
             performed by experts.
             Describers provide tags that further resemble book descriptions.




 Zubiaga, K¨rner, Strohmaier ()
           o                                      Tags vs Shelves   June 8th, 2011   19 / 31
Experiments


Index


1     Motivation

2     User Behavior Measures

3     Experiments

4     Results

5     Conclusions & Outlook




    Zubiaga, K¨rner, Strohmaier ()
              o                           Tags vs Shelves   June 8th, 2011   20 / 31
Experiments


Datasets



      Set of 38,149 popular books, with categorization data made by
      experts:
             27,299 categorized according to DDC (10 categories).
             24,861 categorized according to LCC (20 categories).
      Tagging data from 153k+ users on LibraryThing and 110k+ users on
      GoodReads (100+ users annotated each book).
      Additional descriptive data:
             Book synopses (Barnes&Noble).
             User reviews (LibraryThing, GoodReads, and Amazon.com).
             Editorial reviews (Amazon.com).




 Zubiaga, K¨rner, Strohmaier ()
           o                           Tags vs Shelves              June 8th, 2011   21 / 31
Experiments


Tag-based Book Classification




        Software: Multiclass Support Vector Machines (svm-multiclass3 ).
        Vectorial representation of books, using tag frequency values.
        We perform 6 different training set selections of 18,000 books, and
        show the average accuracy.
                        #correctguesses
        Accuracy:          #testset     .




   3
       http://svmlight.joachims.org/svm multiclass.html
 Zubiaga, K¨rner, Strohmaier ()
           o                                 Tags vs Shelves   June 8th, 2011   22 / 31
Experiments


Descriptiveness of Tags


      Vectorial representation of books (Tr ), using tag frequency values.
      Vectorial representation of books (Rr ), using term frequency values
      on descriptive data (synopses, reviews).
      Cosine similarity between Tr and Rr :

                                                         Tr · Rr
                         similarityr = cos(θr ) =                    =
                                                         Tr Rr
                             n
                                          Tri       × Rri
                                   n                         n                              (5)
                                              2     ×                  2
                           i=1     i=1 (Tri )                i=1 (Rri )




 Zubiaga, K¨rner, Strohmaier ()
           o                               Tags vs Shelves                 June 8th, 2011   23 / 31
Results


Index


1     Motivation

2     User Behavior Measures

3     Experiments

4     Results

5     Conclusions & Outlook




    Zubiaga, K¨rner, Strohmaier ()
              o                       Tags vs Shelves   June 8th, 2011   24 / 31
Results


Results

                                 GoodReads                                        LibraryThing

                   TPP (verb.)   TRR (div.)   ORP. (verb. + div.)   TPP (verb.)   TRR (div.)       ORP. (verb. + div.)
 Classification
 Descriptiveness




     1             TPP measure: Categorizers outperform Describers for classification.
     2             All the measures (though especially TRR): Describers further
                   resemble descriptive data.
 Zubiaga, K¨rner, Strohmaier ()
           o                                            Tags vs Shelves                          June 8th, 2011    25 / 31
Results


Results

                                 GoodReads                                        LibraryThing

                   TPP (verb.)   TRR (div.)   ORP. (verb. + div.)   TPP (verb.)   TRR (div.)       ORP. (verb. + div.)
 Classification
 Descriptiveness




     3             Verbosity helps find extreme Categorizers.
                         Users who think of a specific shelf to place the book tend to assign a
                         tag identifying the shelf.
 Zubiaga, K¨rner, Strohmaier ()
           o                                            Tags vs Shelves                          June 8th, 2011    26 / 31
Results


Results

                                 GoodReads                                        LibraryThing

                   TPP (verb.)   TRR (div.)   ORP. (verb. + div.)   TPP (verb.)   TRR (div.)       ORP. (verb. + div.)
 Classification
 Descriptiveness




     4             Diversity does not work to find Categorizers on GoodReads.
                         GoodReads suggests previously used tags to the user, so that it affects
                         diversity of tags.
 Zubiaga, K¨rner, Strohmaier ()
           o                                            Tags vs Shelves                          June 8th, 2011    27 / 31
Results


Results

                                 GoodReads                                        LibraryThing

                   TPP (verb.)   TRR (div.)   ORP. (verb. + div.)   TPP (verb.)   TRR (div.)       ORP. (verb. + div.)
 Classification
 Descriptiveness




     5             Users providing non-descriptive tags (i.e., different from Describers)
                   produce more accurate classification.
 Zubiaga, K¨rner, Strohmaier ()
           o                                            Tags vs Shelves                          June 8th, 2011    28 / 31
Conclusions & Outlook


Index


1     Motivation

2     User Behavior Measures

3     Experiments

4     Results

5     Conclusions & Outlook




    Zubiaga, K¨rner, Strohmaier ()
              o                                     Tags vs Shelves   June 8th, 2011   29 / 31
Conclusions & Outlook


Conclusions & Outlook



      Social classification of books with tagging data, discriminating
      extreme Categorizers and Describers.
      It complements previous research by showing that users so-called
      Categorizers produce more accurate classification.
      Non-verbose, non-descriptive, shelf-driven tagging produces more
      accurate classification of books.
      Outlook: Further analyzing tagging behavior to find: generalists
      (users who provide general tags), and specialists (users who provide
      more specific tags rather focused on the subject).




 Zubiaga, K¨rner, Strohmaier ()
           o                                     Tags vs Shelves   June 8th, 2011   30 / 31
Conclusions & Outlook


Thank You


Achiu    Arigato                   Danke Dhannvaad Dua Netjer en ek Efcharisto
     Gracias Gr`cies
               a     Gratia Grazie Guishepeli
   Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila
                     o o o          e
   esker Obrigado Shukran          Tack Tak Takk              Shukriya

   T¨nan Tapadh leat Tesekk¨r ederim Thank
    a                       u
                                             you          Toda
      E-mail: azubiaga@lsi.uned.es
           @arkaitz

 Zubiaga, K¨rner, Strohmaier ()
           o                                     Tags vs Shelves         June 8th, 2011   31 / 31

Mais conteúdo relacionado

Semelhante a Tags vs Shelves: From Social Tagging to Social Classification

Improving Personal Tagging Consistency Through Visualization Of Tag
Improving Personal Tagging Consistency Through Visualization Of TagImproving Personal Tagging Consistency Through Visualization Of Tag
Improving Personal Tagging Consistency Through Visualization Of Tag
Qin Gao
 
Saleegul summary
Saleegul summarySaleegul summary
Saleegul summary
Javed Riza
 
Understanding Differential Item Functioning and Item bias In Psychological In...
Understanding Differential Item Functioning and Item bias In Psychological In...Understanding Differential Item Functioning and Item bias In Psychological In...
Understanding Differential Item Functioning and Item bias In Psychological In...
CrimsonpublishersPPrs
 
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docxPSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
amrit47
 
20130905_Feng_Chia_GIS_center_geospatial_ontology
20130905_Feng_Chia_GIS_center_geospatial_ontology20130905_Feng_Chia_GIS_center_geospatial_ontology
20130905_Feng_Chia_GIS_center_geospatial_ontology
Dongpo Deng
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classification
azubiaga
 
Towards Mining Semantic Maturity in Social Bookmarking Systems
Towards Mining Semantic Maturity in Social Bookmarking SystemsTowards Mining Semantic Maturity in Social Bookmarking Systems
Towards Mining Semantic Maturity in Social Bookmarking Systems
Inovex GmbH
 
Publications2016
Publications2016Publications2016
Publications2016
Tamsin Lee
 

Semelhante a Tags vs Shelves: From Social Tagging to Social Classification (20)

Improving Personal Tagging Consistency Through Visualization Of Tag
Improving Personal Tagging Consistency Through Visualization Of TagImproving Personal Tagging Consistency Through Visualization Of Tag
Improving Personal Tagging Consistency Through Visualization Of Tag
 
Penulisan artikel untuk jurnal/Writing of an article journal
Penulisan artikel untuk jurnal/Writing of an article journalPenulisan artikel untuk jurnal/Writing of an article journal
Penulisan artikel untuk jurnal/Writing of an article journal
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
Saleegul summary
Saleegul summarySaleegul summary
Saleegul summary
 
Understanding Differential Item Functioning and Item bias In Psychological In...
Understanding Differential Item Functioning and Item bias In Psychological In...Understanding Differential Item Functioning and Item bias In Psychological In...
Understanding Differential Item Functioning and Item bias In Psychological In...
 
Visualizing Consensus with Online Ontologies to Support Quality in Ontology D...
Visualizing Consensus with Online Ontologies to Support Quality in Ontology D...Visualizing Consensus with Online Ontologies to Support Quality in Ontology D...
Visualizing Consensus with Online Ontologies to Support Quality in Ontology D...
 
Ontology and its various aspects
Ontology and its various aspectsOntology and its various aspects
Ontology and its various aspects
 
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docxPSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
PSY-850 Lecture 4Read chapters 3 and 4.Objectives Different.docx
 
Rachel howell values_paper
Rachel howell values_paperRachel howell values_paper
Rachel howell values_paper
 
Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...
Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...
Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...
 
Co word analysis
Co word analysisCo word analysis
Co word analysis
 
20130905_Feng_Chia_GIS_center_geospatial_ontology
20130905_Feng_Chia_GIS_center_geospatial_ontology20130905_Feng_Chia_GIS_center_geospatial_ontology
20130905_Feng_Chia_GIS_center_geospatial_ontology
 
Cataloguing of learning objects using social tagging
Cataloguing of learning objects using social taggingCataloguing of learning objects using social tagging
Cataloguing of learning objects using social tagging
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classification
 
Temporal dynamics of human behavior in social networks (i)
Temporal dynamics of human behavior in social networks (i)Temporal dynamics of human behavior in social networks (i)
Temporal dynamics of human behavior in social networks (i)
 
Sas project: Equivalency test of word lists in noisy background
Sas project: Equivalency test of word lists in noisy backgroundSas project: Equivalency test of word lists in noisy background
Sas project: Equivalency test of word lists in noisy background
 
Qualitative research types
Qualitative research typesQualitative research types
Qualitative research types
 
Towards Mining Semantic Maturity in Social Bookmarking Systems
Towards Mining Semantic Maturity in Social Bookmarking SystemsTowards Mining Semantic Maturity in Social Bookmarking Systems
Towards Mining Semantic Maturity in Social Bookmarking Systems
 
Ratings, tags, bookmarks and other species: some examples of quantitative res...
Ratings, tags, bookmarks and other species: some examples of quantitative res...Ratings, tags, bookmarks and other species: some examples of quantitative res...
Ratings, tags, bookmarks and other species: some examples of quantitative res...
 
Publications2016
Publications2016Publications2016
Publications2016
 

Mais de azubiaga

Mining Twitter for Real-Time Trend and Information Discovery
Mining Twitter for Real-Time Trend and Information DiscoveryMining Twitter for Real-Time Trend and Information Discovery
Mining Twitter for Real-Time Trend and Information Discovery
azubiaga
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
azubiaga
 

Mais de azubiaga (13)

Exploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaExploiting context for rumour detection in social media
Exploiting context for rumour detection in social media
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
 
Curating and Contextualizing Twitter Stories to Assist with Social Newsgathering
Curating and Contextualizing Twitter Stories to Assist with Social NewsgatheringCurating and Contextualizing Twitter Stories to Assist with Social Newsgathering
Curating and Contextualizing Twitter Stories to Assist with Social Newsgathering
 
Mining Twitter for Real-Time Trend and Information Discovery
Mining Twitter for Real-Time Trend and Information DiscoveryMining Twitter for Real-Time Trend and Information Discovery
Mining Twitter for Real-Time Trend and Information Discovery
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Sociales
 
Content-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud VisualizationContent-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud Visualization
 
Getting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page ClassificationGetting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page Classification
 
Enhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social TagsEnhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social Tags
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
 
Etiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzeaEtiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzea
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentation
 

Tags vs Shelves: From Social Tagging to Social Classification

  • 1. Tags vs Shelves: From Social Tagging to Social Classification Hypertext 2011 Arkaitz Zubiaga, Christian K¨rner, Markus Strohmaier o UNED (Madrid, Spain) & Graz University of Technology (Graz, Austria) June 8th, 2011
  • 2. Motivation Index 1 Motivation 2 User Behavior Measures 3 Experiments 4 Results 5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 2 / 31
  • 3. Motivation Book Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 3 / 31
  • 4. Motivation Book Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 4 / 31
  • 5. Motivation Book Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 5 / 31
  • 6. Motivation Book Cataloging Librarians have been cataloging books for centuries. The task of manually cataloging books becomes very expensive and effortful for large collections. For instance, the Library of Congress reported an average cost of $94.58 for cataloging each book in 2002 (291,749 books, total: $27.5 million) Given the enormous costs and efforts required for the task, research is moving towards automatic classification. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 6 / 31
  • 7. Motivation Automatic Classification of Books Problem: it is not easy to get data representing the aboutness of the books. In addition, content of books is not always available digitally. Solution: Social tags provided by users have shown to be helpful (Zubiaga et al, 2009)1 . Social tagging sites like LibraryThing and GoodReads are gathering vast amounts of tag annotations on books. 1 A. Zubiaga, R. Mart´ ınez, V. Fresno. Getting the Most Out of Social Annotations for Web Page Classification. DocEng 2009. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 7 / 31
  • 8. Motivation Tagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 8 / 31
  • 9. Motivation Social Tagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 9 / 31
  • 10. Motivation Social Tagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 10 / 31
  • 11. Motivation Problem Statement Can we find a type of user whose tags further resemble the categorization by experts? Can we characterize those users? Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 11 / 31
  • 12. Motivation User Behavior K¨rner et al.2 suggested and described the existence of two kinds of o user behavior: Categorizers and Describers. Categorizer Describer Goal of Tagging later browsing later retrieval Change of Tag Vocabulary costly cheap Size of Tag Vocabulary limited open Tags subjective objective Previous works suggest that Describers rather help infer semantic relations among tags. Our goal is to discover whether this kind of tagging behavior affects the usefulness of tags as to the social classification of books. 2 C. K¨rner. Understanding the Motivation behind Tagging. Hypertext 2009. o Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 12 / 31
  • 13. Motivation User Behavior Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 13 / 31
  • 14. Motivation User Behavior Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 14 / 31
  • 15. User Behavior Measures Index 1 Motivation 2 User Behavior Measures 3 Experiments 4 Results 5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 15 / 31
  • 16. User Behavior Measures User Behavior Measures Tags per Post (TPP) – Verbosity r |Tur | TPP(u) = (1) |Ru | Orphan Ratio (ORPHAN) – Diversity |R(tmax )| n= (2) 100 o |Tu | o , T = {t||R(t)| ≤ n} ORPHAN(u) = (3) |Tu | u Tag Resource Ratio (TRR) – Verbosity + Diversity |Tu | TRR(u) = (4) |Ru | C. K¨rner, R. Kern, H.-P. Grahsl, and M. Strohmaier. Of categorizers and Describers: an evaluation of quantitative measures for o tagging motivation. Hypertext 2010. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 16 / 31
  • 17. User Behavior Measures Computing measures These 3 measures provide a weight for each user. These weights enable to infer a ranking of users according to each measure. From these rankings, we choose subsets of users as extreme Categorizers (highest-ranked) and extreme Describers (lowest-ranked). Subsets range from 10% to 100%, with a step size of 10%. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 17 / 31
  • 18. User Behavior Measures Book Cataloging We select subsets of users according to number of tag assignments. Selecting by percents of users would be unfair, since it would provide different amounts of data. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 18 / 31
  • 19. User Behavior Measures Objective We aim at analyzing whether: Categorizers provide tags that further help infer categorization performed by experts. Describers provide tags that further resemble book descriptions. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 19 / 31
  • 20. Experiments Index 1 Motivation 2 User Behavior Measures 3 Experiments 4 Results 5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 20 / 31
  • 21. Experiments Datasets Set of 38,149 popular books, with categorization data made by experts: 27,299 categorized according to DDC (10 categories). 24,861 categorized according to LCC (20 categories). Tagging data from 153k+ users on LibraryThing and 110k+ users on GoodReads (100+ users annotated each book). Additional descriptive data: Book synopses (Barnes&Noble). User reviews (LibraryThing, GoodReads, and Amazon.com). Editorial reviews (Amazon.com). Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 21 / 31
  • 22. Experiments Tag-based Book Classification Software: Multiclass Support Vector Machines (svm-multiclass3 ). Vectorial representation of books, using tag frequency values. We perform 6 different training set selections of 18,000 books, and show the average accuracy. #correctguesses Accuracy: #testset . 3 http://svmlight.joachims.org/svm multiclass.html Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 22 / 31
  • 23. Experiments Descriptiveness of Tags Vectorial representation of books (Tr ), using tag frequency values. Vectorial representation of books (Rr ), using term frequency values on descriptive data (synopses, reviews). Cosine similarity between Tr and Rr : Tr · Rr similarityr = cos(θr ) = = Tr Rr n Tri × Rri n n (5) 2 × 2 i=1 i=1 (Tri ) i=1 (Rri ) Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 23 / 31
  • 24. Results Index 1 Motivation 2 User Behavior Measures 3 Experiments 4 Results 5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 24 / 31
  • 25. Results Results GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 1 TPP measure: Categorizers outperform Describers for classification. 2 All the measures (though especially TRR): Describers further resemble descriptive data. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 25 / 31
  • 26. Results Results GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 3 Verbosity helps find extreme Categorizers. Users who think of a specific shelf to place the book tend to assign a tag identifying the shelf. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 26 / 31
  • 27. Results Results GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 4 Diversity does not work to find Categorizers on GoodReads. GoodReads suggests previously used tags to the user, so that it affects diversity of tags. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 27 / 31
  • 28. Results Results GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 5 Users providing non-descriptive tags (i.e., different from Describers) produce more accurate classification. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 28 / 31
  • 29. Conclusions & Outlook Index 1 Motivation 2 User Behavior Measures 3 Experiments 4 Results 5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 29 / 31
  • 30. Conclusions & Outlook Conclusions & Outlook Social classification of books with tagging data, discriminating extreme Categorizers and Describers. It complements previous research by showing that users so-called Categorizers produce more accurate classification. Non-verbose, non-descriptive, shelf-driven tagging produces more accurate classification of books. Outlook: Further analyzing tagging behavior to find: generalists (users who provide general tags), and specialists (users who provide more specific tags rather focused on the subject). Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 30 / 31
  • 31. Conclusions & Outlook Thank You Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto Gracias Gr`cies a Gratia Grazie Guishepeli Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila o o o e esker Obrigado Shukran Tack Tak Takk Shukriya T¨nan Tapadh leat Tesekk¨r ederim Thank a u you Toda E-mail: azubiaga@lsi.uned.es @arkaitz Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 31 / 31