Tags vs Shelves: From Social Tagging to Social Classification
1. Tags vs Shelves:
From Social Tagging to Social Classification
Hypertext 2011
Arkaitz Zubiaga, Christian K¨rner, Markus Strohmaier
o
UNED (Madrid, Spain)
&
Graz University of Technology (Graz, Austria)
June 8th, 2011
2. Motivation
Index
1 Motivation
2 User Behavior Measures
3 Experiments
4 Results
5 Conclusions & Outlook
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 2 / 31
6. Motivation
Book Cataloging
Librarians have been cataloging books for centuries.
The task of manually cataloging books becomes very expensive and
effortful for large collections.
For instance, the Library of Congress reported an average cost of $94.58
for cataloging each book in 2002 (291,749 books, total: $27.5 million)
Given the enormous costs and efforts required for the task, research is
moving towards automatic classification.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 6 / 31
7. Motivation
Automatic Classification of Books
Problem: it is not easy to get data representing the aboutness of the
books.
In addition, content of books is not always available digitally.
Solution:
Social tags provided by users have shown to be helpful (Zubiaga et al,
2009)1 .
Social tagging sites like LibraryThing and GoodReads are gathering
vast amounts of tag annotations on books.
1
A. Zubiaga, R. Mart´
ınez, V. Fresno. Getting the Most Out of Social Annotations for Web Page Classification. DocEng
2009.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 7 / 31
11. Motivation
Problem Statement
Can we find a type of user whose tags further resemble the categorization
by experts?
Can we characterize those users?
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 11 / 31
12. Motivation
User Behavior
K¨rner et al.2 suggested and described the existence of two kinds of
o
user behavior: Categorizers and Describers.
Categorizer Describer
Goal of Tagging later browsing later retrieval
Change of Tag Vocabulary costly cheap
Size of Tag Vocabulary limited open
Tags subjective objective
Previous works suggest that Describers rather help infer semantic
relations among tags.
Our goal is to discover whether this kind of tagging behavior affects
the usefulness of tags as to the social classification of books.
2
C. K¨rner. Understanding the Motivation behind Tagging. Hypertext 2009.
o
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 12 / 31
15. User Behavior Measures
Index
1 Motivation
2 User Behavior Measures
3 Experiments
4 Results
5 Conclusions & Outlook
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 15 / 31
16. User Behavior Measures
User Behavior Measures
Tags per Post (TPP) – Verbosity
r
|Tur |
TPP(u) = (1)
|Ru |
Orphan Ratio (ORPHAN) – Diversity
|R(tmax )|
n= (2)
100
o
|Tu | o
, T = {t||R(t)| ≤ n}
ORPHAN(u) = (3)
|Tu | u
Tag Resource Ratio (TRR) – Verbosity + Diversity
|Tu |
TRR(u) = (4)
|Ru |
C. K¨rner, R. Kern, H.-P. Grahsl, and M. Strohmaier. Of categorizers and Describers: an evaluation of quantitative measures for
o
tagging motivation. Hypertext 2010.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 16 / 31
17. User Behavior Measures
Computing measures
These 3 measures provide a weight for each user.
These weights enable to infer a ranking of users according to each
measure.
From these rankings, we choose subsets of users as extreme
Categorizers (highest-ranked) and extreme Describers (lowest-ranked).
Subsets range from 10% to 100%, with a step size of 10%.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 17 / 31
18. User Behavior Measures
Book Cataloging
We select subsets of users according to number of tag assignments.
Selecting by percents of users would be unfair, since it would provide
different amounts of data.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 18 / 31
19. User Behavior Measures
Objective
We aim at analyzing whether:
Categorizers provide tags that further help infer categorization
performed by experts.
Describers provide tags that further resemble book descriptions.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 19 / 31
20. Experiments
Index
1 Motivation
2 User Behavior Measures
3 Experiments
4 Results
5 Conclusions & Outlook
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 20 / 31
21. Experiments
Datasets
Set of 38,149 popular books, with categorization data made by
experts:
27,299 categorized according to DDC (10 categories).
24,861 categorized according to LCC (20 categories).
Tagging data from 153k+ users on LibraryThing and 110k+ users on
GoodReads (100+ users annotated each book).
Additional descriptive data:
Book synopses (Barnes&Noble).
User reviews (LibraryThing, GoodReads, and Amazon.com).
Editorial reviews (Amazon.com).
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 21 / 31
22. Experiments
Tag-based Book Classification
Software: Multiclass Support Vector Machines (svm-multiclass3 ).
Vectorial representation of books, using tag frequency values.
We perform 6 different training set selections of 18,000 books, and
show the average accuracy.
#correctguesses
Accuracy: #testset .
3
http://svmlight.joachims.org/svm multiclass.html
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 22 / 31
23. Experiments
Descriptiveness of Tags
Vectorial representation of books (Tr ), using tag frequency values.
Vectorial representation of books (Rr ), using term frequency values
on descriptive data (synopses, reviews).
Cosine similarity between Tr and Rr :
Tr · Rr
similarityr = cos(θr ) = =
Tr Rr
n
Tri × Rri
n n (5)
2 × 2
i=1 i=1 (Tri ) i=1 (Rri )
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 23 / 31
24. Results
Index
1 Motivation
2 User Behavior Measures
3 Experiments
4 Results
5 Conclusions & Outlook
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 24 / 31
25. Results
Results
GoodReads LibraryThing
TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)
Classification
Descriptiveness
1 TPP measure: Categorizers outperform Describers for classification.
2 All the measures (though especially TRR): Describers further
resemble descriptive data.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 25 / 31
26. Results
Results
GoodReads LibraryThing
TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)
Classification
Descriptiveness
3 Verbosity helps find extreme Categorizers.
Users who think of a specific shelf to place the book tend to assign a
tag identifying the shelf.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 26 / 31
27. Results
Results
GoodReads LibraryThing
TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)
Classification
Descriptiveness
4 Diversity does not work to find Categorizers on GoodReads.
GoodReads suggests previously used tags to the user, so that it affects
diversity of tags.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 27 / 31
28. Results
Results
GoodReads LibraryThing
TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.)
Classification
Descriptiveness
5 Users providing non-descriptive tags (i.e., different from Describers)
produce more accurate classification.
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 28 / 31
29. Conclusions & Outlook
Index
1 Motivation
2 User Behavior Measures
3 Experiments
4 Results
5 Conclusions & Outlook
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 29 / 31
30. Conclusions & Outlook
Conclusions & Outlook
Social classification of books with tagging data, discriminating
extreme Categorizers and Describers.
It complements previous research by showing that users so-called
Categorizers produce more accurate classification.
Non-verbose, non-descriptive, shelf-driven tagging produces more
accurate classification of books.
Outlook: Further analyzing tagging behavior to find: generalists
(users who provide general tags), and specialists (users who provide
more specific tags rather focused on the subject).
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 30 / 31
31. Conclusions & Outlook
Thank You
Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto
Gracias Gr`cies
a Gratia Grazie Guishepeli
Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila
o o o e
esker Obrigado Shukran Tack Tak Takk Shukriya
T¨nan Tapadh leat Tesekk¨r ederim Thank
a u
you Toda
E-mail: azubiaga@lsi.uned.es
@arkaitz
Zubiaga, K¨rner, Strohmaier ()
o Tags vs Shelves June 8th, 2011 31 / 31