New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Pragmatic Evaluation of Concept Hierarchies
1. Graz University of Technology
Pragmatic Evaluation of Concept
Hierarchies
Christoph Trattner, Philipp Singer
Denis Helic, Markus Strohmaier
Graz University of Technology, Austria
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
1
2. Graz University of Technology
Part 1 What is this talk about
We will introduce a framework to evaluate concept
hierarchies that do not rely on a Golden-Standard
Framework determines the pragmatic usefulness of
concept hierarchies utilizing Kleinberg‟s idea of
hierarchical decentralized search
Part 2
We will show evidence that the framework does not
only work in theory but also in practice
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
2
3. Graz University of Technology
What was the motivation of our research?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
3
4. Graz University of Technology
Directories: Categorization by Experts
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
4
5. Graz University of Technology
Research question
Can a crowd of users contribute to the
creation of such categorizations?
How can we generate such hierarchical
structures automatically?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
5
6. Graz University of Technology
Annotation by Users: Tagging
Folksonomy
Tuple (U, R, T, Y)
User (U)
Resource (R)
Tag (T)
Relation (Y)
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
6
7. Graz University of Technology
Folksonomies
Emerge from the process of collaborative tagging
Latent hierarchical structures
Turn flat structure into hierarchy taxonomy
induction algorithms
Generality-based algorithms (centrality in tag-to-tag networks)
Other algorithms possible: k-means, affinity propagation, ...
E.g., [Heyman and Garcia-Molina 2006] or [Benz et al. 2010]
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
7
8. Graz University of Technology
Problem: How can we evaluate the
usefulness of these hierarchies?
Idea: Golden standard based methods
Problem: Lack of golden standard [Strohmaier et al. 2012]
little taxonomic overlap => results are not trustworthy
M. Strohmaier, D. Helic, D. Benz, C. Körner and R.
Very small overlap !!! Kern, Evaluation of Folksonomy Induction Algorithms, In the
ACM Transactions on Intelligent Systems and Technology
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
8
9. Graz University of Technology
Question?
Can we somehow find another evaluation method?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
9
10. Graz University of Technology
Stanley Milgram
A social psychologist
Yale and Harvard University
Study on the Small World Problem,
beyond well defined communities
and relations 1933-1984
(such as actors, scientists, …)
„An Experimental Study of the Small World Problem”
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
10
11. Graz University of Technology
The simplest way of formulating the small-world problem is:
Starting with any two people in the world, what is the
likelihood that they will know each other?
A somewhat more sophisticated formulation, however, takes
account of the fact that while person X and Z may not know
each other directly, they may share a mutual acquaintance -
that is, a person who knows both of them. One can then think of
an acquaintance chain with X knowing Y and Y knowing Z.
Moreover, one can imagine circumstances in which X is linked
to Z not by a single link, but by a series of links, X-A-B-C-D…Y-
Z. That is to say, person X knows person A who in turn knows
person B, who knows C… who knows Y, who knows Z.
[Milgram 1967, according to
]http://www.ils.unc.edu/dpr/port/socialnetworking/theory_paper.html#2]
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
11
12. Graz University of Technology
An Experimental Study of the Small World
Problem [Travers and Milgram 1969]
A Social Network Experiment tailored towards
Demonstrating
Defining
And measuring
Inter-connectedness in a large society (USA)
A test of the modern idea of “six degrees of
separation”
Which states that: every person on earth is
connected to any other person through a chain of
acquaintances not longer than 6
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
12
13. Graz University of Technology
Set Up Target
Boston
Target person: stockbroker
A Boston stockbroker
Three starting populations
Nebraska Boston
100 “Nebraska stockholders”random random
96 “Nebraska random”
Nebraska
100 “Boston random”
stockholders
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
13
14. Graz University of Technology
Results
How many of the starters would be able to establish
contact with the target?
64 out of 296 reached the target
How many intermediaries would be required to link
starters with the target?
Well, that depends: the overall mean 5.2 links
Through hometown: 6.1 links
Through business: 4.6 links
Boston group faster than Nebraska groups
Nebraska stockholders not faster than Nebraska random
What form would the distribution of chain lengths
take?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
14
15. Graz University of Technology
Decentralized Search
Search in (social) networks people have only local
knowledge of the network
People have background knowledge of the network, e.g.
geography
Background knowledge defines the notion of distance
between nodes
People are greedy: at each step people select a node that
has the smallest distance to the target
Kleinberg explained the process of navigating a network and
finding others with only local knowledge
Decentralized search with hierarchical background
knowledge [Kleinberg 2000]
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
15
16. Graz University of Technology
Hierarchical decentralized searcher
Information
Network
Hierarchy
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
16
17. Graz University of Technology
Idea!
Use Kleinberg„s model of decentralized search in social
networks and apply it to information networks.
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
17
18. Graz University of Technology
Framework
Hence, we implemented a framework that takes as input a given
hierarchy & network and determines the usefulness of this
hierarchy for navigating the network [Helic et al. 2011].
Hierarchy
Useful?
Yes/No
Framework
Hierarchical
Decentralized
D. Helic, M. Strohmaier, C. Trattner, M. Muhr, K.
Searcher Lerman, Pragmatic Evaluation of Folksonomies, 20th
Network International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28 - April 1, ACM, 2011.
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
18
19. Graz University of Technology
Question?
To what extent are current tag hierarchy induction
algorithms useful for navigation?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
19
20. Graz University of Technology
Evaluating Tag Hierarchy Induction
Algorithms
In [Helic et al. 2011 we used this kind of framework to
evaluate 5 different hierarchy induction algorithms on
5 different datasets (25 combinations)
BibSonomy
Delicious
CiteUlike
Flickr
LastFM
Simulations were based on a random sample of
100.000 search pairs
Measuring the success rate and stretch for evaluation
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
20
21. Graz University of Technology
Evaluating Tag Hierarchy Induction
Algorithms
BibSonomy CiteULike Delicious
Results:
Centrality-based hierarchy induction
algorithms outperform complicated
methods such as K-Means or Affinity
Flickr Propagation
LastFM
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
21
22. Graz University of Technology
Question
What are the differences and similarities of hierarchies
based on different types of annotations?
To what extent are hierarchies based on tags more useful for navigation
than hierarchies based on keywords?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
22
23. Graz University of Technology
Tags
We
Keywords
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
23
24. Graz University of Technology
Results
Results:
Tag-based Hierarchies are more
useful for navigation than keyword-
based hierarchies
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
24
25. Graz University of Technology
Question???
To what extent is it justified to model human navigation
in information networks with hierarchical
decentralized search?
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
25
26. Graz University of Technology
Idea?
Compare Simulations with real world data!
Exploring the Differences and Similarities between Hierarchical Decentralized
Search and Human Navigation in Information Networks
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
26
27. Graz University of Technology
Evaluation
We compared simulations with
human click trails of the online Game –
The Wiki Game (http://thewikigame.com/)
Contains 1,500,000
click trails of more
than 500,000 users with
(start; target) information.
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
27
28. Graz University of Technology
Hierachy Creation
Two types of hierarchies were evaluated
1.) First type is based on our previous work
Categorial Concepts:
Wikipedia Category Label Dataset:
Tags from Delicious 2,300,000 category labels,
Category labels from Wikipedia 4,500,000 articles, 30,000,000 category
label assignments
Delicious Tag Dataset:
440,000 tags, 580,000 articles and
3,400,000 tag assignments
Similarity Graph
Latent Hierarchical Taxonomy
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
28
29. Graz University of Technology
Hierarchy Creation
2.) Second type is based on the work of [Muchnik et al. 2007]
Simple idea: Algorithm iterates through all
links in the network and decides if that link is
of a hierarchical type, in which case it
remains in the network otherwise it is
removed.
Directed link-network dataset of the
English-Wikipedia from February
2012.
All in all, the dataset includes
around 10,000,000 articles and
around 250,000,000 links
Muchnik, L., Itzhack, R., Solomon S. and Louzoun Y.: Self-emergence of knowledge trees: Extraction
of the Wikipedia hierarchies, PHYSICAL REVIEW E 76, 016106 (2007)
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
29
30. Graz University of Technology
Evaluation Metrics
Success Rate: Percentage of target nodes found
Number of Hops: Number of hops needed to reach the target
node
Stretch: Fraction of number of the number of steps and global
shortest path
Path Similarity: intersection(h_clicks,s_clicks)/s_clicks
Degree: median in- and out-degree values of the nodes visited
by the simulator and the human navigator
Transition Similarity
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
30
31. Graz University of Technology
What are the results??
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
31
32. Graz University of Technology
Results: Hops, Stretch, Success Rate
Success Rate: 100% Success Rate: 31.6%
Stretch: 2.5 Stretch: 1.7
Humans Searcher with Wikipedia Category
Hierarchy
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
32
33. Graz University of Technology
Results: Hops, Stretch, Success Rate
Success Rate: 100% Success Rate: 69%
Stretch: 2.5 Stretch: 8.8
Humans Searcher with Wikipedia Delicious
Hierarchy
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
33
34. Graz University of Technology
Results: Hops, Stretch, Success Rate
Success Rate: 100% Success Rate: 93%
Stretch: 2.5 Stretch: 1.5
Humans Searcher with Wikipedia Network
Hierarchy
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
34
35. Graz University of Technology
Results: Path Similarity
Question: How similar are the paths taken by our searcher compared
to the humans
Humans vs. Humans Humans vs. Simulators
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
35
36. Graz University of Technology
Results: Degree
In- Degree Out- Degree
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
36
37. Graz University of Technology
Results: Transition Similarity
Humans Searcher
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
37
38. Graz University of Technology
Conclusions
We have shown that our approach of hierarchical
decentralized search models human navigation in
information networks fairly well
Furthermore, we have shown that hierarchies created
directly from the link network are better suited for
navigation than hierarchies that are created from
external knowledge
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
38
39. Graz University of Technology
What we plan for the Future?
Enhance the framework to consider not only
navigation but also search (= search box)
Evaluation of alternative navigational structures
and many more things
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
39
40. Graz University of Technology
Take home message
Network hierarchies are better suited for
Thank you!
navigation than hierarchies created from
external knowledge
Christoph Trattner Philipp Singer Denis Helic Markus Strohmaier
ctrattner@iicm.edu philipp.singer@tugraz.at dhelic@tugraz.at markus.strohmaier@tugraz.at
www.christophtrattner.info www.philippsinger.info http://coronet.iicm.edu/ www.markusstrohmaier.info
denis/homepage/
@ctrattner @ph_singer @dhelic @mstrohm
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
40