Presentation at the 15th International Society of Scientometrics and Informetrics Conference, 29 June to 4 July, Istanbul, Turkey (http://www.issi2015.org).
Contextualization of topics - browsing through terms, authors, journals and cluster allocations
1. Introduction Method Results Summary
Contextualization of Topics
browsing through terms, authors, journals and cluster allocations
Rob Koopman1 Shenghui Wang1 Andrea Scharnhorst2
1OCLC Research 2DANS-KNAW
ISSI 2015
2. Introduction Method Results Summary
Introduction
What are essence and boundary of a scientific field?
Different ways to find clusters in scientific literature based on
connectivity in terms of authorship, citations, language
similarity, etc.
Ambiguous nature in science
3. Introduction Method Results Summary
Ariadne: interactive context explorer
Ariadne is an interactive interface which allows users to
explore the context of entities such as authors, journals,
topical terms, etc.
It builds on semantic indexing statistically computed from a
large scale bibliographic corpus
It was originally implemented to explore 1M topical terms, 3M
authors, 35K journals and 700+ Dewey decimal classes
associated with 65M articles.
4. Introduction Method Results Summary
Research questions
Q1: How does the Ariadne algorithm work on a much smaller,
field specific dataset?
Q2: Can we use Ariadne to label the clusters produce by the
different methods?
Q3: Can we use Ariadne to compare different clustering
solutions?
5. Introduction Method Results Summary
LittleAriadne
LittleAriadne: context explorer over Astrophysics data
Offline: generates a semantic representation for each entity
Online: finds the most related entities and using
multidimensional scaling to display
6. Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimated
from the observed luminosity of the hot spot, is log M-tr
= 16.8 +/- 0.3. This is safely below the critical mass
transfer rates of log M-crit = 18.1 (corresponding to log
T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to
the “revised” value of log T-crit(0) = 3.65). The mass
transfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmic
variables] [subject:disc instability model] [subject:dwarf novae]
[subject:novae, cataclysmic variables] [subject:outbursts]
[subject:parameters] [subject:stars] [subject:stars dwarf novae]
[subject:stars individual ss cyg] [subject:state] [subject:
superoutbursts]
7. Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimated
from the observed luminosity of the hot spot, is log M-tr
= 16.8 +/- 0.3. This is safely below the critical mass
transfer rates of log M-crit = 18.1 (corresponding to log
T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to
the “revised” value of log T-crit(0) = 3.65). The mass
transfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmic
variables] [subject:disc instability model] [subject:dwarf novae]
[subject:novae, cataclysmic variables] [subject:outbursts]
[subject:parameters] [subject:stars] [subject:stars dwarf novae]
[subject:stars individual ss cyg] [subject:state] [subject:
superoutbursts]
Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15]
[cluster:d 51] [cluster:e 17] [cluster:f 1]
8. Introduction Method Results Summary
LittleAriadne
Six different clustering solutions
x Source y=#Cluster #Cluster in LittleAriadne
a cwts 1.8 23 23
b UMSI 23 23
c oclc 20 20 20
d hu 139 48
e sts 5664 229
f ECOOM 15 15
9. Introduction Method Results Summary
LittleAriadne
Entities in the Astrophysis dataset
There are in total 90,343 entities associated with 111,616
astrophysics articles
59 journals
27,027 author names (no disambiguation applied)
39,577 topical terms
23,322 subjects (extracted from ”Author Keywords” and
”Keywords Plus”)
358 cluster labels (source + cluster id)
10. Introduction Method Results Summary
LittleAriadne
Build semantic representation
Basic assumptions
Entities can be represented by its context
Entities which share more context are more likely to be related
Context is the textual environment where an entity occurs
11. Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimated
from the observed luminosity of the hot spot, is log M-tr
= 16.8 +/- 0.3. This is safely below the critical mass
transfer rates of log M-crit = 18.1 (corresponding to log
T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to
the “revised” value of log T-crit(0) = 3.65). The mass
transfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmic
variables] [subject:disc instability model] [subject:dwarf novae]
[subject:novae, cataclysmic variables] [subject:outbursts]
[subject:parameters] [subject:stars] [subject:stars dwarf novae]
[subject:stars individual ss cyg] [subject:state] [subject:
superoutbursts]
Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15]
[cluster:d 51] [cluster:e 17] [cluster:f 1]
12. Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
mass
transfer
rate
[subject:outburst]
[subject:sstars]
[subject:parameters]
[author:smak j]
[cluster: a19]
[issn:0001-5237]
13. Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
mass
transfer
rate
[subject:outburst]
[subject:sstars]
[subject:parameters]
[author:smak j]
[cluster: a19]
[issn:0001-5237]
14. Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has its semantic representation
Cosine similarity between entities can be computed very fast,
based on which the 2D visualisation is implemented
15. Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has its semantic representation
Cosine similarity between entities can be computed very fast,
based on which the 2D visualisation is implemented
For each article, we collected the semantic representation of
all the entities in which it involves, and take an average as its
semantic representation
We applied a standard K-means clustering method to cluster
these articles based on their semantic representations
16. Introduction Method Results Summary
Experiment 1: Exploring context
Experiment 1: Exploring context
Now we can explore
Let’s start with stars
An overview of all journals
24. Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
An overview of all clustering solutions
25. Introduction Method Results Summary
Summary
Summary
We present a method and an interface that allows visual
exploration through the contexts of entities
We can provide the most related topical terms to clusters
although expert knowledge is needed to transform them into
real labels/topics
LittleAriadne provides a visual way of comparing different
clustering solutions
Our na¨ıve way of clustering is worth exploring further
26. Introduction Method Results Summary
Future extensions
Future extensions
Add more types of entities, such as citations, publishers,
conferences, etc, to provide richer context
Add direct links to articles to answer information retrieval
needs
Study context sensitivity
compare ”young” and ”young”
27. Introduction Method Results Summary
Thank you
Thank you
http://thoth.pica.nl/astro/relate
Rob Koopman (rob.koopman@oclc.org)
Shenghui Wang (shenghui.wang@oclc.org)
Andrea Scharnhorst (andrea.scharnhorst@dans.knaw.nl)