Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Bibliometric-enhanced Retrieval
Models for Big Scholarly Information
Systems
philipp.mayr@gesis.org
Workshop on Scholarly Big Data: Challenges and
Ideas. IEEE BigData 2013

Intro
• What are Big Scholarly Information
Systems?

Intro
• What are bibliometric-enhanced IR
models?
– set of methods to quantitatively analyze
scientific and technological literature
– E.g. citation analysis (h-index)
– CiteSeer was a pioneer bibliometric-enhanced
IR system

Background
• DFG-funded (2009-2013): Projects IRM I and IRM II
– IRM = Information Retrieval Mehrwertdienste (value-added IR services)
• Goal: Implementation and evaluation of value-added IR services for
digital library systems
• Main idea: Applying scholarly (science) models for IR
 Co-occurrence analysis of controlled vocabularies (thesauri)
 Bibliometric analysis of core journals (Bradford’s law)
 Centrality in author networks (betweenness)
• In IRM I we concentrated on the basic evaluation
• In IRM II we concentrate on the implementation of reusable (web)
services
4
http://www.gesis.org/en/research/external-funding-projects/archive/irm/

Search Term Recommender (Petras 2006)
Search Term Service: recommending strongly
associated terms from controlled vocabulary

Bradfordizing (White 1981, Mayr 2009)
Bradford Law of Scattering (Bradford 1948): idealized example for 450 articles
Nucleus/Core:
150 papers in
3 Journals
Zone 2:
150 papers in
9 Journals
Zone 3:
150 papers in
27 Journals
Ranking by Bradfordizing: sorting the core journal papers / core books on top
bradfordized list of journals in informetrics applied to monographs: publisher as sorting criterion

Author Centrality (Mutschke 2001, 2004)
Ranking by Author Centrality: sorting central author papers on top

Scenarios for combined ranking services
iterative use : simultanous use:
Result Set
Core Journal Papers
Central Author Papers
Relevant
Papers
Result Set
Central Author Papers
Core Journal Papers

Prototye
http://multiweb.gesis.org/irsa/IRMPrototype

Main Research Issue:
Contribution to retrieval quality and usability
• Precision:
– Do central authors (core journals) provide more relevant hits?
– Do highly associated cowords have any positive effects?
• Value-adding effects:
– Do central authors (core journals) provide OTHER relevant hits?
– Do coword-relationships provide OTHER relevant search terms?
• Mashup effects:
– Do combinations of the services enhance the effects?

Evaluation Design
• precision in existing evaluation data:
– Clef 2003-2007: 125 topics; 65,297 SOLIS documents
– KoMoHe 2007: 39 topics; 31,155 SOLIS documents
• plausibility tests:
– author centrality / journal coreness ↔ precision
– Bradfordizing ↔ author centrality
• precision tests with users (Online-Assessment-Tool)
• usability tests with users (acceptance)

Evaluation of Bradfordizing on CLEF Data (Mayr 2013)
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
Bradford zones (core, z2, z3)
2003 articles 0,29 0,22 0,16
2004 articles 0,23 0,18 0,13
2005 articles 0,31 0,24 0,17
2006 articles 0,29 0,27 0,24
2007 articles 0,28 0,26 0,22
2005 monographs 0,21 0,16 0,19
2006 monographs 0,28 0,28 0,24
2007 monographs 0,24 0,21 0,23
core z2 z3
journal articles:
significant improvement
of precision from zone3
to core
monographs:
slight improvement of
precision distribution
between the three
zones
precision between Bradford zones (core, zone2 and zone3)

Evaluation of Author Centrality on CLEF Data
• moderate positive relationship between
rate of networking and precision
• precision of TF-IDF rankings (0.60)
significantly higher than author centrality
based rankings (0.31) – BUT:
• very little overlap of documents on top of
the ranking lists: 90% of relevant hits
provided by author centrality did not appear
on top of TF-IDF rankings
→ added precision of 28%
0
20
40
60
80
100
120
140
0 0,2 0,4 0,6 0,8 1 1,2
GiantSize
Precision
Correlation Precision10 -
Giant Size: 0.25
• author centrality seems to favor OTHER
relevant documents than traditional rankings
• value-adding effect:
other view to the information space
avg number docs 517
avg number authors 664
avg number co-authors 302
avg giant size 24

Result: overlap
Intersection of
suggested top n=10
documents over all
topics and services
Mutschke et al. 2011
top 10 result lists
are marginal
overlapping!

Output
19
Returning suggestions for any query term

Integration
20
www.sowiport.de is
using query suggestions
from IRSA

IRM & Modeling Science
measuring contribution
of bibliometric-enhanced services
to retrieval quality
deeper insights in
structure & functioning
of science
Bibliometric-enhanced
services
(structural attributes of
science system)
way towards a formal
model of science

References
• Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-
added services for scholarly information systems. Scientometrics, 89(1), 349–
364. doi:10.1007/s11192-011-0430-x
• Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term
recommendation systems. In Proceedings of the 36th international ACM SIGIR
conference on Research and development in information retrieval - SIGIR ’13
(pp. 1093–1094). New York, New York, USA: ACM Press.
doi:10.1145/2484028.2484207
• Mayr, P. (2013). Relevance distributions across Bradford Zones: Can
Bradfordizing improve search? In J. Gorraiz, E. Schiebel, C. Gumpenberger, M.
Hörlesberger, & H. Moed (Eds.), 14th International Society of Scientometrics
and Informetrics Conference (pp. 1493–1505). Vienna, Austria. Retrieved from
http://arxiv.org/abs/1305.0357
• Hienert, D., Schaer, P., Schaible, J., & Mayr, P. (2011). A Novel Combined Term
Suggestion Service for Domain-Specific Digital Libraries. In S. Gradmann, F.
Borri, C. Meghini, & H. Schuldt (Eds.), International Conference on Theory and
Practice of Digital Libraries (TPDL) (pp. 192–203). Berlin: Springer.
doi:10.1007/978-3-642-24469-8_21 22

Using IRSA
23



•
•

Thank you!
Dr Philipp Mayr
GESIS Leibniz Institute for the Social Sciences
Unter Sachsenhausen 6-8
50667 Cologne
Germany
philipp.mayr@gesis.org
24

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Semelhante a Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems (20)

Mais de GESIS

Mais de GESIS (16)

Último

Último (20)

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems