SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Automatic creation of mappings between
classification systems for bibliographic data
Prof. Magnus Pfeffer
Stuttgart Media University
pfeffer@hdm-stuttgart.de
Agenda


Motivation



Instance-based matching



Application to bibliographic data



Evaluation



Ongoing projects



RDF Representation

November 26th, 2013

Semantic Web in Libraries, Hamburg

2
Motivation

November 26th, 2013

Semantic Web in Libraries, Hamburg

3
Current situation in Germany


Five regional library unions


Subject headings




Classification systems







Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file
RVK (Regensburg Union Classification)
BK (Basic Classification)
DDC (Dewey Decimal Classification)
Various local classification systems

Low proportion of indexed titles (25-30%)

November 26th, 2013

Semantic Web in Libraries, Hamburg

4
Current situation in Germany


National library


Subject headings




Classification systems






Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file
DDC (Dewey Decimal Classification)
Coarse categories

DDC only for titles published since 2007
Only „Reihe A“ (print trade publications) is fully indexed
with RSWK

November 26th, 2013

Semantic Web in Libraries, Hamburg

5
Austrian National Library


Subject headings




Predominantly RSWK („Regeln für den
Schlagwortkatalog“ - „Rules for the subject catalogue“)
using a shared authority file

Classification systems


BK since 2007



RVK in the Austrian librariy union catalogue

November 26th, 2013

Semantic Web in Libraries, Hamburg

6
Goals


Re-use existing indexing information


National level






BK is used mainly in northern Germany / Austria
RVK mainly in southern Germany
DDC mainly by the National Library

International level



Make RVK data more accessible to DDC users
Use DDC indexing information available from e.g. the Library
of Congress

November 26th, 2013

Semantic Web in Libraries, Hamburg

7
Ideas


Use of appropriate classification systems


Facetted search in resource discovery systems





Browsing of similar titles




Should be monohierarchical
Should have limited number of classes
→ DDC (first digits) or BK

Should be fine-grained
→ DDC (full) or RVK

(Multi-lingual retrieval)

November 26th, 2013

Semantic Web in Libraries, Hamburg

8
Ideas


Enable the use of existing tools and visualisations

Denton (2012)
November 26th, 2013

Legrady (2005)
Semantic Web in Libraries, Hamburg

9
Instance-based Matching

November 26th, 2013

Semantic Web in Libraries, Hamburg

10
Ontology matching


Well-studied problem in computer science



Several approaches


Based on the descriptors



Based on the structure



Based on the manifestations (instances)

November 26th, 2013

Semantic Web in Libraries, Hamburg

11
Instances


Entries in catalogues with multiple classifications

November 26th, 2013

Semantic Web in Libraries, Hamburg

12
Instance-based matching


Assumptions





Classes with semantic overlap co-occur in instances
The more often these classes co-occur, the stronger
the overlap

Preparation


Extraction of all pairs of classifications from the data



Count of the extracted pairs

November 26th, 2013

Semantic Web in Libraries, Hamburg

13
Example

November 26th, 2013

Semantic Web in Libraries, Hamburg

14
Example


Entry 1



Pairs





179.9 / CC 7200



RVK: CC 7200



179.9 / CC 7250





DDC: 179.9
RVK: CC 7250



179.9 / CC 7200

Entry 2


DDC: 179.9



RVK: CC 7200

November 26th, 2013

Semantic Web in Libraries, Hamburg

15
Normalisation


Comparing solely absolute numbers is bad





Some classes are more often used than others
Number of pairs correlates with the number of entries
that are classified using a given class

Instead:
Use proportion of co-occurrence ↔ occurrence
∣E c1∩ E c2∣
∣E c1∪ E c2∣
number of entries with both classifications
divided by
number of entries with either classification
(Jaccard measure for overlap of sets)
November 26th, 2013

Semantic Web in Libraries, Hamburg

16
Further interpretation


a and b are two classes from two classification
systems A and B


The classes a and b only occur together
→ exact match



a only co-occurs with b, but b co-occurs with other
classes from A
→ a is narrower concept than b



a co-occurs with several classes from B (including b)
→ a is wider concept than b



a and b do not co-occur
→ cannot infer that a and b are unrelated

November 26th, 2013

Semantic Web in Libraries, Hamburg

17
Prior work


Pfeffer (2009)


Analysis of classification system structure and actual
use







Locating classes that describe the same concept
Finding ways to improve existing mappings to RVK
Focus on RVK, using data from library union catalogues
Co-occurrence analysis

Results






High co-occurrence and close in the hierarchy:
→ classes are hard to assign properly
High co-occurrence and far in the hierarchy:
→ classes describe identical concepts
Mappings from RSWK to RVK could be augmented

November 26th, 2013

Semantic Web in Libraries, Hamburg

18
Related work


Isaac et.al. (2007)


Applied instance based matching to bibliographic data



Data from the National Library of the Netherlands



Mapping from a thesaurus to a classification system



Results



Generated mappings are quite good
More sophisticated measures than Jaccard do not lead to
better mappings

November 26th, 2013

Semantic Web in Libraries, Hamburg

19
Application to bibliographic data

November 26th, 2013

Semantic Web in Libraries, Hamburg

20
Bibliographic data is different


Multiple editions



Multiple document types

November 26th, 2013

Semantic Web in Libraries, Hamburg

21
Bibliographic data


Skewed data





Multiple editions → More pairs
Some co-occurrences could appear stronger than others

Solution: Pre-clustering individual titles on the „work“
level


Increases chance for instances with more than one
classifications



Each cluster contributes only once



Allows using absolute co-occurrence numbers



Cut-off for small numbers
Ranking of competing matches

November 26th, 2013

Semantic Web in Libraries, Hamburg

22
Prior work


Pfeffer (2013)


Matching bibliographic records


Based on author, title and uniform title




(as well as information on title changes)

Matches any edition and revision of a work


Including translations



Merge match sets → Discrete clusters



Consolidating indexing information




For indexing purposes, the differences between editions and
revisions are irrelevant
Subject headings and classifications are shared between all
members of a cluster

November 26th, 2013

Semantic Web in Libraries, Hamburg

23
Evaluation

November 26th, 2013

Semantic Web in Libraries, Hamburg

24
Comparison with existing mappings


Existing (partial) mappings can be used as a basis for
evaluation
→ „Gold standard“



Comparison of automatic and manual mapping





Recall: Are all the mappings found?
Precision: Are all found mappings correct?

Analysis of additional links


Maybe the gold standard can be improved?

November 26th, 2013

Semantic Web in Libraries, Hamburg

25
Ongoing projects

November 26th, 2013

Semantic Web in Libraries, Hamburg

26
Data


Bibliographic data



German National Library catalogue



Austrian National Library catalogue





German library union catalogues

British national bibliography

Gold standards


Partial mappings BK ↔ RVK

November 26th, 2013

Semantic Web in Libraries, Hamburg

27
Interesting Mappings


RVK → BK



BK well suited for faceted retrieval





Gold standard exists
RVK has largest proportion of classified titles

RVK ↔ DDC




Enable data sharing between the German National Library
and the RVK-using libraries

Not limited to classification systems


See Pfeffer (2009) and Wang et.al. (2009)

November 26th, 2013

Semantic Web in Libraries, Hamburg

28
Implementation: Tasks


Import and mapping of MAB2 and MARC data



Clustering



Matching and clustering





Generation of keys for the match process
Consolidation of indexing and classification information

Statistics





Co-occurrence counts
Jaccard measure

Output


Full mappings

November 26th, 2013

Semantic Web in Libraries, Hamburg

29
Implementation: State


All steps implemented as a prototype





Perl scripts
File-based data and indexes

Current development


Still Perl scripts (but better documented)



All data is accumulated in a document store




MongoDB

Further plan: Porting to MetaFacture framework
November 26th, 2013

Semantic Web in Libraries, Hamburg

30
RDF representation

November 26th, 2013

Semantic Web in Libraries, Hamburg

31
Classification systems as Linked Data


DDC has been published as Linked Data



RVK has not been published as Linked Data





There is no versioning and no stable identifiers
A project to fix this and to publish RVK as Linked Data has
been cancelled by the university library of Regensburg

BK has not been published as Linked Data


There is authority data in the GVK union catalogue

→ One would have to create temporary URIs for the
RVK and BK classes

November 26th, 2013

Semantic Web in Libraries, Hamburg

32
Direct Mappings


SKOS offers


skos:mappingRelation



skos:closeMatch



skos:exactMatch



skos:broadMatch



skos:narrowMatch



skos:relatedMatch

November 26th, 2013

Semantic Web in Libraries, Hamburg

33
Further Mappings


1:n-Relationships





List of classes that are all narrow matches
Or: A combination of classes is a (near) exact match

Qualified mappings


Express the confidence of the proposed match



Allow applications to optimize for precision or recall

November 26th, 2013

Semantic Web in Libraries, Hamburg

34
Qualification through indirection


RDF relations cannot be qualified
http://dewey.info/class/641/

November 26th, 2013

skos:exactMatch

http://ex.org/rvk/BF_8150

Semantic Web in Libraries, Hamburg

35
Qualification through indirection


So an intermediate node is used
http://dewey.info/class/641/

skos:exactMatch

ex:qualifiedMatch

http://ex.org/rvk/BF_8150

ex:targetConcept
_1

ex:confidence

Further information

“1.0”

November 26th, 2013

Semantic Web in Libraries, Hamburg

36
Summary


Mappings between classification systems are an
important means for interoperability and sharing of
classification information and tools



Simple statistics on the existing data in catalogues can
generate candidates for matches between individual
classes



Where manually created mappings exist, they can be
used to evaluate the algorithmic results



Mappings can be expressed in SKOS terms




To qualify the mappings further, intermediate nodes
need to be introduced
There is no standard for this yet

November 26th, 2013

Semantic Web in Libraries, Hamburg

37
Thank you for listening.

Slides available online
http://www.slideshare.net/MagnusPfeffer/
This work is licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.

November 26th, 2013

Semantic Web in Libraries, Hamburg

38
References


Denton, W. (2012). On dentographs, a new method of visualizing library collections.
In: Code4Lib.



Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007).
An empirical study of instance-based ontology matching. In: The Semantic Web
(pp. 253-266). Springer Berlin Heidelberg.



Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow
Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.



Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden.
Presentation given at the Librarian Workshop of the 33rd Annual Conference of the
German Classification Society on Data Analysis, Machine Learning, and
Applications (GfKl).



Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with
indexing information. In: Data Analysis, Machine Learning and Knowledge
Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für
Klassifikation e. V.. Springer Berlin Heidelberg.



Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009).
Matching multi-lingual subject vocabularies. In: Research and Advanced
Technology for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg.
November 26th, 2013

Semantic Web in Libraries, Hamburg

39

Mais conteúdo relacionado

Mais procurados

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xmlFelix Sasaki
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Ellen Van Keer
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3Csssw2012
 
Data 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFData 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFNiels Ockeloen
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBnw13
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphOpenAIRE
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platformJindřich Mynarz
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataFelix Ostrowski
 

Mais procurados (20)

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
 
Data 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDFData 2 Documents: Modular and Distributive Content Management in RDF
Data 2 Documents: Modular and Distributive Content Management in RDF
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNB
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 

Destaque

Jetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtJetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtMagnus Pfeffer
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMagnus Pfeffer
 
Bibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedBibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedMagnus Pfeffer
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open DataMagnus Pfeffer
 
Cloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenCloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenMagnus Pfeffer
 
Automatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenAutomatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenMagnus Pfeffer
 
Clustering auf Werksebene
Clustering auf WerksebeneClustering auf Werksebene
Clustering auf WerksebeneMagnus Pfeffer
 
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Magnus Pfeffer
 
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...Magnus Pfeffer
 
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Magnus Pfeffer
 
Linked Data in der Lehre
Linked Data in der LehreLinked Data in der Lehre
Linked Data in der LehreMagnus Pfeffer
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open DataMagnus Pfeffer
 
Resource Discovery: Herausforderung und Chance für die Sacherschließung
Resource Discovery:  Herausforderung und Chance für die SacherschließungResource Discovery:  Herausforderung und Chance für die Sacherschließung
Resource Discovery: Herausforderung und Chance für die SacherschließungMagnus Pfeffer
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenMagnus Pfeffer
 
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...Magnus Pfeffer
 
Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Magnus Pfeffer
 
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementOpen Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementMagnus Pfeffer
 

Destaque (17)

Jetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehörtJetzt kommt zusammen, was zusammen gehört
Jetzt kommt zusammen, was zusammen gehört
 
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata ProvenanceMetadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
 
Bibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data ExtendedBibliotheken und Linked Open Data Extended
Bibliotheken und Linked Open Data Extended
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Cloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von MetadatenCloud Computing für die Verarbeitung von Metadaten
Cloud Computing für die Verarbeitung von Metadaten
 
Automatisches Generieren von Konkordanzen
Automatisches Generieren von KonkordanzenAutomatisches Generieren von Konkordanzen
Automatisches Generieren von Konkordanzen
 
Clustering auf Werksebene
Clustering auf WerksebeneClustering auf Werksebene
Clustering auf Werksebene
 
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
Altbestandserschließung: Automatische Übernahme von RVK und SWD über Verbundg...
 
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen  übe...
Abgleich von Titeldaten zur Übernahme von Sacherschließungsinformationen übe...
 
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
Fallbasierte automatische Klassifikation nach der RVK - k-nearest neighbour a...
 
Linked Data in der Lehre
Linked Data in der LehreLinked Data in der Lehre
Linked Data in der Lehre
 
Bibliotheken und Linked Open Data
Bibliotheken und Linked Open DataBibliotheken und Linked Open Data
Bibliotheken und Linked Open Data
 
Resource Discovery: Herausforderung und Chance für die Sacherschließung
Resource Discovery:  Herausforderung und Chance für die SacherschließungResource Discovery:  Herausforderung und Chance für die Sacherschließung
Resource Discovery: Herausforderung und Chance für die Sacherschließung
 
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzenAusleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
Ausleihdaten aus Bibliotheken als Linked Open Data publizieren und nutzen
 
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
RVK 3.0 - Die Regensburger Verbundklassifikation als Normdatei für Bibliothek...
 
Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?Resource Discovery - Sacherschließung am Ende?
Resource Discovery - Sacherschließung am Ende?
 
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von MetadatenmanagementOpen Source Software zur Verarbeitung und Analyse von Metadatenmanagement
Open Source Software zur Verarbeitung und Analyse von Metadatenmanagement
 

Semelhante a Automatic creation of mappings between classification systems for bibliographic data

Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach @CULT Srl
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
gayathrinosql.pptx
gayathrinosql.pptxgayathrinosql.pptx
gayathrinosql.pptxGayathriP95
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddataStefan Gradmann
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Antoine Isaac
 
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Carlos Badenes-Olmedo
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltlux20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltluxDongGyun Hong
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 

Semelhante a Automatic creation of mappings between classification systems for bibliographic data (20)

Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
gayathrinosql.pptx
gayathrinosql.pptxgayathrinosql.pptx
gayathrinosql.pptx
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...Scalable Cross-lingual Document Similarity through Language-specific Concept ...
Scalable Cross-lingual Document Similarity through Language-specific Concept ...
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltlux20181106 survey on challenges of question answering in the semantic web saltlux
20181106 survey on challenges of question answering in the semantic web saltlux
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 

Último

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Último (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Automatic creation of mappings between classification systems for bibliographic data

  • 1. Automatic creation of mappings between classification systems for bibliographic data Prof. Magnus Pfeffer Stuttgart Media University pfeffer@hdm-stuttgart.de
  • 2. Agenda  Motivation  Instance-based matching  Application to bibliographic data  Evaluation  Ongoing projects  RDF Representation November 26th, 2013 Semantic Web in Libraries, Hamburg 2
  • 3. Motivation November 26th, 2013 Semantic Web in Libraries, Hamburg 3
  • 4. Current situation in Germany  Five regional library unions  Subject headings   Classification systems      Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file RVK (Regensburg Union Classification) BK (Basic Classification) DDC (Dewey Decimal Classification) Various local classification systems Low proportion of indexed titles (25-30%) November 26th, 2013 Semantic Web in Libraries, Hamburg 4
  • 5. Current situation in Germany  National library  Subject headings   Classification systems     Predominantly RSWK („Regeln für den Schlagwortkatalog“ „Rules for the subject catalogue“) using a shared authority file DDC (Dewey Decimal Classification) Coarse categories DDC only for titles published since 2007 Only „Reihe A“ (print trade publications) is fully indexed with RSWK November 26th, 2013 Semantic Web in Libraries, Hamburg 5
  • 6. Austrian National Library  Subject headings   Predominantly RSWK („Regeln für den Schlagwortkatalog“ - „Rules for the subject catalogue“) using a shared authority file Classification systems  BK since 2007  RVK in the Austrian librariy union catalogue November 26th, 2013 Semantic Web in Libraries, Hamburg 6
  • 7. Goals  Re-use existing indexing information  National level     BK is used mainly in northern Germany / Austria RVK mainly in southern Germany DDC mainly by the National Library International level   Make RVK data more accessible to DDC users Use DDC indexing information available from e.g. the Library of Congress November 26th, 2013 Semantic Web in Libraries, Hamburg 7
  • 8. Ideas  Use of appropriate classification systems  Facetted search in resource discovery systems    Browsing of similar titles   Should be monohierarchical Should have limited number of classes → DDC (first digits) or BK Should be fine-grained → DDC (full) or RVK (Multi-lingual retrieval) November 26th, 2013 Semantic Web in Libraries, Hamburg 8
  • 9. Ideas  Enable the use of existing tools and visualisations Denton (2012) November 26th, 2013 Legrady (2005) Semantic Web in Libraries, Hamburg 9
  • 10. Instance-based Matching November 26th, 2013 Semantic Web in Libraries, Hamburg 10
  • 11. Ontology matching  Well-studied problem in computer science  Several approaches  Based on the descriptors  Based on the structure  Based on the manifestations (instances) November 26th, 2013 Semantic Web in Libraries, Hamburg 11
  • 12. Instances  Entries in catalogues with multiple classifications November 26th, 2013 Semantic Web in Libraries, Hamburg 12
  • 13. Instance-based matching  Assumptions    Classes with semantic overlap co-occur in instances The more often these classes co-occur, the stronger the overlap Preparation  Extraction of all pairs of classifications from the data  Count of the extracted pairs November 26th, 2013 Semantic Web in Libraries, Hamburg 13
  • 14. Example November 26th, 2013 Semantic Web in Libraries, Hamburg 14
  • 15. Example  Entry 1  Pairs   179.9 / CC 7200  RVK: CC 7200  179.9 / CC 7250   DDC: 179.9 RVK: CC 7250  179.9 / CC 7200 Entry 2  DDC: 179.9  RVK: CC 7200 November 26th, 2013 Semantic Web in Libraries, Hamburg 15
  • 16. Normalisation  Comparing solely absolute numbers is bad    Some classes are more often used than others Number of pairs correlates with the number of entries that are classified using a given class Instead: Use proportion of co-occurrence ↔ occurrence ∣E c1∩ E c2∣ ∣E c1∪ E c2∣ number of entries with both classifications divided by number of entries with either classification (Jaccard measure for overlap of sets) November 26th, 2013 Semantic Web in Libraries, Hamburg 16
  • 17. Further interpretation  a and b are two classes from two classification systems A and B  The classes a and b only occur together → exact match  a only co-occurs with b, but b co-occurs with other classes from A → a is narrower concept than b  a co-occurs with several classes from B (including b) → a is wider concept than b  a and b do not co-occur → cannot infer that a and b are unrelated November 26th, 2013 Semantic Web in Libraries, Hamburg 17
  • 18. Prior work  Pfeffer (2009)  Analysis of classification system structure and actual use      Locating classes that describe the same concept Finding ways to improve existing mappings to RVK Focus on RVK, using data from library union catalogues Co-occurrence analysis Results    High co-occurrence and close in the hierarchy: → classes are hard to assign properly High co-occurrence and far in the hierarchy: → classes describe identical concepts Mappings from RSWK to RVK could be augmented November 26th, 2013 Semantic Web in Libraries, Hamburg 18
  • 19. Related work  Isaac et.al. (2007)  Applied instance based matching to bibliographic data  Data from the National Library of the Netherlands  Mapping from a thesaurus to a classification system  Results   Generated mappings are quite good More sophisticated measures than Jaccard do not lead to better mappings November 26th, 2013 Semantic Web in Libraries, Hamburg 19
  • 20. Application to bibliographic data November 26th, 2013 Semantic Web in Libraries, Hamburg 20
  • 21. Bibliographic data is different  Multiple editions  Multiple document types November 26th, 2013 Semantic Web in Libraries, Hamburg 21
  • 22. Bibliographic data  Skewed data    Multiple editions → More pairs Some co-occurrences could appear stronger than others Solution: Pre-clustering individual titles on the „work“ level  Increases chance for instances with more than one classifications  Each cluster contributes only once  Allows using absolute co-occurrence numbers   Cut-off for small numbers Ranking of competing matches November 26th, 2013 Semantic Web in Libraries, Hamburg 22
  • 23. Prior work  Pfeffer (2013)  Matching bibliographic records  Based on author, title and uniform title   (as well as information on title changes) Matches any edition and revision of a work  Including translations  Merge match sets → Discrete clusters  Consolidating indexing information   For indexing purposes, the differences between editions and revisions are irrelevant Subject headings and classifications are shared between all members of a cluster November 26th, 2013 Semantic Web in Libraries, Hamburg 23
  • 24. Evaluation November 26th, 2013 Semantic Web in Libraries, Hamburg 24
  • 25. Comparison with existing mappings  Existing (partial) mappings can be used as a basis for evaluation → „Gold standard“  Comparison of automatic and manual mapping    Recall: Are all the mappings found? Precision: Are all found mappings correct? Analysis of additional links  Maybe the gold standard can be improved? November 26th, 2013 Semantic Web in Libraries, Hamburg 25
  • 26. Ongoing projects November 26th, 2013 Semantic Web in Libraries, Hamburg 26
  • 27. Data  Bibliographic data   German National Library catalogue  Austrian National Library catalogue   German library union catalogues British national bibliography Gold standards  Partial mappings BK ↔ RVK November 26th, 2013 Semantic Web in Libraries, Hamburg 27
  • 28. Interesting Mappings  RVK → BK   BK well suited for faceted retrieval   Gold standard exists RVK has largest proportion of classified titles RVK ↔ DDC   Enable data sharing between the German National Library and the RVK-using libraries Not limited to classification systems  See Pfeffer (2009) and Wang et.al. (2009) November 26th, 2013 Semantic Web in Libraries, Hamburg 28
  • 29. Implementation: Tasks  Import and mapping of MAB2 and MARC data  Clustering   Matching and clustering   Generation of keys for the match process Consolidation of indexing and classification information Statistics    Co-occurrence counts Jaccard measure Output  Full mappings November 26th, 2013 Semantic Web in Libraries, Hamburg 29
  • 30. Implementation: State  All steps implemented as a prototype    Perl scripts File-based data and indexes Current development  Still Perl scripts (but better documented)  All data is accumulated in a document store   MongoDB Further plan: Porting to MetaFacture framework November 26th, 2013 Semantic Web in Libraries, Hamburg 30
  • 31. RDF representation November 26th, 2013 Semantic Web in Libraries, Hamburg 31
  • 32. Classification systems as Linked Data  DDC has been published as Linked Data  RVK has not been published as Linked Data    There is no versioning and no stable identifiers A project to fix this and to publish RVK as Linked Data has been cancelled by the university library of Regensburg BK has not been published as Linked Data  There is authority data in the GVK union catalogue → One would have to create temporary URIs for the RVK and BK classes November 26th, 2013 Semantic Web in Libraries, Hamburg 32
  • 34. Further Mappings  1:n-Relationships    List of classes that are all narrow matches Or: A combination of classes is a (near) exact match Qualified mappings  Express the confidence of the proposed match  Allow applications to optimize for precision or recall November 26th, 2013 Semantic Web in Libraries, Hamburg 34
  • 35. Qualification through indirection  RDF relations cannot be qualified http://dewey.info/class/641/ November 26th, 2013 skos:exactMatch http://ex.org/rvk/BF_8150 Semantic Web in Libraries, Hamburg 35
  • 36. Qualification through indirection  So an intermediate node is used http://dewey.info/class/641/ skos:exactMatch ex:qualifiedMatch http://ex.org/rvk/BF_8150 ex:targetConcept _1 ex:confidence Further information “1.0” November 26th, 2013 Semantic Web in Libraries, Hamburg 36
  • 37. Summary  Mappings between classification systems are an important means for interoperability and sharing of classification information and tools  Simple statistics on the existing data in catalogues can generate candidates for matches between individual classes  Where manually created mappings exist, they can be used to evaluate the algorithmic results  Mappings can be expressed in SKOS terms   To qualify the mappings further, intermediate nodes need to be introduced There is no standard for this yet November 26th, 2013 Semantic Web in Libraries, Hamburg 37
  • 38. Thank you for listening. Slides available online http://www.slideshare.net/MagnusPfeffer/ This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. November 26th, 2013 Semantic Web in Libraries, Hamburg 38
  • 39. References  Denton, W. (2012). On dentographs, a new method of visualizing library collections. In: Code4Lib.  Isaac, A., Van Der Meij, L., Schlobach, S. and Wang, S. (2007). An empirical study of instance-based ontology matching. In: The Semantic Web (pp. 253-266). Springer Berlin Heidelberg.  Legrady, G. (2005). Making visible the invisible. Seattle Library Data Flow Visualization. In: Digital Culture and Heritage. Proceedings of ICHIM05 Sept, 21-23.  Pfeffer, M. (2009). Äquivalenzklassen – Alle Doppelstellen der RVK finden. Presentation given at the Librarian Workshop of the 33rd Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications (GfKl).  Pfeffer, M. (2013). Using clustering across union catalogues to enrich entries with indexing information. In: Data Analysis, Machine Learning and Knowledge Discovery - Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e. V.. Springer Berlin Heidelberg.  Wang, S., Isaac, A., Schopman, B., Schlobach, S. and Van Der Meij, L. (2009). Matching multi-lingual subject vocabularies. In: Research and Advanced Technology for Digital Libraries (pp. 125-137). Springer Berlin Heidelberg. November 26th, 2013 Semantic Web in Libraries, Hamburg 39