Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez Perez
1. Linked Data Applications:
There is no One-Size-Fits-All
Formula
Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
asun@fi.upm.es
Acknowledgements:
O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón
Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
2. Table of content
1. The concept
2. Foundations
3. The process
4. Examples
• Libraries: http://datos.bne.es
• Geo: http://geo.linkeddata.es/
• Metereology:http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 2
3. Complex queries using data from heterogeneous Web
pages
http://www.bne.es/
http://elviajero.elpais.com/
Cervantes enthusiast from Germany
visiting Madrid and willing to know
more about Cervantes’ work and life
http://www.viaf.org/
http://www.aemet
SSSW-12: 9th Summer School on Ontological Engineering andattribution: http://commons.wikimedia.org/wiki/User:Gugerell
*Picture the Semantic Web. Cercedilla. Spain 3
4. BD BD BD IGN BD BD
BD BNE VIAF AEMET Prisa DBpedia Data Integration
BNE
Ubicado en
Alcalá de Henares
1605 El Quijote
Año de Same as
Publicación Autor birthPlace
M. Cervantes Alcalá de Henares
M. Cervantes
M. Cervantes
creator
Year of
publication Don Quixote
1960 Alcalá de Henares
Alcalá de Henares
Translated
into
Temperatura
located guía
Hebrew
20º
Tapas Siglo
de Oro
VIAF
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 4
5. Table of content
1. The concept
2. Foundations
3. The process
4. Examples
• Libraries: http://datos.bne.es
• Geo: http://geo.linkeddata.es/
• Metereology:http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 5
6. The model (Ontology) and the data
Idiom
translation
Is creator of birthPlace
Year Work Person Place Ontology
Publication date
Located at
Has subject
Library
Catalán
translation
Is creator of
birthPlace
1960 El Quijote Cervantes Alcalá de Henares
Publication date
Has subject
Located in Data
Vida de Cervantes
BNE
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 6
7. The model (Ontology) and the data
Language
http://iflastandards.info/ns/fr/frbr/frbrer/C1002
Ontology
translation
Is creator of
work Person
Año http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Publication date
birthPlace
Has subject
Located in http://geo.linkeddata.es/ontology/Municipio
Biblioteca
http://xmlns.com/foaf/0.1/Organization
Catalán
http://datos.bne.es/resource/XX1924295
translation http://geo.linkeddata.es/resource/Alcalá de Henares
Don Quijote de la Mancha
Cervantes Saavedra, Miguel de
Es autor birthPlace
1960 http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747
Publication date
Has subject
Located in http://datos.bne.es/resource/bimo0002045496
BNE Vida de Miguel de Cervantes Saavedra
http://datos.bne.es/# Data
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
7
8. Table of content
1. The concept Specification
2. Foundations Modelling
3. The process RDF
Generation
4. Examples Links
Generation
Publication
Exploitation
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 8
9. Specification
• Data sources analysis
Modelling
RDF
• URI Design
Generation
Links • License definition
Generation
Publication
Exploitation
Reunión bilateral CNIG – OEG
SSSW-12: 9th Summer
Proyecto OTALEX
School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 9
10. Specification
URI design
Specification • Meaningful URIs vs opaque URIs
• Separate TBox (ontology model) from ABox
Modelling • Base URI
http://linkeddata.es/
RDF http://geo.linkeddata.es/
Generation http://otalex.linkeddata.es/
Links • Ontología (TBox URIs)
Generation http://phenomenontology.linkeddata.es/ontology/{concept|property}
http://phenomenontology.linkeddata.es/ontology/Municipality
Publication
• Datos (ABox URIs)
Exploitation
http://geo.linkeddata.es/resource/{resource type}/{resource name}
http://geo.linkeddata.es/resource/Municipio/Azuaga
Reunión bilateral CNIG – OEG
SSSW-12: 9th Summer
Proyecto OTALEX
School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 10
11. Specification
License Definition
• Several possibilities
Specification
• The UK Open Government License
• Open Database License
Modelling
• Public Domain Dedication and License
RDF • Open Data Commons Attribution License
Generation
• The Creative Commons Licenses (CC)
Links
Generation
• It is also possible to reuse and apply an existing
Publication license of the (government) data sources.
Exploitation
Reunión bilateral CNIG – OEG
SSSW-12: 9th Summer
Proyecto OTALEX
School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 11
12. Modelling
Ontology
Specification • Ontologies:
• A set of terms
• A set of explicit assumptions regarding the intended meaning of
Modelling the terms.
• Almost always including concepts and their classification
• Almost always including properties between concepts
RDF Generation
Links Generation
• Shared understanding of a domain of interest
Publication • Ontologies expressed in OWL or RDF(S), both based on
RDF
Exploitation
• The NeOn methodology helps to build ontologies
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 12
13. 2. Vocabulary development
Identification • Features
of the data sources
• Lightweight :
Vocabulary • Taxonomies and a few properties
development
• Consensuated vocabularies
• To avoid the mapping problems
Generation
of the RDF Data • Multilingual
• Linked data are multilingual
Publication
of the RDF data • The NeOn methodology can help to
• Re-enginer Non ontological resources into ontologie
Data cleansing
• Pros: use domain terminology already
consensuated by domain experts
Linking
the RDF data • Withdraw in heavyweight ontologies those features
that you don’t need
Enable effective • Reuse existing vocabularies
discovery
Asunción Gómez Pérez 9th
SSSW-12: Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 13
14. NeOn Methodology
Knowledge Resources
Non Ontological Resources Ontological Resources
Glossaries O. Design Patterns O. Repositories and Registries 3 4
Dictionaries Lexicons
Flogic
5 6
Classification
Taxonomies Thesauri RDF(S)
Schemas
OWL Ontological Resource
2 Reuse
2 5 6
Ontology Design 4 O. Aligning
Non Ontological Resource
Pattern Reuse 3
Reuse
6 O. Merging
2 Ontological Resource
7 Reengineering 5 Alignments
Non Ontological Resource
Reengineering 4 6
1
RDF(S)
O. Specification O. Conceptualization O. Formalization O. Implementation
Flogic
8
9 Ontology Restructuring
O. Localization (Pruning, Extension, OWL
Specialization, Modularization)
1,2,3,4,5,6,7,8, 9
Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation;
Configuration Management; Evaluation (V&V); Assessment
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
15. Modelling
Reuse available vocabularies
Reuse suitable
Ontologies and
vocabularies
Linked Open Vocabularies
…
Search for suitable
non-ontological resources
are there Yes Build the vocabulary by
suitable transforming available
resources? resources
No Domain-related sites
Build the vocabulary from Government Catalogs
Highly reliable Web Sites
scratch
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 15
16. Publicación
Specification
Modelling
Data publication
RDF Metadata publicacion using VOID
Generation
Links
To facilitate the discovery
Generation
• Register in CKAN your dataset
Publication
• Use to sitemap4rdf to generate the site map
Exploitation
• Upload the site map to Google and Sindice
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
17. Table of content
1. The concept
2. Foundations
3. The process
4. Examples
• Libraries: http://datos.bne.es
• http://linkeddata3.dia.fi.upm.es/bne-demo
• Geo: http://geo.linkeddata.es/
• Metereology: http://aemet.linkeddata.es/
• Travelling: http://webenemasuno.linkeddata.es/
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 17
18. MARC21
Specification
• Different communication formats:
• MARC 21 format for Bibliographic Data
Modelling
• MARC 21 format for Authority Data
• Others: Holdings, Classification, etc.
RDF Generation
• Three main elements:
• Record structure: ISO 2709. Fields, indicators,
Links Generation subfields…
• Content designation: "Meaning" of codes and
conventions
Publication
• Content: Defined outside the MARC standard (ISBD,
AACR..)
Exploitation
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 18
19. Specification@ BNE
• Records in the MARC 21 format
• 3.9 million bibliographical records
Specification
• 4.2 million authority records
Modelling
• Version: November, 2011
AUTHORITY BIBLIOGRAPHIC
RDF Generation
Links Generation
Persons 76576 Maps
Corporate bodies 320727 Sound recordings
Publication Conferences 166017 Gravings, drawings, pictures
Titles 35770 Manuscripts
Subject 143959 Ancient books
2696560 Modern books
Exploitation
178473 Scores
3021 Electronic resources
156634 Serials
96672 Videos
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 19
20. MARC21 record structure
Specification • Authority record: Camus, Albert*
Control Field 001 XX1721208
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
040 $a SpMaBN $b spa $c SpMaBN $e rdc $f
embne
Field Subfield Content 100 10 $a Camus, Albert
HEADING
Subfield Content 1XX
$d 1913-1960
670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n.
Mondovi, Argel)
670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
* http://datos.bne.es/resource/XX1721208
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 20
21. MARC21 record content designation
• Authority record: Camus, Albert*
Control Number 001 XX1721208
HEADING – Personal
Personal name Name 100 10 $a Camus, Albert Name
100
Dates associated with name $d 1913-1960
Source consulted Citation 670 $a El mite de Sísif, 1987 $b port. (Albert
Camus)
• Human reading:
An authority record that describes a Person, named
Camus, Albert with associated dates 1913-1960
* http://datos.bne.es/resource/XX1721208
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 21
22. Frecuency of codes in records
Specification
Modelling
RDF Generation
Links Generation
Publication
Exploitation
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 22
23. Specification
• Source data: MARC 21 records, not RDB. Very flat
Specification
structure difficult to map to richer models
Modelling • Domain experts (catalogers) need to be part of the mapping
process.
RDF Generation
• Data quality good but still many errors: reporting.
Links Generation
• Iterative and incremental transformation process: measure
coverage and progress.
Publication
• Highly specialized library models: FRBR, ISBD.
Exploitation • Multilinguality, collaboration with IFLA
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
24. Model: FRBR at a glance
Work 2
Specification
Works
Work 1
Modelling Work 3
RDF Generation
Expression 2
Links Generation
Expression1 Expressions
Publication
Exploitation
Manifestations
Manifestation1 Manifestation2
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 24
25. The Ontology: based on IFLA vocabularies
Specification
Modelling
RDF
Generation
Links
Generation
Publication
Exploitation
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
26. Who will be the mapping generator?
001 XX1721208
Specification
005 200012181124
008 901120nn aijnnaabn n aaa
016 $a BNE19900178994
Modelling 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne
100 10 $a Camus, Albert
$d 1913-1960
RDF 670 $a El mite de Sísif, 1987 $b port. (Albert Camus)
Generation 670 $a Dic. de filosofía, de J. Ferrater Mora,
1980$b(Camus., Albert (1913-1960); n. Mondovi,
Argel)
Links 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)
Generation
Publication
Exploitation
BNE
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
27. Similar to mapping ontologies
100a maps Person
maps
Content Content
(100a) (100at) is creator of
contained in
maps
100at Work
subfield
property
maps
100t title of work
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 27
28. Marimba allows librarians to create mappings
• Three spreadsheets:
Classification Basic structure
mapping
MARC21 Records count Content sample Mapping
info
100 $a $d 888.880 Camus, Albert foaf:Person
1913-1960
Annotation 100 $a 999.999 Cervantes, Miguel foaf:name
mapping de
100 $a $m 10.000 Cervantes, iguel ERROR
Relationships
mapping
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 28
29. Librarians create mappings using excell
Classification
mapping
Classification Basic structure
mapping
MARC21 Records count Content sample Mapping
info
100 $a $d 888.880 Camus, Albert foaf:Person
1913-1960
Annotation 100 $a 999.999 Cervantes, Miguel foaf:name
mapping de
100 $a $m 10.000 Cervantes, iguel ERROR
Relationships
mapping
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 29
30. Librarians create mappings using excell
Annotation
mapping
place of publication
has dimensions
Is part of work
Relationships
mapping
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 30
31. Marimba interprets the Mappings and generate the RDF
001 XX1721208
……
Specification 100 10 $a Camus, Albert
$d 1913-1960
……
Modelling • Classify: Exploiting the heading field and subfield codes.
100 $a $d Person (it has a personal name)
RDF 100 $a $d $t Work (it has a title)
Generation
• Annotate: Using subfield codes and the content.
Links
Generation 100 $a "Camus, Albert" frbr:3001 "Camus, Albert"
100 $t "La Peste" frbr:P3039 "La Peste"
Publication
MARC 21 record Action RDF (Output)
(Input)
Exploitation
100 $a $d Classify rdf:type frbr:C1005
100 $a Camus, Annotate frbr:P3039 "Camus,
BNE
Albert Albert"
100 $d 1913-1960 Annotate frbr:P3040 "1913-
1960"
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 31
32. Mapping process more in detail
• But, what about the relationships between the entities?
RDF • Relationships between records are not explicit in MARC.
Generation
Goal: The work "La Peste" was created by Albert Camus
001 XX1721208 001 XX1910518
100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste
Common Common Diff
Work
We know the type of R1 and R2, and we look at the heading diff
bne:XX1721208 frbr:2010 bne:XX1910518
(isCreatorOf)
* http://datos.bne.es/resource/XX1910518
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 32
33. Marimba: Mapping process summary
(MARC records)
001 XX1721208 001 XX1910518
Specification
100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa
peste
Modelling Classify
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
RDF
Generation
Annotate
Links bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
Generation frbr:name "Camus, Albert" . frbr:title "La Peste"
frbr:hasDates 1913-1960
Publication
Relate
bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work
frbr:name "Camus, Albert" . frbr:title "La Peste" .
Exploitation frbr:hasDates 1913-1960 . frbr:isCreatedBy bne:XX1721208
frbr:isCreatorOf bne:XX1721208
BNE
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 33
34. Marimba uses the ontology to generate RDF
Specification
Modelling
RDF
Generation
Links
Generation
Publication
Exploitation
BNE
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
35. Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
http://d-nb.info/gnd/11851993X
Specification
DNB
Modelling http://viaf.org/viaf/17220427
VIAF
Same As
RDF Same As http://dbpedia.org/resource/Miguel_de_Cervantes
Generation
DBpedia
Same As
Links
Generation
http://datos.bne.es/resource/XX1718747
BNE
Publication Same As
Same As
Exploitation http://www.idref.fr/026774771/id
SUDOC
http://libris.kb.se/resource/auth/45369
LIBRIS
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
36. Marimba links with other resources:
VIAF, DNB, SUDOC, LIBRIS, DBpedia
Specification
Modelling
RDF
Generation
Links
Generation
Publication
Exploitation
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
37. Publicación
Specification
Modelling
Data publication
RDF Metadata publicacion using VOID
Generation
Links
To facilitate the discovery
Generation
• Register in CKAN your dataset
Publication
• Use to sitemap4rdf to generate the site map
Exploitation
• Upload the site map to Google and Sindice
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
38. Exploitation
Web Interface
Especification
Specification
Modelling
Model
RDF
Generation
generation
Links
Publication
Generation
SPARQL queries
Exploitation
Publication
URI Cervantes
select distinct COUNT(?Obras) where {
http://datos.bne.es/resource/XX1718747 Is author
Exploitation
<http://iflastandards.info/ns/fr/frbr/frbrer/P2010>
?Obras
}
http://bne.linkeddata.es/
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
39. SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 40
40. SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 41
41. SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 42
42. Technological Support
• Modelling:
• Open Metadata Registry
• Neon Toolkit
• Mapping and generation
• MARiMbA: Library-oriented, supports and facilitates the
entire process od transformation from MARC21 to RDF
• Publication:
• Virtuoso Universal Server
• Pubby
• CKAN registry
• Sitemap4rdf
• Exploitation:
• Web Applications that visualize data using SPARQL
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
43. Results: datos.bne.es
• Total number of authority records: 4.100.000
• Total number of bibliographical records: 2.390.140
• Total number of RDF triples: 58.053.215
• Number of links: (15% authorities): 587.520
• Linked sources:
• VIAF
• SUDOC (French collective university catalogue) FR
• GND (German National Library of authorities) GER
• LIBRIS Sweden
• DBPedia
• Soon BNF
http://bne.linkeddata.es/
SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 44
Notas do Editor
The five MARC 21 communication formats, MARC 21 Format for Bibliographic Data, MARC 21 Format for Authority Data, MARC 21 Format for Holdings Data, MARC 21 Format for Classification Data, and MARC 21 Format for Community Information, are widely used standards for the representation and exchange of bibliographic, authority, holdings, classification, and community information data in machine-readable form.A MARC record is composed of three elements: the record structure, the content designation, and the data content of the record:The record structure is an implementation of the international standard Format for Information Exchange (ISO 2709) and its American counterpart, Bibliographic Information Interchange (ANSI/NISO Z39.2)The content designation--the codes and conventions established explicitly to identify and further characterize the data elements within a record and to support the manipulation of that data--is defined by each of the MARC formats.The content of the data elements that comprise a MARC record is usually defined by standards outside the formats. Examples are the International Standard Bibliographic Description (ISBD), Anglo-American Cataloguing Rules, Library of Congress Subject Headings (LCSH), or other cataloging rules, subject thesauri, and classification schedules used by the organization that creates a record. The content of certain coded data elements is defined in the MARC formats (e.g., the Leader, field 008).
- We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
- We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
- We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
- We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)