2. Who am I?
• Postdoc at Cornell Information Science
• Research areas
• linked data
• user-contributed data (annotations)
• (meta-)data interoperability
• Contact:
• bernhard.haslhofer@cornell.edu
3. Today we talk about...
http://www.youtube.com/watch?v=5Cb3ik6zP2I
4. Today we talk about...
• Movies, actors and other real-world entities
• How to make data about these entities
available on the Web (Linked Data)
• Enabling technologies, best-practices and
useful tools that help us in doing so
• Other Linked Data projects (BBC, LoC)
6. The World Wide Web (WWW)
• Internet != WWW != Google != Facebook
• Fundamental technologies
• URI - a simple and generic syntax for identifiers
• HTML - a markup language without formal schema
binding
• HTTP - a simple protocol to access and manipulate
resources and resource representations in a
distributed environment
• W3C Consortium (http://www.w3.org)
7. URIs
• Identification of resources via Uniform
Resource Identifiers (URIs)
•The generic syntax consists of a hierarchical sequence of components, scheme,
Generic Syntax:
authority, path, query, and fragment.
URI = scheme “:” hier-path [ “?” query ] [ “#” fragment ]
Scheme and hier-path are required, though the path may be empty.
Example URIs with components: URI
foo://example.com:8042/over/there?name=ferret#nose
_/ ________________/_________/ _________/ __/
URL
| | | | | URN
scheme authority path query fragment
8. URIs / Resources
• Information Resource
• web pages, images, product catalogs, etc
• all their essential characteristics can be conveyed in a
message
• e.g., http://www.flickr.com/user2/photos/image.jpg
• Non-Information Resource
• other things such as dogs, people, this classroom, concepts
• their essence is not information
• e.g., http://www.example.com/ontology/meter
9. HTTP
• A stateless request-response protocol in the
client-server computing model
• HTTP methods: GET, POST, PUT, DELETE, ...
• Agents may use a URI to access the
referenced resource = dereferencing the URI
10. HTTP Content Negotiation
• A URI is not (necessarily) a filename
• Conneg = making available multiple resource
representations via the same URI
Plain Text
text/plain
HTML (en)
URI text/html
HTML (jp)
http://example.com/The_Shining text/html
Resource
11. (X)HTML(5)
• A resource representation data format...
• ... for presentation markup
• rendered by user agents (typically browsers)
• focus on readability
• less formal, user-friendly syntax and semantics
12. Web Services
• Application-to-application communication
based on the Web architecture
• simple and open standards (HTTP, XML, JSON, ...)
• send data from Application A to Application B
through the Web
• usually define some API
Web
Application A Application B
17. Why Linked Data?
• There is lots of information on the Web
• ...valuable information that can be (re-)used
• Problem
• information is usually expressed in the form of
HTML documents
• the underlying raw data are locked in closed data
silos (mostly DBMS)
19. Why Linked Data?
• The Web is successful because it provides
• Uniform encoding (HTML)
• Uniform addressing (URI)
• Uniform transportation (HTTP)
for the exchange of documents.
• Why not apply the same mechanism to the
underlying data?
22. What is Linked Data?
• A method to build a Web of Data
• Architectural style, set of standards
Web
23. What is Linked Data?
• A set of four principles
• use URIs as names for things
• use HTTP URIs so that people can look up those
names
• when someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
• include links to other URIs, so that they can
discover more things
25. Uniform Resource Identifiers (URI)
• Name and identify things (resources)
• Dereferencable HTTP URIs
http://dbpedia.org/resource/
The_Shining_(film)
http://data.linkedmdb.org/
resource/film/2014
http://rdf.freebase.com/ns/m/
04fjzv
26. Resource Description Framework (RDF)
• A model for representing data on the Web
• Several statements (triples) form a graph
http://dbpedia.org/ontology/ http://xmlns.com/foaf/0.1/
Film Person
rdf:type rdf:type
http://dbpedia.org/resource/ http://dbpedia.org/resource/
dbpprop:starring
The_Shining_(film) Jack_Nicholson
foaf:name
rdfs:label rdfs:label
dbpedia-owl:birthDate
!" (#$) The Shining (film) 1937-04-22 Jack Nicholson
28. RDF Vocabulary Description Language (RDFS)
• A language for describing the syntax and
semantics of vocabularies in a machine-
understandable way
http://dbpedia.org/ontology/
Work
rdfs:subClassOf
http://dbpedia.org/ontology/
Film
29. OWL - Web Ontology Language
• A more expressive (formal) language for defining the
syntax and semantics of vocabularies
• Solves RDFS shortcomings but introduces quite some
complexity
http://www.w3.org/2002/07/ http://dbpedia.org/ontology/
owl#ObjectProperty Work
rdf:type rdfs:domain
http://dbpedia.org/ontology/ http://dbpedia.org/ontology/
rdfs:range
starring Person
rdfs:label
starring
30. Simple Knowledge Organization System (SKOS)
• A language for describing controlled vocabularies
(taxonomies, thesauri, classification schemes)
http://dbpedia.org/resource/
Category:1980s_horror_films
skos:subject rdf:type
http://dbpedia.org/resource/ skos:broader
http://www.w3.org/2004/02/
The_Shining_(film) skos/core#Concept
rdf:type
http://dbpedia.org/resource/
Category:1980s_films
32. SPARQL
• A query language and protocol for accessing
7.2.2.7 SPARQL - RDF Query Language
RDF data on the Web
A query language and protocol for accessing RDF data on the Web
SELECT DISTINCT ?x
WHERE {?x skos:subject <http:dbpedia.org/resource/Cate-
gory:1980s_horror_films>}
LIMIT 10
34. Publishing Vocabularies
• Hash-based URIs
• e.g., http://example.com/example1#ClassA
• Suited to group the description of a moderate number of
related terms into one RDF document
• Agent can retrieve terms with a single request
• Slash-based URIs
• e.g., http://example.com/example1/ClassB
• Suited to split terms in large vocabularies into one
document per term
• No need to download a massive document
39. Publishing Data
• Distinguish between non-information and
information resource
• Sample non-information resource
• http://dbpedia.org/resource/The_Shining_(film)
• Sample information resource
• http://dbpedia.org/page/The_Shining_(film) - HTML
• http://dbpedia.org/data/The_Shining_(film) - RDF
40. Publishing Data
GET http://dbpedia.org/resource/The_Shining_(film)
Accept: application/rdf+xml
303 See Other
Location: http://dbpedia.org/data/The_Shining_(film)
GET http://dbpedia.org/data/The_Shining_(film)
Accept: application/rdf+xml
200 OK
...
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF ...
42. Linking? Open? Data Project?
• Open Data: a philosophy, practice, or policy that data are
freely available to everyone without restrictions from
copyright, patents, a.s.o.
• Linked Data: method / best practices for exposing, sharing,
and connecting data using URIs and RDF
• Linking Open Data: a W3C community project with the
goal to extend the Web with a data commons by publishing
various open data sets as RDF on the Web and by setting
links between data items from different sources
59. RDF / Linked Data Wrappers
• D2RQ - SPARQL / Linked Data for relational
databases (http://www4.wiwiss.fu-berlin.de/
bizer/d2rq/)
• OAI2LOD Server - expose any OAI-PMH
source as Linked Data
• TripFS - filesystem as Linked Data
• TripCel - XLS spreadsheets as Linked Dat
• ...
60. Linked Data debugging
Startup your console / terminal
- native on Linux / Mac OS X
- Windows: http://www.cygwin.com/
Dereference resources with cURL (http://curl.haxx.se/)
curl -I -H "Accept: application/rdf+xml" http://
dbpedia.org/resource/The_Shining_%28film%29
curl -H "Accept: application/rdf+xml" http://
dbpedia.org/data/The_Shining_%28film%29
61. Linked Data debugging
Install the Raptor RDF Syntax Library (http://
librdf.org/raptor/)
- Mac: brew install raptor
Use the rapper utility to dereference URIs
rapper http://dbpedia.org/resource/The_Shining_%28film
%29
rapper -o rdfxml http://dbpedia.org/resource/
The_Shining_%28film%29
63. Required Reading
• T. Heath, C. Bizer. Linked Data: Evolving the Web into a
Global Data Space, Chapters 1-5
http://linkeddatabook.com/editions/1.0/
64. Recommended Readings
• Linked Data Web Site: http://linkeddata.org
• Linked Data / Semantic Web Introduction: http://
www.linkeddatatools.com/semantic-web-basics
• Tim Berners-Lee. Linked Data Design Issues: http://
www.w3.org/DesignIssues/LinkedData.html
• Best Practice Recipes for Publishing RDF Vocabularies:
http://www.w3.org/TR/swbp-vocab-pub/
• How to Publish Linked Data on the Web: http://
www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/