This document provides an introduction to linked open data for museums. It discusses limitations of keyword searching, defines linked open data and its core principles. It also covers topics such as copyright and licensing, the RDF data model, triples, questions answered by RDF, core vocabularies, ontologies, finding links between data, embedding schema.org metadata, and managing linked open data. Exercises are provided to help readers practice converting data to RDF triples and linking data to external vocabularies.
3. Limitations of Keyword Searching
Polysemy: One word with multiple meanings. E.g.
man
crane
bank
Synonymy: Multiple words with the same meaning.
buy OR purchase
create OR make
eliminate OR remove OR abolish
Signal to noise ratio e.g.
Try searching for the term
“Mississippi”
4. What is Linked Open Data?
On the web, open license
Machine-readable data
Non-proprietary format
RDF Format
Linked RDF
5. Copyright and Licensing
If Your content files are still under copyright and your institution is the
copyright owner, encourage your institution to license the content as
openly as possible
CCO
CC-BY
CC-BY-SA
CC-BY-NC
6. What is RDF?
• “Resource Description Framework (RDF) is a standard model for data
interchange on the Web. RDF has features that facilitate data merging
even if the underlying schemas differ, and it specifically supports the
evolution of schemas over time without requiring all the data
consumers to be changed. “ (from W3C)
• “…making Statements about resources (in particular web resources)
in the form of subject-predicate-object expressions.” (Wikipedia)
7. What are Triples?
• Triples are statements of fact (or assertions) composed of a
subject, predicate, and object. For example:
“David Henry”
Subject
“Lives in”
Predicate
“St. Louis”
Object
8. What are Questions Answered by RDF?
Fact-Based
Interpretive
Theoretical
Subjective
Analytical
9. Fact Based Questions ask Who, What,
When Where (Not so much Why)
Fact-Based
Questions
Who directed “Citizen Kane’?
What’s a daguerreotype?
Where did Van Gogh paint ‘Starry Night’?
10. Fact Based Question:
Are there any daguerreotypes of the
Mississippian mounds in St. Louis, Missouri?
Title: Group of people standing on a partially destroyed Big
Mound.
Description: Group of people standing on a partially destroyed Big
Mound.
Place: St. Louis, Missouri
Dates: 1869
Type(s): photo, Daguerreotype
Maker/Creator: Thomas M. Easterly
Subjects: Mississippian Culture, mounds
Identifier: PHO:17665
Permalink: http://collections.mohistory.org/resource/9952
12. “Thomas M.
Easterly” Name: Thomas M. Easterly
Birth Date: October 3, 1809
Death Date: March 12, 1882
Places of Residence:
Guilford, Vermont
Liberty, Missouri
St. Louis, Missouri
Bio: Thomas M. Easterly was one of the leading
American Daguerreotypists ….
During the 1860s, improvements in photographic development
caused daguerreotypes to become out of fashion. Easterly refused
to acknowledge these changes believing the highly detailed
daguerreotypes were far superior in terms of beauty or
permanence urging the public to "save your old daguerreotypes for
you will never see their like again".
13. Exercise 1.
Time: 10-15 minutes Activity:
• Break into groups of 2-3.
• Write out one or more research questions.
• For each question, draw a entity-relationship graph
that could provide an answer to the question
15. What is a Uniform Resource Identifier?
Uniform
Resource
Locator
-----
Purpose:
To locate a
web resource
(document)
Uniform
Resource
Name
-----
Purpose:
To identify
any resourceIn Linked Open Data,
URIs act as both URLs and URNs
UR
I
16. Principles of Linked Data
• Use URIs to denote things.
• Use HTTP URIs so that these things can be referred to and
looked up ("dereferenced") by people and user agents.
• Provide useful information about the thing when its URI is
dereferenced, leveraging standards such as RDF, SPARQL.
• Include links to other related things (using their URIs) when
publishing data on the Web.
To make this happen subjects and predicates MUST be defined
by URIs. Objects may be URIs or literals.
17. Triples to Complex
Graphs
Resource:9952
Thomas M. Easterly
1839
ns1:Subject_91011
“Mississippian Culture”
nso:hasSubject
nso:hasLabel
nso:hasType
“Daguerreotype”
ns1:type_80345
Resource:92142
19. What two words are most commonly
found in a browser window?
Web links have a half life of about ten years.
In other words, 50% of links that are 10 years
old are broken.
22. Rules for persistent URI’sCoolURI’s
• No date Context
• No ownership context
• No technology context
• Re-use existing identifiers
• Link multiple representations
• Implement 303 redirects for
real world objects
NotCoolURI’s
• Avoid stating ownership
• Avoid version numbers
• Avoid query strings
• Avoid file extensions
27. Exercise 2.
Time: 15 minutes Activity:
• Break into groups of 2-3.
• Using the graph defined in Exercise 1, define a set of
triples from the graph (Use your own URIs)
• Use the RDF validator at
http://www.rdfabout.com/demo/validator/
28. What is Linked Open Data?
On the web, open license
Machine-readable data
Non-proprietary format
RDF Format
Linked RDF
29. Principles of Linked Data
• Use URIs to denote things.
• Use HTTP URIs so that these things can be referred to and looked up
("dereferenced") by people and user agents.
• Provide useful information about the thing when its URI is
dereferenced, leveraging standards such as RDF, SPARQL.
• Include links to other related things (using their URIs) when
publishing data on the Web.
34. Example using CRM
Core
E52 Time-Span
1898
E53 Place
France
(nation)
E21 Person
Rodin Auguste
E52 Time-Span
1840
E67 Birth
Rodin’s birth
E52 Time-Span
1917
P4 has
time-span
E69 Death
Rodin’s death
E12 Production
Rodin making “Monument
to Balzac” in 1898
E21 Person
Honoré de Balzac
E55 Type
sculptors
E84 Information Carrier
The “Monument to Balzac”
(plaster)
E55 Type
plaster
E52 Time-Span
1925
E55 Type
bronze
E40 Legal Body
Rudier (Vve Alexis)
et Fils
E12 Production
Bronze
casting“Monument to
Balzac” in 1925
E55 Type
companies
E84 Information Carrier
The “Monument to
Balzac”(S1296)
P108B was
produced by
P62 depicts
P16B was used for
P134 continued
P2 has type
P120B occurs
after
P4 has time-span
P2 has type
P100B died in
P98B was born
P4 has time
-span
P2 has type
P14 carried out by
P14 carried out by
P62 depicts
P108B was
produced by P2 has type
P7 took
place at
P4 has time-span
35. Implementing Linked Open
Data
Link existing data
• Low barrier to entry
• Controlled lists and
thesauri
• Not very descriptive
Manage data to fit an ontology
• High barrier to entry
• Ontologies
• Very descriptive
RDF facilitates the “evolution of schemas over time”
36. What is RDF?
• “Resource Description Framework (RDF) is a standard model for data
interchange on the Web. RDF has features that facilitate data merging
even if the underlying schemas differ, and it specifically supports the
evolution of schemas over time without requiring all the data
consumers to be changed. “ (from W3C)
• “…making Statements about resources (in particular web resources)
in the form of subject-predicate-object expressions.” (Wikipedia)
38. Finding Links
• Linked Open Vocabularies is a good starting point
• Other well-used sources include:
• DBPedia - for a wide-range of types (people, places, subjects,
concepts)
• Id.loc.gov – for name authorities and subjects
• Viaf.org – for name authorities
• geonames.org – for geographic locations
Problem: There are no universal vocabularies
39. A Note of Caution
When re-using existing URIs, be sure to use the URI that represents
the entity (thing/concept/person) and not the web resource.
For example:
http://id.loc.gov/authorities/subjects/sh85126887.html
Is NOT the same as:
http://id.loc.gov/authorities/subjects/sh85126887
40. A Note of Caution
When re-using existing URIs, be sure to use the URI that represents
the entity (thing/concept/person) and not the web resource.
44. Exercise 3.
Time: 15-20 minutes
Activity:
• Break into groups of 2-3.
• Using the triples you defined in Exercise 3, find existing URIs
to link with your local URIs.
• Be prepared to explain why you chose the URIs your chose.
45. How Tos
• Embed schema.org data in a web page
• Publish static RDF files
• Manage local vocabularies and align them with existing vocabularies
• Contributing to a collection aggregator – e.g. Europeana or DPLA
• Publish existing database records as RDF
• Managing RDF data in a triple (or quad) store
46. Embedding
schema.org
<div itemscope itemtype="http://schema.org/CreativeWork">
<img src="http://collections.mohistory.org/resource/16679.jpg" class="item_image"
width="300" itemprop="image" />
<div id="record_detail">
<p><b>Title:</b> <span itemprop="name“>Lord Fitzwilliam and manservant, hunting
on the Hunt Farm on Gravois Road.</span></p>
<p><b>Description:</b> <span itemprop="description"></span></p>
<p><b>Item:</b> <span itemprop="additionalType">Daguerreotype</span></p>
<p><b>Dates:</b> <span itemprop="dateCreated">1855 to 1865</span></p> .
Copy and paste entire text
47. Publish static RDF files
• RDF files can be hand-written (what fun!) or rendered using templates
• Paths to RDF files can be submitted to RDF search engines such as
Sindice (http://sindice.com)
• Caution: Some content negotiation would be required.
• Remember: http://mydomain.org/resource/1234.rdf is NOT the same as
http://mydomain.org/resource/1234
48. Manage local vocabularies and align
them with existing vocabularies
Tools include:
PoolParty
Tematres
Karma
49. Contributing to a collection aggregator –
e.g. Europeana or DPLA
Service
Hub
• Dataset A
• Dataset B
• Dataset C
Service
Hub
• Dataset 1
• Dataset 2
• Dataset 3
Content
Hub
• Dataset X
• Dataset Y
• Dataset Z
51. Managing RDF data in a triple (or quad)
store
• Quad = triple + context
• Most stores feature a SPARQL interface to query across
all triples (quads) in a repository
• Tools:
• Sesame – from OpenRDF
• Virtuoso
• Mulgara
Below are some open data options from the Creative CommonsThese are listed from the least restrictive at the top of this slide to the most restrictive at the bottom of this slide. CCO – is when a copyright owner waives their right and dedicates it to the public domainCC-BY is when only requirement is attribution to the owner when reusingCC-BY-SA adds the additional criteria for others to share alike under the same termsCC-BY-NC - further restricts re-use to non-commercial uses only. I put this in red because some open data purists believe that a non-commercial restriction does not qualify for open content status.
Time: 10-15 minutes Activity: Break into groups of 2-3. Write out one or more research questions.For each question, draw a entity-relationship graph that could provide an answer to the question
RDF can be written in various formats including:RDFXMLN-TriplesTurtleJSON-LD
See http://data.nytimes.com/77498966567276420453 for an example of crosslinking “Joan Baez”-- linkage uses the owl:sameAs predicate to link the URI for Joan Baez at the New York Times with the URI at DBPedia.
Ad foaf
DCMI Types: http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/Library of Congress Subject Headings: http://id.loc.gov/authorities/subjects/sh85086237.htmlCIDOC CRM: http://www.cidoc-crm.org/docs/cidoc_crm_version_5.1.2.pdf
Is there a URI for the type "Daguerreotype"?1) Try Linked open vocabularies. result: Nothing for "Daguerreotype" ref: http://lov.okfn.org/dataset/lov/search/#s=Daguerreotype result2: Many hits for "photograph" ref: http://lov.okfn.org/dataset/lov/search/#s=photograph -- could use http://schema.org/photograph as a broad match2) Try the LOC Linked Data Service. result: Subject result for "Daguerreotype" ref: http://id.loc.gov/search/?q=Daguerreotype&q= result2: after filtering by the TGM, found http://id.loc.gov/vocabulary/graphicMaterials/tgm002852.html -- good result: well used vocabulary; fits within a hierarchy3) Try DBPedia. result: found a dbpedia resource ref: http://lookup.dbpedia.org/api/search.asmx/PrefixSearch?QueryClass=&MaxHits=5&QueryString=daguerreotype
Is there a URI for the type "Daguerreotype"?1) Try Linked open vocabularies. result: Nothing for "Daguerreotype" ref: http://lov.okfn.org/dataset/lov/search/#s=Daguerreotype result2: Many hits for "photograph" ref: http://lov.okfn.org/dataset/lov/search/#s=photograph -- could use http://schema.org/photograph as a broad match2) Try the LOC Linked Data Service. result: Subject result for "Daguerreotype" ref: http://id.loc.gov/search/?q=Daguerreotype&q= result2: after filtering by the TGM, found http://id.loc.gov/vocabulary/graphicMaterials/tgm002852.html -- good result: well used vocabulary; fits within a hierarchy3) Try DBPedia. result: found a dbpedia resource ref: http://lookup.dbpedia.org/api/search.asmx/PrefixSearch?QueryClass=&MaxHits=5&QueryString=daguerreotype
Embed schema.org data in a web pagePublish static RDF filesManage local vocabularies and align them with existing vocabulariesContributing to a collection aggregator – e.g. Europeana or DPLAPublish existing database records as RDFManaging RDF data in a triple (or quad) store
Embed schema.org data in a web pagePublish static RDF filesManage local vocabularies and align them with existing vocabulariesContributing to a collection aggregator – e.g. Europeana or DPLAPublish existing database records as RDFManaging RDF data in a triple (or quad) store
Embed schema.org data in a web pagePublish static RDF filesManage local vocabularies and align them with existing vocabulariesContributing to a collection aggregator – e.g. Europeana or DPLAPublish existing database records as RDFManaging RDF data in a triple (or quad) store
Embed schema.org data in a web pagePublish static RDF filesManage local vocabularies and align them with existing vocabulariesContributing to a collection aggregator – e.g. Europeana or DPLAPublish existing database records as RDFManaging RDF data in a triple (or quad) store