Short presentation on challenges encountered in publishing EAD data as Linked Data in LOCAH and Linking Lives projects.
Archives & Linked Data meeting, JISC, London, Tuesday 7 February 2012
From EAD to Linked Data: (still) a work in progress
1. From EAD to Linked Data:
(still) a work in progress
Archives & Linked Data meeting,
JISC London, 7 Feb 2012
Pete Johnston
Technical Researcher, Eduserv
pete.johnston@eduserv.org.uk
2. How?
• Model our “world”
• Design URI patterns
• Select/create RDF vocabularies
• Design mapping of existing data to RDF
• Convert/transform data
• Generate links
• Publish/expose data
• Maintain/sustain
3. in
Finding maintainedBy/ Repository administeredBy/ Place Postcode
Aid maintains (Agent) administers Unit
hasPart/ encodedAs/
partOf encodes EAD
Document
accessProvidedBy/
Level
Biographical providesAccessTo
hasBiogHist/ topic/
History isBiogHistFor page
level Language
Archival language
topic/ at time
page
origination hasPart/ Resource
product of Creation Temporal
partOf
Entity
associatedWith extent
inScheme
Extent
Agent Concept Concept
Scheme
representedBy
Is-a foaf:focus
Object
Is-a associatedWith
Person Family Organisation Place
Book
participates in
Birth Death Genre Function
at time
Temporal
Entity
4. Finding maintainedBy/ Repository administeredBy/ Place
Aid maintains (Agent) administers
accessProvidedBy/
providesAccessTo
topic/
page
Archival
Resource
origination hasPart/
partOf
associatedWith
Agent Concept Concept associatedWith
Scheme inScheme
Book
foaf:focus Is-a
Is-a
Person Family Organisation Place Genre Function
5. Design URI Patterns
Cool URIs for the Semantic Web
http://blogs.ukoln.ac.uk/locah/2010/11/16/
identifying-the-things-uri-patterns-for-the-hub-linked-data/
Designing URI Sets for the UK Public Sector
http://www.cabinetoffice.gov.uk/resource-library/
designing-uri-sets-uk-public-sector
http://example.org/id/person/p123456
http://example.org/doc/person/p123456
http://example.org/doc/person/p123456.html
http://example.org/doc/person/p123456.rdf
Identifying the “things”: URI Patterns for the Hub Linked Data
http://blogs.ukoln.ac.uk/locah/2010/11/16/
identifying-the-things-uri-patterns-for-the-hub-linked-data/
6. HTML
Expose XHTML+
EAD RDFa
EAD
SPARQL
XML
EAD RDF/
XMLEAD Transform Triple XML
XMLEAD Store
XML
XML
SPARQL/ Other
API Apps
Enhance
Data Data Data
Set Set Set
7. EAD
EAD
XML
EAD
XMLEAD Transform Triple
XMLEAD Store
XML
XML
8. Transform
• Transform EAD XML to RDF/XML using XSLT
• Translate RDF/XML to N-Triples
• Split N-Triples into chunks
• Post to Triple Store
• Manage inputs
• Capture metadata about each step of process
9. Challenges
• Archival description/Encoded Archival Description
• Document v data
• Hub as aggregation
• Messy data, from multiple sources
• Versioning
• What happens when EAD doc X updated?
• Tracking triple/graph provenance
• Graph/quad support in store
10. Triple
Store
SPARQL/
API
Enhance
Data Data Data
Set Set Set
11. Enhance
• Add supplementary data
• Repository postcode data
• Data about project (DOAP), dataset (VOID) etc
• Internal links/consolidation
• Generate links to external resources
• Ordnance Survey – trivial from postcode
• VIAF – script to look up candidate matches
• LCSH – script to look up, match
12. Enhance
• Tools
• Silk - pattern matching
• Google Refine
• Use third-party links
• e.g. get Dbpedia link from VIAF
• Use aggregator services
• e.g. sameas.org
• Capture metadata about each process
13. Challenges
• Various target interfaces for lookup
• Identity/similarity/”sameAs” issues, verification
• Workflow
• Repeatability?
• Versioning
• Tracking triple/graph provenance
• Graph/quad support in store
• Exposing triple provenance
14. RDF o3-1
Lic A
RDF i1 RDF o3
Lic A
EAD 1
RDF o3-2
Lic C
RDF i2 RDF i1 RDF i2 RDF o2
EAD 2 Lic C
Lic B
RDF iX
Lic B Lic A Lic B
RDF o1
RDF iX
Lic C
Lic C
15. HTML o2-1
Meta A HTML o2
(from RDF i1 HTML o2-2 Meta oA
Linked
Archives
Hub) RDF o2-1
RDF o2 Meta oB
RDF i1
Meta B RDF o3-2
RDF i2
Meta A
(from RDF i2
RDF o1
DBpedia)
HTML o1 Meta oB
Meta B
17. From EAD to Linked Data:
(still) a work in progress
Archives & Linked Data meeting,
JISC London, 7 Feb 2012
Pete Johnston
Technical Researcher, Eduserv
pete.johnston@eduserv.org.uk