SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
A distributed network of digital
heritage information
Enno Meijers
Semantics Conference – Amsterdam – 12 September 2017
Contents
‱ Introduction to Dutch Digital Heritage Network
‱ Problems with the current infrastructure
‱ Strategies for improvement
‱ Building the distributed network
Introduction to Dutch Digital Heritage Network
Digital Heritage Network (NDE) aims
at increasing the social value of the
heritage information maintained by
libraries, archives, museums and
other cultural heritage institutions.
Long term cooperation between the
government and the institutions on
national, regional and local level. It’s
about information and people!
Three-layered approach for
improving the sustainability,
the usability and the visibility
of digital heritage information.
sustainable
usable
visible
In general - discovery of the “deep web”
‱ Institutional repositories, collection management systems
‱ Millions of ‘invisible’ datasets: publications, research data, heritage collections
‱ Poor coverage by regular search engines
‱ Metadata is key, describing physical materials or (licensed) digital content
‱ Dutch cultural heritage sector: 1500 institutions, >>1500 collections
‱ Demand for cross-institutional, cross-domain discovery
‱ Many specialized portals giving access to different views
General setup of these portals
And even networks of aggregators...
Evaluating current infrastructure
Evaluating current approach
Positive results so far:
‱ many data sources available through OAI-PMH protocol
‱ powerful and smart protocol for metadata synchronization
‱ opened up data silos
‱ created the need for aligning data models
‱ made cross-collection and cross-domain discovery possible (e.g. Europeana)
But there are two problems areas:
‱ semantic alignment
‱ data integration
Problem #1: Poor semantic alignment
Not enough semantic alignment in the data sources:
‱ lack sustainable URIs and shared identifiers
‱ no shared terminology sources available
‱ no provisions for linking between sources
‱ implementations lack support for multiple data models
‱ data is ‘flattened’ to a common data model (EDM, Dublin Core)
‱ loss of meaning due to transformation
 poor capabilities for cross-collection, cross-domain discovery
 cleaning, aligning and enriching is needed after harvesting
Problem #2: Inefficient data integration
Physical data integration based on OAI–PMH (= copying)
‱ synchronizing with the sources is hard work
‱ ownership, licensing, provenance, control over access are difficult topics
‱ no feedback loop to the data source (usage, cleaning, enrichments)
‱ data source owner and end user are disconnected
‱ centralized model leads to scalability problems
‱ OAI-PMH is not a web-centric protocol
See also:
Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017)
Herbert Van de Sompel - Reminiscing About 15 Years of Interoperability Efforts - D-lib Magazine - December (2015)
Strategies for improvement
‱ build portals as views based on a common data layer
‱ minimize the intermediate layers
‱ refer to the source instead of copying
‱ support decentralized discovery
‱ maximize the usability of data at the source
‱ develop a sustainable, ‘webcentric’ solution
‱ use HTTP, RDF and RESTful APIs as building blocks
=> implement the Linked Data principals
Inspired by the work of Ruben Verborgh, Herbert Van de Sompel and colleagues:
See for example: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed
Linked Data archives - Journal of Documentation (2017)
Design principles for a discovery infrastructure
At the data source level:
‱ use sustainable URIs to identify the resources
‱ use formal definitions for persons, places, concepts, events (API)
‱ use domain vocabularies / data models to describe the data
‱ add support for cross-domain discovery (EDM, Schema.org,...)
At the network level:
‱ create a ‘network of terms’ for shared entities
‱ provide tools for aligning and linking
‱ create alignments and links between different terminology sources
‱ provide easy access for collection management systems (API)
Implementing Linked Data principles
Building on previous work
But how will our Linked Data be found?
The Semantic Web is still a dream
 #1
 So discovery of
Linked Data requires
registering datasets?!
A tiny example...suppose a resource is defined as:
museum_X:object1
a nde:painting ;
dcterms:subject aat:windmill .
For ‘browsable Linked Data’ you should(!) add the inverse relation [1],[2]:
aat:windmill
a skos:Concept ;
skos:prefLabel “Windmolen“@nl ;
dcterms:isSubjectOf museum_X:object1 . # “backlinks”
=> a Linked Data integration problem

The Semantic Web is still a dream
 #2
[1]: Tim Berner’s Lee on ‘browsable linked data’ (2006)
[2]: Tom Heath and Christian Bizer on ‘Incoming Links’ (2011)
1. Only semantic integration
‱ just implement schema.org, let search engines ‘infer’ the relations
‱ is the data interesting enough for Google?
‱ what about special thematic or regional views? how about reuse?
‱ can we reuse the results of the integration? (NO!)
2. Physical integration:
‱ aggregate all the related Linked Data sources
‱ build large triplestore and infer the relations
‱ but like OAI-PMH, based on copying data
Special case: LOD Laundromat
Comparing Linked Data integration approaches

‘Traditional’ solutions to federated querying not feasible:
- publishing Linked Data in triplestores is hard for small data providers
- service is vulnerable because of rich functionality
- federated querying over SPARQL endpoints performs poorly
Follow the Linked Data Fragments approach :
- Linked Data available through Triple Pattern Fragments interface
- easier to implement for data providers
- federated querying is supported, even SPARQL
- more complexity at network level is acceptable
- even support for time-based versions (Memento)
3. Virtual integration by federated approach
See also: Miel Vander Sande et al. , (2017) Towards sustainable publishing and querying of distributed Linked Data archives -
Journal of Documentation
Use the backlinks to support the discovery process:
See also: Miel Vander Sande et al. (2016) Hypermedia-Based Discovery for Source Selection Using Low-Cost Linked Data Interfaces
(IJSWIS) 12(3) 79–110
More advanced:
data source profiling
or dataset summaries
Federated querying needs source selection

To make discovery of Linked Data work:
1. Register organizations and datasets
2. Build a knowledge graph with backlinks for resource discovery
Implementations will depend on capabilities of cultural heritage institutions
Building the distributed network
of heritage information
Strategy for our distributed network
1. build a knowledge graph for Dutch digital heritage entities
2. improve the usability of the data source:
- align object descriptions with shared entities
- publish data as Linked Data
3. build a discovery infrastructure:
- register organizations and datasets in a registry
- build knowledge graph to support discovery (backlinks)
4. implement virtual data integration technology :
- use registry and knowledge graph for selecting the resources
- support federated querying (or selective aggregation)
semantic
alignment
data
integration
https://github.com/netwerk-digitaal-erfgoed/high-level-design
High-level design of our discovery infrastructure
Roadmap
‱ Start with the existing (OAI-PMH) based infrastructure
‱ Build registry for organizations and datasets
‱ Build network of terms to provide shared entities for discovery
‱ Upgrade object descriptions with URIs
‱ Make aggregators Linked Data compliant
‱ Build knowledge graph with backlinks for discovery
‱ Support federated querying (or selective harvesting)
‱ Make collection management systems Linked Data compliant
‱ Transform aggregators to service portals for discovery
Thank you for your attention!
please share your thoughts with us...
email: enno.meijers at kb.nl
twitter, slideshare: ennomeijers
https://github.com/netwerk-digitaal-erfgoed

Mais conteĂșdo relacionado

Mais procurados

Mais procurados (20)

Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage Information
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
A discovery service for UK research data
A discovery service for UK research dataA discovery service for UK research data
A discovery service for UK research data
 
British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014British Library Linked Open Data Presentation for ALA June 2014
British Library Linked Open Data Presentation for ALA June 2014
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
 
Text mining and machine learning
Text mining and machine learningText mining and machine learning
Text mining and machine learning
 
Linked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
 
Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research data
 
RDM shared services at IDCC
RDM shared services at IDCCRDM shared services at IDCC
RDM shared services at IDCC
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
National data services lightening talk at the RDA
National data services lightening talk at the RDANational data services lightening talk at the RDA
National data services lightening talk at the RDA
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information Infrastructure
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
 

Semelhante a Session 1.4 a distributed network of heritage information

Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 

Semelhante a Session 1.4 a distributed network of heritage information (20)

A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
20191210 NDLI KEDL2019 Building the dutch digital heritage network
20191210 NDLI KEDL2019 Building the dutch digital heritage network20191210 NDLI KEDL2019 Building the dutch digital heritage network
20191210 NDLI KEDL2019 Building the dutch digital heritage network
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Open Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LODOpen Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LOD
 
Linked Data
Linked DataLinked Data
Linked Data
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 

Mais de semanticsconference

Mais de semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Session 1.4 a distributed network of heritage information

  • 1. A distributed network of digital heritage information Enno Meijers Semantics Conference – Amsterdam – 12 September 2017
  • 2. Contents ‱ Introduction to Dutch Digital Heritage Network ‱ Problems with the current infrastructure ‱ Strategies for improvement ‱ Building the distributed network
  • 3. Introduction to Dutch Digital Heritage Network
  • 4. Digital Heritage Network (NDE) aims at increasing the social value of the heritage information maintained by libraries, archives, museums and other cultural heritage institutions. Long term cooperation between the government and the institutions on national, regional and local level. It’s about information and people!
  • 5. Three-layered approach for improving the sustainability, the usability and the visibility of digital heritage information. sustainable usable visible
  • 6. In general - discovery of the “deep web” ‱ Institutional repositories, collection management systems ‱ Millions of ‘invisible’ datasets: publications, research data, heritage collections ‱ Poor coverage by regular search engines ‱ Metadata is key, describing physical materials or (licensed) digital content ‱ Dutch cultural heritage sector: 1500 institutions, >>1500 collections ‱ Demand for cross-institutional, cross-domain discovery ‱ Many specialized portals giving access to different views
  • 7.
  • 8. General setup of these portals
  • 9. And even networks of aggregators...
  • 11. Evaluating current approach Positive results so far: ‱ many data sources available through OAI-PMH protocol ‱ powerful and smart protocol for metadata synchronization ‱ opened up data silos ‱ created the need for aligning data models ‱ made cross-collection and cross-domain discovery possible (e.g. Europeana) But there are two problems areas: ‱ semantic alignment ‱ data integration
  • 12. Problem #1: Poor semantic alignment Not enough semantic alignment in the data sources: ‱ lack sustainable URIs and shared identifiers ‱ no shared terminology sources available ‱ no provisions for linking between sources ‱ implementations lack support for multiple data models ‱ data is ‘flattened’ to a common data model (EDM, Dublin Core) ‱ loss of meaning due to transformation  poor capabilities for cross-collection, cross-domain discovery  cleaning, aligning and enriching is needed after harvesting
  • 13. Problem #2: Inefficient data integration Physical data integration based on OAI–PMH (= copying) ‱ synchronizing with the sources is hard work ‱ ownership, licensing, provenance, control over access are difficult topics ‱ no feedback loop to the data source (usage, cleaning, enrichments) ‱ data source owner and end user are disconnected ‱ centralized model leads to scalability problems ‱ OAI-PMH is not a web-centric protocol See also: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017) Herbert Van de Sompel - Reminiscing About 15 Years of Interoperability Efforts - D-lib Magazine - December (2015)
  • 15. ‱ build portals as views based on a common data layer ‱ minimize the intermediate layers ‱ refer to the source instead of copying ‱ support decentralized discovery ‱ maximize the usability of data at the source ‱ develop a sustainable, ‘webcentric’ solution ‱ use HTTP, RDF and RESTful APIs as building blocks => implement the Linked Data principals Inspired by the work of Ruben Verborgh, Herbert Van de Sompel and colleagues: See for example: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017) Design principles for a discovery infrastructure
  • 16. At the data source level: ‱ use sustainable URIs to identify the resources ‱ use formal definitions for persons, places, concepts, events (API) ‱ use domain vocabularies / data models to describe the data ‱ add support for cross-domain discovery (EDM, Schema.org,...) At the network level: ‱ create a ‘network of terms’ for shared entities ‱ provide tools for aligning and linking ‱ create alignments and links between different terminology sources ‱ provide easy access for collection management systems (API) Implementing Linked Data principles
  • 18.
  • 19.
  • 20. But how will our Linked Data be found?
  • 21. The Semantic Web is still a dream
 #1  So discovery of Linked Data requires registering datasets?!
  • 22. A tiny example...suppose a resource is defined as: museum_X:object1 a nde:painting ; dcterms:subject aat:windmill . For ‘browsable Linked Data’ you should(!) add the inverse relation [1],[2]: aat:windmill a skos:Concept ; skos:prefLabel “Windmolen“@nl ; dcterms:isSubjectOf museum_X:object1 . # “backlinks” => a Linked Data integration problem
 The Semantic Web is still a dream
 #2 [1]: Tim Berner’s Lee on ‘browsable linked data’ (2006) [2]: Tom Heath and Christian Bizer on ‘Incoming Links’ (2011)
  • 23. 1. Only semantic integration ‱ just implement schema.org, let search engines ‘infer’ the relations ‱ is the data interesting enough for Google? ‱ what about special thematic or regional views? how about reuse? ‱ can we reuse the results of the integration? (NO!) 2. Physical integration: ‱ aggregate all the related Linked Data sources ‱ build large triplestore and infer the relations ‱ but like OAI-PMH, based on copying data Special case: LOD Laundromat Comparing Linked Data integration approaches

  • 24. ‘Traditional’ solutions to federated querying not feasible: - publishing Linked Data in triplestores is hard for small data providers - service is vulnerable because of rich functionality - federated querying over SPARQL endpoints performs poorly Follow the Linked Data Fragments approach : - Linked Data available through Triple Pattern Fragments interface - easier to implement for data providers - federated querying is supported, even SPARQL - more complexity at network level is acceptable - even support for time-based versions (Memento) 3. Virtual integration by federated approach See also: Miel Vander Sande et al. , (2017) Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation
  • 25. Use the backlinks to support the discovery process: See also: Miel Vander Sande et al. (2016) Hypermedia-Based Discovery for Source Selection Using Low-Cost Linked Data Interfaces (IJSWIS) 12(3) 79–110 More advanced: data source profiling or dataset summaries Federated querying needs source selection

  • 26. To make discovery of Linked Data work: 1. Register organizations and datasets 2. Build a knowledge graph with backlinks for resource discovery Implementations will depend on capabilities of cultural heritage institutions
  • 27. Building the distributed network of heritage information
  • 28. Strategy for our distributed network 1. build a knowledge graph for Dutch digital heritage entities 2. improve the usability of the data source: - align object descriptions with shared entities - publish data as Linked Data 3. build a discovery infrastructure: - register organizations and datasets in a registry - build knowledge graph to support discovery (backlinks) 4. implement virtual data integration technology : - use registry and knowledge graph for selecting the resources - support federated querying (or selective aggregation) semantic alignment data integration
  • 30. Roadmap ‱ Start with the existing (OAI-PMH) based infrastructure ‱ Build registry for organizations and datasets ‱ Build network of terms to provide shared entities for discovery ‱ Upgrade object descriptions with URIs ‱ Make aggregators Linked Data compliant ‱ Build knowledge graph with backlinks for discovery ‱ Support federated querying (or selective harvesting) ‱ Make collection management systems Linked Data compliant ‱ Transform aggregators to service portals for discovery
  • 31. Thank you for your attention! please share your thoughts with us... email: enno.meijers at kb.nl twitter, slideshare: ennomeijers https://github.com/netwerk-digitaal-erfgoed