Trailblazer Community - Flows Workshop (Session 2)
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
1. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
1
This project has received funding from
the European Union’s Horizon 2020
research and innovation programme
under grant agreement No 732064
This project is part
of BDV PPP
EXPOSING EO LINKED (META-)DATA FROM OPENSEARCH CATALOGUE
Raul Palma1, Yves Coene2
1Poznan Supercomputing and Networking Center, Poland
2Spacebel, Belgium
113th OGC Technical Committee meeting
Toulouse, 19th November 2019
2. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
2
Linked Data publication
• LD is increasingly becoming a popular method for publishing data on the Web
• Improves data accessibility by both humans and machines, e.g., for finding, reuse and integration
• Enables to discover more useful data through the links (and inferencing), and to exploit data with semantic
queries
• Growing number of datasets in the LOD cloud
• 1,239 datasets with 16,147 links (as of March 2019)
• Coverage of the LOD cloud
• Large cross-domain datasets (dbpedia, freebase, etc.)
• Variable domain coverage (e.g., Geography,
Government, BioInformatics)
• What about EO (meta-)data ?
http://lod-cloud.net/
3. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
3
Earth Observation (EO) (meta-) data
• EO concerns the gathering of information about planet Earth’s physical, chemical and
biological systems using remote sensing technologies such as satellites and aerial sensors
along with ground-based observations.
• Huge amounts of EO data available that keeps continuously growing
• Several geospatial datasets already available as linked data
• Parts of CORINE, Urban Atlas, Open Land Use (TELEIOS, FOODIE)
• geonames, LinkedGeoData, GADM, NUTS,…
• Some ontologies available to publish EO data
• Data Cube vocabulary
• DLR ontology
• OGC 17-003
• …
• Large catalogues of EO products metadata are available (e.g., NASA, ESA, DLR,..)
• Some initiatives to provide Linked Data for EO products metadata (e.g., CREODIAS,
data.eo.esa.int)
• However, endpoints cannot be federated, e.g., to integrate data with external datasets, and in
cases difficult to maintain up-to-date (e.g., linked data has to be updated, and stored in
triplestore)
www.rezatec.com
Source: Soille et al., 2018
4. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
4
Linked data principles & general tasks
• Simple set of principles & technologies
• URI, HTTP, RDF, SPARQL
• Involves a set of (common) general tasks
Datasets
identification
Model specification
RDF data generation
Linking
Exploiting
Hyland et al.
Hausenblas et al.
Villazón-Terrazas et al.
Best Practices for Publishing Linked Data
5-star deployment scheme
for Linked Open Data
5. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
5
Linked data guidelines & patterns
T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space,
http://linkeddatabook.com/editions/1.0/
B. Hyland, G. Atemezing, B. Villazón-Terrazas
Best Practices for Publishing Linked Data.
W3C Working Group Note
https://www.w3.org/TR/ld-bp/
6. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
6
From guidelines to practice
7. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
7
Implementing Linked Data publication pipelines
• Goal: to define and deploy (semi-) automatic processes to carry out the necessary steps to transform and
publish different input datasets as Linked Data.
• A pipeline connect different data processing components to carry out the transformation of data into RDF
and their linking, and includes the mapping specifications to process the input datasets.
• Each pipeline is configured to support specific input dataset types (same format, model and delivery form).
• Principles
• Pipelines can be directly re-executed and re-applied
(e.g., extended/updated datasets)
• Pipelines must be easily reusable
• Pipelines must be easily adapted for new input datasets
• Pipeline execution should be as automatic as possible.
The final target is to fully automated processes.
• Pipelines should support both: (mostly) static data and data streams
(e.g., sensor data)
• The resulting datasets available as Linked Data, will provide an integrated view over the initial
(disconnected and heterogeneous) datasets, in compliance with any privacy and access control needs
8. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
8
Use case: generating Linked Data from EO products metadata
• Existing approaches (example1)
Metadata: Linked Data
EO
Products
EO
Products
EO
Products
Source: Carlo Matteo Scalzo.
CTO, Epistematica
• RDF for EO products
• Define a EO Vocabulary
• Develop a OGC-to-EO Vocabulary parser to
create RDF Triples
• Extend the EO Vocabulary to link EO
products to EO ontologies
• Add RDF triples for each EO product to link
the product with the ontologies
• Store the RDF triples in a RDF Triple Store
• Web Access
• Create a web portal including semantic
search & map visualization (CKAN)
• Ontology Navigation
• graphical user interface that allows users to
explore a EO ontology
• select a term to obtain a list of all the EO
products that are relevant
Steps
http://data.eo.esa.int
Site is down
9. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
9
Use case: generating Linked Data from EO products metadata
• Existing approaches (example2)
• E1: New data will need to be parsed to create
updated RDF triples
• E1: New RDF data needs to be linked
• E1, E2: RDF data needs to be physically stored
in a triplestore (most probably in addition to
source data)
• E1: Although CKAN provides the possibility to
expose SPARQL endpoint it is not provided
• E1: No SPARQL interface (e.g., to create
federated query) and endpoint is not
externally accessible
• E2: provides interface to execute (federated)
SPARQL queries, but endpoint is not externally
accessible*
• E1, E2: Linked Data cannot be exploited from
other applications
• E1: E2: EO vocabularies used to represent
metadata are internal and closed
• E1, E2: limit set of missions/platforms
supported
Examples limitations
https://browser.creodias.eu/
10. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
10
Our approach: serving Linked Data with hybrid services
• Many practical linked data use cases have to address hybrid information needs1:
• Variety of data sources
• Variety of data modalities
• Variety of data processing techniques
• Although SPARQL queries enable to express data requests over RDF knowledge graphs,
the support for hybrid information needs is limited
• Query engines focus on retrieving RDF data and support a set of built-in services
• Approach: implement wrappers around the APIs that:
• Assign HTTP URIs to the resources about which the API provides data
• Upon URI dereference, rewrite the client’s request into a request against the API
• Transform API results to RDF and sent back to the client.
1Nikolov, Andriy et al. “Ephedra: SPARQL Federation over RDF Data and Services.” International Semantic Web Conference (2017).
11. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
11
Use case: FedEO: Federated Earth Observation Gateway System
(ongoing work)
FedEO = Federated Earth Observation missions access
The FedEO system provides a unique entry point to a growing number of scientific catalogues and services.
12. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
12
FedEO Interfaces
• Endpoints:
• http://geo.spacebel.be/opensearch/readme.html (development server)
• http://fedeo.esa.int/opensearch/readme.html (operational server)
• External interfaces:
• See http://ceos.org/ourwork/workinggroups/wgiss/access/fedeo/
• EO Extension of OpenSearch (OGC 13-026r8, OGC 17-047)
• Multiple encodings: GeoJSON, JSON-LD, Atom
• Multiple metadata formats:
• OGC 10-157r4: EOP O&M
• OGC 17-003: GeoJSON(-LD) encoding Product Metadata
• OGC 17-084: GeoJSON(-LD) encoding Collection Metadata
• ISO 19139(-2)
• DIF-10 etc.
13. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
13
FedEO: Connected Data Assets
14. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
14
FedEO: Vocabulary Server - GUI
• ESA Thesauri:
• Operational (October ‘19)
http://thesauri.eo.esa.int/thesaurus/en/
• Reference
•http://fedeo.spacebel.be/thesaurus/en/?clang=en
• SKOSMOS tool makes available APIs here documented
http://skosmos.org/
Examples
http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/envisat
http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/envisat&format=application/json
http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/meris&format=application/json
15. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
15
General steps
• Define/select semantic models to represent data of resources from API
• Implement wrapper around API to transform on the fly SPARQL request to
API call and generate RDF data from GeoJSON result
• Expose generated RDF data via SPARQL endpoint
• Query REST API with SPARQL
• Process (e.g., format) any required output on the fly
• Link the generated RDF data with other datasets and thesauri (on the fly or with
previously generated/discovered RDF links)
• Visualize and exploit Linked Data
16. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
16
Ontologies for EO products metadata
• General rule: reuse standard and/or widely used ontologies/vocabularies whenever
possible, and extend as needed
• In the case of FedEO,
the metadata returned is
already using semantic
vocabularies in the (Geo)JSON-LD
representation, thus it requires
just to expose results as
Linked Data
17. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
17
Ephedra: API Wrapper
• Ephedra is a SPARQL federation engine aimed
at processing hybrid queries, which provides
a flexible declarative mechanism for
including hybrid services into a SPARQL
federation.
• Ephedra is a component of Metaphactory
(https://www.metaphacts.com/), an end-to-
end Knowledge Graph Platform for
knowledge graph management, rapid
application development, and end-user
oriented interaction.
18. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
18
Creating SPARQL wrapper with Ephedra
• Describe the REST Service
Signature (mapping)
• Define inputs & outputs terms,
e.g., dc:identifier
• Define input & output data types,
e.g., xsd:string
• Define output (JSON) path, e.g.,
$.@graph[0].rdfs:member[*].dct:id
entifier
19. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
19
Creating SPARQL wrapper with Ephedra
• Configure the AgroDataCube REST
Service Repository
• Include this repository into the Ephedra
federation
20. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
20
Expose generated RDF data via SPARQL endpoint
• SPARQL endpoint provided
• http://metaphactory.foodie-cloud.org/sparql?repository=ephedra
• Use SPARQL SERVICE keyword
• SERVICE
<http://www.metaphacts.com/ontologies/platform/repository/fed
eration#spacebelOS>
21. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
21
Query REST API with SPARQL
• Process (e.g., format) any required output on the fly
• Link the generated RDF data with other datasets and thesauri on the fly
22. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
22
Visualize and exploit the linked data
• Demo app: http://metaphactory.foodie-cloud.org/resource/:ESA-datasets
23. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
23
Visualize and exploit the linked data
• Demo app:
http://metaphactory.foodie-cloud.org/resource/:ESA-datasets
24. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
24
Visualize and exploit the linked data
25. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
25
Visualize and exploit the linked data
26. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
26
Visualize and exploit the linked data
27. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
27
Thank you for your attention!
Special thanks to Metaphacts team
Questions: rpalma@man.poznan.pl