O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Exposing EO Linked (meta-)Data from OpenSearch Catalogue

Mais Conteúdo rRelacionado

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Exposing EO Linked (meta-)Data from OpenSearch Catalogue

  1. 1. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 1 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064 This project is part of BDV PPP EXPOSING EO LINKED (META-)DATA FROM OPENSEARCH CATALOGUE Raul Palma1, Yves Coene2 1Poznan Supercomputing and Networking Center, Poland 2Spacebel, Belgium 113th OGC Technical Committee meeting Toulouse, 19th November 2019
  2. 2. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 2 Linked Data publication • LD is increasingly becoming a popular method for publishing data on the Web • Improves data accessibility by both humans and machines, e.g., for finding, reuse and integration • Enables to discover more useful data through the links (and inferencing), and to exploit data with semantic queries • Growing number of datasets in the LOD cloud • 1,239 datasets with 16,147 links (as of March 2019) • Coverage of the LOD cloud • Large cross-domain datasets (dbpedia, freebase, etc.) • Variable domain coverage (e.g., Geography, Government, BioInformatics) • What about EO (meta-)data ? http://lod-cloud.net/
  3. 3. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 3 Earth Observation (EO) (meta-) data • EO concerns the gathering of information about planet Earth’s physical, chemical and biological systems using remote sensing technologies such as satellites and aerial sensors along with ground-based observations. • Huge amounts of EO data available that keeps continuously growing • Several geospatial datasets already available as linked data • Parts of CORINE, Urban Atlas, Open Land Use (TELEIOS, FOODIE) • geonames, LinkedGeoData, GADM, NUTS,… • Some ontologies available to publish EO data • Data Cube vocabulary • DLR ontology • OGC 17-003 • … • Large catalogues of EO products metadata are available (e.g., NASA, ESA, DLR,..) • Some initiatives to provide Linked Data for EO products metadata (e.g., CREODIAS, data.eo.esa.int) • However, endpoints cannot be federated, e.g., to integrate data with external datasets, and in cases difficult to maintain up-to-date (e.g., linked data has to be updated, and stored in triplestore) www.rezatec.com Source: Soille et al., 2018
  4. 4. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 4 Linked data principles & general tasks • Simple set of principles & technologies • URI, HTTP, RDF, SPARQL • Involves a set of (common) general tasks Datasets identification Model specification RDF data generation Linking Exploiting Hyland et al. Hausenblas et al. Villazón-Terrazas et al. Best Practices for Publishing Linked Data 5-star deployment scheme for Linked Open Data
  5. 5. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 5 Linked data guidelines & patterns T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space, http://linkeddatabook.com/editions/1.0/ B. Hyland, G. Atemezing, B. Villazón-Terrazas Best Practices for Publishing Linked Data. W3C Working Group Note https://www.w3.org/TR/ld-bp/
  6. 6. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 6 From guidelines to practice
  7. 7. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 7 Implementing Linked Data publication pipelines • Goal: to define and deploy (semi-) automatic processes to carry out the necessary steps to transform and publish different input datasets as Linked Data. • A pipeline connect different data processing components to carry out the transformation of data into RDF and their linking, and includes the mapping specifications to process the input datasets. • Each pipeline is configured to support specific input dataset types (same format, model and delivery form). • Principles • Pipelines can be directly re-executed and re-applied (e.g., extended/updated datasets) • Pipelines must be easily reusable • Pipelines must be easily adapted for new input datasets • Pipeline execution should be as automatic as possible. The final target is to fully automated processes. • Pipelines should support both: (mostly) static data and data streams (e.g., sensor data) • The resulting datasets available as Linked Data, will provide an integrated view over the initial (disconnected and heterogeneous) datasets, in compliance with any privacy and access control needs
  8. 8. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 8 Use case: generating Linked Data from EO products metadata • Existing approaches (example1) Metadata: Linked Data EO Products EO Products EO Products Source: Carlo Matteo Scalzo. CTO, Epistematica • RDF for EO products • Define a EO Vocabulary • Develop a OGC-to-EO Vocabulary parser to create RDF Triples • Extend the EO Vocabulary to link EO products to EO ontologies • Add RDF triples for each EO product to link the product with the ontologies • Store the RDF triples in a RDF Triple Store • Web Access • Create a web portal including semantic search & map visualization (CKAN) • Ontology Navigation • graphical user interface that allows users to explore a EO ontology • select a term to obtain a list of all the EO products that are relevant Steps http://data.eo.esa.int Site is down
  9. 9. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 9 Use case: generating Linked Data from EO products metadata • Existing approaches (example2) • E1: New data will need to be parsed to create updated RDF triples • E1: New RDF data needs to be linked • E1, E2: RDF data needs to be physically stored in a triplestore (most probably in addition to source data) • E1: Although CKAN provides the possibility to expose SPARQL endpoint it is not provided • E1: No SPARQL interface (e.g., to create federated query) and endpoint is not externally accessible • E2: provides interface to execute (federated) SPARQL queries, but endpoint is not externally accessible* • E1, E2: Linked Data cannot be exploited from other applications • E1: E2: EO vocabularies used to represent metadata are internal and closed • E1, E2: limit set of missions/platforms supported Examples limitations https://browser.creodias.eu/
  10. 10. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 10 Our approach: serving Linked Data with hybrid services • Many practical linked data use cases have to address hybrid information needs1: • Variety of data sources • Variety of data modalities • Variety of data processing techniques • Although SPARQL queries enable to express data requests over RDF knowledge graphs, the support for hybrid information needs is limited • Query engines focus on retrieving RDF data and support a set of built-in services • Approach: implement wrappers around the APIs that: • Assign HTTP URIs to the resources about which the API provides data • Upon URI dereference, rewrite the client’s request into a request against the API • Transform API results to RDF and sent back to the client. 1Nikolov, Andriy et al. “Ephedra: SPARQL Federation over RDF Data and Services.” International Semantic Web Conference (2017).
  11. 11. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 11 Use case: FedEO: Federated Earth Observation Gateway System (ongoing work) FedEO = Federated Earth Observation missions access The FedEO system provides a unique entry point to a growing number of scientific catalogues and services.
  12. 12. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 12 FedEO Interfaces • Endpoints: • http://geo.spacebel.be/opensearch/readme.html (development server) • http://fedeo.esa.int/opensearch/readme.html (operational server) • External interfaces: • See http://ceos.org/ourwork/workinggroups/wgiss/access/fedeo/ • EO Extension of OpenSearch (OGC 13-026r8, OGC 17-047) • Multiple encodings: GeoJSON, JSON-LD, Atom • Multiple metadata formats: • OGC 10-157r4: EOP O&M • OGC 17-003: GeoJSON(-LD) encoding Product Metadata • OGC 17-084: GeoJSON(-LD) encoding Collection Metadata • ISO 19139(-2) • DIF-10 etc.
  13. 13. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 13 FedEO: Connected Data Assets
  14. 14. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 14 FedEO: Vocabulary Server - GUI • ESA Thesauri: • Operational (October ‘19) http://thesauri.eo.esa.int/thesaurus/en/ • Reference •http://fedeo.spacebel.be/thesaurus/en/?clang=en • SKOSMOS tool makes available APIs here documented http://skosmos.org/ Examples http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/envisat http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/envisat&format=application/json http://fedeo.spacebel.be/rest/v1/thesaurus/data?uri=https://earth.esa.int/concept/meris&format=application/json
  15. 15. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 15 General steps • Define/select semantic models to represent data of resources from API • Implement wrapper around API to transform on the fly SPARQL request to API call and generate RDF data from GeoJSON result • Expose generated RDF data via SPARQL endpoint • Query REST API with SPARQL • Process (e.g., format) any required output on the fly • Link the generated RDF data with other datasets and thesauri (on the fly or with previously generated/discovered RDF links) • Visualize and exploit Linked Data
  16. 16. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 16 Ontologies for EO products metadata • General rule: reuse standard and/or widely used ontologies/vocabularies whenever possible, and extend as needed • In the case of FedEO, the metadata returned is already using semantic vocabularies in the (Geo)JSON-LD representation, thus it requires just to expose results as Linked Data
  17. 17. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 17 Ephedra: API Wrapper • Ephedra is a SPARQL federation engine aimed at processing hybrid queries, which provides a flexible declarative mechanism for including hybrid services into a SPARQL federation. • Ephedra is a component of Metaphactory (https://www.metaphacts.com/), an end-to- end Knowledge Graph Platform for knowledge graph management, rapid application development, and end-user oriented interaction.
  18. 18. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 18 Creating SPARQL wrapper with Ephedra • Describe the REST Service Signature (mapping) • Define inputs & outputs terms, e.g., dc:identifier • Define input & output data types, e.g., xsd:string • Define output (JSON) path, e.g., $.@graph[0].rdfs:member[*].dct:id entifier
  19. 19. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 19 Creating SPARQL wrapper with Ephedra • Configure the AgroDataCube REST Service Repository • Include this repository into the Ephedra federation
  20. 20. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 20 Expose generated RDF data via SPARQL endpoint • SPARQL endpoint provided • http://metaphactory.foodie-cloud.org/sparql?repository=ephedra • Use SPARQL SERVICE keyword • SERVICE <http://www.metaphacts.com/ontologies/platform/repository/fed eration#spacebelOS>
  21. 21. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 21 Query REST API with SPARQL • Process (e.g., format) any required output on the fly • Link the generated RDF data with other datasets and thesauri on the fly
  22. 22. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 22 Visualize and exploit the linked data • Demo app: http://metaphactory.foodie-cloud.org/resource/:ESA-datasets
  23. 23. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 23 Visualize and exploit the linked data • Demo app: http://metaphactory.foodie-cloud.org/resource/:ESA-datasets
  24. 24. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 24 Visualize and exploit the linked data
  25. 25. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 25 Visualize and exploit the linked data
  26. 26. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 26 Visualize and exploit the linked data
  27. 27. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 27 Thank you for your attention! Special thanks to Metaphacts team Questions: rpalma@man.poznan.pl

×