O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dimension with Semantic Web Technologies

242 visualizações

Publicada em

Paper presentation: Christophe Debruyne, Kris McGlinn, Lorraine McNerney and Declan O'Sullivan: A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dimension with Semantic Web Technologies. Presented at the Fourth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data GeoRich 2017 Co-located with SIGMOD/PODS 2017 in Chicago, IL, USA

Publicada em: Ciências
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dimension with Semantic Web Technologies

  1. 1. A Lightweight Approach to Explore, Enrich and Use Data with a Geospa9al Dimension using Seman9c Web Technologies Christophe Debruyne (TCD), Kris McGlinn (TCD), Lorraine McNerney (OSi), and Declan O’Sullivan (TCD) 2017-05-14 @ GeoRICH 2017 The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
  2. 2. www.adaptcentre.ieIntroduc9on •  Linked Data is a set of best pracXces and guidelines to publish and interlink data on the Web by cleverly combining several standardized technologies. •  GeospaXal informaXon is an important part of the LD Web. In fact, most datasets have a geospaXal dimension and loca%on provides a convenient way to explore, analyze and align datasets. •  Its importance is evidenced by the vast amount of geographic data on the Web and bespoke tools that are available that support the consumpXon of such data (e.g., GeoSPARQL-enabled triplestores). •  How can we limit relying on such bespoke tools, and leverage engagement with geospaXal Linked Data to analyze, explore and enrich non-RDF data?
  3. 3. www.adaptcentre.ieContext – Open Data Engagement Fund •  Organized by the Department of Public Expenditure and Reform in conjuncXon with the Open Data Governance Board •  The goal of this iniXaXve is to improve the availability and usage of data on the hap://data.gov.ie/ portal •  We observed that not many datasets are available as RDF and thus also not available as Linked Data. They are available as CSV, TSV, etc. But, many datasets have a geospaXal dimension, which ideally should be linked to authoritaXve geospaXal Linked Data (as provided by the Ordnance Survey Ireland).
  4. 4. www.adaptcentre.ieContext – data.geohive.ie data.geohive.ie is an ongoing collaboraXon between ADAPT and the Ordnance Survey Ireland to publish OSi’s authoritaXve geospaXal informaXon as Linked Data. StarXng from publicly available boundary data, supporXng two use cases: provision of different geometries for features, and provenance and evoluXon of features and their geometries
  5. 5. www.adaptcentre.ieApproach We propose a method and a set of tools for: •  Uplif: transforming non-RDF into RDF •  Interlinking •  Downlif: transforming back to non-RDF, and •  Engagement CSV File Uplift Data in RDF Links in RDF Merge / Combine Enriched descriptions in RDF Engage Enriched CSV File Downlift ! Create Links Link Discovery Tools SPARQL CONSTRUCT queries ...
  6. 6. www.adaptcentre.ieBackground Triple PaJern Fragments (Verborgh et al. 2016) •  SPARQL endpoints require a lot of resources on one’s server. And one ofen provides data dumps and resolvable URIs as a “good enough” pracXce to avoid this problem (Verborgh et al. 2016) •  Distribute load between a TPF client and server •  Less load on the server at the cost of increased bandwidth •  Easy to setup locally and to “simulate” federated queries GeoSPARQL •  An OGC standard for represenXng and querying geospaXal data on the SemanXc Web. Defines both a vocabulary for represenXng geospaXal informaXon and an extension to the SPARQL query language. R. Verborgh, M. Vander Sande, O. HarXg, J. Van Herwegen, L. De Vocht, B. De Meester, G. Haesendonck, and P. Colpaert. 2016. Triple Paaern Fragments: A low-cost knowledge graph interface for the Web. J. Web Sem. 37-38 (2016), 184–206
  7. 7. www.adaptcentre.ieR2RML R2RML: RDB to RDF Mapping Language •  A W3C RecommendaXon since fall 2012 •  CreaXng an R2RML file that annotates a relaXonal database with exisXng vocabularies and/or ontologies (RDFS or OWL). •  That R2RML file goes through an R2RML Mapping Engine to produce RDF. •  R2RML specified •  An ontology to specify those mappings; •  How those mappings should be interpreted to produce RDF. •  R2RML files are thus stored as RDF.
  8. 8. www.adaptcentre.ie @prefix rr: <hap://www.w3.org/ns/r2rml#> . @prefix foaf: <hap://xmlns.com/foaf/0.1/> . @prefix dbpedia: <hap://dbpedia.org/ontology/> . <#CityTriplesMap> a rr:TriplesMap ; rr:logicalTable [ rr:tableName "City" ] ; rr:subjectMap [ rr:template "hap://foo.example/City/{ID}" ; rr:class dbpedia:Place ; ] ; rr:predicateObjectMap [ rr:predicate foaf:name ; rr:objectMap [ rr:column "Name" ] ; ] ; . What is being mapped? A logical table/view or an SQL query. How to generate and state something about the subject of those triples. How to generate predicates and objects. City ID Name 1 Dublin 2 Ghent Background – R2RML Example
  9. 9. www.adaptcentre.ieApproach – Running Example •  Fingal County Council Weather StaXons •  Example from the hap://data.gov.ie/ portal Name Weather Reading Agency LAT LONG M50 Blanchardstown hap://… NaXonal Roads Authority 53.3704660 3 -6.38085144 7 M50 Dublin Airport hap://… NaXonal Roads Authority 53.4096411 1 -6.22759742 8 Dublin Airport hap://… Met Éireann 53.4215060 8 -6.29784754 Records in fccweatherstaXonsp20110829-2221.csv
  10. 10. www.adaptcentre.ieApproach – Uplift @prefix odef: <http://adaptcentre.ie/ont/odef#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix csv: <file:///...#> . <#TriplesMap> rr:logicalTable [ rr:sqlQuery """SELECT rownum() AS ROW_NUM, * FROM fccweatherstationsp201108292221;""" ] ; rr:subjectMap [ rr:template "http://www.example.org/record/{ROW_NUM}" ; rr:class odef:Record ; ] ; rr:predicateObjectMap [ rr:predicate csv:LONG ; odef:label "LONG" ; odef:order "5"^^xsd:int ; rr:objectMap [ rr:column "LONG" ] ; ] ; # Other predicate-object maps omitted for brevity rr:predicateObjectMap [ rr:predicate csv:ROW_NUM ; rr:objectMap [ rr:column "ROW_NUM" ] ; ] . Adopt the W3C RDB to RDF Mapping Language (R2RML) to generate a mapping from CSV to RDF Insert metadata containing informaXon about the non- RDF resource’s structure in the mapping
  11. 11. www.adaptcentre.ieApproach – UpliX Generate RDF with R2RML-F – an implementaXon of R2RML supporXng funcXons in JavaScript (haps://opengogs.adaptcentre.ie/debruync/r2rml) Christophe Debruyne, Declan O'Sullivan: R2RML-F: Towards Sharing and ExecuXng Domain Logic in R2RML Mappings. LDOW@WWW 2016 <http://www.example.org/record/2> a <http://adaptcentre.ie/ont/odef#Record> ; <file:///...#AGENCY> "National Roads Authority" ; <file:///...#LAT> "53.4096411069945" ; <file:///...#LONG> "-6.22759742761812" ; <file:///...#NAME> "M50 Dublin Airport" ; <file:///...#ROW_NUM> "2"^^<http://www.w3.org/2001/XMLSchema#int> ; <file:///...#WEATHER_READING> "http://www.nratraffic.ie/..." ; geo:asWKT "POINT(-6.22759742761812 53.4096411069945)"^^geo:wktLiteral .
  12. 12. www.adaptcentre.ieApproach – Interlinking Either with SPARQL CONSTRUCT queries or use the RDF using link discovery tools such as Silk (Jentzsch et al. 2010) or LIMES (Ngonga et al. 2011) SPARQL CONSTRUCT queries can be formulated using TPF. For support for GeoSPARQL, we use an extension of the TPF client (Debruyne et al. 2017) In our running example, we can adopt GeoSPARQL to find out where the weather staXons are based by either: •  ConcatenaXng the LAT and LONG to create a WKT point •  Extending the skeleton mapping and generate WKT points Anja Jentzsch, Robert Isele, ChrisXan Bizer: Silk - GeneraXng RDF Links while Publishing or Consuming Linked Data. ISWC Posters&Demos 2010 Axel-Cyrille Ngonga Ngomo, Sören Auer: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. IJCAI 2011: 2312-2317 Christophe Debruyne, Eamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL FuncXons with Triple Paaern Fragments. LDOW@WWW 2017
  13. 13. www.adaptcentre.ieApproach – Interlinking rr:predicateObjectMap [ rr:predicate geo:asWKT ; rr:objectMap [ rr:template "POINT({LONG} {LAT})" ; rr:termType rr:Literal ; rr:datatype geo:wktLiteral ; ] ; ] ; A B C D ExecuXng a CONSTRUCT query with A.  OSi's TPF server B.  the local TPF server C.  the query (using our extension), and D.  the resulXng RDF graph. Runs in a browser, and one does not need to install and populate a GeoSPARQL- enabled triplestore. Straigh~orward federated querying. Extending the R2RML Mapping
  14. 14. www.adaptcentre.ieApproach Two possible approaches to extending TPF with GeoSPARQL: A) Extending a TPF Client •  TPF server specificaXon intact (backwards compaXble) •  Possibly more network overhead B) Extending the TPF server •  Outside server specificaXon, but proven to be viable for substring filtering (Van Herwegen et al. 2015) Addi9onal requirement: a pure JavaScript implementa9on •  Allows one to run the client in a browser and hence facilitate stakeholders in formulaXng GeoSPARQL queries J. Van Herwegen, L. De Vocht, R. Verborgh, E. Mannens, and R. Van de Walle. 2015. Substring Filtering for Low-Cost Linked Data Interfaces. In The SemanXc Web - ISWC 2015 - 14th InternaXonal SemanXc Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part I (LNCS), Vol. 9366. Springer, 128–143.
  15. 15. www.adaptcentre.ieApproach – DownliX 1.  Merge generated RDF and links 2.  Provide downlif algorithm 1.  The merged RDF 2.  The R2RML mapping – to reconsXtute the original file 3.  A set of predicates (URIs) – for the addiXonal columns. Empty values will be generated when no links exist Name Weather Reading … hJp://www.opengis.net/ont/geosparql#within M50 Blanchardstown hap://… … hap://data.geohive.ie/resource/county/2AE19629144F13A3E055000000000001 M50 Dublin Airport hap://… … hap://data.geohive.ie/resource/county/2AE19629144F13A3E055000000000001 Dublin Airport hap://… … hap://data.geohive.ie/resource/county/2AE19629144F13A3E055000000000001
  16. 16. www.adaptcentre.ieImplementa9on On the tools extension for a TPF Client •  Uplif: haps://opengogs.adaptcentre.ie/debruync/generate-mapping •  Downlif: haps://opengogs.adaptcentre.ie/debruync/downlif •  Our R2RML engine: haps://opengogs.adaptcentre.ie/debruync/r2rml •  All available with accessible licenses (MIT) On the GeoSPARQL extension for a TPF Client •  We extended V2.0.4 of the TPF Node.js Client (Available at haps://github.com/chrdebru/Client.js) •  Made available a web-client using this extension (At hap://theme-e.adaptcentre.ie/geo-tpf/)
  17. 17. www.adaptcentre.ieDiscussion On the meaning of the generated RDF and enriched CSV •  Goal was to enrich non-RDF with a geospaXal component, that also allowed one also to engage with that data •  The mappings that are generated are not necessarily meaningful; the generated RDF is just as meaningful as the original file and users are not obliged to adopt established vocabularies, but they can… •  AdopXon of established vocabularies would allow for more meaningful links, which are then stored back into the CSV, and the mapping would provide a basis for creaXng meaningful RDF, and even Linked Data. On the datasets •  We used a simple dataset and only demonstrated SPARQL CONSTRUCT with GeoSPARQL. More exercises are conducted in the context of the Open Data Engagement Fund; to be reported.
  18. 18. www.adaptcentre.ieDiscussion (not in paper – new) Apprecia9on during public seminars •  About 40 registered aaendees from academia, government agencies, and industry held on the 4th and 10th of May. •  First seminar on Linked Data (principles and case studies) •  Second seminar on Linked Data publishing uplif •  A recurring concern about Linked Data publishing and interlinking is relying on third parXes to conduct the exercise, and how one can acquire the experXse. •  Simulated direct mapping, and its generator, was appreciated as it allowed to incrementally generate "meaningful" linked data. Downlif was appreciated as it facilitated relying on their exisXng processes.
  19. 19. www.adaptcentre.ieWith respect to related work… The GeoKnow EU FP7 Project (hap://www.geoknow.eu/) •  Relies on a parXcular sofware stack •  No explicit noXon of downlif •  Provides support uplif of unstructured text and shapefiles Our approach was aimed to be lightweight, but we are aware that approach will likely not scale as well as bespoke systems built for this parXcular purpose. Our goal was to leverage the enrichment of non-RDF open data with a geospaXal dimension with semanXc technologies.
  20. 20. www.adaptcentre.ieSummary, lessons learned, and future work Summary •  We presented a lightweight approach to uplif, interlinking and enriching, and downlif, as well as engage with that informaXon. •  Developed in the context of a iniXaXve to improve the usage and availability of open data on the Irish open data portal •  Prototypes have been made available with an accessible license Lessons learned •  We deem our approach viable. Even though we are aware our approach will not outperform certain bespoke systems in terms of GeoSPARQL processing and analysis, our goal was to leverage aforemenXoned processes. Future work •  User studies, and inclusion of governance aspects •  AddiXonal exercises with other datasets to be reported in a whitepaper