Web & Social Media Analytics Previous Year Question Paper.pdf
Building Scalable Semantic Geospatial RDF Stores
1.
2. Outline
™ Motivation
™ The Model stRDF
™ The Query Language stSPARQL
™ The Scalable Geospatial RDF Store Strabon
™ Experimental Evaluation
™ Other Extensions to RDF
™ Conclusions
3. Motivation
™ Need for modeling of
™ Geospatial and temporal information
™ Product metadata
™ Product content
™ Need to link to other data sources
™ GIS data
™ Other information on the Web
4. TELEIOS Project
™ TELEIOS EU project
™ Development of a Virtual Earth Observatory
™ Accessible to expert and non-expert users
™ Scalable (Storing/Querying PBs)
™ Use Cases
™ DLR: A Virtual Observatory for TerraSAR-X data
™ NOA: Real-time fire monitoring based on continuous
acquisitions of EO images and geospatial data
™ Technologies
™ NKUA: Geospatial extensions for RDF and SPARQL
™ CWI: Array extensions to SQL
5. The Data Model stRDF
™ stRDF extends RDF with:
™ Spatial literals encoded by Boolean combinations of
linear constraints
™ New Datatype for spatial literals (strdf:geometry)
™ Valid time of triples encoded by Boolean combinations
of temporal constraints
™ stRDF (most recent version, TELEIOS)
™ Spatial literals encoded in Well-Known Text/GML
(OGC standards)
™ Valid time of triples ignored for the time being
7. stRDF: An example
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:geometry "POLYGON(( 38.16 23.7, 38.18
23.7, 38.18 23.8, 38.16 23.8, 38.16 23.7));<http://
spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .
Spatial Data Type
Well-Known Text
Spatial Literal
8. stRDF: An example (GML)
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:geometry "<gml:Polygon
srsName='http://www.opengis.net/def/crs/EPSG/0/4121'>
<gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>38.16
,23.70 38.18,23.70 38.18,23.80 38.16,23.80
38.16,23.70</gml:coordinates></gml:LinearRing></
gml:outerBoundaryIs></gml:Polygon>"^^strdf:GML .
Spatial Data Type
GML
Spatial Literal
9. stSPARQL: An example
™ Find all burnt forests within 10km of a city
SELECT ?BA ?BAGEO
WHERE {
?R rdf:type noa:Region .
?R strdf:geometry ?RGEO .
?R noa:hasCorineLandCoverUse ?F .
?F rdfs:subClassOf clc:Forests .
?CITY rdf:type dbpedia:City .
?CITY strdf:geometry ?CGEO .
?BA rdf:type noa:BurntArea .
?BA strdf:geometry ?BAGEO .
FILTER(strdf:Intersect(?RGEO,?BAGEO) &&
strdf:Distance(?BAGEO,?CGEO) < 0.02) }
Spatial
Functions
10. stSPARQL: Geospatial SPARQL 1.1
™ We define a SPARQL extension function for each function
defined in the OpenGIS Simple Features Access standard
™ Basic functions
™ Get a property of a geometry (e.g., strdf:srid)
™ Get the desired representation of a geometry (e.g., strdf:AsText)
™ Test whether a certain condition holds (e.g., strdf:IsEmpty,
strdf:IsSimple)
™ Functions for testing topological spatial relationships
(e.g., strdf:equals, strdf:intersects)
™ Spatial analysis functions
™ Construct new geometric objects from existing geometric objects
(e.g., strdf:buffer, strdf:intersection, strdf:convexHull)
™ Spatial metric functions (e.g., strdf:distance, strdf:area)
™ We define spatial aggregate functions (e.g., strdf:union,
strdf:extent)
11. stSPARQL: Geospatial SPARQL 1.1
™ Spatial terms
™ Constants (e.g., "POLYGON((38.16 23.7, ...)) " ^^strdf:WKT)
™ Variables (e.g., ?GEO)
™ Results of set operations (e.g., strdf:intersection, strdf:union,)
™ Results of geometric operations (e.g., strdf:boundary, strdf:buffer)
™ Select clause
™ Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))
™ Spatial aggregate functions (e.g., strdf:extent(?geo))
™ Metric functions (e.g., strdf:area(?geo))
™ Filter clause
™ Functions for testing topological spatial relationships between spatial terms (e.g., strdf:contains(?
G1, strdf:union(?G2, ?G3)))
™ Numeric expressions involving spatial metric functions (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2)+1)
™ Boolean combinations
™ Having clause
™ Boolean expressions involving spatial aggregate functions and spatial metric functions or functions
testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo) ) > 1 )
™ Updates
™ Similar to the OGC standard GeoSPARQL
12. Strabon: A Scalable Geospatial
RDF Store
™ Storing: stRDF
™ Querying: stSPARQL, GeoSPARQL
™ Storage Backend: PostGIS, MonetDB
™ Extends: Sesame 2.6.3
™ Open Source: http://strabon.di.uoa.gr/
™ EU Projects: Semsorgrid4Env and TELEIOS
14. Query Evaluation
Sesame’s Native Store (Baseline)
™ Index (Geometries): None
(geometries treated as ordinary literals)
™ Query Evaluation
1. Triple pattern evaluation
2. Evaluation of spatial functions in Sesame using JTS library
Strabon + PostGIS
™ Index (Geometries): R-Tree
™ Query Evaluation
PostGIS optimizer
Evaluation of spatial functions in PostGIS
Triple pattern evaluation might come before/after evaluation of spatial
functions
15. Experimental Evaluation
™ Workload based on geospatial linked datasets
q 150 million triples
q Test queries based on logs from Dbpedia,
LinkedGeoData
1. Detailed evaluation is coming up
[ISWC paper submission under preparation]
17. LinkedGeoData
™ Q1: Find the names of the points of interest located in the
city of Rotterdam.
SELECT ?poiName
WHERE {
?poi rdf:type lgd:Amenity .
?poi rdfs:label ?poiName .
?poi geo:geometry ?poiGeo .
FILTER (strdf:inside(?poiGeo, "POLYGON((
3.9111328125 51.7401123046875, 3.9111328125 52.05322265625,
4.6087646484375 52.05322265625, 4.6087646484375 51.7401123046875,
3.9111328125 51.7401123046875))"^^strdf:WKT)) }
18. LinkedGeoData/GADM
™ Q2: Find the points of interest located in Luxembourg.
SELECT ?label
WHERE {
?poi rdf:type lgd:Amenity .
?poi rdfs:label ?label .
?poi geo:geometry ?poiGeo .
gadm:LUXEMBOURG gadm:hasGeometry ?luxGeo .
FILTER (strdf:inside(?luxGeo, ?poiGeo)) }
19. Response Time for Queries Q1, Q2
0
50
100
150
200
250
cold caches warm caches cold caches warm caches
Q1 Q2
ResponseTime(seconds)
Workload based on Geospatial Linked Datasets
Strabon
Sesame
20. Experimental Evaluation
™ Workload based on a synthetic dataset
q Dataset based on OpenStreetMaps
q 10 million triples (2 GB) up to 1 billion triples (110 GB)
q Triples with spatial literals: 1.4 and 140 million triples
™ Response time of queries with various thematic and
spatial selectivities
26. The Framework RDFi
™ Extension of RDF with incomplete information
™ New kind of literals (e-literals) for each datatype
™ Property values that exist but are unknown or partially known
™ Partial knowledge: captured by constraints
(appropriate constraint language L)
™ RDF graphs extended to RDFi databases: pair (G, φ)
G: RDF graph with e-literals
φ: quantifier-free formula of L
™ Formal semantics for RDFi and SPARQL query evaluation
™ Representation System: CONSTRUCT with AUF graph patterns
™ Certain Answer: semantics, algorithms, computational complexity when L is
a language of spatial topological constraints
™ Implementation in the context of Strabon has started with L being PCL
(topological constraints between variables and polygon constants)
[conference submission]
27. Challenges
™ Query processing in face of linked geospatial
data
™ Query optimization for spatial data
™ Federation
™ Efficient query processing with incomplete
information
28. Conclusions
™ stRDF/stSPARQL
™ Model and query language for the representation and
querying of geospatial data
™ Strabon
™ Scalable Geospatial RDF Store
™ RDFi
™ Framework for representation and querying of RDF
data with incomplete information