SlideShare uma empresa Scribd logo
1 de 38
Strabon
A Semantic Geospatial DBMS
 Kostis Kyzirakos, Manos Karpathiotakis, and
             Manolis Koubarakis



 Dept. of Informatics and Telecommunications,
 National and Kapodistrian University of Athens, Greece
Outline
 Introduction
 The data model stRDF
 The query language stSPARQL
  and a comparison to GeoSPARQL
 The system Strabon for
  stSPARQL and GeoSPARQL
 Experimental Evaluation
 Related Work & Conclusions
Main idea
 How do we represent and query geospatial
 information in the Semantic Web?

    Develop appropriate vocabularies and
     ontologies

    Extend RDF to take into account the
     geospatial dimension

    Extend SPARQL to query the new kinds of
     data

    Use Open Geospatial Consortium (OGC) and
     other geospatial industry standards
Example




          4
National Observatory of Athens:
Fire Products Ontology




                                  5
   Introduction
   The data model


                         The data model
    stRDF
   The query language
    stSPARQL and a


                         stRDF
    comparison to
    GeoSPARQL
   The system Strabon
    for stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
The Data Model stRDF

 stRDF extends RDF with:
    Spatial literals encoded by Boolean combinations of
     linear constraints
    New datatype for spatial literals (strdf:geometry)
    Valid time of triples encoded by Boolean
     combinations of temporal constraints


 stRDF (most recent version)
    Spatial literals encoded in Well-Known Text/GML
     (OGC standards)
    Valid time of triples ignored for the time being
Burnt Area Products
@prefix strdf:
<http://strdf.di.uoa.gr/
ontology#>.
@prefix noa:
<http://teleios.di.uoa.gr/
ontologies/noaOntology.owl#>.

noa:ba_15 rdf:type noa:BurntArea ;
          noa:hasGeometry noa:ba_15g.
                                 Spatial
                                  literal
noa:ba_15g noa:hasSerialization
  "MULTIPOLYGON(((393801.42 4198827.92,
             ..., 393008 424131)));
<http://www.opengis.net/def/crs/EPSG/0/2100>"
                        Spatial  ^^strdf:WKT.
                       data type
The stRDF Data Model
strdf:geometry rdf:type rdfs:Datatype;
              rdfs:subClassOf rdfs:Literal.




strdf:WKT   rdf:type rdfs:Datatype;
            rdfs:subClassOf strdf:geometry.

strdf:GML   rdf:type    rdfs:Datatype;
            rdfs:subClassOf   strdf:geometry.
   Introduction
   The data model
    stRDF
                     The query
   The query
    language         language
                     stSPARQL
    stSPARQL and a
    comparison to
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and

                     and a comparison
    GeoSPARQL
   Experimental

                     to GeoSPARQL
    Evaluation
   Related Work &
    Conclusions
stSPARQL: Geospatial SPARQL 1.1
We define a SPARQL extension function for each function
defined in the OpenGIS Simple Features Access standard

 Basic functions
   Get a property of a geometry (e.g., strdf:srid)
   Get the desired representation of a geometry (e.g., strdf:AsText)
   Test whether a certain condition holds (e.g., strdf:IsEmpty,
    strdf:IsSimple)

 Functions for testing topological spatial relationships
 (e.g., strdf:equals, strdf:intersects)
   OGC Simple Features Access, Egenhofer, RCC-8


 Spatial analysis functions
    Construct new geometric objects from existing geometric objects (e.g.,
     strdf:buffer, strdf:intersection, strdf:convexHull)
    Spatial metric functions (e.g., strdf:distance, strdf:area)


 Spatial aggregate functions (e.g., strdf:union, strdf:extent)
stSPARQL: Geospatial SPARQL 1.1
Select clause
 Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))
 Spatial aggregate functions (e.g., strdf:union(?geo))
 Metric functions (e.g., strdf:area(?geo))

Filter clause
 Functions for testing topological relationships between spatial terms (e.g.,
  strdf:contains(?G1, strdf:union(?G2, ?G3)))
 Numeric expressions involving spatial metric functions
  (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2))
 Boolean combinations

Having clause
 Boolean expressions involving spatial aggregate functions and spatial metric
  functions or functions testing for topological relationships between spatial
  terms (e.g., strdf:area(strdf:union(?geo))>1)
Updates
stSPARQL: An example (1/2)
Find coniferous forests that have been affected by
fires



SELECT ?forest ?burntArea
WHERE {
   ?burntArea rdf:type noa:BurntArea;
               noa:hasGeometry [
                noa:hasSerialization ?baGeo].
   ?forest   rdf:type noa:Region;
             clc:hasLandCover noa:coniferousForest;
   Spatial   clc:hasGeometry [
  Function   clc:hasSerialization ?fGeom].
    FILTER(strdf:intersects(?baGeom,?fGeom))
}
stSPARQL: An example (2/2)
Isolate the parts of the burnt areas that lie in
coniferous forests.                 Spatial
 SELECT ?burntArea                Aggregate
(strdf:intersection(?baGeom,
                     strdf:union(?fGeom))
 AS ?burntForest)
WHERE {
   ?burntArea rdf:type noa:BurntArea;
               noa:hasGeometry [
                noa:hasSerialization ?baGeo].
   ?forest   rdf:type noa:Region;
             clc:hasLandCover noa:coniferousForest;
   Spatial   clc:hasGeometry [
  Function   clc:hasSerialization ?fGeom].
   FILTER(strdf:intersects(?baGeom,?fGeom))
}
GROUP BY ?burntArea ?baGeom
The OGC Standard GeoSPARQL
                    Core



       Topology                  Geometry      Parameters
      Vocabulary                 Extension
      Extension
- relation family
                           - serialization
                           - version           • Serialization
                                                  • WKT
                                                  • GML
                       Geometry Topology
                           Extension
                       - serialization
                                               • Relation Family
                       - version
                       - relation family
                                                  • Simple
                                                     Features
                                                  • RCC-8
  Query Rewrite            RDFS Entailment
    Extension                 Extension           • Egenhofer
- serialization            - serialization
- version                  - version
- relation family          - relation family
   Introduction
   The data model
    stRDF            The system
   The query
    language
    stSPARQL and a
                     Strabon for
    comparison to
    GeoSPARQL        stSPARQL and
                     GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation


                     strabon.di.uoa.gr
   Related Work &
    Conclusions
Strabon Architecture
                             Strabon

 WKT       GML   Sesame v2.6.3
                   Query Engine         Storage Manager
                       Parser              Repository

                     Optimizer                 SAIL
       stRDF
       graphs         Evaluator                RDBMS

                     Transaction
                      Manager




  stSPARQL/
  GeoSPARQL
    queries
                                   GeneralDB




                     PostGIS
Storage Scheme                               uri_values
type_2
Triples                                      ID   VALUE
                                            Dictionary
SUBJEC OBJECT
SUBJECT PREDICATE OBJECT
             PREDICATE                      1
                                     OBJECT VALUE
                                           ID    noa:ba_15
T
noa:ba_15 3 rdf:type
noa:ba_15
1            rdf:type                      12 noa:ba_15
                                                 rdf:type
                                     noa:BurntArea .
1
noa:ba_15 2 noa:hasGeometry
                         3
noa:ba_15 6 noa:hasGeometry
5                                    noa:ba_15g. rdf:type
                                      noa:ba_15g noa:BurntArea
                                             23
1
noa:ba_15g4 rdf:type     5
noa:ba_15g rdf:type
hasgeom_4
                                     noa:Geometry .
                                             34 noa:BurntArea
                                      noa:Geometrynoa:hasGeometry
5
noa:ba_15g2 noa:hasSerialization
                         6           "MULTIPOLYGON(...)"^^
noa:ba_15g OBJECT
SUBJECT      noa:hasserialization            45 noa:hasGeometry
                                                  noa:ba_15g
                                      "MULTIPOLYGON(...)"^^
5         7              8            strdf:WKT .
                                      strdf:WKT noa:ba_15g
                                             56   noa:Geometry
1          5
                                             67 noa:Geometry
                                                  noa:hasSerialization
hasserial_7
                                            7label_values
                                                 noa:hasSerialization
SUBJECT OBJECT
                                            8 ID "MULTIPOLYGON(...)"^^
                                                    VALUE
5             8         geo_values               strdf:WKT
                                              8     "MULTIPOLYGON(...)"^^
                        ID   VALUE                 strdf:WKT
                        8                    datatype_values
                                             ID    VALUE
                                             8     strdf:WKT
Query Processing
                          Strabon   • Parser generates
 stSPARQL/                            abstract syntax tree
                Query Engine
GeoSPARQL
    query           Parser
                                    • Abstract syntax tree
                  Optimizer           mapped to internal
                   Evaluator          algebra of Sesame
                  Transaction
                   Manager          • Standard optimizations
                                      performed
                                    • Evaluator produces
                                      corresponding SQL
                        GeneralDB     query
                                    • DBMS evaluates the
                                      SQL query
             PostGIS
                                    • Post-processing
Query Processing (cont’d)
• Deviate from the evaluation strategy of Sesame
  for SPARQL extension functions
• Push the evaluation of extension functions to
  underlying DBMS
   • Spatial predicates evaluated by PostGIS
   • Spatial joins now affect query plan
   • Avoid Cartesian products


• Results may be returned in well-known industry
 formats
 • KML/KMZ
 • GeoJSON
 • GML
   Introduction
   The data model


                     Experimental
    stRDF
   The query
    language


                     Evaluation
    stSPARQL and a
    comparison to
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
Experimental Evaluation
• Goal: Evaluate the performance of
 Strabon vs other systems

• Real workload based on geospatial
 linked datasets
 150 million triples


• Synthetic workload
  Half a billion triples
Strabon vs other systems

• Strabon over PostgreSQL
  (Strabon-PG)
• Strabon over System X
  (Strabon-X)
• Implementation over RDF-3X
  (Brodt et. al)
• Parliament (BBN Technologies)
• Naive Implementation over Sesame
Real world workload: Data
               DBpedia     GeoNames        LGD       Pachube   SwissEx      CLC        GADM


Size           7.1 GB       2.1 GB       6.6 GB      828 KB    33 MB       14 GB       146 MB

Triples       58,722,893   17,688,602   46,296,978    6,333    277,919   19,711,926      255

Spatial        386,205     1,262,356    5,414,032     101       687      2,190,214       51
Terms

Distinct       375,087     1,099,964    5,035,981      70       623      2,190,214       51
Spatial
Terms

Points         375,087     1,099,964    3,205,015      70       623          -             -

Linestrings       -            -         353,714        -         -          -     1       -
                                                                                       79,831
                                                                                       vertices
Polygons          -            -        1,704,650       -         -      2,190,214       51
Real world workload: Queries
               Commonly Used     Spatial Selection    Spatial Join
    Query 1           X
    Query 2           X
    Query 3                               X
    Query 4                               X
    Query 5                                                X
    Query 6           X                                    X
    Query 7           X                                    X
    Query 8                                                X

    Queries with Spatial Joins   Points       Lines    Polygons

    Query 5                        X                       X
                                   X                       X
    Query 6                        X           X           X
    Query 7                                                X
    Query 8                        X           X           X
Real world workload: Results
Cache    System       Q1     Q2     Q3      Q4      Q5       Q6      Q7     Q8
State
Cold     Naive        0.08   1.65   >8h     28.88   89       170     844    1.699
(sec.)
         Strabon      2.01   6.79   41.39   10.11   78.69    60.25   9.23   702.5
         PG                                                                 5
         Strabon      1.74   3.05   1623    46.52   12.57    2409    >8h    57.83
         X
         Parliament   2.12   6.46   >8h     229.72 1130      872     3627   3786

Warm     Naïve        0.01   0.03   >8h     0.79    43.07    88      708    1712
(sec.)
         Strabon      0.01   0.81   0.96    1.66    38.74    1.22    2.92   648.1
         PG
         Strabon      0.01   0.26   1604.9 35.59    0.18     3196    >8h    44.72
         X
         Parliament   0.01   0.04   >8h     10.91   358.92   483.29 2771    3502
Synthetic Workload
• Workload based on a synthetic dataset
  Dataset based on OpenStreetMaps
  10 million triples (2 GB) up to half a
   billion triples (50GB)
    Implemented custom bulk loader
 Triples with spatial literals: 1 up to 46
  million triples


• Response time of queries with various
 thematic and spatial selectivities
Synthetic Workload: Sample
Data
Synthetic Workload:
    Sample stSPARQL Query
SELECT *
WHERE {
    ?tag geordf:key "1" .
    ?node geordf:hasTag ?tag .
    ?node geo:hasGeography1 ?geo .
FILTER (strdf:inside(?geo,
        "POLYGON((-1 -1, 0.056568542 -1,
       0.056568542 0.056568542,
       -1 0.056568542, -1 -1))"^^strdf:WKT
      ))
}
Real-world Workload:
                 100 million triples – warm caches
Response time (sec)




                                                        Response time (sec)




                      number of Nodes in query region                         number of Nodes in query region

                             Tag 1                                                      Tag 1024
Real-world Workload:
100 million triples – cold caches




   number of Nodes in query region   number of Nodes in query region

         Tag 1                              Tag 1024
Strabon-PG: 500 million triples
Tags comparison
     Response time (sec)
Real-world Workload:
Query Plans
Findings
• Strabon over PostgreSQL outperforms
 other systems in case of warm caches

• Results in case of cold caches mixed


• PostreSQL optimizer needs to take into
 account spatial selectivity
 • PostGIS 2.0 moves towards it
System           Language      Index       Geometries    CRS support     Geospatial
                                                                          Function
                                                                          Support
Strabon        stSPARQL/    R-tree-over-   WKT / GML         Yes       • OGC-SFA
               GeoSPARQL*   GiST           support                     • Egenhofer
                                                                       • RCC-8
Parliament     GeoSPARQL*   R-Tree         WKT / GML         Yes       •OGC-SFA
                                           support                     •Egenhofer
                                                                       •RCC-8
Oracle 12c     GeoSPARQL    R-Tree,        WKT               Yes       •OGC-SFA
                            Quadtree

Brodt et al.   SPARQL       R-Tree         WKT support       No        OGC-SFA
(RDF-3X)
Perry          SPARQL-ST    R-Tree         GeoRSS GML        Yes       RCC-8


AllegroGraph   Extended     Distribution   2D point         Partial    •Buffer
               SPARQL       sweeping       geometries                  •Bounding Box
                            technique                                  •Distance

OWLIM          Extended     Custom         2D point          No        •Point-in-polygon
                                                                       •Buffer
               SPARQL                      geometries
                                                                       •Distance

Virtuoso       SPARQL       R-Tree         2D point          Yes       SQL/MM (subset)
                                           geometries

uSeekM         SPARQL       R-tree-over    WKT support       No        OGC-SFA
                            GiST
   Introduction




    The data model
    stRDF
    The query
                     Related Work &
    language
    stSPARQL and a
    comparison to
                     Conclusions
    GeoSPARQL
   The system
    Strabon for
    stSPARQL and
    GeoSPARQL
   Experimental
    Evaluation
   Related Work &
    Conclusions
Future Work
 Use even larger datasets
   Test with 1 billion triples successful

 Implement the temporal part of stSPARQL

 Develop a benchmark for geospatial RDF
 stores
  Consider more systems
    GeoSPARQL implementation of Oracle
    Virtuoso

 stSPARQL query processing in MonetDB

 Go distributed!
   Federated queries
Thanks! Any Questions?
 Strabon
   Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis,
      Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta,
      Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros.
     Web site: http://strabon.di.uoa.gr
     Mercurial repository: http://hg.strabon.di.uoa.gr
     Trac: http://bug.strabon.di.uoa.gr
     Mailing list: http://cgi.di.uoa.gr/~mailman/listinfo/strabon-users


 Real Time Fire Monitoring Service,
 National Obervatory of Athens
   http://papos.space.noa.gr/fend_static


 Greek Linked Open Data
   http://www.linkedopendata.gr
  
 TELEIOS EU Project
   http://www.earthobservatory.eu


 SemsorGrid4Env EU Project
   http://www.semsorgrid4env.eu

Mais conteúdo relacionado

Semelhante a Strabon: A Semantic Geospatial Database System

RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
Ankit Rathi
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
ACSG Section Montréal
 

Semelhante a Strabon: A Semantic Geospatial Database System (20)

Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
SWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPINSWRL2SPIN: Converting SWRL to SPIN
SWRL2SPIN: Converting SWRL to SPIN
 
Final_show
Final_showFinal_show
Final_show
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQL
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
Revisiting the Representation of and Need for Raw Geometries on the Linked Da...
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 
SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
Facete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with faceteFacete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with facete
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...Toward Next Generation of Gazetteer:  Utilizing GeoSPARQL For Developing Link...
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 

Mais de Kostis Kyzirakos

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Kostis Kyzirakos
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Kostis Kyzirakos
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Kostis Kyzirakos
 

Mais de Kostis Kyzirakos (7)

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Linked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEOLinked Earth Observation Data:The Projects TELEIOS and LEO
Linked Earth Observation Data:The Projects TELEIOS and LEO
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Strabon: A Semantic Geospatial Database System

  • 1. Strabon A Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
  • 2. Outline  Introduction  The data model stRDF  The query language stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 3. Main idea How do we represent and query geospatial information in the Semantic Web?  Develop appropriate vocabularies and ontologies  Extend RDF to take into account the geospatial dimension  Extend SPARQL to query the new kinds of data  Use Open Geospatial Consortium (OGC) and other geospatial industry standards
  • 5. National Observatory of Athens: Fire Products Ontology 5
  • 6. Introduction  The data model The data model stRDF  The query language stSPARQL and a stRDF comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 7. The Data Model stRDF  stRDF extends RDF with:  Spatial literals encoded by Boolean combinations of linear constraints  New datatype for spatial literals (strdf:geometry)  Valid time of triples encoded by Boolean combinations of temporal constraints  stRDF (most recent version)  Spatial literals encoded in Well-Known Text/GML (OGC standards)  Valid time of triples ignored for the time being
  • 8. Burnt Area Products @prefix strdf: <http://strdf.di.uoa.gr/ ontology#>. @prefix noa: <http://teleios.di.uoa.gr/ ontologies/noaOntology.owl#>. noa:ba_15 rdf:type noa:BurntArea ; noa:hasGeometry noa:ba_15g. Spatial literal noa:ba_15g noa:hasSerialization "MULTIPOLYGON(((393801.42 4198827.92, ..., 393008 424131))); <http://www.opengis.net/def/crs/EPSG/0/2100>" Spatial ^^strdf:WKT. data type
  • 9. The stRDF Data Model strdf:geometry rdf:type rdfs:Datatype; rdfs:subClassOf rdfs:Literal. strdf:WKT rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry. strdf:GML rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry.
  • 10. Introduction  The data model stRDF The query  The query language language stSPARQL stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and and a comparison GeoSPARQL  Experimental to GeoSPARQL Evaluation  Related Work & Conclusions
  • 11. stSPARQL: Geospatial SPARQL 1.1 We define a SPARQL extension function for each function defined in the OpenGIS Simple Features Access standard  Basic functions  Get a property of a geometry (e.g., strdf:srid)  Get the desired representation of a geometry (e.g., strdf:AsText)  Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)  Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)  OGC Simple Features Access, Egenhofer, RCC-8  Spatial analysis functions  Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)  Spatial metric functions (e.g., strdf:distance, strdf:area)  Spatial aggregate functions (e.g., strdf:union, strdf:extent)
  • 12. stSPARQL: Geospatial SPARQL 1.1 Select clause  Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))  Spatial aggregate functions (e.g., strdf:union(?geo))  Metric functions (e.g., strdf:area(?geo)) Filter clause  Functions for testing topological relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))  Numeric expressions involving spatial metric functions (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2))  Boolean combinations Having clause  Boolean expressions involving spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1) Updates
  • 13. stSPARQL: An example (1/2) Find coniferous forests that have been affected by fires SELECT ?forest ?burntArea WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom)) }
  • 14. stSPARQL: An example (2/2) Isolate the parts of the burnt areas that lie in coniferous forests. Spatial SELECT ?burntArea Aggregate (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom
  • 15. The OGC Standard GeoSPARQL Core Topology Geometry Parameters Vocabulary Extension Extension - relation family - serialization - version • Serialization • WKT • GML Geometry Topology Extension - serialization • Relation Family - version - relation family • Simple Features • RCC-8 Query Rewrite RDFS Entailment Extension Extension • Egenhofer - serialization - serialization - version - version - relation family - relation family
  • 16. Introduction  The data model stRDF The system  The query language stSPARQL and a Strabon for comparison to GeoSPARQL stSPARQL and GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation strabon.di.uoa.gr  Related Work & Conclusions
  • 17. Strabon Architecture Strabon WKT GML Sesame v2.6.3 Query Engine Storage Manager Parser Repository Optimizer SAIL stRDF graphs Evaluator RDBMS Transaction Manager stSPARQL/ GeoSPARQL queries GeneralDB PostGIS
  • 18. Storage Scheme uri_values type_2 Triples ID VALUE Dictionary SUBJEC OBJECT SUBJECT PREDICATE OBJECT PREDICATE 1 OBJECT VALUE ID noa:ba_15 T noa:ba_15 3 rdf:type noa:ba_15 1 rdf:type 12 noa:ba_15 rdf:type noa:BurntArea . 1 noa:ba_15 2 noa:hasGeometry 3 noa:ba_15 6 noa:hasGeometry 5 noa:ba_15g. rdf:type noa:ba_15g noa:BurntArea 23 1 noa:ba_15g4 rdf:type 5 noa:ba_15g rdf:type hasgeom_4 noa:Geometry . 34 noa:BurntArea noa:Geometrynoa:hasGeometry 5 noa:ba_15g2 noa:hasSerialization 6 "MULTIPOLYGON(...)"^^ noa:ba_15g OBJECT SUBJECT noa:hasserialization 45 noa:hasGeometry noa:ba_15g "MULTIPOLYGON(...)"^^ 5 7 8 strdf:WKT . strdf:WKT noa:ba_15g 56 noa:Geometry 1 5 67 noa:Geometry noa:hasSerialization hasserial_7 7label_values noa:hasSerialization SUBJECT OBJECT 8 ID "MULTIPOLYGON(...)"^^ VALUE 5 8 geo_values strdf:WKT 8 "MULTIPOLYGON(...)"^^ ID VALUE strdf:WKT 8 datatype_values ID VALUE 8 strdf:WKT
  • 19. Query Processing Strabon • Parser generates stSPARQL/ abstract syntax tree Query Engine GeoSPARQL query Parser • Abstract syntax tree Optimizer mapped to internal Evaluator algebra of Sesame Transaction Manager • Standard optimizations performed • Evaluator produces corresponding SQL GeneralDB query • DBMS evaluates the SQL query PostGIS • Post-processing
  • 20. Query Processing (cont’d) • Deviate from the evaluation strategy of Sesame for SPARQL extension functions • Push the evaluation of extension functions to underlying DBMS • Spatial predicates evaluated by PostGIS • Spatial joins now affect query plan • Avoid Cartesian products • Results may be returned in well-known industry formats • KML/KMZ • GeoJSON • GML
  • 21. Introduction  The data model Experimental stRDF  The query language Evaluation stSPARQL and a comparison to GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 22. Experimental Evaluation • Goal: Evaluate the performance of Strabon vs other systems • Real workload based on geospatial linked datasets 150 million triples • Synthetic workload Half a billion triples
  • 23. Strabon vs other systems • Strabon over PostgreSQL (Strabon-PG) • Strabon over System X (Strabon-X) • Implementation over RDF-3X (Brodt et. al) • Parliament (BBN Technologies) • Naive Implementation over Sesame
  • 24. Real world workload: Data DBpedia GeoNames LGD Pachube SwissEx CLC GADM Size 7.1 GB 2.1 GB 6.6 GB 828 KB 33 MB 14 GB 146 MB Triples 58,722,893 17,688,602 46,296,978 6,333 277,919 19,711,926 255 Spatial 386,205 1,262,356 5,414,032 101 687 2,190,214 51 Terms Distinct 375,087 1,099,964 5,035,981 70 623 2,190,214 51 Spatial Terms Points 375,087 1,099,964 3,205,015 70 623 - - Linestrings - - 353,714 - - - 1 - 79,831 vertices Polygons - - 1,704,650 - - 2,190,214 51
  • 25. Real world workload: Queries Commonly Used Spatial Selection Spatial Join Query 1 X Query 2 X Query 3 X Query 4 X Query 5 X Query 6 X X Query 7 X X Query 8 X Queries with Spatial Joins Points Lines Polygons Query 5 X X X X Query 6 X X X Query 7 X Query 8 X X X
  • 26. Real world workload: Results Cache System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 State Cold Naive 0.08 1.65 >8h 28.88 89 170 844 1.699 (sec.) Strabon 2.01 6.79 41.39 10.11 78.69 60.25 9.23 702.5 PG 5 Strabon 1.74 3.05 1623 46.52 12.57 2409 >8h 57.83 X Parliament 2.12 6.46 >8h 229.72 1130 872 3627 3786 Warm Naïve 0.01 0.03 >8h 0.79 43.07 88 708 1712 (sec.) Strabon 0.01 0.81 0.96 1.66 38.74 1.22 2.92 648.1 PG Strabon 0.01 0.26 1604.9 35.59 0.18 3196 >8h 44.72 X Parliament 0.01 0.04 >8h 10.91 358.92 483.29 2771 3502
  • 27. Synthetic Workload • Workload based on a synthetic dataset Dataset based on OpenStreetMaps 10 million triples (2 GB) up to half a billion triples (50GB)  Implemented custom bulk loader Triples with spatial literals: 1 up to 46 million triples • Response time of queries with various thematic and spatial selectivities
  • 29. Synthetic Workload: Sample stSPARQL Query SELECT * WHERE { ?tag geordf:key "1" . ?node geordf:hasTag ?tag . ?node geo:hasGeography1 ?geo . FILTER (strdf:inside(?geo, "POLYGON((-1 -1, 0.056568542 -1, 0.056568542 0.056568542, -1 0.056568542, -1 -1))"^^strdf:WKT )) }
  • 30. Real-world Workload: 100 million triples – warm caches Response time (sec) Response time (sec) number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
  • 31. Real-world Workload: 100 million triples – cold caches number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
  • 32. Strabon-PG: 500 million triples Tags comparison Response time (sec)
  • 34. Findings • Strabon over PostgreSQL outperforms other systems in case of warm caches • Results in case of cold caches mixed • PostreSQL optimizer needs to take into account spatial selectivity • PostGIS 2.0 moves towards it
  • 35. System Language Index Geometries CRS support Geospatial Function Support Strabon stSPARQL/ R-tree-over- WKT / GML Yes • OGC-SFA GeoSPARQL* GiST support • Egenhofer • RCC-8 Parliament GeoSPARQL* R-Tree WKT / GML Yes •OGC-SFA support •Egenhofer •RCC-8 Oracle 12c GeoSPARQL R-Tree, WKT Yes •OGC-SFA Quadtree Brodt et al. SPARQL R-Tree WKT support No OGC-SFA (RDF-3X) Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC-8 AllegroGraph Extended Distribution 2D point Partial •Buffer SPARQL sweeping geometries •Bounding Box technique •Distance OWLIM Extended Custom 2D point No •Point-in-polygon •Buffer SPARQL geometries •Distance Virtuoso SPARQL R-Tree 2D point Yes SQL/MM (subset) geometries uSeekM SPARQL R-tree-over WKT support No OGC-SFA GiST
  • 36. Introduction   The data model stRDF The query Related Work & language stSPARQL and a comparison to Conclusions GeoSPARQL  The system Strabon for stSPARQL and GeoSPARQL  Experimental Evaluation  Related Work & Conclusions
  • 37. Future Work  Use even larger datasets  Test with 1 billion triples successful  Implement the temporal part of stSPARQL  Develop a benchmark for geospatial RDF stores  Consider more systems  GeoSPARQL implementation of Oracle  Virtuoso  stSPARQL query processing in MonetDB  Go distributed!  Federated queries
  • 38. Thanks! Any Questions?  Strabon  Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis, Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta, Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros.  Web site: http://strabon.di.uoa.gr  Mercurial repository: http://hg.strabon.di.uoa.gr  Trac: http://bug.strabon.di.uoa.gr  Mailing list: http://cgi.di.uoa.gr/~mailman/listinfo/strabon-users  Real Time Fire Monitoring Service, National Obervatory of Athens  http://papos.space.noa.gr/fend_static  Greek Linked Open Data  http://www.linkedopendata.gr   TELEIOS EU Project  http://www.earthobservatory.eu  SemsorGrid4Env EU Project  http://www.semsorgrid4env.eu

Notas do Editor

  1. Hello, I am KostisKyzirakos, and I will be presenting you the work done with my colleagues Manos Karpathiotakis and ManolisKoubarakis, on the development of a geospatial RDF store nick-named Strabon.
  2. Our extension of RDF for the representation of geospatial and temporal information is stRDF.stRDFextens RDF with a new kind of literals, called spatial literals, which are expresssionsof Boolean combinations of linear constraints.stRDF defines also a new datatype for spatial literals [which is a supertype of the respective datatypes for WKT and GML (standardized formats of the Open Geospatial Consortium for the encoding of geometries)]stRDF employs also a forth component to a triple to represent the valid time of triples.(???WHAT EXPRESSIONS CAN BE WRITTEN???)Constraints: theoretical representationIn the most recent version of stRDF, and to be compliant with geospatial industry standards,spatial literals are encoded in the WKT/GML formats, which are widely adopted standards of the Open Geospatial Consortium.The reason for moving to OGC standards was being realistic. No one had any spatial data expressedwith linear constraints! But all the nice work of the theoretical model remained valid. We only had to change the encoding used for spatial literals.(We are currently working on this feature, so this presentation does not consider time anymore.) We will describe our model through an example.
  3. Filter expressions are...stSPARQL supports also the construction of new spatial terms and spatial aggregate functions in SELECT clauses.(spatial aggregate functions are implemented in Strabon and we do not use the implementation of PostGIS to overcome issues with transformatios between cordinate reference systems, something that PostGIS does not take into account)OpenGIS Simple Feature Access specification defines a standard SQL schema that supports storage, retrieval, query and update of collections of spatial features using SQL
  4. Strabon extends the well-known RDF store Sesame, allowing it to manage both thematic and spatial data expressed in stRDF. In the case when PostGIS is used as the relational backend of Strabon,Strabon uses an R-tree-over-GiST (Generalised Search Tree) offered by PostGIS as a spatial index.Strabon is implemented by creating a layer that is included in Sesame&apos;s software stackin a transparent way so that it does not affect its range of functionalities, whilebenefitting from new versions of Sesame. Strabon 3.0 uses Sesame 2.6.3 andcomprises three modules: the storage manager, the query engine and the underlying databaseWe have extended Sesame as follows: Modified the storage scheme, optimizer, and evaluator to take into account geometries and not treat them as plain literals Added a new RDBMS abstract layer to Storage Manager so as any spatially-enabled database can be used with Strabon.Currently, we have been using PostGIS and MonetDB.
  5. “Decomposition Storage Model”
  6. The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  7. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.Query processing in Strabon is performed by the query engine which consistsof a parser, an optimizer, an evaluator and a transaction manager. The parserand the transaction manager are identical to the ones in Sesame. The optimizerand the evaluator have been implemented by modifying the corresponding com-ponents of Sesame as we describe below.The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  8. We designed a simple benchmark
  9. This section presents a detailed evaluation of the system Strabon using two different workloads: a workload based on linked data and a synthetic workload. Forboth workloads, we compare the response time of Strabon on top of PostgreSQL (called Strabon PG from now on) with our closest competitor implementationin [3], the naive, baseline implementation described in Section 4, and the RDF store Parliament. To identify potential benets from using a dierent DBMS asa relational backend for Strabon, we also executed the SQL queries produced by Strabon in a proprietary spatially-enabled DBMS (which we will call System X,and Strabon X the resulting combination). [3] presents an implementation which enhances the RDF-3X triple store withthe ability to perform spatial selections using an R-tree index. The implementation of [3] is not a complete system like Strabon and does not support afull-edged query language such as stSPARQL. In addition, the only way to load data in the system is the use of a generator which has been especially designed for the experiments of [3] thus it cannot be used to load other datasets in the implementation. Moreover, the geospatial indexing support of this implementation is limited to spatial selections. Spatial selections are pushed down inthe query tree (i.e., they are evaluated before other operators). Parliament is an RDF storage engine recently enhanced with GeoSPARQL processing capabilities [2], which is coupled with Jena (an RDF query processor) to provide a complete RDF system.
  10. We have used a number of linked geospatial datasets. I would like to stress the heterogeneity of the dataset.We have included datasets from many application domains (e.g., sensors, land use, administrative areas, gazeteer).They also vary significantly in terms of size and complexity of the spatial geometries. DBpediaGeoNamesLDG (LinkedGeoData, OpenStreetMaps)Pachube (“patch-bay”): Pachube (&quot;patch-bay&quot;) connects people to devices, applications, and the Internet of Things. As a web-based service built to manage the world&apos;s real-time data, Pachube gives people the power to share, collaborate, and make use of information generated from the world around them.SwissEx (Swiss Experiment): http://www.swiss-experiment.chA platform to enable real-time environmental experiments through wireless sensor networks and a common, modern, generic cyber-infrastructure. This infrastructure will be used to enable interdisciplinary environmental research, allowing scientists to work efficiently and collaboratively to find the key mechanisms in the triggering of natural hazards and to efficiently distribute the information to increase public awareness.CLC (Corine Land Cover)GADM (Global Administrative Areas): http://www.gadm.org/ GADM is a spatial database of the location of the world&apos;s administrative areas (or adminstrative boundaries) for use in GIS and similar software. Administrative areas in this database are countries and lower level subdivisions such as provinces, departments, bibhag, bundeslander, daerahistimewa, fivondronana, krong, landsvæðun, opština, sous-préfectures, counties, and thana. GADM describes where these administrative areas are (the &quot;spatial features&quot;), and for each area it provides some attributes, such as the name and variant names.
  11. Queries 1,2: no spatial dimension
  12. Cold caches vs work cachesGreen is the winning colorRed is the loosing colorThe various versions of Strabon are the winning onesNo cases where Parliament is the winning systemThere are many interesting things we found in this experiment. For example,Parliament: reference implementation of GeoSPARQLIn queries Q1 and Q2, the baseline implementation outperforms all other systems as these queries do not include any spatial function. As mentioned earlier, the native store of Sesame outperforms Sesame implementations on top of a DBMS. All other systems producecomparable result times for these non-spatial queries. In all other queries the DBMS-based implementations outperform Parliament and the naive implemen-tation. Strabon PG outperforms the naive implementation since it incorporates the two stSPARQL-specic optimizations discussed in Section 4. Thus, spatialoperations are evaluated by PostGIS using a spatial index, instead of being evaluated after all the results have been retrieved. The naive implementation andParliament fail to execute Q3 as this query involves a spatial join that is very expensive for systems using a naive approach. The only exception in the behaviorof the naive implementation is Q4 in the case of warm caches, where the non-spatial part of the query produces very few results and the file blocks needed forquery evaluation are cached in main memory. In this case, the non-spatial part of the query is executed rapidly while the evaluation of the spatial function overthe results thus far is not signicant. All queries except Q8 are executed significantly faster when run using Strabon on warm caches. Q8 involves many triplepatterns and spatial functions which result in the production of a large number of intermediate results. As these do not fit in the system&apos;s cache, the responsetime is unaffected by the cache contents. System X decides to ignore the spatial index in queries Q3, Q6-Q8 and evaluate any spatial predicate exhaustively overthe results of a thematic join. In queries Q6-Q8, it also uses Cartesian products in the query plan as it considers their evaluation more profitable. These decisionsare correct in the case of Q8 where it outperforms all other systems significantly, but very costly in the cases of Q3, Q6 and Q7.
  13. Για το synthetic dataset, έχουμε τρέξει πειράματα με 1 billion με τον Strabon. Το naive δεν μπορούσε να αποτιμήσει queries σε τόσο μεγάλο dataset οπότε παρουσιάζουμε γράφους μέχρι 500 millions.
  14. The data of the synthetic dataset follows a general version of the schema of Open Street Map. Each node has a spatial extent (the location of the node) and is placed uniformly on a grid with step 0.001 (min 0, max 10.24).In addition, each node is assigned a number of tags each of which consist of a key-value pair of strings.Every node is tagged with key 1, every second node with key 2, every fourth node with key 4, and so on up to key 1024.By modifying the step of the grid, we produce datasets of arbitrary size.
  15. In this query THIS_VALUE (value “1”) allows us to select a known fraction of the generated nodes. Since every node is tagged with key 1, this triple pattern will produce a binding for every node of the dataset.By altering this value to “2”, we select half the nodes of the dataset. By further changing it to “4”, we select half thenodes we previously did, and so on.The extent of the POLYGON in the filter expression defines how selective the spatial predicate is. We modify the size of the polygon in order to select 10, 100, 1000, etc. spatial nodes.
  16. After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  17. In this case, as previously discussed, PostgreSQL does not select a good plan andthe response times are even higher than the times where PARAM A is equal to 1(Figure 6a), despite producing signicantly more intermediate results.
  18. After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  19. Two different query plans.1. Evaluation of thematic selection first.2. Evaluation of spatial selection first.In Postgres, the functions that estimate the selectivity of a spatial selection or a spatial join return a constant number. In other words, the spatial selectivity of a query did not affect dynamically the decision whether the spatial selection should be the first operation to execute. Therefore, only the thematic selectivity affected the choice for a query plan. To state it in terms of our query template, altering the value for the key between 1 and 1024 was the only factor influencing the query plan.
  20. So, what I would like you to keep for this slide is that Strabon not only has a good performance but it is also one of the richest systems available at the moment.