DataBio Architecture for Big Data and Big Data Visualisation
1. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
1
This project has received funding from
the European Union’s Horizon 2020
research and innovation programme
under grant agreement No 732064
This project is part
of BDV PPP
DATABIO ARCHITECTURE FOR BIG DATA AND BIG DATA
VISUALISATION
Karel Charvat with support of
Thanasis Poulakidas Tomáš Řezník, Šimon
Leitgeb, Štěpán Kafka, Raul Palma, Karel
Charvat Jr, Vojtech Lukas, Soumya Brahma,
Dmitrij Kozuch, Raitis Berzins, Karel Jedlička
107th OGC
Technical
Committee
Colorado State University
Lory Student Center
Ft. Collins, Colorado, USA
2. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
2
Experience from our DataBio project
Project title: Data-Driven Bioeconomy
Project type: H2020 Innovation Action, in topic ICT-15-2016-2017 - Big Data PPP: Large
Scale Pilot actions in sectors best benefitting from data-driven innovation
Duration: 1 Jan. 2017 – 31 Dec. 2019 (36 months)
Total budget: 16,2 M€
Partners: 48 partners, 70+ associated partners
3. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
3
Pilots
Fishing vessels
immediate operational
choices
Oceanic tuna fisheries immediate operational choices
Small pelagic fisheries immediate operational choices
Fishing vessel trip and
fisheries planning
Oceanic tuna fisheries planning
Small pelagic fisheries planning
Fisheries sustainability
and value
Pelagic fish stock assessments
Small pelagic market predictions and traceability
Multisource and data
crowdsourcing /e-
services
Easy data sharing and networking
Monitoring and control tools for forest owners
Forest Health /
Remote/Crowd
sensing, Invasive
species/damage
Forest damage remote sensing
Monitoring of forest health
Invasive alien species control and monitoring
Forest data
management services
(forecast/predict)
Web-mapping service for the government decision making
Shared multiuser forest data environment
Precision
Horticulture
including
vine and
olives
Precision agriculture in olives, fruits, grapes (@Greece)
Precision agriculture in vegetable seed crops (@Italy)
Precision agriculture in vegetables -2 (Potatoes,
@Netherlands)
Big Data management in greenhouse eco-systems (@Italy)
Arable
Precision
Farming
Cereals, biomass and cotton crops 1 (@Spain)
Cereals, biomass and cotton crops 2 (@Greece)
Cereals, biomass and cotton crops 3 (@Italy)
Cereals, biomass and cotton crops 4 (@Czech Republic)
Machinery management (@Czech Republic, Italy)
Subsidies
and
insurance
Insurance (@Greece)
Farm Weather Insurance Assessment (@Italy)
CAP Support (@Italy, Romania)
CAP Support (@Greece)
4. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
4
Big picture and expected outcomes
AGRICULTURE FORESTRY FISHERY
Big Data Sources
and Big Data Types
Structured and unstructured data
Spatio-temporal data
Machine generated data
Image/sensor data
Geospatial data
Genomics data
Data
Management
Collection
Preparation
Curation
Linking
Access
Data
Processing
Batch
Interactive
Streaming
Real-time
Data Analytics
Classification
Clustering
Regression
Deep learning
Optimization
Simulation
RAW MATERIAL PRODUCTION
FOR FOOD AND
ENERGY SUPPLY CHAINS
BIOMATERIALS
RESPONSIBLE
PRODUCTION
SUSTAINABILITY
Data Visualization and User Interaction
1D, 2D, 3D + temporal
Virtual and Augmented Reality
5. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
5
Combining drivers and assets
Sector Variety Volume
(TB)
Velocity
(TB/Year)
Agriculture 8 sources, 4 types 53 197
Forestry
8 sources, 7 types 11,39 12,12
Aerial/UAV 100 GB/h
Fishery 20 sources, 13 types 8,82 6,27
26 pilots, in 3 sectors x 3 thematic groups
6. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
6
DataBio platform
• The DataBio platform is a software development platform,
providing a Big Data toolset, offering functionalities for
services primarily in agriculture, forestry, fishery
• 91 technology components
• Formed 13 reusable and deployable pipelines
• Sets of components, with clear mutual interfaces linking them
together and to the platform environment, fulfilling specific pilot
functionalities
• Example (roles, pipeline and lifecycle views):
7. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
7
DataBio reports (new technical reports will come soon)
• All DataBio reports are on
• https://www.databio.eu/en/publicdeliverables/
• currently as most relevant for Agriculture.DWG are
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D1.1-Agriculture-Pilot-
Definition_v1.1_2018-04-26_LESPRO.pdf
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.4-Data-driven-bioeconomy-
pilots_v1.0_2018-02-28_CiaoT.pdf
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D7.1-Business-Plan_v2.1_2018-
02-06_UStG.pdf
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D7.3-PESTLE-
Analysis_v1.0_2017-12-29_VTT.pdf
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D5.1-EO-Component-
Specification_v1.0_2017-12-29_SPACEBEL.pdf
• https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.2-Data-Management-
Plan_v1.0_2017-06-30_CREA.pdf
8. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
8
Three cases
• Unifying Data and Metadata
• Linked Open Data FOODIE Data Model
• 3D visualization of Big Data
9. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
9
Use cases
Unifying Data and Metadata
10. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
10
Why?
The way we currently handle
geospatial metadata.
Images adopted from: organicwineexchange.com, vectorstock.com
Where can I find information on
what’s inside?
We have an application exactly for
that. Just go into the room at the
end of the shop, press the red
button to start the scanner and
then wait few seconds to see the
information that appears.
11. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
11
Current situation
12. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
12
Let’s move on
Image adopted from: Reznik, T., Chudy, R., Micietova, E. Normalized evaluation of the performance, capacity
and availability of catalogue services: a pilot study based on INfrastruture for SPatial InfoRmation in Europe.
International Journal of Digital Earth 9, 325-341 (2016). doi: 10.1080/17538947.2015.1019581
13. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
13
Software ingredients
• HSLayers NG
• Visualization library based on OL, Cordova, Bootstrap etc.
• http://ng.hslayers.org/
• Copernicus Open Access API
• Source of Sentinel images
• https://scihub.copernicus.eu
• NASA API
• Source of Landsat (and other images)
• https://api.nasa.gov/
14. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
14
Copernicus Open Access API
• Sample query
https://scihub.copernicus.eu/dhus/search?q=footprint:%22Intersects(POLYG
ON((16.75%2049.03,%2017.12%2049.04,%2017.06%2049.30,%2016.78%20
49.29,%2016.75%2049.03)))%22&FORMAT=json
JSON
metadata parser
15. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
15
Copernicus Open Access API
• API produces JSON, however it is firstly parsed and transformed into
GeoJSON to handle geospatial information correctly (Python script
developed)
• Communication to NASA API in progress
JSON
metadata parser
GeoJSON
16. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
16
Current status
17. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
17
Outlook – Filtering
• Sample query
https://scihub.copernicus.eu/dhus/search?q=footprint:%22Intersects(POLYG
ON((16.75%2049.03,%2017.12%2049.04,%2017.06%2049.30,%2016.78%20
49.29,%2016.75%2049.03)))%22&FORMAT=json
JSON
Metadata parser
328 satellite images
available
radar
multispectral
x
x
18. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
18
Outlook – Notifications
New Sentinel-2B image is available.
70.8% cloud coverage
DOWNLOAD (SAFE, 750 MB) IGNORE
Ongoing work also on integration of the NASA API (https://api.nasa.gov)
19. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
19
Use cases
Linked Open Data FOODIE Data Model
20. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
20
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. Find us at www.databio.eu
FOODIE Data Models
Core Data Model
VGI Data Model
Transport Data Model
Sensor Data Model
21. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
21
Linked data publication process overview
• Simple set of principles & technologies
• URI, HTTP, RDF, SPARQL
• Involves a set of tasks
Datasets identification
Model specification
RDF data generation
Linking
Hyland et al.
Hausenblas et al.
Villazón-Terrazas et al.
Reference Linked data publication pipelines
Exploiting
22. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
22
Linked data publication technologies overview
• Used technologies:
• D2RQ for transforming Relational Databases as
Virtual RDF Graphs
• RDF for the representation of data
• Farming ontology providing the underlying
vocabulary and relations
• Virtuoso for storing the semantic datasets
• Silk for discovery of links
• Sparql for querying semantic data
• Hslayers NG for visualisation of data
• Metaphactory for visualisation of data
D2RQ
23. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
23
Datasets identification
• Goal: to publish linked data from pilots in FOODIE project (available in
PostgreSQL database):
• Precision viticulture (Spain)
• Delivered a web-based solution providing advisory services in different aspects related to
winegrowing, like disease prevention, production estimation or harvesting schedule
• Open Data for Strategic and Tactical planning (Czech Republic)
• Delivered two main applications, one for farm telemetry and other for estimation of yield potential
24. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
24
Transformation from UML model to OWL ontology
• Followed a semi-automatic approach
• ShapeChange tool that implements ISO 19150-2 standard
rules for mapping ISO geographic information UML models to
OWL ontologies.
• Required different processing tasks:
• Pre-processing
• Source model preparation
• ShapeChange tool configuration: encoding rules; mappings UML classes - OWL elements;
namespaces definition
• Base ontologies fixes (INPSIRE common, ISO 19100 series standards)
• Post-processing tasks
• Manual fixes in the ontology
• Manual creation of ontology elements of the base INSPIRE schemas (AF)
XML schemas,
feature catalogs,
and RDF/OWL
25. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
25
Ontology for farming data - overview
• ShapeChange output
• UML featureTypes and dataTypes modelled as classes, and
their attributes as datatype or object properties
• UML codeLists modelled as classes/concepts, and their
attributes as concept members
• Cardinalities restrictions defined on properties (exactly,
min, max)
• DataType properties ranges defined according to
model/mappings
• Object properties ranges defined according to
model/mappings
• Object properties inverseOf defined
Top hierarchy
FeatureType hierarchy
Codelist hierarchy
Datatype hierarchy
26. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
26
Exploiting the Linked Data – visualisation
• Map visualisation: http://ng.hslayers.org/examples/foodie-zones/
27. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
27
Links to models
• https://github.com/Wirelessinfo/FOODIE-data-model
• https://github.com/FOODIE-cloud/ontology
28. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
28
Use cases
3D visualization of Big Data
29. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
29
• Use a 3D visualisation as a unifying environment for portraying different
types of data.
• Base on the agriculture point of view:
• Raw data for picking the right dataset for further data processing.
• Processed data (transformed / harmonized / analyzed / …)
for exploration the results and decision support.
Methodology 3D visualisation of Big Data
30. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
30
Technology
• Diversity of the data structures implies a need of robust and easily customizable
application for data visualization
• Framework
• HSLayers NG (~ OpenLayers based JavaScript Library)
• https://github.com/hslayers/hslayers-ng
• Cesium
• https://cesiumjs.org/
• Data connectors
• Web Map Service ~ for raster and imagery data
• GeoJSON ~ for vector data
• Resource Description Framework (RDF) ~ for linked data
• OpenStreetMap live data pump ~ for vector data from OSM
• Tailored applications
31. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
31
Major Outcomes
• Developed best practise applications examples
• best practice examples of processed data visualization tailored for the purposes of
the DataBio project Data Experimentation and Proof of Concept phases.
• The applications were created by using the above mentioned framework.
• The work started in previous project FOODIE and now continues as a part of Czech
agriculture pilots of DataBio project. To speed up a development, three new large
scale testbeds were developed as part of INSIRE Hack.
• http://www.foodie-project.eu/
• http://databio.eu/
• http://www.plan4all.eu/inspire-hack-2017/
32. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
32
Major Outcomes
• Open Land Use (http://ng.hslayers.org/examples/3d-
olu)
33. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
33
Major Outcomes
• Perspective visualization of estimated yield (http://ng.hslayers.org/examples/rostenice)
34. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
34
Major Outcomes
• Linked data integration (http://ng.hslayers.org/examples/produce-3d)
35. This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
35
Thank you for your attention!
W www.databio.eu
E charvat@lesprojekt.cz,
E info@databio.eu
agriXchange / DataBio
@DataBio_eu
DataBioProject