O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

DataBio Architecture for Big Data and Big Data Visualisation

167 visualizações

Publicada em

Karel Charvat

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

DataBio Architecture for Big Data and Big Data Visualisation

  1. 1. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 1 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064 This project is part of BDV PPP DATABIO ARCHITECTURE FOR BIG DATA AND BIG DATA VISUALISATION Karel Charvat with support of Thanasis Poulakidas Tomáš Řezník, Šimon Leitgeb, Štěpán Kafka, Raul Palma, Karel Charvat Jr, Vojtech Lukas, Soumya Brahma, Dmitrij Kozuch, Raitis Berzins, Karel Jedlička 107th OGC Technical Committee Colorado State University Lory Student Center Ft. Collins, Colorado, USA
  2. 2. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 2 Experience from our DataBio project Project title: Data-Driven Bioeconomy Project type: H2020 Innovation Action, in topic ICT-15-2016-2017 - Big Data PPP: Large Scale Pilot actions in sectors best benefitting from data-driven innovation Duration: 1 Jan. 2017 – 31 Dec. 2019 (36 months) Total budget: 16,2 M€ Partners: 48 partners, 70+ associated partners
  3. 3. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 3 Pilots Fishing vessels immediate operational choices Oceanic tuna fisheries immediate operational choices Small pelagic fisheries immediate operational choices Fishing vessel trip and fisheries planning Oceanic tuna fisheries planning Small pelagic fisheries planning Fisheries sustainability and value Pelagic fish stock assessments Small pelagic market predictions and traceability Multisource and data crowdsourcing /e- services Easy data sharing and networking Monitoring and control tools for forest owners Forest Health / Remote/Crowd sensing, Invasive species/damage Forest damage remote sensing Monitoring of forest health Invasive alien species control and monitoring Forest data management services (forecast/predict) Web-mapping service for the government decision making Shared multiuser forest data environment Precision Horticulture including vine and olives Precision agriculture in olives, fruits, grapes (@Greece) Precision agriculture in vegetable seed crops (@Italy) Precision agriculture in vegetables -2 (Potatoes, @Netherlands) Big Data management in greenhouse eco-systems (@Italy) Arable Precision Farming Cereals, biomass and cotton crops 1 (@Spain) Cereals, biomass and cotton crops 2 (@Greece) Cereals, biomass and cotton crops 3 (@Italy) Cereals, biomass and cotton crops 4 (@Czech Republic) Machinery management (@Czech Republic, Italy) Subsidies and insurance Insurance (@Greece) Farm Weather Insurance Assessment (@Italy) CAP Support (@Italy, Romania) CAP Support (@Greece)
  4. 4. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 4 Big picture and expected outcomes AGRICULTURE FORESTRY FISHERY Big Data Sources and Big Data Types Structured and unstructured data Spatio-temporal data Machine generated data Image/sensor data Geospatial data Genomics data Data Management Collection Preparation Curation Linking Access Data Processing Batch Interactive Streaming Real-time Data Analytics Classification Clustering Regression Deep learning Optimization Simulation RAW MATERIAL PRODUCTION FOR FOOD AND ENERGY SUPPLY CHAINS BIOMATERIALS RESPONSIBLE PRODUCTION SUSTAINABILITY Data Visualization and User Interaction 1D, 2D, 3D + temporal Virtual and Augmented Reality
  5. 5. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 5 Combining drivers and assets Sector Variety Volume (TB) Velocity (TB/Year) Agriculture 8 sources, 4 types 53 197 Forestry 8 sources, 7 types 11,39 12,12 Aerial/UAV 100 GB/h Fishery 20 sources, 13 types 8,82 6,27 26 pilots, in 3 sectors x 3 thematic groups
  6. 6. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 6 DataBio platform • The DataBio platform is a software development platform, providing a Big Data toolset, offering functionalities for services primarily in agriculture, forestry, fishery • 91 technology components • Formed 13 reusable and deployable pipelines • Sets of components, with clear mutual interfaces linking them together and to the platform environment, fulfilling specific pilot functionalities • Example (roles, pipeline and lifecycle views):
  7. 7. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 7 DataBio reports (new technical reports will come soon) • All DataBio reports are on • https://www.databio.eu/en/publicdeliverables/ • currently as most relevant for Agriculture.DWG are • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D1.1-Agriculture-Pilot- Definition_v1.1_2018-04-26_LESPRO.pdf • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.4-Data-driven-bioeconomy- pilots_v1.0_2018-02-28_CiaoT.pdf • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D7.1-Business-Plan_v2.1_2018- 02-06_UStG.pdf • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D7.3-PESTLE- Analysis_v1.0_2017-12-29_VTT.pdf • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D5.1-EO-Component- Specification_v1.0_2017-12-29_SPACEBEL.pdf • https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.2-Data-Management- Plan_v1.0_2017-06-30_CREA.pdf
  8. 8. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 8 Three cases • Unifying Data and Metadata • Linked Open Data FOODIE Data Model • 3D visualization of Big Data
  9. 9. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 9 Use cases Unifying Data and Metadata
  10. 10. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 10 Why? The way we currently handle geospatial metadata. Images adopted from: organicwineexchange.com, vectorstock.com Where can I find information on what’s inside? We have an application exactly for that. Just go into the room at the end of the shop, press the red button to start the scanner and then wait few seconds to see the information that appears.
  11. 11. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 11 Current situation
  12. 12. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 12 Let’s move on Image adopted from: Reznik, T., Chudy, R., Micietova, E. Normalized evaluation of the performance, capacity and availability of catalogue services: a pilot study based on INfrastruture for SPatial InfoRmation in Europe. International Journal of Digital Earth 9, 325-341 (2016). doi: 10.1080/17538947.2015.1019581
  13. 13. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 13 Software ingredients • HSLayers NG • Visualization library based on OL, Cordova, Bootstrap etc. • http://ng.hslayers.org/ • Copernicus Open Access API • Source of Sentinel images • https://scihub.copernicus.eu • NASA API • Source of Landsat (and other images) • https://api.nasa.gov/
  14. 14. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 14 Copernicus Open Access API • Sample query https://scihub.copernicus.eu/dhus/search?q=footprint:%22Intersects(POLYG ON((16.75%2049.03,%2017.12%2049.04,%2017.06%2049.30,%2016.78%20 49.29,%2016.75%2049.03)))%22&FORMAT=json JSON metadata parser
  15. 15. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 15 Copernicus Open Access API • API produces JSON, however it is firstly parsed and transformed into GeoJSON to handle geospatial information correctly (Python script developed) • Communication to NASA API in progress JSON metadata parser GeoJSON
  16. 16. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 16 Current status
  17. 17. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 17 Outlook – Filtering • Sample query https://scihub.copernicus.eu/dhus/search?q=footprint:%22Intersects(POLYG ON((16.75%2049.03,%2017.12%2049.04,%2017.06%2049.30,%2016.78%20 49.29,%2016.75%2049.03)))%22&FORMAT=json JSON Metadata parser 328 satellite images available radar multispectral x x
  18. 18. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 18 Outlook – Notifications New Sentinel-2B image is available. 70.8% cloud coverage DOWNLOAD (SAFE, 750 MB) IGNORE Ongoing work also on integration of the NASA API (https://api.nasa.gov)
  19. 19. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 19 Use cases Linked Open Data FOODIE Data Model
  20. 20. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 20 This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. Find us at www.databio.eu FOODIE Data Models Core Data Model VGI Data Model Transport Data Model Sensor Data Model
  21. 21. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 21 Linked data publication process overview • Simple set of principles & technologies • URI, HTTP, RDF, SPARQL • Involves a set of tasks Datasets identification Model specification RDF data generation Linking Hyland et al. Hausenblas et al. Villazón-Terrazas et al. Reference Linked data publication pipelines Exploiting
  22. 22. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 22 Linked data publication technologies overview • Used technologies: • D2RQ for transforming Relational Databases as Virtual RDF Graphs • RDF for the representation of data • Farming ontology providing the underlying vocabulary and relations • Virtuoso for storing the semantic datasets • Silk for discovery of links • Sparql for querying semantic data • Hslayers NG for visualisation of data • Metaphactory for visualisation of data D2RQ
  23. 23. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 23 Datasets identification • Goal: to publish linked data from pilots in FOODIE project (available in PostgreSQL database): • Precision viticulture (Spain) • Delivered a web-based solution providing advisory services in different aspects related to winegrowing, like disease prevention, production estimation or harvesting schedule • Open Data for Strategic and Tactical planning (Czech Republic) • Delivered two main applications, one for farm telemetry and other for estimation of yield potential
  24. 24. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 24 Transformation from UML model to OWL ontology • Followed a semi-automatic approach • ShapeChange tool that implements ISO 19150-2 standard rules for mapping ISO geographic information UML models to OWL ontologies. • Required different processing tasks: • Pre-processing • Source model preparation • ShapeChange tool configuration: encoding rules; mappings UML classes - OWL elements; namespaces definition • Base ontologies fixes (INPSIRE common, ISO 19100 series standards) • Post-processing tasks • Manual fixes in the ontology • Manual creation of ontology elements of the base INSPIRE schemas (AF) XML schemas, feature catalogs, and RDF/OWL
  25. 25. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 25 Ontology for farming data - overview • ShapeChange output • UML featureTypes and dataTypes modelled as classes, and their attributes as datatype or object properties • UML codeLists modelled as classes/concepts, and their attributes as concept members • Cardinalities restrictions defined on properties (exactly, min, max) • DataType properties ranges defined according to model/mappings • Object properties ranges defined according to model/mappings • Object properties inverseOf defined Top hierarchy FeatureType hierarchy Codelist hierarchy Datatype hierarchy
  26. 26. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 26 Exploiting the Linked Data – visualisation • Map visualisation: http://ng.hslayers.org/examples/foodie-zones/
  27. 27. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 27 Links to models • https://github.com/Wirelessinfo/FOODIE-data-model • https://github.com/FOODIE-cloud/ontology
  28. 28. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 28 Use cases 3D visualization of Big Data
  29. 29. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 29 • Use a 3D visualisation as a unifying environment for portraying different types of data. • Base on the agriculture point of view: • Raw data for picking the right dataset for further data processing. • Processed data (transformed / harmonized / analyzed / …) for exploration the results and decision support. Methodology 3D visualisation of Big Data
  30. 30. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 30 Technology • Diversity of the data structures implies a need of robust and easily customizable application for data visualization • Framework • HSLayers NG (~ OpenLayers based JavaScript Library) • https://github.com/hslayers/hslayers-ng • Cesium • https://cesiumjs.org/ • Data connectors • Web Map Service ~ for raster and imagery data • GeoJSON ~ for vector data • Resource Description Framework (RDF) ~ for linked data • OpenStreetMap live data pump ~ for vector data from OSM • Tailored applications
  31. 31. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 31 Major Outcomes • Developed best practise applications examples • best practice examples of processed data visualization tailored for the purposes of the DataBio project Data Experimentation and Proof of Concept phases. • The applications were created by using the above mentioned framework. • The work started in previous project FOODIE and now continues as a part of Czech agriculture pilots of DataBio project. To speed up a development, three new large scale testbeds were developed as part of INSIRE Hack. • http://www.foodie-project.eu/ • http://databio.eu/ • http://www.plan4all.eu/inspire-hack-2017/
  32. 32. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 32 Major Outcomes • Open Land Use (http://ng.hslayers.org/examples/3d- olu)
  33. 33. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 33 Major Outcomes • Perspective visualization of estimated yield (http://ng.hslayers.org/examples/rostenice)
  34. 34. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 34 Major Outcomes • Linked data integration (http://ng.hslayers.org/examples/produce-3d)
  35. 35. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 35 Thank you for your attention! W www.databio.eu E charvat@lesprojekt.cz, E info@databio.eu agriXchange / DataBio @DataBio_eu DataBioProject

×