SlideShare uma empresa Scribd logo
1 de 33
Spatial Data Integrator   software presentation and use cases National Geographic Community Meeting Day Ministry of Ecology and Sustainable Development – Ministry of Agroculture mathieu.rajerison
Summary ,[object Object]
Place of an ETL inside a data infrastructure
The different interface elements of SDI ,[object Object],[object Object]
Connecting the components insite the workspace
Configuring the tMap component
Executing the job ,[object Object],[object Object]
Merging layers
Chaining the quality checking of layers
Migrating data to PostgreSQL/PostGIS
Other applications ,[object Object],[object Object]
Links
1- Software presentation
General aspects ,[object Object]
Based on  Talend Open Studio
It adds a  spatial  layer to TOS thanks to geospatial access and treatment components
Developed on Java: Eclipse environment, UDig elements, GeoTools library, Java Topology Suite, Sextante
Place of an ETL in a data infrastructure Dashboards Portal
The interface elements the map window This windows enables to visualize geographic data. It is useful when controlling the results of a treatment. This windows is part of UDig Software.
The tool  The business modeler The business modeler enables to model the job processes Il allows a large public to take part of of the data flow conception and to follow the advancement of development, without requiring any computer skills Modelling in this window has no impact on the job execution
The interface elements The repository metadata tab The repository contains, among other things, the metadata part The metadata part is a place where to store the data access parameters. On the image, you can notice-the different types of data sources. Note that the configuration of geographic data is not made inside the metadata part (we'll see that further in the demo)
The interface elements The graphical workspace The main window is where you create your jobs You pick your components and put them here There are different types of relations between components that won't be detailed in this keynote.
The interface elements The components palette The palette contains the different components. It's a kind of toolbox Spatial Data Integrator adds the  geo  part to it The palette is extensible thanks to the contributions of developers As it is opensource, you can develop your own components
The interface elements The configuration tab the bottom windows is where you configure the behaviour of each component it also enables you to parameter the execution of your job.
2- Demonstration How to manage outer joins
Configuring the data access  and creating the schemas the first step consists in configuring the access to you data source.
Connecting the components  inside the workspace You put and connect the components inside the workspace
Configuring the tMap component Here, the city name links the two tables. Two output flows are generated: one for inner join results, one for the outer join ones.
The job execution The job can now be executed There are two modes of execution: - statistics  mode displays the number of rows for each flow - traces  mode displays its content Each of these modes is executed in streaming.
Going further:  detecting similarities between rows Here, we use a fuzzy logic component named  tFuzzyMatch . It detects the similarities between rows coming from two different flows. It can be useful to see which rows from a reference (lookup) table correspond the most to the outer join results.
3- Use Cases
Scheduling the aggregation of data A web geographic portal demands joining periodically the data from different sources Here, it is an Access database fed by users. We'll associate its entries with the cities objects. WMS Access SHP BDCARTO Map Server Sybase XML ... Client part SCP SHP
Scheduling the aggregation of data -SDI task scheduler  -crontab for Linux env -windows task  scheduler

Mais conteúdo relacionado

Mais procurados

Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1
Ajay Ohri
 
Modeling of multiversion concurrency control
Modeling of multiversion concurrency controlModeling of multiversion concurrency control
Modeling of multiversion concurrency control
Jawid Ahmad Baktash
 
Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clustering
paperpublications3
 
Real Property Management at DND using FME
Real Property Management at DND using FMEReal Property Management at DND using FME
Real Property Management at DND using FME
Safe Software
 

Mais procurados (20)

Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
QGIS Tutorial 1
QGIS Tutorial 1QGIS Tutorial 1
QGIS Tutorial 1
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
Modeling of multiversion concurrency control
Modeling of multiversion concurrency controlModeling of multiversion concurrency control
Modeling of multiversion concurrency control
 
Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clustering
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
QGIS training class 1
QGIS training class 1QGIS training class 1
QGIS training class 1
 
MCE GeoProcessing Services for ADM(IE): Self Validation of Spatial Data Input...
MCE GeoProcessing Services for ADM(IE): Self Validation of Spatial Data Input...MCE GeoProcessing Services for ADM(IE): Self Validation of Spatial Data Input...
MCE GeoProcessing Services for ADM(IE): Self Validation of Spatial Data Input...
 
Real Property Management at DND using FME
Real Property Management at DND using FMEReal Property Management at DND using FME
Real Property Management at DND using FME
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
IntraMaps - User Group - November 2010 - Hansen Integration
IntraMaps - User Group - November 2010 - Hansen IntegrationIntraMaps - User Group - November 2010 - Hansen Integration
IntraMaps - User Group - November 2010 - Hansen Integration
 

Semelhante a Spatial Data Integrator - Software Presentation and Use Cases

2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
NavNeet KuMar
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docx
anhlodge
 

Semelhante a Spatial Data Integrator - Software Presentation and Use Cases (20)

Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Geoprocessing
GeoprocessingGeoprocessing
Geoprocessing
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Major ppt
Major pptMajor ppt
Major ppt
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docx
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
 
RS and GIS TW- 1&2.pdf
RS and GIS TW- 1&2.pdfRS and GIS TW- 1&2.pdf
RS and GIS TW- 1&2.pdf
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Presto
PrestoPresto
Presto
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Spatial Data Integrator - Software Presentation and Use Cases

  • 1. Spatial Data Integrator software presentation and use cases National Geographic Community Meeting Day Ministry of Ecology and Sustainable Development – Ministry of Agroculture mathieu.rajerison
  • 2.
  • 3. Place of an ETL inside a data infrastructure
  • 4.
  • 5. Connecting the components insite the workspace
  • 7.
  • 9. Chaining the quality checking of layers
  • 10. Migrating data to PostgreSQL/PostGIS
  • 11.
  • 12. Links
  • 14.
  • 15. Based on Talend Open Studio
  • 16. It adds a spatial layer to TOS thanks to geospatial access and treatment components
  • 17. Developed on Java: Eclipse environment, UDig elements, GeoTools library, Java Topology Suite, Sextante
  • 18. Place of an ETL in a data infrastructure Dashboards Portal
  • 19. The interface elements the map window This windows enables to visualize geographic data. It is useful when controlling the results of a treatment. This windows is part of UDig Software.
  • 20. The tool The business modeler The business modeler enables to model the job processes Il allows a large public to take part of of the data flow conception and to follow the advancement of development, without requiring any computer skills Modelling in this window has no impact on the job execution
  • 21. The interface elements The repository metadata tab The repository contains, among other things, the metadata part The metadata part is a place where to store the data access parameters. On the image, you can notice-the different types of data sources. Note that the configuration of geographic data is not made inside the metadata part (we'll see that further in the demo)
  • 22. The interface elements The graphical workspace The main window is where you create your jobs You pick your components and put them here There are different types of relations between components that won't be detailed in this keynote.
  • 23. The interface elements The components palette The palette contains the different components. It's a kind of toolbox Spatial Data Integrator adds the geo part to it The palette is extensible thanks to the contributions of developers As it is opensource, you can develop your own components
  • 24. The interface elements The configuration tab the bottom windows is where you configure the behaviour of each component it also enables you to parameter the execution of your job.
  • 25. 2- Demonstration How to manage outer joins
  • 26. Configuring the data access and creating the schemas the first step consists in configuring the access to you data source.
  • 27. Connecting the components inside the workspace You put and connect the components inside the workspace
  • 28. Configuring the tMap component Here, the city name links the two tables. Two output flows are generated: one for inner join results, one for the outer join ones.
  • 29. The job execution The job can now be executed There are two modes of execution: - statistics mode displays the number of rows for each flow - traces mode displays its content Each of these modes is executed in streaming.
  • 30. Going further: detecting similarities between rows Here, we use a fuzzy logic component named tFuzzyMatch . It detects the similarities between rows coming from two different flows. It can be useful to see which rows from a reference (lookup) table correspond the most to the outer join results.
  • 32. Scheduling the aggregation of data A web geographic portal demands joining periodically the data from different sources Here, it is an Access database fed by users. We'll associate its entries with the cities objects. WMS Access SHP BDCARTO Map Server Sybase XML ... Client part SCP SHP
  • 33. Scheduling the aggregation of data -SDI task scheduler -crontab for Linux env -windows task scheduler
  • 34. Merging layers Imagine a data infrastructure where geograhic layers are disseminated in as many files as cities. Consequently, there is one file per city. The jobs aims at merging all these files in one unique table. SHP5 SHP4 SHP3 SHP2 SHP1 SHP
  • 36. Chaining the Quality Control of Digitalized Documents After having digitalized a huge mass of data, we must operate a complete control on it. The geometry of the objects and their attributes must be checked. This task is very time-consuming if we accomplish it with usual mapping softwares. checking the tables structure checking the content checking the geometric compliance comparison to the reference data
  • 37. Chaining the Quality Control of Digitalized Documents With a single click, SDI enables to operate this series of controls Reports will list errors related to the objects geometric compliance or attribute values. checking the tables structure checking the content checking the geometric compliance comparison to the reference data
  • 38. Chaining the Quality Control of Digitalized Documents
  • 39. Chaining the Quality Control of Digitalized Documents Job comparing the Urban Planning Project Map to the Cadastral Reference Data.
  • 40. Chaining the Quality Control of Digitalized Documents Tmap joining component Used function Result type row4.the_geom. symDifference (row2.the_geom) géométrique GeometryOperation.GETAREA (row4.the_geom.difference(row2.the_geom)) flottant
  • 41. Migrating data into a PostgreSQL/PostGIS database At a regional scope, we want to mutualize data and integrate it into a PostgreSQL/postGIS database management system Folder tree Relational Database System
  • 42. Migrating data into a PostgreSQL/PostGIS database
  • 43.
  • 44. Dividing an image in multiple images, each cut using the city contour and naming each image with the name of the city it has been cut with
  • 45. Using Talend with GDAL-OGR : conversion in other formats
  • 47. Extending the possibilities by using auxiliar java libraries
  • 48.
  • 49. Enables to migrate , consolidate spatial data infrastructures
  • 50. Simplifies usually time-consuming tasks
  • 51. Avoids errors due to the repeating of manual operations, enhances the quality of controls
  • 52. A very active community
  • 53. New components are to be available
  • 54.
  • 55. Multiple accesses to data : SCP, FTP, WebServices, POP
  • 56. Metadata automatic creation: MEF, XML files for GeoNetwork
  • 57. Raster processing using Sextante
  • 58.