4. OpenStreetMap Analog topo map 1:10K Brandenburg Viewer 1 Motivation Spatial data quality matters Potsdam in different spatial datasets
5.
6.
7.
8.
9. One spatial object, multiple geometry OpenStreet Map TeleAtlas ATKIS 3 Data conflation Optimising spatial data quality
10.
11. 4 Data conflation at work Automated workflow Producing best-fit dataset dataset 1 dataset 2 pre-processing pre-processing object assignment new dataset data sources
12.
13.
14.
15.
16.
17. Thank you for your attention Questions? Comments? Feedback? Contact Hartmut Asche | gislab@uni-potsdam.de Dept of Geography | University of Potsdam | GER Web www.geographie.uni-potsdam.de/geoinformatik ICCSA 2011 | GEOG-AN-MOD 2011 | University of Santander | 20-23/06/2011
Notas do Editor
With the introduction of digital mapping techniques in the 1960s and then GIS shortly afterwards, researchers realized that error and uncertainty in digital spatial data had the potential to cause problems that had not been experienced with paper maps. An international trend started in the early-1980s to design and implement data transfer standards which would include data quality information that had disappeared from the margins of paper maps with the transformation to digital data products. The main intention of this work is to present the data conflation as one of the options for improvement of spatial data quality.
In a number of fields, the approach to quality evolved into a definition based on fitness-for-use. ISO 8402 defines the quality as the ‘totality of characteristics of a product that bear on its ability to satisfy stated or implied needs’. This means that to define the quality two information are needed: the information on the data being used and on the users needs. Spatial data is defined to be fitness-for-use if it meets requirements of the target application. Data quality is defined by one or more quality dimensions. Quality dimensions for geographic data are called spatial data quality elements. They include completeness, logical consistency, positional accuracy, temporal accuracy (the accuracy of reporting time associated with the data) and thematical/semantical or attribute accuracy. Typically, metadata for spatial data include descriptions of data quality and include information about these elements.
During the conflation process information from the source input dataset (SDS) and the target input dataset (TDS) have to be assigned to each other. The SDS is defined as the dataset from where the geospatial information is taken (e.g. thematic information) and the TDS is defined as the dataset to which the geospatial information taken from the SDS is being transferred, i.e. the expanded dataset.
In order to transmit the real world into the language understandable for the computer, it should be modeled according to specific rules in a simplified form. Such data models represent the objects of reality as points, lines or areas (polygons). Each of these objects is provided with the x -and y-coordinates and contains information on the spatial reference. This example shows the differences of data formats of the same object.
The different producers of spatial data detected the same object of the real world differently. There are no uniform rules for acquisition of spatial data. According to this the different abstract representations of one and the same object of the real world may arise. This Figure shows an example of alternative geometric representations of the same real world object. Each representation was generated by different spatial data providers.
The approach presented here improves the quality of spatial data. This method illustrates how to increase the geometrical completeness of the road networks data. In the source dataset available objects such roundabouts must be found in the target dataset and assigned to the new amended dataset. The problem is that crossroads, which are roundabouts, in the dataset are saved as simple crossroad. At first a position of all available crossroads in the both datasets has to be found. A roundabout is finding if minimum three edges of the road network have the same start- and endpoint. If there are three edges, which have the same node, regardless of that is start or end point of each edge, then this intersection is a part of the roundabout.
In this way every crossroad of the dataset is verified. If a roundabout is defined, than at the second step the adequate crossroad is searched in the second dataset. Therefore the points are used, which are valid as traffic access or exits
All access or exits of roundabout are found in the first input dataset. The corresponding edges in the second input dataset are also found. Now the geometrical information about new objects can be assigned
After merge process of two or more datasets, the completeness of input data is always increased. This applies to all data types: polygons, lines, points. One condition must be fulfilled - one of the input datasets must have more information than the other. Not all new geometrical object of the end dataset include information about attributes. The completeness of the end dataset can never be complete in terms of thematic information. Datasets generated by conflation can be complete only in terms of geometrical information. The figure illustrates this problem. The figure 3 shows an example of two datasets. The first dataset (source dataset) includes the information about 6 buildings. However in the real world total number of buildings is 8, so two objects in this dataset are not provided. The source dataset includes thematic information about type of use of these buildings. The second dataset (target dataset) includes geometrical information about 5 objects. The information about existence of the buildings number 6, 7 and 8 is not available. Unlike source dataset, target data have information about quantity of floors. This information in the first dataset is missing. The end dataset in the figure 3 shows the complete dataset in terms of geometric information. The table under it shows increment of attributes. Geometrical objects, which are available in both input datasets, have 100% thematically completeness. The missing objects have thematic information of only one input dataset.
Conflation approaches allow the improvement of positional and temporal accuracy as well. Positional accuracy of a dataset can be increased with the information given by another input dataset. If both datasets have the major variance from real world, the arithmetic average of all input datasets can increase this quality element. The temporal accuracy will be improved if metadata provide information about actuality of spatial data.