Denunciar

Compartilhar

Seguir

•0 gostou•1,273 visualizações

Many data sets stewarded by geospatial professionals are spatially correlated derivatives of higher accuracy data sets such as parcels and road networks. This article documents the use of the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the horizontal accuracy of geospatial features.

•0 gostou•1,273 visualizações

Seguir

Denunciar

Compartilhar

Many data sets stewarded by geospatial professionals are spatially correlated derivatives of higher accuracy data sets such as parcels and road networks. This article documents the use of the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the horizontal accuracy of geospatial features.

- 1. A Method for Determining and Improving the Horizontal Accuracy of Geospatial Features Juan Tobar, Shakir Ahmed, Linda McCafferty, and Carlos Piccirillo South Florida Water Management District, West Palm Beach, FL, USA Abstract Many data sets stewarded by geospatial professionals are spatially correlated derivatives of higher accuracy data sets such as parcels and road networks. This article documents the use of the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the horizontal accuracy of geospatial features. The method relies on a comparison with a representation of higher accuracy, and estimates the percentage of the total length of the higher accuracy representation that is within a specified distance of the lower accuracy representation. The method is then extended using topological operators to extract and replace lower accuracy representations with those of higher accuracy. Introduction The South Florida Water Management District (SFWMD) regulates water supply, water quality, groundwater withdrawals, and surface water runoff through the issuance of permits for these activities on specific land parcels. The District’s Regulatory GIS consists of approximately 85,000 permits spread over a 16 county jurisdictional area from Orlando to the Keys. The permits are maintained in an SDE database in 18 feature classes based on permit type. About half of these permits (Environmental Resource Permits) never expire and the other half are valid for 20 years (Water Use Permits) before they need to be renewed. These feature classes are used by engineers, environmental scientists, hydrologist, and compliance staff to make informed decisions during the application review process and post permit compliance. For these reasons it is important that even the oldest permits are depicted as accurately as possible in the GIS system. The Data - Permits From 1980 to 1987 (15 years) permits were drawn directly on USGS 1:24,000 topographic quadrangles maps and mylar overlays. From 1987 to 1995 (8 years) the maps had been migrated to CAD and permits where being heads-up digitized using SPOT 10 Meter Panchromatic and 20 Meter Multi-Spectral Scanner imagery. From 1995 to 1999 (4 years) 1 meter Digital Ortho-photo Quarter Quads were used, and by 1999 some permits were being digitized using county parcel data. Today all permits are digitized to parcel but we have 23 years of badly data digitized with much less than optimal base maps.
- 2. The Data - Parcels The District uses a contiguous parcel base that is composed of features from the 16 counties within the District’s jurisdiction. The State of Florida’s Cadastral Mapping Guidelines recommend that horizontal accuracy should meet or exceed U.S. National Map Accuracy Standards (NMAS). These standards state that at “scales larger than 1:20,000, not more than 10 percent of the points tested shall be in error by more than 1/30 inch, measured on the publication scale.” Common scales for cadastral maps range from 1:500 to 1:10,000 assuming that they are following NMAS horizontal positional accuracy at the 90% confidence will range from ±1.38 to 27.78 feet (Table 2). NMAS NMAS NSSDA NSSDA Map Scale CMAS RMSE(R) Accuracy (R) 95% 90% confidence level 1:1,200 (1” = 100’) 3.33 2.20 ft 3.80 ft 1:2,400 (1” = 200’) 6.67 4.39 ft 7.60 ft 1:4,800 (1” = 400’) 13.33 8.79 ft 15.21 ft 1:6,000 (1” = 500’) 16.67 10.98 ft 19.01 ft 1:12,000 (1” = 1000’) 33.33 21.97 ft 38.02 ft Table 1: Comparison of NMAS, NSSDA Horizontal Accuracy for Parcels These two data set are spatially correlated as permits are based on the same legal boundaries used for parcels and we can therefore use parcels as a control to test the accuracy of our permits. In general, the horizontal accuracy of the parcels can be considered to be an order of magnitude better than the permits. Literature Review Positional accuracy or spatial accuracy refers to the accuracy of a test feature when compared to a control feature. Methods for determining the positional accuracy of points are well established and are usually provided by the Euclidean distance between the test point and a control point. The error can be reported as errors in x, y, and z and descriptive statistics can be generated based on these numbers. Determining the positional accuracy of a line is more complex since they are composed of multiple points each of which may or may not have a matching control point. Additional problems include the determination of an appropriate search radius and the identification of equivalent features to be used for comparison. Atkinson-Gordo and Ariza-Lopez (2002) provide an excellent review of methods for measuring the position accuracy of linear features. Methods for measuring the positional accuracy of polygons come from the extension of methods used to measure the positional accuracy of lines. The five primary methods from Atkinson- Gordo and Ariza-Lopez in brief are as follows: 2
- 3. Epsilon Band Error methods are based on defining an uncertainty band around a polygon feature. The band width is known as Epsilon and the wider it is the greater the uncertainty in the position of a line. The band can be derived by error propogation or by the comparison of test line segments to a control. The method determines an error band rather than determining or quantifying the accuracy of the line Figure 1: Epsilon Bands The Buffer-Overlay method of Goodchild and Hunter (1997) is based on defining a buffer around a control line of higher accuracy and computing the percentage of the length of the less accurate line within the buffer zone. Then, the width of the buffer is increased and the percentage computed again. The process is repeated several times producing a probability distribution. Figure 2: Buffer-Overlay The Buffer Overlay Statistics method of Tveite and Langaas (1999) involves buffering, overlay, and generating statistics. First both the test line (X) and the control line (Q) are buffered to produce buffers XB and QB. An overlay operation is then performed resulting in four types of areas (Figure 3): Type 1: Area outside XB and outside QB: Type 2: Area outside XB and inside QB: Type 3: Area inside XB and outside QB: Type 4: Area inside XB and inside QB: 3
- 4. A number of different statistics can be generated from the above metrics but for our purposes the most interesting is Type 4 which will dominate if the test and control polygon are very similar. When the lines are similar in form but differ in position (displacement is present), an estimate of the positional accuracy can be made when Type 4 approaches 50%. Figure 3: Buffer Overlay Statistics Hausdorff Distance methods of Abbas, Grussenmeyer and Hunter (1995) is based on calculating the Hausdorff distance on a pair of equivalent lines that have been generalized and normalized using the RMSE and a generalization factor. Two values are computed for evaluation of a line: percentage of agreement (ratio between the normalized lines and the original lines) and the RMSE for planimetric features (computed from all the normalized lines). Figure 4: Hausdorff Distance Maximum Proportion Standard (MPS) and Maximum Distortion Standard (MDS) method of Veregin (2000) is based on the computation of the uniform distortion (UDD). The UDD is computed from areas between two lines and the length of the line in the map. Then, a diagram of cumulative frequencies is built for a given band width at a given level of confidence. 4
- 5. Figure 5: MPS and MDS The advantages of the Buffer-Overlay method over other methods discussed is that: (1) it can perform effectively without the need to extract both the test and the control polygon, (2) it does not require matching of points between the two representations, (3) it is relatively insensitive to outlying values, and (4) it is statistically based. Additionally, the algorithm uses common buffering and clipping functions available in all major GIS. The Test Area In order to thoroughly test the limits of our procedures for determining and improving horizontal accuracy we chose to run our test on a subset of the data. Specifically, we extracted the Environmental Resource Permits for Township 44S Range 25E in Lee County, Florida. Lee County was selected because it was an area known to have permits that were highly displaced from their parcel counterparts. All permits that intersected this township range were extracted into a File Geodatabase consisting of 259 features. Figure 6: Test Area 5
- 6. Methods A straight forward method for determining the horizontal accuracy of a polygon feature class is to measure the offset between polygon vertices and parcel vertices and then calculate the Root Mean Square Error (RMSE). In order to facilitate this activity a C# program was written that would allow staff to create a database of coordinate sample points. The RMSE provides us with the accuracy of the entire feature class but does not tell us the accuracy of individual permits, hence, the need for Buffer-Overlay. Buffer-Overlay is usually implemented by buffering a control line and quantifying how much of the test line is found within each buffer. This works well with small control data sets such as a shoreline but is not practical when using parcels. In this case it would require buffering each parcel line segment and then checking for an overlapping permit line segment that in the majority of cases does not exist. This implementation will therefore buffer the permit lines (test) and quantify how much of the parcel line (control) is found within each buffer. The output is the cumulative probability (CP) curve for each individual permit. The pseudo code for calculating the initial horizontal accuracy is as follows: Convert parcel polygons to parcel lines For each permit o Buffer from 0.5 ft to 60 ft @ 0.5 ft intervals Clip the parcel lines (control) using buffer distance Drop dangling nodes (where length = buffer) Calculate the CP If CP 1 horizontal accuracy is the buffer distance Else If CP 0.999 and buffer < 60 next buffer Clipping produces short and long line segment dangles as artifacts the length of which are directly related to the buffer distance used to clip. Short dangles are easily removed by eliminating segments equal to the buffer distance. In the case of long dangles the CP reaches 1 before a complete ring can be extracted and will result in a failed polygon build. This algorithm was run on all polygons in the test area resulting in 259 curves composed of the individual probability at each buffer distance for each feature. In Figure 7 a random sample of CP curves for 21 permits is displayed. On this graph the x-axis represents the distance buffered from 0.5 to 60 feet @ 0.5 ft intervals. The y-axis represents the CP and when the curve reaches 1 or more the length of clipped parcel line is greater than or equal to the perimeter of the permit 6
- 7. line. In these cases the buffer distance used is assigned as the horizontal accuracy of the permit. Those curves that never reach 1 are outside of our maximum buffer distance of 60 feet. 1.2 1 Cumulative Probability (%) 0.8 0.6 0.4 0.2 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5 32.5 34.5 36.5 38.5 40.5 42.5 44.5 46.5 48.5 50.5 52.5 54.5 56.5 58.5 Buffer Distance (ft) Figure 7: Cumulative Probability for Individual Permits Phase I Correction Phase I involved converting the extracted parcel line segments into polygons using geospatial tools. This functionality is built into many GIS and is best associated with the creation of parcel polygons from meets and bounds entered using Coordinate Geometry. The pseudo code for this is as follows: For each permit o Buffer at the accuracy level previously determined o Clip the parcel lines (control) using the test buffer o Drop dangling nodes (where length = buffer) o Build parcel lines as polygons Compare area of polygon to original permit Only accept if polygon area = 0.03 * permit area Build Succeeds/Fails In some cases long line segment artifacts are extracted that form closed rings and results in polygon builds that are significantly larger or smaller in area than the original permit and can be excluded through an area comparison. 7
- 8. Once complete each permit feature will have a CP and an assigned horizontal accuracy. The RMSE will be recalculated to quantify the improvements on the entire feature class. Phase II Correction Phase II involved adding arc segments to parcel lines with gaps in order to form a closed ring that could be built into a permit polygon. The pseudo code for this is as follows: For each permit that failed to build o Buffer @ accuracy level previously determined o Clip the parcel lines (control) using test buffer o Drop dangling nodes (where length = buffer) o For each remaining node Identify the closest node Connect the two nodes with a line segment o Build lines as polygons o Compare area of polygon to original permit if polygon area = permit area ( +/- 0.03 * permit area ) In this case, an improved CP cannot be calculated since Phase 2 adds line segments to permit features where they are missing from parcel features. Since the CP is based on the parcels and in this case parcel line segments are missing an improved CP cannot be calculated. However, we can recalculate the RMSE to quantify any improvements. Results The initial confidence interval on the estimate of RMSE for x and y at 95% probability was calculated using 30 coordinate pairs from the entire test area. The initial values were 20.22 ± 5.74 in the x and 22.18 ± 7.29 in the y (Table 2). The RMSE measure is circular meaning that the values are relatively similar between the x and y and indicate that there is no systematic error in the data that would produce more errors in any particular direction. Initial X/Y Dimension Definitions Values Confidence interval on the estimate of RMSEx at 95% probability RMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95 Confidence interval on the estimate of RMSEy at 95% probability RMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46 Table 2: Initial Root Mean Square error (RMSE) 8
- 9. 40 Figure 8, is a graph of the initial 35 horizontal accuracy distribution from 0 30 to 60 feet for all 259 permit features. 25 The distribution has two peaks at either # 20 extreme representing a large number of 15 high accuracy features ( 0.5 feet) and 10 a large number low accuracy features 5 0 ( 60 feet) in between the curve is randomly distributed and contains a 61 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 significant number of features. Buffer Distance (ft) 300 Figure 9, is a graph of the cumulative horizontal accuracy distribution. In the 250 best case scenario this would be a 200 straight line across the y–axis at 259 # 150 indicating that all features had 100 accuracies of 0.5 feet. About 10% of the features have accuracies 0.5 feet, 50 then there is a steady stream of features 0 of various accuracies up to 60 feet 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 61 (80%), and lastly about 10% of the records were not measured because Buffer Distance (ft) their accuracy was 60 feet. Figure 10, is a classified map of the initial horizontal accuracies. Permits in green have accuracies of 1 foot, yellow from 2 to 59 feet, and red from 60 feet to 999. Where 999 represents features beyond our 60 foot buffer distance. 9
- 10. Phase I Correction was applied once the RMSE for the feature class and individual feature accuracies had been generated. Phase I correction consisted of buffering features at the previously determined accuracy, using this buffer to clip parcels, and then building higher accuracy replacement polygons. Buffer-Overlay was then used to re-calculate the horizontal accuracy for all permits. Figure 11 is a graph of the initial (red) 140 and Phase I (green) accuracy 120 distribution for all 259 permits. After 100 correction the number of features with 80 displacements of 1 foot increased by # 105 records or 40%. 60 40 20 0 61 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 Buffer Distance (ft) Initial Phase 1 Correction Figure 12 is a close-up view of the 35 curve for horizontal accuracies between 30 0 and 30. Here we see that the 25 amplitude of the curve has been 20 reduced and that the Phase I curve # 15 (green) runs above the initial conditions 10 (red) for accuracies 1 foot and below 5 for the rest of the curve. 0 4.5 6.5 0.5 2.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 Buffer Distance (ft) Initial Phase 1 Correction 10
- 11. 300 Figure 13 is a graph of the cumulative curve for both the Initial (red) and 250 Phase I (green) conditions. Here we see 200 that an addition of 105 records now # 150 have accuracies of 1 foot. 100 50 0 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 61 Buffer Distance (ft) Initial Phase 1 Correction In Table 3, the before and after RMSE are provided for comparison displaying a significant reduction in the mean of the RMSE. Confidence interval on the estimate of Initial Phase I RMSEx at 95% probability X/Y Dimension Values X/Y Dimension Values RMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95 15.52 ± 5.78 = 9.75 to 21.3 RMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46 12.76 ± 4.6 = 8.16 to 17.35 Table 3: RMSE Initial and Phase 1 Correction In Figure 14, two maps are shown depicting the horizontal accuracy before (left) and after Phase I (right). Figure 14: Before and After Accuracy Classification 11
- 12. In the process of building higher accuracy features in Phase I some polygons could not be built because of clipped line segments that did not form a complete rings. Phase II atempts to correct these features by adding line segment at dangling nodes in order to form a complete ring. This operation resulted in 6% or 15 additional records being classified as 1 foot (Figure 15 and 16). 35 Initial 30 Phase 1 Correction Phase 2 Correction 25 20 # 15 10 5 0 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5 26.5 27.5 28.5 29.5 Buffer Distance (ft) Figure 15: Initial, Phase I, and Phase II Horizontal Accuracy Distribution 300 250 200 # 150 Initial Phase 1 Correction 100 Phase 2 Correction 50 0 61 24.5 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 26.5 28.5 30.5 32.5 34.5 36.5 38.5 40.5 42.5 44.5 46.5 48.5 50.5 52.5 54.5 56.5 58.5 Buffer Distance (ft) Figure 16: Initial, Phase I, and Phase II Cumulative Curves 12
- 13. Discussion Many of the data sets stewarded by geospatial professionals are based on or directly related to higher accuracy data sets that could be used to improve horizontal spatial accuracy. In this paper we have demonstrated the use of Buffer-Overlay to determine and improve the accuracy of permits whose boundaries are related to higher accuracy parcel boundaries. The initial accuracy assessment included the RMSE for the feature class and then each feature was assigned a horizontal spatial accuracy from 0 to 60 feet at 0.5 foot intervals. Phase I used these accuracy measures to clip parcel lines and build higher accuracy polygons. The results were a 40% increase in the number of records with accuracies 1 foot. Phase II examined those records that failed to build in Phase I. Line segments were added between node gaps in order to form rings that could be built into polygons. The result was a 6% increase in the number of records with accuracies 1 foot. In general, we find that Buffer Overlay is an effective method for quantifying and improving the accuracy of features where control data exists. Most data stewards would acknowledge having a data set that should be improved but lack the time and money to make such improvements. The cost of improving data using Buffer Overlay is confined to algorithm development and time requirements if automated boil down to CPU cycles leaving the steward free to focus on the capture and accuracy of new data. References [1] Goodchild, F.M., and G.J. Hunter, 1997. A simple positional accuracy measure for linear features, International Journal of Geographical Information Sciences, 11(3):299-306. [2] Atkinson, A.D.J., and F. Ariza, 2002. Nuevo Enfoque para el Analisis de la Calidad Posicional en cartografica Mediante Estudios Basados en la geometria Lineal, Proceedings XIV International Congress of Engineering Graphics, Santander, Spain. [3] Tveite, H., and S. Langaas, 1999. An accuracy assessment method for geographical line data sets based on buffering, International Journal of Geographical Information Sciences, 13(1): 27- 47. 13