GEOSTATISTICAL_ANALYST

Putu Santikayasa
Putu SantikayasaResearcher em Bogor Agricultural University
Geostatistical Analyst
Summarized by – Dr. Danny M. Vaughn
(08/14/06)
Spatial Analysis by Visual & Statistical Description
• Contour Map – lines delineating values of equality (p42).
• Symbol Map – Symbols posted representing a class of data values
(p43).
• Grayscale Map – Similar to symbol map, but uses gradations of
gray values (p44).
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
• Indicator Maps – Binary raster plots delineating class
values (p45).
• Each plot represents a different range or thresholds of
values, e,g, concentrations of dissolved solids in a lake.
• Distribution of values above or below a threshold are
displayed & their spatial patterns may be analyzed for
trends (directional bias).
GEOSTATISTICAL_ANALYST
• Moving Window Statistics – used to assess
anomalies of average values and variability
(heteroscedasticity) of values throughout a region (p46).
• Create a grid whose size is dependent upon the size of
the data set.
• Analogous to a raster spacing (spatial resolution).
• Given a data set with a 1 meter grid spacing and 100
points:
• Select a window (search neighborhood) that is
not too large to over-generalize, yet not too small
that anomalies are not picked up.
• Search neighborhoods are used to compute
mean and standard deviations.
GEOSTATISTICAL_ANALYST
• If local means (based upon the number of values used
in the grid) are heavily influenced by erratic values, use
the median.
• Mean (top value) & standard deviations (bottom value)
for moving windows posted (p47).
• Note patterns; higher standard deviations where
variability of values is greatest in a window, Higher
means correspond to highest values in a window.
• Standard deviations vary more than means throughout
the region.
GEOSTATISTICAL_ANALYST
Proportional Effect
• Anomalies in the local variability have an impact on
the accuracy of estimations in spatial statistical
techniques.
• Areas of uniform values (low variability) render the
most credible estimates.
• Consider four relationships between local mean and
local variability (p49).
Graph A - Local mean (straight line) and variability
are constant. Data values fluctuate about the local
mean, but no obvious change in variability.
Graph B – A trend in the local mean, but the
variability remains constant w/ rising & falling
mean.
GEOSTATISTICAL_ANALYST
Graph C – Constant local mean while the variability
exhibits a trend (rises).
Graph D – A trend in both the local mean and
variability (both mean and standard deviation rise).
Most common case in Earth Science data.
• Ideally, data should be as A or B (relatively constant
variability), but D is acceptable since variability changes
noticeably, but it is related to local mean, therefore
somewhat predicable.
• When a relationship exists between the local mean and
local standard deviation, a proportional effect exists.
• A scatterplot (mean vs. standard deviation) shows a
visual means for assessing this relationship (p51).
• A low correlation coefficient indicates a low
relationship.
• Univariate, normally distributed data can be spatially
arranged to render a proportional effect.
GEOSTATISTICAL_ANALYST
Geostatistical Analysis - an ArcGIS extension used to create a
statistically valid, continuous surface.
Basic steps to creating a surface
• Add layers in ArcGIS.
• Explore statistical properties of data through:
• Histogram (univariate distribution).
• Voronoi polygons – analyze spatial
variability.
• Normal QQPlot – Check for normality.
• Trend Analysis – identify global trends.
• Semi-variogram/covariance cloud – analyze spatial dependencies
(autocorrelation) in the data.
• General QQPlot – explore whether two datasets have the same
distributions.
• Cross-covariance cloud – analyze cross-covariance between two
datasets.
• Select a surface model to create a surface.
• Assess the output surface.
• Compare other models for optimal fit.
• Two basic methods of interpolation –
• Deterministic – use mathematical functions.
• Geostatistical – use statistical & mathematical
functions.
Deterministic Methods
• Sample points (observed) of phenomena are established at
locations.
• Algorithms interpolate (predict) other (unmeasured) points &
create a continuous surface.
• Premise – objects that are close to each other tend to be more alike
than objects further apart (Tobler, 1970).
• Inverse distance weighted method – based upon Local similarity.
• General inverse weighted formula – Z’(so) = S lZ(si)
• Weight formula - li = dio
-p / S dio
-p
Z’(so) = a predicted value for location (so).
li = weights assigned to each measured point.
Z (si) = an observed value at location (si).
dio
-p = distance between prediction location (so) & each
measured location (si).
p = a power function (higher powers plot as a steeper
curve on the left end of a graph of relative weight (y
axis), distance (x axis).
• Weights decrease rapidly w/ distance. Scaled so that Sl = 1.
• Optimal p values are derived as the minimum RMS prediction error.
• Cross-validation process computes the difference from a
measured vs. predicted value for that location.
• An exact (predicts a value identical to the measured
value at a known location w/o prediction standard error, & forces the
surface to pass through data values) interpolator.
• An output surface is sensitive to clustering & outliers.
• Works best when the data is evenly distributed.
• A predicted surface can not extend above a maximum or below a
minimum value.
• IDW assumes the surface is driven by local variation captured by
neighborhood values.
• Use the Validation option by creating training (contains measured
locations) & test datasets (used to validate the predictions).
• Global & Local Polynomial Interpreters – modeled by polynomial
equations.
• First order – linear.
• Second order – one bend in the plane (Quadratic).
• Third order – two bends (cubic).
• An inexact (doe not predict a value identical to the measured value
at a measured location, & does not force the surface to pass through
data values) interpolator.
• Points will plot above & below the fitted plane.
Global Interpolation.
• Use for gradually varied surfaces.
• To examine & remove global trends (trend analysis).
• Surfaces are highly susceptible to extreme values,
especially near edges.
• Anisotropic* (ellipse) or Isotropic* (circle) distributed
data can be modeled.
• Anisotropic – spatial dependence (autocorrelation) changes
w/ distance & direction between two locations.
• Isotropic - spatial dependence (autocorrelation) changes
w/ only distance between two locations.
Local Interpolation.
• Accuracy of any surface increases w/ an ability to
identify local trends.
• Fits many polynomials w/in overlapping neighborhoods.
• Accounts for local variation in a surface.
• Fits a surface using all points in a defined neighborhood.
• Provides a smooth fit for short range variation in longer range
trends.
• Successive iterations cross-validate an output surface using
optimal parameters based upon a minimum RMS predicted error.
• Local polynomials are sensitive to neighborhood distance.
• Anisotropic (ellipse) or Isotropic (circle) distributed data can be
modeled.
• Radial basis functions – (Local/neighborhood) polynomial
interpolation (smoothing).
• An exact (forces the surface to pass through data values)
interpolator.
• Thin-plate spline, spline w/ tension, completely
regularized spline, multiquadric spline, & inverse
multiquadric spline.
• Conceptually like rubber sheeting through points w/ minimum
curvature.
• A predicted surface can extend above a maximum or below a
minimum value.
• Uses smaller, overlapping planes.
• Thin-plate splines.
• Splines w/ tension.
• Optimal parameter is derived as the minimum RMS prediction
error. Higher values produce smoother surfaces.
• Use to create smooth surfaces from large sets of points.
• Best for gently varied surfaces.
• Not good for irregular, highly varied data, & data prone to error or
uncertainty.
Geostatistical Methods
• Regionalized variable in geostatistics – consists of: Structured
aspect reflecting large scale tendencies, and a random aspect
reflecting local irregularities.
• The accuracy of any surface increases w/ an ability to identify
local trends.
• Must characterize error & variability of the predicted surface.
• Based on statistical models.
• Geostatistical methods provide measures of error & uncertainty
(accuracy of predictions).
Basic Principles of Geostatistical Methods
• Assumes all values are a result of random processes w/ dependence
(as opposed to an independent random process, e.g. a coin toss).
• Dependent random process - a coin is tossed three times, tails,
heads, & heads. A fourth position is determined by a rule: since the
second & third flips are heads, assign the fourth the same as the first
(tails).
• Reality is represented by introducing randomness through local
fluctuations termed, drift, (as opposed to errors in Trend analysis)
around a fixed surface.
• Fluctuations are not errors in Geostatistical analysis, rather actual
features with their own structure
• Autocorrelation – statistical correlation between spatial random
variables of the same type where correlation is dependent upon
distance and/or direction (spatially dependent) separating the
locations.
• In geostatistics, knowing spatial locations allows distances to be
computed between observations & autocorrelation modeled as a
function of distance through the semi-variogram.
• Observe scatter of points in the semi-variogram. If points are
close to the line of best fit on one side, yet spread out more on the
other side, directional autocorrelation may exist.
Spatial Continuity
• Exists for most Earth Science data (similar values with
data close to each other, positive spatial autocorrelation).
• Scatterplots may be employed in semivariograms to
show the value of one variable and the value of the same
variable, but at a nearby location.
h – Scatterplots
• Shows all possible pairs of data values whose locations
are separated by a distance in a direction.
• The location of any point can be described by a vector
as can the separation between two points (p52).
GEOSTATISTICAL_ANALYST
• Given an origin at 0,0. Point xi,yi can be written as
vector ti, and point xj,yj can be written as vector tj.
• The separation between point i & j is ti - tj. This can
also be expressed as:
(xi - yi, xj- yj). The vector (distance) between these two
points (i to j) will now be referred as, hij.
• On a h - scatterplot, the x axis is labeled V(t), & the y
axis is labeled V(t+h).
• The x coordinate of a point is a value at a particular
location, and the y coordinate is a value a distance and
direction h away.
• Recall an h value is a vector, thus we can express it as
h = (0,1), where we have taken each data location and
paired it with the data location whose easting is the same,
and whose northing is 1 m larger to the north (p53,A).
• The shape of the cloud of points on an h – scatterplot
delineates how continuous the data values are over a
certain distance in a specific direction (p54).
• If data values at locations separated by h are similar,
then the pairs will plot close to the line x = y (1, 1), or a
45˚degree slope through the data cloud (p54).
• As the separation distance increases (y increases), the
similarity between pairs of values decreases, and the
points on the h - scatterplot spread out further from the
diagonal line.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
Correlation Functions, Covariance functions, and
Variograms
• h – scatterplots require some quantitative summary
of the information contained within them.
• One essential feature is fatness of the cloud of points.
• Summarized by a correlation coefficient (p(h)).
• As the cloud gets thicker (y increases), coefficient gets
smaller.
h Correlation Covar. Moment of Inertia
(ppm2) (ppm2)
(0, 1) 0.742 448.8 312.8
(0, 2) 0.590 341.0 479.2
(0, 3) 0.560 323.8 521.4
(0, 4) 0.478 291.5 652.9
• The relation between the correlation coefficient of an
h – scatterplot (p(h)) and h is termed a correlation
function or correlogram (p57).
• The correlation coefficient is dependent on h which is
vector defined as having magnitude (visual angle
varying inversely w/ distance from the eye) and
direction.
• Covariance (C(h)) – an alternative index for spatial
continuity, & termed covariance function (C(h)).
• A covariance function is also plotted as the covariance
(C(h)) against h, & is also inversely related to thickness
(y increases) (p57).
GEOSTATISTICAL_ANALYST
• Moment of inertia (g(h))– computed as, 1/2n S (xi –
yi)2 about x = y (45˚).
• As h increases, points drift away from the 45˚ line,
thus the moment of inertia is a measure of the flatness
of the cloud – it increases as the cloud gets fatter.
• The relationship between the moment of inertia of an h
– scatterplot and h is termed a semivariogram
(variogram for short) (p57).
• Aberrant points can have a pronounced impact on
summary statistics.
• Note the significant difference in a correlation
coefficient w/ & w/out a single point value.
Correlation Coefficient
h All Points 19 ppm Excluded
(0, 1) 0.742 0.761
(0, 2) 0.590 0.625
(0, 3) 0.560 0.551
(0, 4) 0.478 0.559
Components of the Semivariogram
A Semi-variogram surface is controlled by the lag size and numbe
of lags.
Lag – distance between pairs of points.
• If the lag is too large, short-range autocorrelation is masked. If
the lag is too small, many empty bins will result.
• Lag - Multiply lag size x number of lags. This value should be <1/2
the largest distance among all points.
• Lag (bin) size – a limit of the distance (width) defining a bin into
which pairs of locations of approx. equal distance & direction are
placed to reduce the number of combinations.
• Increase or decrease lag based on a small or large range of points.
Binning (classification of lag size)- reduces the number of points plotted
making the semivariogram easier to interpret.
• Locations are grouped by distance & direction.
• Semivariogram average distance (h) =
0.5 * average [(value at a location Z(si) – value at location Z(sj)2].
• Only average distance & semi-variance for all pairs in a bin are plotted
as a single point on the empirical semivariogram.
• An empirical semivariogram value (from the y axis) for each bin for
each direction is plotted as a red dot. X axis is a distance from the center
of the bin to the center of the semivariogram surface).
• A semivariogram surface is plotted for each bin value (average
distance & semi-variance for all pairs in a bin) represented as color-
coded pixels.
GEOSTATISTICAL_ANALYST
Range – the distance where the semivariogram model
levels out.
• Sample points with distances closer (within) than the
range are spatially autocorrelated. Points beyond the
range make no useful contribution to the interpolation.
• The range defines the “zone of influence.”
Sill – The value on the y axis that defines the semi-
variance at the range.
• The flattened line from this point represents no change
in variation w/ increases in distance.
Nugget – The point where the semi-variogram model
intercepts the y axis.
• Due to measurement error, and/or variation at
distances smaller than the sampling interval.
Modeling the Semivariogram - Assuming stationarity,
autocorrelation can be examined & quantified (spatial modeling).
Stationarity
• Statistics relies on some notion of replication.
• It follows, estimates can be derived, & variation & uncertainty of an
estimate understood from repeated observations.
• Through continued observations of many samples,
dependencies become apparent.
• Statistics relies upon replication.
• Estimates derived with variation & uncertainty understood from
repeated observations.
• Stationarity is used to obtain replication in a spatial setting.
• A variogram is only used for a given practical distance.
• Stationarity – a variable is stationary if its distribution is
invariant (does not change) under translation (the entire
distribution is shifted through coordinate space in mass).
• Normally only the mean and covariance are required to
satisfy this requirement.
• Mean stationarity – constant (no spatial drift) between samples &
independent of location.
• Also, constant variance (no outliers) across space.
• Second-order stationarity – assumes covariance is the same between
two points that are the same distance & direction apart regardless of
points chosen.
• Covariance is dependent on the distance between any two
values, not location.
• For semi-variograms, intrinsic (genuine, essential, real) stationarity
is the assumption that the variance of the difference is the same
between any two points that are the same distance apart (increments)
regardless of which two points are chose. Also referred to as, quasi-
stationarity.
• Local fluctuations dominate the overall trend.
• This is exhibited within a natural increase in the variance beyond
these shorter sections (observations) of the distribution of a regional
variable as distance increases from each observation.
• Similar distance between points that provides replication to
estimate dependence rules & allow predictions to be made.
Assuming intrinsic stationarity, autocorrelation can be
quantified.
• Termed spatial modeling, structural analysis, or variography.
• A line is fitted in the empirical semivariogram similar to a least
squares line in regression analysis.
• Parameters of the curve should minimize the
deviations from the points by some criterion
• Used to quantify spatial autocorrelation – assumes
closer points (left side, x axis) are more similar (lower
semi-variance on y axis) than points further away (right
side, x axis).
• Semivariogram Model – defined as one-half the variance of the
difference between two variables at two locations.
where, • g (si sj) = ½ var(Z(si) - Z(sj)) where,
g (si sj) = semivariogram (predicted values).
var = variance.
(si sj) = two locations.
(Z(s) = observed (measured) values.
• This is a dissimilarity function of increased variability (semi-
variance, y axis) w/ increased distance.
• Semivariogram fuction – average[(Z(si) - Z(sj))2].
• With observed values close to each other, the average
is small.
• As observed values get further apart, their values become
more dissimilar w/ higher variability & higher averages.
Calculating the Empirical Semivariogram
Step 1 – Given a data set of five locations [(1,5), (3,4), (1,3), (4,5),
(5,1)]and their associated values [(100), (105), (105), (100), (115)],
calculate the distance (x values) between each pair of locations
using the Pythagorean theorem.
Table 1
Value Locations Distance Cal. Distances
100 & 105 (1,5), (3,4) √[(1-32) + (5-42) 2.236
100 & 105 (1,5) , (1,3) √[(02) + (22)] 2
100 & 100 (1,5) , (4,5) √[(32) + (02)] 3
100 & 115 (1,5) , (5,1) √[(42) + (42)] 5.657
105 & 105 (3,4) , (1,3) √[(22) + (12)] 2.236
105 & 100 (3,4) , (4,5) √[(12) + (12)] 1.414
105 & 115 (3,4) , (5,1) √[(22) + (32)] 3.606
105 & 100 (1,3) , (4,5) √[(32) + (22)] 3.606
105 & 115 (1,3) , (5,1) √[(42) + (22)] 4.472
100 & 115 (4,5) , (5,1) √[(12) + (42)] 4.123
Step 2 – Calculate the empirical semivariance (y values) as
0.5 * [(value at location i – value at location j) 2], or one-half times
the difference squared for the values of the paired locations (empirical
semivariance).
Increasing the number of points (five in this example) can soon result
in a very large number of paired combinations (Table 1), since each of
the five point locations has an accompanying point to form a vector.
Table 2
Value Diff. Diff.2 Empirical Semivariance
5 (5)2 = 25 12.5
5 (5)2 = 25 12.5
0 (0)2 = 0 0
15 (15)2 = 225 112.5
0 (0)2 = 0 0
5 (5)2 = 25 12.5
10 (10)2 = 100 50
5 (5)2 = 25 12.5
10 (10)2 = 100 50
15 (15)2 = 225 122.5
Step 3 – Binning the Empirical Semivariogram. Since a
data set can consist of 100’s to 10,000’s of control points,
the individual vectors (delineated by two ordered
coordinate pairs) and the empirical semivariance values
associated with each pair, will quickly result in a cloud of
points in the semivariogram that will be unintelligible to
interpret.
• Binning reduces the total number of points plotted in
the semivariogram by assigning a bin (storage
compartment) based upon the lag size (distance) and
number of lags.
• Binning classifies locations grouped by distance &
direction.
• Only the average distance and average semivariance for
all pairs included in the bin are plotted as a single point.
• The basic equation would now be – Semivariogram
(distance h) = 0.5 * average [(value at location i – value at
location j) 2].
• This equation is interpreted as, compute one-half the
average of the differences squared of the values for all
paired locations separated by a distance h. Distance is
the lag distance.
• Considering a lag distance of some unit measure (e.g.
1), we create 5 bins, & will compute the average
semivariance for all pairs of points falling within the
distance limits (lag distance) assigned for each bin.
• Only an average distance & average semivariance for
all pairs in a bin are plotted as a single point on the
empirical semivariogram.
Table 3
Lag Dist. (Bin) Pairs Dist. Avg. Dist. Semivar. Avg.
1-2 1.414, 2 1.707 12.5, 12.5 12.5
2-3 2.236, 2.236, 3 2.491 12.5, 0, 0 4.167
3-4 3.606, 3.606 3.606 50, 12.5 31.25
4-5 4.472, 4.123 4.298 50, 112.5 81.25
5+ 5.657 5.657 112.5 112.5
Trend Analysis in Geostatistics
• A trend surface is made-up of global (structured or deterministic) &
random (uncorrelated, local irregularities referred to errors in Trend
Analysis) short-range variation.
• Global trend – all measurements are controlled by some (non-
random) deterministic factor. (a physical process, e.g. prevailing wind,
data collected along a transect, etc.).
• May be represented by a polynomial formula & removed from the
analysis, then returned before predictions are made.
• Modeling long-range or coarse scale variation.
• Note asymmetry of the trend model and variance in change across
each axis.
• Global trends are plotted as a 3-D box diagram.
• Plotted as sideways views along the x, z & y, z planes.
• Polynomials (best fit line) are created for each plane.
• Flat lines indicate no trend.
• A definite pattern to the polynomial suggests a global trend.
• Rotating the model enables an enhanced visual distribution.
• Model the residuals & reintroduce the trend in the prediction model.
• Used to examine local characteristics of spatial autocorrelation.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
Removing the Global Trend
• Zk
j(si) = the jth measurement of variable type k at the ith
spatial location si decomposed into:
m(s) = a deterministic (predicted mean) trend.
e(s) = random, autocorrelated errors (distance dependent).
Si = a location.
• Employed to satisfy normality, stationarity assumptions & to
model local, short-range variations.
• Stick to Ordinary Kriging unless there are strong reasons to
remove the trend surface.
• More parameters must be estimated when a trend is removed,
reducing the precision of the surface model.
• Keep the surfaces simple. i.e. 1st or 2nd order polynomials.
• Always check w/ cross-validation, & especially validation.
Random Trends
• Random implies governed by rules of probability, including
dependence of neighboring values (autocorrelation).
• Alignment of block-faulted mountain system is a global trend, but a
specific time-event surficial processes (volcanism) may affect short-
term variations in the mountain’s morphology.
• The shape of the semivariogram/covariance curve may vary (be
influenced) with direction (anisotropy) after the global trend has
been removed, or if no global trend exists at all.
• Anisotropy– spatial dependence (autocorrelation) changes w/ both
distance & direction between two locations.
• Usually not a deterministic process.
• Modeled as a random process (Random error usually of an
unknown cause) w/ higher autocorrelation in a preferred direction.
• Observe scatter of points in the semivariogram. If points are close
to the line of best fit on one side, yet spread out more on the other
side, directional autocorrelation may exist.
• When the data is anisotropic (the anisotropic box checked), the
yellow line becomes several lines indicating many directions).
GEOSTATISTICAL_ANALYST
• The surface cloud is elliptical (w/ an angular component) in shape.
• The outline of the range is displayed as a blue line throughout the
semivariogram/covariance surface cloud.
• The lengths of the semi-minor & semi-major axes determine the
ranges in the surface cloud, & define the search neighborhood.
• Once the shape is determined, establish constraints (number of
points, & partition the shape into bins to avoid directional bias due to
clustering or transected points w/in the shape).
• Ensure there are enough points to account for a meaningful
prediction.
GEOSTATISTICAL_ANALYST
• If enough points are not available for the bins, outside points
(uncorrelated) will be used which degrades the model.
• Distance greater than the range are not correlated.
• When the pairs of points in the semivariogram yield a straight
(horizontal) line (no variability), there is no spatial autocorrelation in
the data, thus a surface would be meaningless.
• The length of the semi-minor axis (minor range) from the center of
the cloud will reach the sill sooner (plot as a steeper grade) than the
semi-major (major range) axis.
• Directional Binning (Anisotropy)
Under the Semivariogram/Covariance Modeling dialog box –
• Use the Show Search Direction box under the
Semivariogram/Covariance Cloud dialog, and rotate or type in angle
(search) direction, angle tolerance, bandwidth, and lag
(distance/width).
• An angle of tolerance determines how close points will be included
from an observed point to the bandwidth.
• Bandwidth is the width of the search.
• Bins (lag distance/width) contain pairs of locations with a
predetermined distance (interval) & direction apart.
• Isotropic - spatial dependence (autocorrelation) changes only w/
changes in distance between two locations.
• When the data is isotropic, the shape of the semivariogram will be a
circle.
• The outline of the range is displayed as a blue line throughout the
semivariogram/covariance surface cloud.
• The length of any axis from the center of the semi-
variogram/covariance cloud will always be the same.
• The range will always be the same in all directions.
• Thus, the sill is equidistant in any direction.
GEOSTATISTICAL_ANALYST
Covariance – a statistical tendency of two variables
to vary in ways related to each other.
• A scaled version of correlation.
• A similarity function of decreased variability (semi-
variance, y axis) w/ increased distance (along the
x axis).
• C (si sj) = covar(Z(si), Z(sj))where,
C (si sj) = covariance (predicted values).
covar = covariance.
(si sj) = two locations.
(Z(s) = observed (measured) values.
Z’ = mean of all the data.
• Covariance function – average[ (Z(si) –Z’) (Z(sj) – Z’) .
• With two variables close to each other (approaching 0
distance on the x axis), they are expected to be similar
w/ a large covariance & correlation.
• Positive covariance – when both variables tend to be
above their respective means together.
• Negative covariance - when one variable tends to be
below & the other variable above its mean.
Summary Statements on Modeling the
Semivariogram
• The semivariogram & covariance functions quantify
autocorrelation by measuring the statistical correlation
as a function of distance.
• A relationship between the semivariogram &
covariance function is expressed as:
g (si sj) = sill - C (si sj) where,
g (si sj) = semi-variogram (predicted values).
C (si sj) = covariance (predicted values).
• Predictions are made using either function because of
their equivalence.
• Finally, a line of best fit (a continuous function or
curve) is fitted through the cloud of points plotted in the
semivariogram from which predictions of attribute
values at unsampled locations can be made through the
Kriging steps.
• When the curve is steep near the origin (short range),
closer neighbors have a pronounced influence on the
prediction.
Exploratory Spatial Data Analysis (ESDA)
• ESDA provides interactive graphical models of the dataset.
Histogram - shows a frequency distribution.
• Shows the shape (skewness & kurtosis) of the distribution.
• Normal distributions are required for quantile & probability maps
using ordinary, simple, and universal Kriging.
Properties include –
• Variances of data values are more constant & the variance
approaches the mean.
• Mean, median, and mode are nearly equal in a normal distribution.
GEOSTATISTICAL_ANALYST
• Variance & standard deviation should approach 1.
• Skewness should approach 0, & kurtosis 3.
• Outliers in a skewed distribution can be global or local.
• Global – high or low value in the distribution.
• Local – a value w/in the normal range of the dataset, yet
locally it is unusually high.
• Both outliers may indicate real abnormalities in the measured
phenomena, or incorrect measurements.
• Trend removal helps justify assumptions of normality &
stationarity.
Normal Score Transformation (NST)
• Goal is to normally distribute all random errors for the entire
population, & make the variances more constant throughout the study
area.
• Perform this step after detrending since covariance & variograms
are calculated on residuals after trend correction.
• Ranks the dataset from low to high by creating an observed
cumulative histogram (OCH).
• Then matches the ranks of the OCH to equivalent ranks from a
cumulative standard normal distribution.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
• Transformation is defined by taking values from the normal
distribution at that rank.
Three methods for NCT –
• Direct – uses the OCD (least smooth w/ least assumptions).
• Linear – fits lines between each step of the cumulative
distribution (middle of the road for smoothness & assumptions).
• Gaussian kernels – approximates the probability distribution by
fitting a linear combination of density cumulative distributions
(smoothest back transformation w/ strongest assumptions).
• Must back-transform to bring values back to their original scale.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
Other transformations (BAL) to Normality
• Any relationship between the variance & the trend are removed w/
these transformations.
• Hence, trend & model autocorrelation are removed (optionally).
• Box-Cox (B) – Small counts in a region of a distribution w/ low
variability (variance). If variance approaches the mean, it approaches
a normal distribution.
• Square Root – as a power function of the data variances will be
made more constant with trends to normality.
• Log (L) – Used as part of the Box-Cox transformation w/
positively skewed distributions & few large values in the dataset.
• Arcsine (A) – used for data that is a proportion (percentage). Again,
makes the variances in the data set more constant.
Declustering Preferential Data -
• Use when data are not random or regular spaced, i.e. higher
concentration of points in some regions.
• May weight the data w/ denser points receiving less weight.
• Cell declustering –
• A grid of rectangular cells is assigned over the data locations.
• A weight is attached to each data location, & inversely proportional
to the number of points in a cell.
GEOSTATISTICAL_ANALYST
• Optimal cell size by –
1.Use a minimum weighted mean if data is preferentially sampled in
areas of high values.
2.Use a maximum weighted mean if data is preferentially sampled in
areas of low values.
• Voronoi Map –
• All locations w/in the polygon are closer to the sample point than
any other sample point.
• Weight each polygon in proportion to the area it represents.
• Border polygons are often given more weight since the borders are
composed of truncated rectangular sides.
GEOSTATISTICAL_ANALYST
Normal & General QQPlot
• QQPlots are quantiles from two distributions.
• Normal QQPlot – a plot of data values (y axis) versus
standardized units (x axis).
• Points that do not fall on a straight line represent
departures from the normal trend & the data is
asymmetric.
• General QQPlot – used to assess the similarity of the
distributions of two sets of data, one set on each axis.
• Two identical distributions are plotted as a straight
line.
GEOSTATISTICAL_ANALYST
Kriging Methodology
Kriging family – Does not require a normal distribution, except for
obtaining probability and quantile maps in Ordinary, Simple, &
Universal Kriging.
• Data collected as point events.
• Values are actually spatially continuous (values occur everywhere
on a surface).
• Continuous (Real) numbers, integers, ordered
categorical, unordered categorical, or binary.
• Weighted surrounding measured values (like
IDW).
• Quantifies closeness of objects & spatial arrangement (spatial
autocorrelation) by variography.
• Weights are statistically determined by the spatial structure
expressed in the semi-variogram.
• Point pairs closer in distance (to a predicted location) have smaller
variability.
• Must minimize the weighted least-square differences.
• Optimize the locations contributing w/ a search neighborhood.
• Circle if there is no directional bias.
• Ellipse for directional bias.
GEOSTATISTICAL_ANALYST
• Assumes all random errors are 2nd order stationarity, &
random errors have a mean = 0, & covariance
between any two random errors is distance & direction
dependent, not by exact locations.
• Also capable of producing maps of:
• Kriging standard errors associated w/ predicted values.
• Probability whether a critical level is exceeded.
• Subsets to assess the quality of the output surface.
• Quantiles* for a predetermined probability level.
* Quantile – the pth quantile is a value between 0-1, &
represents the proportion p (of unity) of the data below
this value.
• Basic Kriging equation – Zk
j(si) = m(s) + e(s) where,
Zk
j(si) = the jth measurement of variable type k at the ith spatial
location si decomposed into:
m(s) = a deterministic (predicted mean) trend.
e(s) = random, autocorrelated errors (distance dependent).
Si = a location.
• Trend removal helps justify assumptions of normality &
stationarity.
Kriging Methods
• While sampling provides accurate information at data points, it
does not tell us what values exist between them.
• Kriging – an estimation (weighted average) method yielding the
best (minimum variance) unbiased linear estimates of point values.
• Accuracy is based upon:
• Number of samples & data quality.
• Geometry of the sample points (Dispersed
distributions are optimal).
• Distance between sample points.
• Spatial continuity (regularity) of a variable.
Ordinary Kriging - Use if m (mean) is unknown.
• Must have data points sampled from continuous phenomena.
• May be used for data that have a trend.
• May create training & test datasets for validation.
• May de-trend & perform Kriging on the residuals.
• Simple Kriging – when the trend is completely known (all
parameters & covariates known), whether constant or not.
• m (mean) is a known constant.
• Must have data points sampled from continuous phenomena.
• It follows, you also know e(s) as deviations from the mean.
• May produce quantile, probability, prediction, or standard error
maps.
• May create training & test datasets for validation.
• Consider checking for bivariate normality.
• Consider declustering when using the normal score transform &
data has been preferentially resampled with denser points in some
areas.
• Consider parameters for semivariogram/covariance modeling
(mathematically expresses autocorrelation), search neighborhoods, &
cross validation.
• Changing Z(s) = 0 (an indicator variable, 0 or 1), if Z(s) is below
some value, e.g. 0.12 ppm ozone concentrations, or 1 if it is above
this value. A probability map is developed to assess the degree a
value exceeds a given threshold using Indicator Kriging.
• Uses a binary variable.
• A threshold is computed to establish limits for binary assignments.
• Assumes an unknown constant mean.
• Creates a probability or standard error map from 0-1.
• Consider parameters for semivariogram/covariance (mathematically
expresses autocorrelation) & search neighborhoods, although
estimation is difficult w/ 0’s & 1’s.
• Probability Kriging – form an indicator variable of Z(s) & predict
it using the original transformed data Z(s) in a cokriging model.
• Like indicator Kriging, only uses cokriging since there are two
types of random errors (e1(s) & e2(s)) generated from two constants
(m1 &m2).
• Assumes unknown constant means.
• Creates a probability or standard error map from 0-1.
• Consider parameters for semivariogram/covariance
(mathematically expresses autocorrelation) & search
neighborhoods, although estimation is difficult w/ 0’s & 1’s.
• Probability kriging tries to exploit the extra information in the
original data in addition to the binary variable.
• Results in more autocorrelation estimations per variable & cross-
correlation creating more uncertainty.
• Disjunctive Kriging – used to form predictors of functions of
variables.
• Ordinary Kriging uses linear combinations of data to find the final
predictor.
• Disjunctive Kriging can predict either the value (predictor), or an
indicator (derived from the threshold) by generalizing the Indicator
Kriging method to form the predictor.
• Assumes a model - f(Z(s)) = m1 + e(s) where,
• m (mean) is an unknown constant.
• f(Z(s)) is some arbitrary function of Z(s).
• e(s) = random, autocorrelated errors (distance
dependent).
• May produce prediction, probability, standard error of indicators,
or standard error maps.
• A standard error map quantifies uncertainty of the prediction.
• In a normal distribution, the true value is w/in plus/minas 2x the
prediction standard error 95% of the time.
• An appropriate transformation & detrending, semi-
variogram/covariance models (mathematically expresses
autocorrelation), & search neighborhoods.
• Decluster when the data has been preferentially sampled.
• Assumes a bivariate normality which should be examined.
• Universal Kriging – operates like polynomial regression, only the
regression is w/ spatial coordinates as explanatory variables & the
errors are assumed auto-correlated, rather than independent.
• If m(s) is a second order polynomial trend, subtract (detrend) it
from the original data (Z(s)) & Krig the random errors e(s)).
• The mean of e(s) is 0.
• The autocorrelation is now modeled from the random errors.
• Consider parameters for semivariogram &/covariance modeling
(mathematically expresses autocorrelation), search neighborhoods,
& cross validation.
• May produce quantile, probability, prediction, or standard error
maps.
• Cokriging – a family of Kriging methods using multiple
variables.
• Considers different trends on multiple variables (Z1(s) &
Z2(s), (e.g. ozone & particulate materials) & measures
autocorrelation for errors & cross-correlation between
errors e1(s) & e2(s) between two variables.
• Ordinary Cokriging models:
• Z1(s) = m1 + e1(s)
• Z2(s) = m2 + e2(s) where,
m1 & m2 = unknown constants.
e1(s) & e2(s) = random errors.
Z1(s) & Z2(s) = predicted measurement
variables.
• Requires estimating auto-correlation & cross-correlation for each
variable.
• Each estimate for auto-correlation introduces additional variability.
Gains in precision of predictions may not be defensible.
• Consider parameters for semivariogram/covariance (mathematically
expresses autocorrelation), search neighborhoods, & cross-covariance
(mathematically expresses cross-correlation) validation, & detrending.
• May produce prediction, probability, quantile, standard error of
indicators, or standard error maps.
Cross-covariance – Correlation between spatial random variables of
different types with correlation dependent upon distance &/or
direction separating the locations (e.g. NO2 & Ozone).
• Positive cross-covariance – both variables are above their respective
means.
• Negative cross-covariance – when one variable is above & the other
below their its mean.
• Cross-covariance cloud – shows empirical cross-covariance for all
pairs of locations between two datasets placed in bins as a function of
distance between the two locations.
• Bins are averaged & smoothed to create a cross-covariance
surface. Controlled by the lag size and number of lags.
• A global outlier (unusually high value) identified in the histogram
is also visible with all pairings of points in the semi-variogram
cloud, regardless of distance.
• Two distinct groups of points occur in the semi-
variogram cloud for global outliers.
• A local outlier will have high semivariogram values (approaching
the top of the y axis) associated with a single point at close
distances (left side along x axis) in the semi-variogram cloud.
GEOSTATISTICAL_ANALYST
• Both types of outliers can have detrimental effects on a prediction
surface.
• Use the Show Search Direction box under the Semi-
variogram/Covariance Cloud dialog, and rotate or type in angle
direction, angle tolerance, bandwidth, and lag.
Cross-Validation & Validation
• Seeks to establish how well any surfacing model predicts
the values at unknown locations.
• Cross Validation – uses all data points to estimate the
autocorrelation model.
• Each data location is selected, & the remaining known
locations are used to predict its value.
• The predicted & actual (known measurement) are
compared & graphed.
• Ideally, they should be close to the same value.
• Unusual points are set aside, & may require refitting the
autocorrelation model.
• Validation – creates a model for a subset of the data set.
• Used to check the validity of a protocol of decisions.
• Choice of semivariogram model, lag size, search
neighborhood.
• Predicted plot – scatterplot of predicted vs. measured values.
• Kriging tends to under predict large values, & over
predict small values.
• Thus, the slope of the solid blue line is usually less than
1 (black dashed line).
• Error plot – same as the predicted plot, except true values are
subtracted from the predicted values.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
• Standardized Error plot – measurement values are subtracted from
the predicted values, then divided by the estimated Kriging standard
errors.
•All three plots show strength of Kriging predictions.
• With autocorrelation & a good model, the predicted line (blue)
should be close to the measured (dashed black) line.
• If all the data were independent (no autocorrelation), all predictions
would equal the mean of the measured data, & the plot would be a
horizontal line.
GEOSTATISTICAL_ANALYST
• QQPlot – quantiles of the difference between predicted &
measurement values divided by the estimated Kriging standard
errors, & quantiles from the standard normal distribution.
• If the predicted errors from the observed (measured) values are
normally distributed, points should lie close to the dashed black line.
• Obtaining probability and quantile maps in Ordinary, Simple, &
Universal Kriging methods that rely on a normal distribution can be
used with confidence.
GEOSTATISTICAL_ANALYST
Summary of Cross-Validation & Validation Analysis
• Predictions should be unbiased w/ a mean prediction error near 0.
• Standardized prediction errors – prediction errors divided by their
prediction standard errors to standardize (scale). The mean
standardized prediction error should approach 0.
• RMS prediction error (Root-Mean-Square in dialog box) should be
small. Shorter green lines indicate closer predictions to their true
values.
• Average (estimated) standard errors should be close to RMS
prediction errors to correctly assess the variability in a prediction.
• If average standard errors are > RMS prediction errors, variability is
overestimated.
• If average standard errors are < RMS prediction errors, variability is
underestimated.
• If RSM standardized errors are >1, variability is underestimated.
• If RSM standardized errors are <1, variability is overestimated.
Cross Validation & Validation Comparison
• Assesses two surfacing models (e.g. IDW vs. Ordinary Kriging) w/
statistics & scatterplots for each model in a single dialog box.
• Generally, the best model has the parameters as per those listed in
the previous summary.
• Average (estimated) standard errors should be close to RMS
prediction errors for a more valid model. This comparison
supercedes a model even if it has an RMS prediction error lower than
the second model.
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
Measurement Error
• Occur when it is possible to have several different observations at
the same location.
• A soils series description from a survey document & actual field
measurements.
• Measurement error model is expressed as:
• Z(s) = m(s) + e(s) + d(s) where,
• Z(s) = a variable of interest.
• d(s) = measurement error.
• m(s) = deterministic trend.
• e(s) = autocorrelated errors (microscale variation or
nugget effect in this model).
• W/ no measurement error, Kriging is an exact interpolator.
• When errors exist, consider predicting only the filtered value
(m(s) + e(s)) that does not contain the measurement error (d(s)).
• It follows, where data have been collected (actually measured), the
filtered values will not be the same.
• Thus the final map will be more smoothed & an inexact
interpolator.
1 de 135

Recomendados

Variograms por
VariogramsVariograms
Variogramsohn thaik
1.4K visualizações20 slides
Geodetic systems (earth, ellipsoid) por
Geodetic systems (earth, ellipsoid)Geodetic systems (earth, ellipsoid)
Geodetic systems (earth, ellipsoid)Md. Tamim Haider
9.4K visualizações13 slides
Coordinate systems, datum & map projections por
Coordinate systems, datum & map projectionsCoordinate systems, datum & map projections
Coordinate systems, datum & map projectionsKU Leuven
1.4K visualizações18 slides
Spatial Autocorrelation por
Spatial AutocorrelationSpatial Autocorrelation
Spatial AutocorrelationEhsan Hamzei
1.7K visualizações11 slides
Raster data and Vector data por
Raster data and Vector dataRaster data and Vector data
Raster data and Vector dataReham Maher El-Safarini
26.6K visualizações17 slides
GIS Map Projection por
GIS Map ProjectionGIS Map Projection
GIS Map ProjectionMUHAMMAD BILAL SALEEM
8.6K visualizações26 slides

Mais conteúdo relacionado

Mais procurados

Basic of Geodesy por
Basic of GeodesyBasic of Geodesy
Basic of GeodesyKutubuddin ANSARI
18.1K visualizações35 slides
Geometry and types of aerial photographs por
Geometry and types of aerial photographsGeometry and types of aerial photographs
Geometry and types of aerial photographsPooja Kumari
48K visualizações31 slides
Four data models in GIS por
Four data models in GISFour data models in GIS
Four data models in GISProf. A.Balasubramanian
13.6K visualizações69 slides
Remote sensing and aerial photography por
Remote sensing and aerial photographyRemote sensing and aerial photography
Remote sensing and aerial photographyMohammad Mohaiminul Islam
16.1K visualizações55 slides
Scanners, image resolution, orbit in remote sensing, pk mani por
Scanners, image resolution, orbit in remote sensing, pk maniScanners, image resolution, orbit in remote sensing, pk mani
Scanners, image resolution, orbit in remote sensing, pk maniP.K. Mani
10.2K visualizações79 slides
Coordinate systems por
Coordinate systemsCoordinate systems
Coordinate systemsReham Maher El-Safarini
3.9K visualizações11 slides

Mais procurados(20)

Basic of Geodesy por Kutubuddin ANSARI
Basic of GeodesyBasic of Geodesy
Basic of Geodesy
Kutubuddin ANSARI18.1K visualizações
Geometry and types of aerial photographs por Pooja Kumari
Geometry and types of aerial photographsGeometry and types of aerial photographs
Geometry and types of aerial photographs
Pooja Kumari48K visualizações
Scanners, image resolution, orbit in remote sensing, pk mani por P.K. Mani
Scanners, image resolution, orbit in remote sensing, pk maniScanners, image resolution, orbit in remote sensing, pk mani
Scanners, image resolution, orbit in remote sensing, pk mani
P.K. Mani10.2K visualizações
Aerial photography.pptx por Pramoda Raj
Aerial photography.pptxAerial photography.pptx
Aerial photography.pptx
Pramoda Raj2.8K visualizações
IMAGE INTERPRETATION TECHNIQUES of survey por Karan Patel
IMAGE INTERPRETATION TECHNIQUES of surveyIMAGE INTERPRETATION TECHNIQUES of survey
IMAGE INTERPRETATION TECHNIQUES of survey
Karan Patel6.9K visualizações
side-looking airborne radar por Sneha Nalla
side-looking airborne radarside-looking airborne radar
side-looking airborne radar
Sneha Nalla5.8K visualizações
Types of aerial photographs por HARITHANAIR15
Types of aerial photographsTypes of aerial photographs
Types of aerial photographs
HARITHANAIR1511.2K visualizações
Projections and coordinate system por Mohsin Siddique
Projections and coordinate systemProjections and coordinate system
Projections and coordinate system
Mohsin Siddique13.8K visualizações
Gis georeference por Shah Naseer
Gis georeferenceGis georeference
Gis georeference
Shah Naseer3.2K visualizações
Band ratioing presentation por sk asadul haque
Band ratioing presentationBand ratioing presentation
Band ratioing presentation
sk asadul haque8.5K visualizações
Basic geostatistics por Serdar Kaya
Basic geostatisticsBasic geostatistics
Basic geostatistics
Serdar Kaya524 visualizações
Surveying geodesy ajith sir por SHAMJITH KM
Surveying   geodesy ajith sirSurveying   geodesy ajith sir
Surveying geodesy ajith sir
SHAMJITH KM2.7K visualizações
Introduction to Remote Sensing por Malla Reddy University
Introduction to Remote SensingIntroduction to Remote Sensing
Introduction to Remote Sensing
Malla Reddy University10.6K visualizações
Fundamentals of remote sensing por Sagar Chavan
Fundamentals of remote sensingFundamentals of remote sensing
Fundamentals of remote sensing
Sagar Chavan28K visualizações
Pre processing por Senthamizhan M
Pre processingPre processing
Pre processing
Senthamizhan M395 visualizações
Stereoscopic parallax por Mr Amol Ghogare
Stereoscopic parallaxStereoscopic parallax
Stereoscopic parallax
Mr Amol Ghogare969 visualizações
Introduction to GIS por Uday kumar Devalla
Introduction to GISIntroduction to GIS
Introduction to GIS
Uday kumar Devalla4.3K visualizações

Similar a GEOSTATISTICAL_ANALYST

Building maps with analysis por
Building maps with analysisBuilding maps with analysis
Building maps with analysisLindaBeale
1.5K visualizações45 slides
Interpolation 2013 por
Interpolation 2013Interpolation 2013
Interpolation 2013Atiqa khan
869 visualizações52 slides
Interpolation 2013 por
Interpolation 2013Interpolation 2013
Interpolation 2013Atiqa Khan
680 visualizações52 slides
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale por
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleKuldeep Jiwani
292 visualizações54 slides
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin... por
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...hpaocec
3.7K visualizações64 slides
STATISTICAL_SURFACES.ppt por
STATISTICAL_SURFACES.pptSTATISTICAL_SURFACES.ppt
STATISTICAL_SURFACES.pptAtikaParvaz1
14 visualizações42 slides

Similar a GEOSTATISTICAL_ANALYST(20)

Building maps with analysis por LindaBeale
Building maps with analysisBuilding maps with analysis
Building maps with analysis
LindaBeale1.5K visualizações
Interpolation 2013 por Atiqa khan
Interpolation 2013Interpolation 2013
Interpolation 2013
Atiqa khan869 visualizações
Interpolation 2013 por Atiqa Khan
Interpolation 2013Interpolation 2013
Interpolation 2013
Atiqa Khan680 visualizações
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale por Kuldeep Jiwani
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
Kuldeep Jiwani292 visualizações
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin... por hpaocec
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...
GIS in Public Health Research: Understanding Spatial Analysis and Interpretin...
hpaocec3.7K visualizações
STATISTICAL_SURFACES.ppt por AtikaParvaz1
STATISTICAL_SURFACES.pptSTATISTICAL_SURFACES.ppt
STATISTICAL_SURFACES.ppt
AtikaParvaz114 visualizações
Spatial interpolation techniques por Manisha Shrivastava
Spatial interpolation techniquesSpatial interpolation techniques
Spatial interpolation techniques
Manisha Shrivastava7.7K visualizações
Interpolation por seidmmd
InterpolationInterpolation
Interpolation
seidmmd992 visualizações
Combined gis 2(GEOGRAPHIC INFORMATION SYSTEM) por musadoto
Combined gis 2(GEOGRAPHIC INFORMATION SYSTEM)Combined gis 2(GEOGRAPHIC INFORMATION SYSTEM)
Combined gis 2(GEOGRAPHIC INFORMATION SYSTEM)
musadoto192 visualizações
projections.ppt por ssuser37f552
projections.pptprojections.ppt
projections.ppt
ssuser37f55233 visualizações
projections.ppt por ssuser37f552
projections.pptprojections.ppt
projections.ppt
ssuser37f55210 visualizações
Map Projections ―concepts, classes and usage por Prof Ashis Sarkar
Map Projections ―concepts, classes and usage Map Projections ―concepts, classes and usage
Map Projections ―concepts, classes and usage
Prof Ashis Sarkar1K visualizações
Ekreg ho-11-spatial ec 231112 por Catur Purnomo
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
Catur Purnomo829 visualizações
Week1 introductiontobasicsurveying por Jackie Arul
Week1 introductiontobasicsurveyingWeek1 introductiontobasicsurveying
Week1 introductiontobasicsurveying
Jackie Arul1.6K visualizações
Applied GIS - 3022.pptx por temesgenabebe1
Applied GIS - 3022.pptxApplied GIS - 3022.pptx
Applied GIS - 3022.pptx
temesgenabebe165 visualizações
SPATIAL POINT PATTERNS por LiemNguyenDuy
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNS
LiemNguyenDuy631 visualizações
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R por Revolution Analytics
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Revolution Analytics3.3K visualizações
Outlier Detection.pptx por AtulSVivek2
Outlier Detection.pptxOutlier Detection.pptx
Outlier Detection.pptx
AtulSVivek28 visualizações
Raster data analysis por Abdul Raziq
Raster data analysisRaster data analysis
Raster data analysis
Abdul Raziq2.6K visualizações

Último

Gross Anatomy of the Liver por
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liverobaje godwin sunday
74 visualizações12 slides
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant... por
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Ms. Pooja Bhandare
194 visualizações45 slides
EILO EXCURSION PROGRAMME 2023 por
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023info33492
181 visualizações40 slides
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx por
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxNiranjan Chavan
38 visualizações48 slides
Narration lesson plan por
Narration lesson planNarration lesson plan
Narration lesson planTARIQ KHAN
69 visualizações11 slides
Papal.pdf por
Papal.pdfPapal.pdf
Papal.pdfMariaKenney3
57 visualizações24 slides

Último(20)

Gross Anatomy of the Liver por obaje godwin sunday
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liver
obaje godwin sunday74 visualizações
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant... por Ms. Pooja Bhandare
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Ms. Pooja Bhandare194 visualizações
EILO EXCURSION PROGRAMME 2023 por info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492181 visualizações
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx por Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan38 visualizações
Narration lesson plan por TARIQ KHAN
Narration lesson planNarration lesson plan
Narration lesson plan
TARIQ KHAN69 visualizações
Papal.pdf por MariaKenney3
Papal.pdfPapal.pdf
Papal.pdf
MariaKenney357 visualizações
Class 9 lesson plans por TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN68 visualizações
MercerJesse2.1Doc.pdf por jessemercerail
MercerJesse2.1Doc.pdfMercerJesse2.1Doc.pdf
MercerJesse2.1Doc.pdf
jessemercerail301 visualizações
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... por Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste53 visualizações
The Accursed House by Émile Gaboriau por DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta246 visualizações
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf por Dr Vijay Vishwakarma
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdfSTRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf
STRATEGIC MANAGEMENT MODULE 1_UNIT1 _UNIT2.pdf
Dr Vijay Vishwakarma90 visualizações
ICS3211_lecture 09_2023.pdf por Vanessa Camilleri
ICS3211_lecture 09_2023.pdfICS3211_lecture 09_2023.pdf
ICS3211_lecture 09_2023.pdf
Vanessa Camilleri134 visualizações
Six Sigma Concept by Sahil Srivastava.pptx por Sahil Srivastava
Six Sigma Concept by Sahil Srivastava.pptxSix Sigma Concept by Sahil Srivastava.pptx
Six Sigma Concept by Sahil Srivastava.pptx
Sahil Srivastava40 visualizações
UNIDAD 3 6º C.MEDIO.pptx por MarcosRodriguezUcedo
UNIDAD 3 6º C.MEDIO.pptxUNIDAD 3 6º C.MEDIO.pptx
UNIDAD 3 6º C.MEDIO.pptx
MarcosRodriguezUcedo145 visualizações
REFERENCING, CITATION.pptx por abhisrivastava11
REFERENCING, CITATION.pptxREFERENCING, CITATION.pptx
REFERENCING, CITATION.pptx
abhisrivastava1139 visualizações
Education of marginalized and socially disadvantages segments.pptx por GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati540 visualizações
NodeJS and ExpressJS.pdf por ArthyR3
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdf
ArthyR347 visualizações
SURGICAL MANAGEMENT OF CERVICAL CANCER DR. NN CHAVAN 28102023.pptx por Niranjan Chavan
SURGICAL MANAGEMENT OF CERVICAL CANCER DR. NN CHAVAN 28102023.pptxSURGICAL MANAGEMENT OF CERVICAL CANCER DR. NN CHAVAN 28102023.pptx
SURGICAL MANAGEMENT OF CERVICAL CANCER DR. NN CHAVAN 28102023.pptx
Niranjan Chavan43 visualizações

GEOSTATISTICAL_ANALYST

  • 1. Geostatistical Analyst Summarized by – Dr. Danny M. Vaughn (08/14/06) Spatial Analysis by Visual & Statistical Description • Contour Map – lines delineating values of equality (p42). • Symbol Map – Symbols posted representing a class of data values (p43). • Grayscale Map – Similar to symbol map, but uses gradations of gray values (p44).
  • 5. • Indicator Maps – Binary raster plots delineating class values (p45). • Each plot represents a different range or thresholds of values, e,g, concentrations of dissolved solids in a lake. • Distribution of values above or below a threshold are displayed & their spatial patterns may be analyzed for trends (directional bias).
  • 7. • Moving Window Statistics – used to assess anomalies of average values and variability (heteroscedasticity) of values throughout a region (p46). • Create a grid whose size is dependent upon the size of the data set. • Analogous to a raster spacing (spatial resolution). • Given a data set with a 1 meter grid spacing and 100 points: • Select a window (search neighborhood) that is not too large to over-generalize, yet not too small that anomalies are not picked up. • Search neighborhoods are used to compute mean and standard deviations.
  • 9. • If local means (based upon the number of values used in the grid) are heavily influenced by erratic values, use the median. • Mean (top value) & standard deviations (bottom value) for moving windows posted (p47). • Note patterns; higher standard deviations where variability of values is greatest in a window, Higher means correspond to highest values in a window. • Standard deviations vary more than means throughout the region.
  • 11. Proportional Effect • Anomalies in the local variability have an impact on the accuracy of estimations in spatial statistical techniques. • Areas of uniform values (low variability) render the most credible estimates. • Consider four relationships between local mean and local variability (p49). Graph A - Local mean (straight line) and variability are constant. Data values fluctuate about the local mean, but no obvious change in variability. Graph B – A trend in the local mean, but the variability remains constant w/ rising & falling mean.
  • 13. Graph C – Constant local mean while the variability exhibits a trend (rises). Graph D – A trend in both the local mean and variability (both mean and standard deviation rise). Most common case in Earth Science data. • Ideally, data should be as A or B (relatively constant variability), but D is acceptable since variability changes noticeably, but it is related to local mean, therefore somewhat predicable. • When a relationship exists between the local mean and local standard deviation, a proportional effect exists.
  • 14. • A scatterplot (mean vs. standard deviation) shows a visual means for assessing this relationship (p51). • A low correlation coefficient indicates a low relationship. • Univariate, normally distributed data can be spatially arranged to render a proportional effect.
  • 16. Geostatistical Analysis - an ArcGIS extension used to create a statistically valid, continuous surface. Basic steps to creating a surface • Add layers in ArcGIS. • Explore statistical properties of data through: • Histogram (univariate distribution). • Voronoi polygons – analyze spatial variability. • Normal QQPlot – Check for normality. • Trend Analysis – identify global trends.
  • 17. • Semi-variogram/covariance cloud – analyze spatial dependencies (autocorrelation) in the data. • General QQPlot – explore whether two datasets have the same distributions. • Cross-covariance cloud – analyze cross-covariance between two datasets. • Select a surface model to create a surface. • Assess the output surface. • Compare other models for optimal fit.
  • 18. • Two basic methods of interpolation – • Deterministic – use mathematical functions. • Geostatistical – use statistical & mathematical functions. Deterministic Methods • Sample points (observed) of phenomena are established at locations. • Algorithms interpolate (predict) other (unmeasured) points & create a continuous surface. • Premise – objects that are close to each other tend to be more alike than objects further apart (Tobler, 1970).
  • 19. • Inverse distance weighted method – based upon Local similarity. • General inverse weighted formula – Z’(so) = S lZ(si) • Weight formula - li = dio -p / S dio -p Z’(so) = a predicted value for location (so). li = weights assigned to each measured point. Z (si) = an observed value at location (si). dio -p = distance between prediction location (so) & each measured location (si). p = a power function (higher powers plot as a steeper curve on the left end of a graph of relative weight (y axis), distance (x axis).
  • 20. • Weights decrease rapidly w/ distance. Scaled so that Sl = 1. • Optimal p values are derived as the minimum RMS prediction error. • Cross-validation process computes the difference from a measured vs. predicted value for that location. • An exact (predicts a value identical to the measured value at a known location w/o prediction standard error, & forces the surface to pass through data values) interpolator. • An output surface is sensitive to clustering & outliers.
  • 21. • Works best when the data is evenly distributed. • A predicted surface can not extend above a maximum or below a minimum value. • IDW assumes the surface is driven by local variation captured by neighborhood values. • Use the Validation option by creating training (contains measured locations) & test datasets (used to validate the predictions).
  • 22. • Global & Local Polynomial Interpreters – modeled by polynomial equations. • First order – linear. • Second order – one bend in the plane (Quadratic). • Third order – two bends (cubic). • An inexact (doe not predict a value identical to the measured value at a measured location, & does not force the surface to pass through data values) interpolator. • Points will plot above & below the fitted plane.
  • 23. Global Interpolation. • Use for gradually varied surfaces. • To examine & remove global trends (trend analysis). • Surfaces are highly susceptible to extreme values, especially near edges. • Anisotropic* (ellipse) or Isotropic* (circle) distributed data can be modeled. • Anisotropic – spatial dependence (autocorrelation) changes w/ distance & direction between two locations. • Isotropic - spatial dependence (autocorrelation) changes w/ only distance between two locations.
  • 24. Local Interpolation. • Accuracy of any surface increases w/ an ability to identify local trends. • Fits many polynomials w/in overlapping neighborhoods. • Accounts for local variation in a surface. • Fits a surface using all points in a defined neighborhood.
  • 25. • Provides a smooth fit for short range variation in longer range trends. • Successive iterations cross-validate an output surface using optimal parameters based upon a minimum RMS predicted error. • Local polynomials are sensitive to neighborhood distance. • Anisotropic (ellipse) or Isotropic (circle) distributed data can be modeled.
  • 26. • Radial basis functions – (Local/neighborhood) polynomial interpolation (smoothing). • An exact (forces the surface to pass through data values) interpolator. • Thin-plate spline, spline w/ tension, completely regularized spline, multiquadric spline, & inverse multiquadric spline. • Conceptually like rubber sheeting through points w/ minimum curvature.
  • 27. • A predicted surface can extend above a maximum or below a minimum value. • Uses smaller, overlapping planes. • Thin-plate splines. • Splines w/ tension. • Optimal parameter is derived as the minimum RMS prediction error. Higher values produce smoother surfaces. • Use to create smooth surfaces from large sets of points. • Best for gently varied surfaces. • Not good for irregular, highly varied data, & data prone to error or uncertainty.
  • 28. Geostatistical Methods • Regionalized variable in geostatistics – consists of: Structured aspect reflecting large scale tendencies, and a random aspect reflecting local irregularities. • The accuracy of any surface increases w/ an ability to identify local trends. • Must characterize error & variability of the predicted surface. • Based on statistical models. • Geostatistical methods provide measures of error & uncertainty (accuracy of predictions).
  • 29. Basic Principles of Geostatistical Methods • Assumes all values are a result of random processes w/ dependence (as opposed to an independent random process, e.g. a coin toss). • Dependent random process - a coin is tossed three times, tails, heads, & heads. A fourth position is determined by a rule: since the second & third flips are heads, assign the fourth the same as the first (tails). • Reality is represented by introducing randomness through local fluctuations termed, drift, (as opposed to errors in Trend analysis) around a fixed surface. • Fluctuations are not errors in Geostatistical analysis, rather actual features with their own structure
  • 30. • Autocorrelation – statistical correlation between spatial random variables of the same type where correlation is dependent upon distance and/or direction (spatially dependent) separating the locations. • In geostatistics, knowing spatial locations allows distances to be computed between observations & autocorrelation modeled as a function of distance through the semi-variogram. • Observe scatter of points in the semi-variogram. If points are close to the line of best fit on one side, yet spread out more on the other side, directional autocorrelation may exist.
  • 31. Spatial Continuity • Exists for most Earth Science data (similar values with data close to each other, positive spatial autocorrelation). • Scatterplots may be employed in semivariograms to show the value of one variable and the value of the same variable, but at a nearby location. h – Scatterplots • Shows all possible pairs of data values whose locations are separated by a distance in a direction. • The location of any point can be described by a vector as can the separation between two points (p52).
  • 33. • Given an origin at 0,0. Point xi,yi can be written as vector ti, and point xj,yj can be written as vector tj. • The separation between point i & j is ti - tj. This can also be expressed as: (xi - yi, xj- yj). The vector (distance) between these two points (i to j) will now be referred as, hij. • On a h - scatterplot, the x axis is labeled V(t), & the y axis is labeled V(t+h). • The x coordinate of a point is a value at a particular location, and the y coordinate is a value a distance and direction h away.
  • 34. • Recall an h value is a vector, thus we can express it as h = (0,1), where we have taken each data location and paired it with the data location whose easting is the same, and whose northing is 1 m larger to the north (p53,A). • The shape of the cloud of points on an h – scatterplot delineates how continuous the data values are over a certain distance in a specific direction (p54). • If data values at locations separated by h are similar, then the pairs will plot close to the line x = y (1, 1), or a 45˚degree slope through the data cloud (p54). • As the separation distance increases (y increases), the similarity between pairs of values decreases, and the points on the h - scatterplot spread out further from the diagonal line.
  • 37. Correlation Functions, Covariance functions, and Variograms • h – scatterplots require some quantitative summary of the information contained within them. • One essential feature is fatness of the cloud of points. • Summarized by a correlation coefficient (p(h)). • As the cloud gets thicker (y increases), coefficient gets smaller.
  • 38. h Correlation Covar. Moment of Inertia (ppm2) (ppm2) (0, 1) 0.742 448.8 312.8 (0, 2) 0.590 341.0 479.2 (0, 3) 0.560 323.8 521.4 (0, 4) 0.478 291.5 652.9
  • 39. • The relation between the correlation coefficient of an h – scatterplot (p(h)) and h is termed a correlation function or correlogram (p57). • The correlation coefficient is dependent on h which is vector defined as having magnitude (visual angle varying inversely w/ distance from the eye) and direction. • Covariance (C(h)) – an alternative index for spatial continuity, & termed covariance function (C(h)). • A covariance function is also plotted as the covariance (C(h)) against h, & is also inversely related to thickness (y increases) (p57).
  • 41. • Moment of inertia (g(h))– computed as, 1/2n S (xi – yi)2 about x = y (45˚). • As h increases, points drift away from the 45˚ line, thus the moment of inertia is a measure of the flatness of the cloud – it increases as the cloud gets fatter. • The relationship between the moment of inertia of an h – scatterplot and h is termed a semivariogram (variogram for short) (p57). • Aberrant points can have a pronounced impact on summary statistics. • Note the significant difference in a correlation coefficient w/ & w/out a single point value.
  • 42. Correlation Coefficient h All Points 19 ppm Excluded (0, 1) 0.742 0.761 (0, 2) 0.590 0.625 (0, 3) 0.560 0.551 (0, 4) 0.478 0.559
  • 43. Components of the Semivariogram A Semi-variogram surface is controlled by the lag size and numbe of lags. Lag – distance between pairs of points. • If the lag is too large, short-range autocorrelation is masked. If the lag is too small, many empty bins will result. • Lag - Multiply lag size x number of lags. This value should be <1/2 the largest distance among all points. • Lag (bin) size – a limit of the distance (width) defining a bin into which pairs of locations of approx. equal distance & direction are placed to reduce the number of combinations. • Increase or decrease lag based on a small or large range of points.
  • 44. Binning (classification of lag size)- reduces the number of points plotted making the semivariogram easier to interpret. • Locations are grouped by distance & direction. • Semivariogram average distance (h) = 0.5 * average [(value at a location Z(si) – value at location Z(sj)2]. • Only average distance & semi-variance for all pairs in a bin are plotted as a single point on the empirical semivariogram. • An empirical semivariogram value (from the y axis) for each bin for each direction is plotted as a red dot. X axis is a distance from the center of the bin to the center of the semivariogram surface). • A semivariogram surface is plotted for each bin value (average distance & semi-variance for all pairs in a bin) represented as color- coded pixels.
  • 46. Range – the distance where the semivariogram model levels out. • Sample points with distances closer (within) than the range are spatially autocorrelated. Points beyond the range make no useful contribution to the interpolation. • The range defines the “zone of influence.” Sill – The value on the y axis that defines the semi- variance at the range. • The flattened line from this point represents no change in variation w/ increases in distance. Nugget – The point where the semi-variogram model intercepts the y axis. • Due to measurement error, and/or variation at distances smaller than the sampling interval.
  • 47. Modeling the Semivariogram - Assuming stationarity, autocorrelation can be examined & quantified (spatial modeling). Stationarity • Statistics relies on some notion of replication. • It follows, estimates can be derived, & variation & uncertainty of an estimate understood from repeated observations. • Through continued observations of many samples, dependencies become apparent. • Statistics relies upon replication. • Estimates derived with variation & uncertainty understood from repeated observations.
  • 48. • Stationarity is used to obtain replication in a spatial setting. • A variogram is only used for a given practical distance. • Stationarity – a variable is stationary if its distribution is invariant (does not change) under translation (the entire distribution is shifted through coordinate space in mass). • Normally only the mean and covariance are required to satisfy this requirement.
  • 49. • Mean stationarity – constant (no spatial drift) between samples & independent of location. • Also, constant variance (no outliers) across space. • Second-order stationarity – assumes covariance is the same between two points that are the same distance & direction apart regardless of points chosen. • Covariance is dependent on the distance between any two values, not location. • For semi-variograms, intrinsic (genuine, essential, real) stationarity is the assumption that the variance of the difference is the same between any two points that are the same distance apart (increments) regardless of which two points are chose. Also referred to as, quasi- stationarity.
  • 50. • Local fluctuations dominate the overall trend. • This is exhibited within a natural increase in the variance beyond these shorter sections (observations) of the distribution of a regional variable as distance increases from each observation. • Similar distance between points that provides replication to estimate dependence rules & allow predictions to be made. Assuming intrinsic stationarity, autocorrelation can be quantified. • Termed spatial modeling, structural analysis, or variography. • A line is fitted in the empirical semivariogram similar to a least squares line in regression analysis.
  • 51. • Parameters of the curve should minimize the deviations from the points by some criterion • Used to quantify spatial autocorrelation – assumes closer points (left side, x axis) are more similar (lower semi-variance on y axis) than points further away (right side, x axis).
  • 52. • Semivariogram Model – defined as one-half the variance of the difference between two variables at two locations. where, • g (si sj) = ½ var(Z(si) - Z(sj)) where, g (si sj) = semivariogram (predicted values). var = variance. (si sj) = two locations. (Z(s) = observed (measured) values. • This is a dissimilarity function of increased variability (semi- variance, y axis) w/ increased distance.
  • 53. • Semivariogram fuction – average[(Z(si) - Z(sj))2]. • With observed values close to each other, the average is small. • As observed values get further apart, their values become more dissimilar w/ higher variability & higher averages.
  • 54. Calculating the Empirical Semivariogram Step 1 – Given a data set of five locations [(1,5), (3,4), (1,3), (4,5), (5,1)]and their associated values [(100), (105), (105), (100), (115)], calculate the distance (x values) between each pair of locations using the Pythagorean theorem.
  • 55. Table 1 Value Locations Distance Cal. Distances 100 & 105 (1,5), (3,4) √[(1-32) + (5-42) 2.236 100 & 105 (1,5) , (1,3) √[(02) + (22)] 2 100 & 100 (1,5) , (4,5) √[(32) + (02)] 3 100 & 115 (1,5) , (5,1) √[(42) + (42)] 5.657 105 & 105 (3,4) , (1,3) √[(22) + (12)] 2.236 105 & 100 (3,4) , (4,5) √[(12) + (12)] 1.414 105 & 115 (3,4) , (5,1) √[(22) + (32)] 3.606 105 & 100 (1,3) , (4,5) √[(32) + (22)] 3.606 105 & 115 (1,3) , (5,1) √[(42) + (22)] 4.472 100 & 115 (4,5) , (5,1) √[(12) + (42)] 4.123
  • 56. Step 2 – Calculate the empirical semivariance (y values) as 0.5 * [(value at location i – value at location j) 2], or one-half times the difference squared for the values of the paired locations (empirical semivariance). Increasing the number of points (five in this example) can soon result in a very large number of paired combinations (Table 1), since each of the five point locations has an accompanying point to form a vector.
  • 57. Table 2 Value Diff. Diff.2 Empirical Semivariance 5 (5)2 = 25 12.5 5 (5)2 = 25 12.5 0 (0)2 = 0 0 15 (15)2 = 225 112.5 0 (0)2 = 0 0 5 (5)2 = 25 12.5 10 (10)2 = 100 50 5 (5)2 = 25 12.5 10 (10)2 = 100 50 15 (15)2 = 225 122.5
  • 58. Step 3 – Binning the Empirical Semivariogram. Since a data set can consist of 100’s to 10,000’s of control points, the individual vectors (delineated by two ordered coordinate pairs) and the empirical semivariance values associated with each pair, will quickly result in a cloud of points in the semivariogram that will be unintelligible to interpret. • Binning reduces the total number of points plotted in the semivariogram by assigning a bin (storage compartment) based upon the lag size (distance) and number of lags. • Binning classifies locations grouped by distance & direction. • Only the average distance and average semivariance for all pairs included in the bin are plotted as a single point.
  • 59. • The basic equation would now be – Semivariogram (distance h) = 0.5 * average [(value at location i – value at location j) 2]. • This equation is interpreted as, compute one-half the average of the differences squared of the values for all paired locations separated by a distance h. Distance is the lag distance. • Considering a lag distance of some unit measure (e.g. 1), we create 5 bins, & will compute the average semivariance for all pairs of points falling within the distance limits (lag distance) assigned for each bin. • Only an average distance & average semivariance for all pairs in a bin are plotted as a single point on the empirical semivariogram.
  • 60. Table 3 Lag Dist. (Bin) Pairs Dist. Avg. Dist. Semivar. Avg. 1-2 1.414, 2 1.707 12.5, 12.5 12.5 2-3 2.236, 2.236, 3 2.491 12.5, 0, 0 4.167 3-4 3.606, 3.606 3.606 50, 12.5 31.25 4-5 4.472, 4.123 4.298 50, 112.5 81.25 5+ 5.657 5.657 112.5 112.5
  • 61. Trend Analysis in Geostatistics • A trend surface is made-up of global (structured or deterministic) & random (uncorrelated, local irregularities referred to errors in Trend Analysis) short-range variation. • Global trend – all measurements are controlled by some (non- random) deterministic factor. (a physical process, e.g. prevailing wind, data collected along a transect, etc.). • May be represented by a polynomial formula & removed from the analysis, then returned before predictions are made. • Modeling long-range or coarse scale variation. • Note asymmetry of the trend model and variance in change across each axis.
  • 62. • Global trends are plotted as a 3-D box diagram. • Plotted as sideways views along the x, z & y, z planes. • Polynomials (best fit line) are created for each plane. • Flat lines indicate no trend. • A definite pattern to the polynomial suggests a global trend. • Rotating the model enables an enhanced visual distribution. • Model the residuals & reintroduce the trend in the prediction model. • Used to examine local characteristics of spatial autocorrelation.
  • 65. Removing the Global Trend • Zk j(si) = the jth measurement of variable type k at the ith spatial location si decomposed into: m(s) = a deterministic (predicted mean) trend. e(s) = random, autocorrelated errors (distance dependent). Si = a location.
  • 66. • Employed to satisfy normality, stationarity assumptions & to model local, short-range variations. • Stick to Ordinary Kriging unless there are strong reasons to remove the trend surface. • More parameters must be estimated when a trend is removed, reducing the precision of the surface model. • Keep the surfaces simple. i.e. 1st or 2nd order polynomials. • Always check w/ cross-validation, & especially validation.
  • 67. Random Trends • Random implies governed by rules of probability, including dependence of neighboring values (autocorrelation). • Alignment of block-faulted mountain system is a global trend, but a specific time-event surficial processes (volcanism) may affect short- term variations in the mountain’s morphology. • The shape of the semivariogram/covariance curve may vary (be influenced) with direction (anisotropy) after the global trend has been removed, or if no global trend exists at all.
  • 68. • Anisotropy– spatial dependence (autocorrelation) changes w/ both distance & direction between two locations. • Usually not a deterministic process. • Modeled as a random process (Random error usually of an unknown cause) w/ higher autocorrelation in a preferred direction. • Observe scatter of points in the semivariogram. If points are close to the line of best fit on one side, yet spread out more on the other side, directional autocorrelation may exist. • When the data is anisotropic (the anisotropic box checked), the yellow line becomes several lines indicating many directions).
  • 70. • The surface cloud is elliptical (w/ an angular component) in shape. • The outline of the range is displayed as a blue line throughout the semivariogram/covariance surface cloud. • The lengths of the semi-minor & semi-major axes determine the ranges in the surface cloud, & define the search neighborhood. • Once the shape is determined, establish constraints (number of points, & partition the shape into bins to avoid directional bias due to clustering or transected points w/in the shape). • Ensure there are enough points to account for a meaningful prediction.
  • 72. • If enough points are not available for the bins, outside points (uncorrelated) will be used which degrades the model. • Distance greater than the range are not correlated. • When the pairs of points in the semivariogram yield a straight (horizontal) line (no variability), there is no spatial autocorrelation in the data, thus a surface would be meaningless. • The length of the semi-minor axis (minor range) from the center of the cloud will reach the sill sooner (plot as a steeper grade) than the semi-major (major range) axis.
  • 73. • Directional Binning (Anisotropy) Under the Semivariogram/Covariance Modeling dialog box – • Use the Show Search Direction box under the Semivariogram/Covariance Cloud dialog, and rotate or type in angle (search) direction, angle tolerance, bandwidth, and lag (distance/width). • An angle of tolerance determines how close points will be included from an observed point to the bandwidth. • Bandwidth is the width of the search. • Bins (lag distance/width) contain pairs of locations with a predetermined distance (interval) & direction apart.
  • 74. • Isotropic - spatial dependence (autocorrelation) changes only w/ changes in distance between two locations. • When the data is isotropic, the shape of the semivariogram will be a circle. • The outline of the range is displayed as a blue line throughout the semivariogram/covariance surface cloud. • The length of any axis from the center of the semi- variogram/covariance cloud will always be the same. • The range will always be the same in all directions. • Thus, the sill is equidistant in any direction.
  • 76. Covariance – a statistical tendency of two variables to vary in ways related to each other. • A scaled version of correlation. • A similarity function of decreased variability (semi- variance, y axis) w/ increased distance (along the x axis). • C (si sj) = covar(Z(si), Z(sj))where, C (si sj) = covariance (predicted values). covar = covariance. (si sj) = two locations. (Z(s) = observed (measured) values. Z’ = mean of all the data. • Covariance function – average[ (Z(si) –Z’) (Z(sj) – Z’) .
  • 77. • With two variables close to each other (approaching 0 distance on the x axis), they are expected to be similar w/ a large covariance & correlation. • Positive covariance – when both variables tend to be above their respective means together. • Negative covariance - when one variable tends to be below & the other variable above its mean.
  • 78. Summary Statements on Modeling the Semivariogram • The semivariogram & covariance functions quantify autocorrelation by measuring the statistical correlation as a function of distance. • A relationship between the semivariogram & covariance function is expressed as: g (si sj) = sill - C (si sj) where, g (si sj) = semi-variogram (predicted values). C (si sj) = covariance (predicted values).
  • 79. • Predictions are made using either function because of their equivalence. • Finally, a line of best fit (a continuous function or curve) is fitted through the cloud of points plotted in the semivariogram from which predictions of attribute values at unsampled locations can be made through the Kriging steps. • When the curve is steep near the origin (short range), closer neighbors have a pronounced influence on the prediction.
  • 80. Exploratory Spatial Data Analysis (ESDA) • ESDA provides interactive graphical models of the dataset. Histogram - shows a frequency distribution. • Shows the shape (skewness & kurtosis) of the distribution. • Normal distributions are required for quantile & probability maps using ordinary, simple, and universal Kriging. Properties include – • Variances of data values are more constant & the variance approaches the mean. • Mean, median, and mode are nearly equal in a normal distribution.
  • 82. • Variance & standard deviation should approach 1. • Skewness should approach 0, & kurtosis 3. • Outliers in a skewed distribution can be global or local. • Global – high or low value in the distribution. • Local – a value w/in the normal range of the dataset, yet locally it is unusually high. • Both outliers may indicate real abnormalities in the measured phenomena, or incorrect measurements. • Trend removal helps justify assumptions of normality & stationarity.
  • 83. Normal Score Transformation (NST) • Goal is to normally distribute all random errors for the entire population, & make the variances more constant throughout the study area. • Perform this step after detrending since covariance & variograms are calculated on residuals after trend correction. • Ranks the dataset from low to high by creating an observed cumulative histogram (OCH). • Then matches the ranks of the OCH to equivalent ranks from a cumulative standard normal distribution.
  • 86. • Transformation is defined by taking values from the normal distribution at that rank. Three methods for NCT – • Direct – uses the OCD (least smooth w/ least assumptions). • Linear – fits lines between each step of the cumulative distribution (middle of the road for smoothness & assumptions). • Gaussian kernels – approximates the probability distribution by fitting a linear combination of density cumulative distributions (smoothest back transformation w/ strongest assumptions). • Must back-transform to bring values back to their original scale.
  • 89. Other transformations (BAL) to Normality • Any relationship between the variance & the trend are removed w/ these transformations. • Hence, trend & model autocorrelation are removed (optionally). • Box-Cox (B) – Small counts in a region of a distribution w/ low variability (variance). If variance approaches the mean, it approaches a normal distribution.
  • 90. • Square Root – as a power function of the data variances will be made more constant with trends to normality. • Log (L) – Used as part of the Box-Cox transformation w/ positively skewed distributions & few large values in the dataset. • Arcsine (A) – used for data that is a proportion (percentage). Again, makes the variances in the data set more constant.
  • 91. Declustering Preferential Data - • Use when data are not random or regular spaced, i.e. higher concentration of points in some regions. • May weight the data w/ denser points receiving less weight. • Cell declustering – • A grid of rectangular cells is assigned over the data locations. • A weight is attached to each data location, & inversely proportional to the number of points in a cell.
  • 93. • Optimal cell size by – 1.Use a minimum weighted mean if data is preferentially sampled in areas of high values. 2.Use a maximum weighted mean if data is preferentially sampled in areas of low values. • Voronoi Map – • All locations w/in the polygon are closer to the sample point than any other sample point. • Weight each polygon in proportion to the area it represents. • Border polygons are often given more weight since the borders are composed of truncated rectangular sides.
  • 95. Normal & General QQPlot • QQPlots are quantiles from two distributions. • Normal QQPlot – a plot of data values (y axis) versus standardized units (x axis). • Points that do not fall on a straight line represent departures from the normal trend & the data is asymmetric. • General QQPlot – used to assess the similarity of the distributions of two sets of data, one set on each axis. • Two identical distributions are plotted as a straight line.
  • 97. Kriging Methodology Kriging family – Does not require a normal distribution, except for obtaining probability and quantile maps in Ordinary, Simple, & Universal Kriging. • Data collected as point events. • Values are actually spatially continuous (values occur everywhere on a surface). • Continuous (Real) numbers, integers, ordered categorical, unordered categorical, or binary. • Weighted surrounding measured values (like IDW).
  • 98. • Quantifies closeness of objects & spatial arrangement (spatial autocorrelation) by variography. • Weights are statistically determined by the spatial structure expressed in the semi-variogram. • Point pairs closer in distance (to a predicted location) have smaller variability. • Must minimize the weighted least-square differences. • Optimize the locations contributing w/ a search neighborhood. • Circle if there is no directional bias. • Ellipse for directional bias.
  • 100. • Assumes all random errors are 2nd order stationarity, & random errors have a mean = 0, & covariance between any two random errors is distance & direction dependent, not by exact locations. • Also capable of producing maps of: • Kriging standard errors associated w/ predicted values. • Probability whether a critical level is exceeded. • Subsets to assess the quality of the output surface. • Quantiles* for a predetermined probability level. * Quantile – the pth quantile is a value between 0-1, & represents the proportion p (of unity) of the data below this value.
  • 101. • Basic Kriging equation – Zk j(si) = m(s) + e(s) where, Zk j(si) = the jth measurement of variable type k at the ith spatial location si decomposed into: m(s) = a deterministic (predicted mean) trend. e(s) = random, autocorrelated errors (distance dependent). Si = a location. • Trend removal helps justify assumptions of normality & stationarity.
  • 102. Kriging Methods • While sampling provides accurate information at data points, it does not tell us what values exist between them. • Kriging – an estimation (weighted average) method yielding the best (minimum variance) unbiased linear estimates of point values. • Accuracy is based upon: • Number of samples & data quality. • Geometry of the sample points (Dispersed distributions are optimal). • Distance between sample points. • Spatial continuity (regularity) of a variable.
  • 103. Ordinary Kriging - Use if m (mean) is unknown. • Must have data points sampled from continuous phenomena. • May be used for data that have a trend. • May create training & test datasets for validation. • May de-trend & perform Kriging on the residuals.
  • 104. • Simple Kriging – when the trend is completely known (all parameters & covariates known), whether constant or not. • m (mean) is a known constant. • Must have data points sampled from continuous phenomena. • It follows, you also know e(s) as deviations from the mean. • May produce quantile, probability, prediction, or standard error maps. • May create training & test datasets for validation.
  • 105. • Consider checking for bivariate normality. • Consider declustering when using the normal score transform & data has been preferentially resampled with denser points in some areas. • Consider parameters for semivariogram/covariance modeling (mathematically expresses autocorrelation), search neighborhoods, & cross validation.
  • 106. • Changing Z(s) = 0 (an indicator variable, 0 or 1), if Z(s) is below some value, e.g. 0.12 ppm ozone concentrations, or 1 if it is above this value. A probability map is developed to assess the degree a value exceeds a given threshold using Indicator Kriging. • Uses a binary variable. • A threshold is computed to establish limits for binary assignments. • Assumes an unknown constant mean. • Creates a probability or standard error map from 0-1. • Consider parameters for semivariogram/covariance (mathematically expresses autocorrelation) & search neighborhoods, although estimation is difficult w/ 0’s & 1’s.
  • 107. • Probability Kriging – form an indicator variable of Z(s) & predict it using the original transformed data Z(s) in a cokriging model. • Like indicator Kriging, only uses cokriging since there are two types of random errors (e1(s) & e2(s)) generated from two constants (m1 &m2). • Assumes unknown constant means. • Creates a probability or standard error map from 0-1.
  • 108. • Consider parameters for semivariogram/covariance (mathematically expresses autocorrelation) & search neighborhoods, although estimation is difficult w/ 0’s & 1’s. • Probability kriging tries to exploit the extra information in the original data in addition to the binary variable. • Results in more autocorrelation estimations per variable & cross- correlation creating more uncertainty.
  • 109. • Disjunctive Kriging – used to form predictors of functions of variables. • Ordinary Kriging uses linear combinations of data to find the final predictor. • Disjunctive Kriging can predict either the value (predictor), or an indicator (derived from the threshold) by generalizing the Indicator Kriging method to form the predictor. • Assumes a model - f(Z(s)) = m1 + e(s) where, • m (mean) is an unknown constant. • f(Z(s)) is some arbitrary function of Z(s). • e(s) = random, autocorrelated errors (distance dependent).
  • 110. • May produce prediction, probability, standard error of indicators, or standard error maps. • A standard error map quantifies uncertainty of the prediction. • In a normal distribution, the true value is w/in plus/minas 2x the prediction standard error 95% of the time. • An appropriate transformation & detrending, semi- variogram/covariance models (mathematically expresses autocorrelation), & search neighborhoods. • Decluster when the data has been preferentially sampled. • Assumes a bivariate normality which should be examined.
  • 111. • Universal Kriging – operates like polynomial regression, only the regression is w/ spatial coordinates as explanatory variables & the errors are assumed auto-correlated, rather than independent. • If m(s) is a second order polynomial trend, subtract (detrend) it from the original data (Z(s)) & Krig the random errors e(s)). • The mean of e(s) is 0. • The autocorrelation is now modeled from the random errors. • Consider parameters for semivariogram &/covariance modeling (mathematically expresses autocorrelation), search neighborhoods, & cross validation. • May produce quantile, probability, prediction, or standard error maps.
  • 112. • Cokriging – a family of Kriging methods using multiple variables. • Considers different trends on multiple variables (Z1(s) & Z2(s), (e.g. ozone & particulate materials) & measures autocorrelation for errors & cross-correlation between errors e1(s) & e2(s) between two variables. • Ordinary Cokriging models: • Z1(s) = m1 + e1(s) • Z2(s) = m2 + e2(s) where, m1 & m2 = unknown constants. e1(s) & e2(s) = random errors. Z1(s) & Z2(s) = predicted measurement variables.
  • 113. • Requires estimating auto-correlation & cross-correlation for each variable. • Each estimate for auto-correlation introduces additional variability. Gains in precision of predictions may not be defensible. • Consider parameters for semivariogram/covariance (mathematically expresses autocorrelation), search neighborhoods, & cross-covariance (mathematically expresses cross-correlation) validation, & detrending. • May produce prediction, probability, quantile, standard error of indicators, or standard error maps.
  • 114. Cross-covariance – Correlation between spatial random variables of different types with correlation dependent upon distance &/or direction separating the locations (e.g. NO2 & Ozone). • Positive cross-covariance – both variables are above their respective means. • Negative cross-covariance – when one variable is above & the other below their its mean. • Cross-covariance cloud – shows empirical cross-covariance for all pairs of locations between two datasets placed in bins as a function of distance between the two locations.
  • 115. • Bins are averaged & smoothed to create a cross-covariance surface. Controlled by the lag size and number of lags. • A global outlier (unusually high value) identified in the histogram is also visible with all pairings of points in the semi-variogram cloud, regardless of distance. • Two distinct groups of points occur in the semi- variogram cloud for global outliers. • A local outlier will have high semivariogram values (approaching the top of the y axis) associated with a single point at close distances (left side along x axis) in the semi-variogram cloud.
  • 117. • Both types of outliers can have detrimental effects on a prediction surface. • Use the Show Search Direction box under the Semi- variogram/Covariance Cloud dialog, and rotate or type in angle direction, angle tolerance, bandwidth, and lag.
  • 118. Cross-Validation & Validation • Seeks to establish how well any surfacing model predicts the values at unknown locations. • Cross Validation – uses all data points to estimate the autocorrelation model. • Each data location is selected, & the remaining known locations are used to predict its value. • The predicted & actual (known measurement) are compared & graphed. • Ideally, they should be close to the same value. • Unusual points are set aside, & may require refitting the autocorrelation model.
  • 119. • Validation – creates a model for a subset of the data set. • Used to check the validity of a protocol of decisions. • Choice of semivariogram model, lag size, search neighborhood. • Predicted plot – scatterplot of predicted vs. measured values. • Kriging tends to under predict large values, & over predict small values. • Thus, the slope of the solid blue line is usually less than 1 (black dashed line). • Error plot – same as the predicted plot, except true values are subtracted from the predicted values.
  • 123. • Standardized Error plot – measurement values are subtracted from the predicted values, then divided by the estimated Kriging standard errors. •All three plots show strength of Kriging predictions. • With autocorrelation & a good model, the predicted line (blue) should be close to the measured (dashed black) line. • If all the data were independent (no autocorrelation), all predictions would equal the mean of the measured data, & the plot would be a horizontal line.
  • 125. • QQPlot – quantiles of the difference between predicted & measurement values divided by the estimated Kriging standard errors, & quantiles from the standard normal distribution. • If the predicted errors from the observed (measured) values are normally distributed, points should lie close to the dashed black line. • Obtaining probability and quantile maps in Ordinary, Simple, & Universal Kriging methods that rely on a normal distribution can be used with confidence.
  • 127. Summary of Cross-Validation & Validation Analysis • Predictions should be unbiased w/ a mean prediction error near 0. • Standardized prediction errors – prediction errors divided by their prediction standard errors to standardize (scale). The mean standardized prediction error should approach 0. • RMS prediction error (Root-Mean-Square in dialog box) should be small. Shorter green lines indicate closer predictions to their true values.
  • 128. • Average (estimated) standard errors should be close to RMS prediction errors to correctly assess the variability in a prediction. • If average standard errors are > RMS prediction errors, variability is overestimated. • If average standard errors are < RMS prediction errors, variability is underestimated. • If RSM standardized errors are >1, variability is underestimated. • If RSM standardized errors are <1, variability is overestimated.
  • 129. Cross Validation & Validation Comparison • Assesses two surfacing models (e.g. IDW vs. Ordinary Kriging) w/ statistics & scatterplots for each model in a single dialog box. • Generally, the best model has the parameters as per those listed in the previous summary. • Average (estimated) standard errors should be close to RMS prediction errors for a more valid model. This comparison supercedes a model even if it has an RMS prediction error lower than the second model.
  • 134. Measurement Error • Occur when it is possible to have several different observations at the same location. • A soils series description from a survey document & actual field measurements. • Measurement error model is expressed as: • Z(s) = m(s) + e(s) + d(s) where, • Z(s) = a variable of interest. • d(s) = measurement error. • m(s) = deterministic trend. • e(s) = autocorrelated errors (microscale variation or nugget effect in this model).
  • 135. • W/ no measurement error, Kriging is an exact interpolator. • When errors exist, consider predicting only the filtered value (m(s) + e(s)) that does not contain the measurement error (d(s)). • It follows, where data have been collected (actually measured), the filtered values will not be the same. • Thus the final map will be more smoothed & an inexact interpolator.