Digital soil mapping uses statistical methods and environmental data to predict soil properties across continuous landscapes. It involves preparing soil data and predictor variables like climate, vegetation and remote sensing data. Predictor data is harmonized using techniques like principal components analysis. Soil data is also harmonized by estimating mean values at standard depth intervals. Regression models are selected to relate soil properties to predictors and create continuous prediction maps. Maps are validated and uncertainty is estimated using confidence intervals or bootstrapping. The process is implemented using the R programming language and specialized soil mapping packages.
3. Soil mapping – creation of cartographic models of soils and
soil properties
4. Pedometrics – application of mathematical and statistical
methods for the study of the distribution and genesis of soils
Soil mapping – creation of cartographic models of soils and
soil properties
5. Pedometrics – application of mathematical and statistical
methods for the study of the distribution and genesis of soils
Digital Soil Mapping (pedometric mapping, predictive soil
mapping) is data-driven generation of soil property and
class maps that is based on use of statistical methods
(T. Hengl, 2003)
Soil mapping – creation of cartographic models of soils and
soil properties
DSM
6. Spatial phenomena
Spatial phenomena can generally be thought of as either :
● discrete objects with clear boundaries (e.g. river, road,
town)
● or as a continuous phenomenon that can be observed
everywhere, but does not have natural boundaries
(e.g. elevation, temperature, and air quality).
A lake is a discrete
object:
it has clear boundaries
Elevation is a continuous
phenomena: it exists in every point
7. Spatial phenomena
Question:
● Is soil a discrete or continuous
phenomenon?
● Are soil properties (carbon, texture, pH,
etc...) discrete or continuous?
● Are soil measurements discrete or
continuous?
8. Spatial phenomena
Answers:
● Soil is a continuous
phenomenon. It covers
almost all land surface.
● However we classify soils
into discrete soil types
that have boundaries
9. Spatial phenomena
Answers:
● Most of the soil properties
are continuous. They exist
in every point of the soil.
● Soil measurements are
discrete. We sample soil
and measure soil properties
in discrete locations.
10. We have:
● Discrete soil observations as point data in sampling locations
We need:
● Continuous estimation of soil properties in every point of
the land surface
Task of the predictive soil mapping:
● To predict continuous soil properties (or soil classes) in every
point of the land surface based on discrete measurements of
soil sampling.
But how to do it?
Digital Mapping of Soil Properties
12. Gridded maps (rasters)
1 2
3 1
• Gridded maps (rasters) are optimal to represent
continuous data, such as soil properties.
• A raster is a regular grid of cells (pixels) with a
value of soil property in each cell.
• Raster resolution defines cell size and accuracy
• Higher resolution = smaller cell size = better
accuracy = bigger files = higher computational
demands
• E.g. cell size 1x1 km – it is good for global and
national mapping.
• Most common raster data format:
GeoTiff (.tif, .tiff);
15. V. Dokuchaev (1883):
Genesis and evolution of soils is
the result of the interaction of a
number of environmental
parameters:
● Climate
● Organisms
● Parent Material
● Relief
● Time
Drivers of Soil Formation
16. State equation of soil formation
Hans JENNY (1941):
● Conceptualization of soil as an state equation of soil
formation.
● Soil and soil properties are a function of a number of
environmental parameters named soil forming factors:
S = ƒ(cl, o, r, p, t) CLORPT MODEL
climate organisms relief parent
material
time
17. Digital Soil Mapping (DSM)
Definition of Digital Soil Mapping (DSM)
The creation and population of spatial soil information systems by numerical
models inferring the spatial and temporal variations of soil types and soil properties
from soil observations and knowledge and from related environmental variables
(Lagacherie and McBratney, 2007).
McBRATNEY, 2012: Conceptualization of forming factors. Soil and
soil properties are a function of a number of environmental
parameters named soil forming factors:
S = ƒ(s, c, o, r, p, a, n)+ ε SCORPAN MODEL
reliefsoil
properties
climate organisms parent
material
age locationsoil attribute to predict function residuals (errors)
21. R – a powerful and versatile tool for DSM
• R is a programming language which
allows everyone to develop scripts
with maximum flexibility;
• It is free, enables the development
of Science even for budget limited
organizations;
• Full access to algorithms;
• Possibility to modify existing
functions and packages;
• Developed by a huge community of
experts in many different fields;
• More than 10,000 R packages
available for download;
• Lot’s of free learning material
22. R packages (examples)
aqp
● Algorithms for quantitative pedology
● We will use it to restructure our soil dataset into
a soil profile collection
GSIF ● Tools, functions and sample datasets for digital
soil mapping, e.g. depth harmonization
raster ● Reading, writing, manipulating, analyzing and
modeling of gridded spatial data (raster data)
rgdal ● Provides access to the 'Geospatial' Data
Abstraction Library ('GDAL') to
projection/transformation operations from the
'PROJ.4'
soilassesment ● Functions used in digital mapping of soil
properties
23. Digital Soil Mapping in R
• Relief
• Climate
• Vegetation
• Geology
• Remote
sensing data
1. Prepare Predictors
24. Digital Soil Mapping in R
2. Harmonize Predictors
Removing collinearity using Principal Components Analysis
25. Digital Soil Mapping in R
3. Prepare soil data
• Identifiers
• Coordinates
• Depth layers
(horizons)
• Measured soil
properties
26. Digital Soil Mapping in R
3. Prepare soil data
See Lesson 2 – Data Organization and Software installation
27. Digital Soil Mapping in R
4. Harmonize soil data
● Profile data has soil parameters
measured for every horizon (depth
layer)
● We need to estimate mean value for
target depth: e.g. 0-30cm, 30-100cm
● For that we can use equal-area
splines. This technique is based on
fitting continuous depth functions for
modeling the variability of soil
properties with depth.
Depth harmonization
28. Digital Soil Mapping in R
4. Harmonize soil data
Statistical distribution
● Normal distribution, also known as the Gaussian
distribution, is a probability distribution that is
symmetric about the mean, showing that data
near the mean are more frequent in occurrence
than data far from the mean.
The Normal Distribution has:
● mean = median = mode
● symmetry about the center
● 50% of values less than the mean and 50% greater
than the mean
In statistical analyses it is often assumed that the data has normal
distribution. If it does not, it may be useful to transform the data.
31. Regression
● Regression a statistical method that allows us to
summarize and study relationships between two
variables:
● variable X, is regarded as the predictor,
explanatory, or independent variable.
● variable Y, is regarded as the response,
outcome, or dependent variable.
The goal is to build a mathematical formula that defines
Y as a function of the X variable.
Once, we built a statistically significant model, it’s
possible to use it for predicting future outcome on the
basis of new X values.
32. Example: linear regression
The mathematical formula of the linear regression can be written as follow:
● the best-fit regression line is in blue
● the intercept (b0) and the slope (b1) are
shown in green
● the residuals (errors) - e are represented
by vertical red lines
Multiple linear regression can have several predictors (X variables):
33. Regression
However, most relationships in nature are not linear!
Relationships between soil
properties and environmental
factors can be very complicated
and require a complex model.
35. Digital Soil Mapping in R
9. Estimate uncertainty
● Uncertainty is an acknowledgement of error: we
are aware that our representation of reality may
differ from reality and express this by being
uncertain
● In the presence of uncertainty, we cannot identify
a single, true values for each pixel of the map.
● But we can identify all possible values and a
probability for each one - to characterise the
uncertain variable with a probability distribution.
● If the distribution is normal, it is easy to construct a
confidence interval, where e.g. we are certain
with 95% confidence that the true value will be
within 2 standard deviations from the mean
(prediction)
36. Digital Soil Mapping in R
9. Create uncertainty maps
If the distribution is not
normal, confidence interval
still can be constructed
through bootstrapping.
95% confidence interval:
We are certain that the 95%
of the unknown values lie
within the prediction width.