Predictive Habitat Distribution Models, Leire Ibaibarriaga

EURO-‐BASIN,
www.euro-‐basin.eu
Introduc)on
to
Sta)s)cal
Modelling
Tools
for
Habitat
Models
Development,
26-‐28th
Oct
2011

2

OUTLINE
• Why to model?

• Habitat models

• Model properties

• Steps for modelling

• What about data?

3

WHY TO MODEL?
• “All models are wrong, some models are useful” (G. Box)

• Models are how we understand the world:
We see the world through models
We learn about the world using formal descriptions

• Model types:

– Static vs dynamic
– Explanatory vs predictive
– Deterministic vs stochastic
– Discrete vs continuous

4

HABITAT MODELS
• Habitat models are focused on how environmental factors control
the distribution of species and communities.

• Multiple applications:

– Biogeography, impact of the global change, management,
conservation, ecology, …

• New conceptual and operative advances due to the growth in
computing power, e.g. GIS, remote sensing, new statistical
modelling tools (computer intensive), etc

5

MODEL PROPERTIES
Some desirable model properties:

• Parsimony (Occam’s razor): “All things being equal, the simplest
solution tends to be the best one”
• Tractability: easy to be analysed
• Conceptually insightful: reveal fundamental properties
• Generalizability: can be applied to other situations/species/…
• Empirical consistency: consistent with the available data
• Falsifiability: can be tested by observations
• Predictive precision

6

MODEL PROPERTIES

Predictive habitat
distribution models

Levins (1966); Sharpe (1990); Guisan and Zimmermann (2000)

7

MODEL PROPERTIES

COMPLEXITY

GENERALITY

The more complex model is not necessarily the best…

8

STEPS FOR MODELLING
1) Conceptual phase

2) Model formulation

3) Model calibration

4) Spatial predictions

5) Model evaluation

6) Model applicability

9

STEPS FOR MODELLING

Guisan and Zimmermann (2000)

10

1. Conceptual phase
• Some sort of theoretical model should be in mind, before a statistical
model is even considered
• This phase includes:
– Literature review
– Define an up-to-date conceptual model
– Set multiple hypothesis
– Assess available and missing data
– Identify appropriate sampling strategy for new data
– Choose appropriate spatio-temporal resolution and geographic
extent
– Identify the most appropriate statistical methods for the other
phases

11

STEPS FOR MODELLING


12

2. Model formulation
• The model depends on the type of response variable and its
associated probability distribution

Distribution Examples
Gaussian Biomass
Poisson Individual counts
Negative Binomial Individual counts
Multinomial Communities
Binomial Presence/absence

13



14

REGRESSION ANALYSIS 2. Model formulation

50
40
30
y
20
10
0

0 2 4 6 8 10

oct-11 © AZTI-Tecnalia
x 14

15


50
40
30
y
20
10
0

0 2 4 6 8 10

© AZTI-Tecnalia
x 15

16


10
5
y
0
-5

0.0 0.2 0.4 0.6 0.8 1.0

x 16

17


10
5
y
0
-5

0.0 0.2 0.4 0.6 0.8 1.0

x 17

18


LINK
FUNCTION

The response variable y can follow distributions like:
NORMAL, BINOMIAL, POISSON, GAMMA, etc

McCullagh and Nelder (1989); Dobson (2008)
© AZTI-Tecnalia 18
oct-11

19


LINK SMOOTHS
FUNCTION

The response variable y can follow distributions like:
NORMAL, BINOMIAL, POISSON, GAMMA, etc

Hastie and Tibshirani (1990); Wood (2006)
© AZTI-Tecnalia 19
oct-11

20


Modelo lineal Modelo aditivo
(LM) (AM)

Modelo lineal generalizado Modelo aditivo generalizado
(GLM) (GAM)

oct-11 © AZTI-Tecnalia 20

21

Other regression models:

• Mixed models: LM, GLM and GAMs including random effect
terms. Useful for meta-analysis.

• Quantile regression: the quantiles are modelled instead of
the mean. Useful for finding limiting factors

• Segmented regression: the model changes depending on a
partition of the explanatory variable. Useful for detecting
regime changes

• Spatial autocorrelation and autoregressive models

22

CLASSIFICATION TECHNIQUES 2. Model formulation
• Classification is the placement of species and/or sample units
into groups based on the environmental variables

23

CLASSIFICATION TECHNIQUES 2. Model formulation
• Classification is the placement of species and/or sample units
into groups based on the environmental variables

• Many techniques included: classification decision tree,
regression decision tree, rule-based classification, maximum-
likelihood classification

• Mainly two groups:
– Supervised classification: a training data set is required
(groups are known beforehand)
– unsupervised classification: groups are unknown and need
to be defined, like in cluster analysis

24

ENVIRONMENTAL ENVELOPES 2. Model formulation
• The environmental envelope of a species is defined as the set
of environments within which it is believed that the species can
persist (Walker and Cocks, 1991)

25

ENVIRONMENTAL ENVELOPES 2. Model formulation
• The environmental envelope of a species is defined as the set
of environments within which it is believed that the species can
persist (Walker and Cocks, 1991)

• Examples of models:

– BIOCLIM: minimal rectilinear envelopes based on
classification trees
– HABITAT: convex polytope envelopes based on
classification trees
– DOMAIN: based on multivariate distance metrics

26

• Ordination is the arrangement or ‘ordering’ of species and/or
ORDINATION TECHNIQUES

sample units along gradients

• Usually applied to community data matrices (row: species,
column: samples, value: abundance)

27

• Indirect gradient analysis (no environmental data used)
– Distance-based approaches:
ORDINATION TECHNIQUES

• Polar ordination, Principal Coordinates Analysis, Nonmetric
Multidimensional Scaling
– Eigenanalysis-based approaches
• Linear model
– Principal Components Analysis
• Unimodal model
– Correspondence Analysis, Detrended Correspondence Analysis
• Direct gradient analysis (environmental data used)
– Linear model
• Redundancy Analysis
– Unimodal model
• Canonical Correspondence Analysis, Detrended Canonical
Correspondence Analysis

ter Braak and Prentice (1988)

28

• Models inspired in the human-brain (interconnected group of
neurons)
NEURAL NETWORKS

• They define a non-linear function, decomposed further as a
weighted sum of functions, that similarly can be further
decomponsed, etc. So, complex non-parametric model (black-
box?)

• Adjusted by varying parameters, connection weights, or
specifics of the architecture such as the number of neurons or
their connectivity

• Few examples available yet

29

STEPS FOR MODELLING


30

3. Model calibration
• It includes model fitting (find the best value of the unknown
parameters to improve the agreement between the data and model
outputs) and model selection (which explanatory variables to be
included)

• To take into account:
– Use of predictors that are ecologically relevant: direct vs indirect
(proxy) variables
– Correlation between explanatory variables

• Each method has each own diagnostic tools according to their
assumptions, e.g, in regression models the residual deviance

31

STEPS FOR MODELLING


32

4.Spatial predictions

• Spatial predictions can be done on the data set used for calibration
or on new data sets. Care must be taken if predictions are done in a
new data set with new combinations between the explanatory
variables and for values outside the range of values in the data set
for calibration

• GIS tools are very often used, but still many statistical models are
not implemented in a GIS environment

33

STEPS FOR MODELLING


34

5. Model evaluation
• The aim is to evaluate the predictive power of a model

• If only one data set is available (we have used the data set for
calibration), bootstrap, cross-validation, jacknife

• If other data sets are available (independent of the calibration data
set), predicted and observed values are compared using:
– the same goodness of fit measure as used for model calibration
– any other measure of association

The data sets for calibration and evaluation are called respectively
training and evaluation data sets. Sometimes the original single
data set is split in two (split-sample approach)

35

STEPS FOR MODELLING

APPLICABILITY


36

6. Model applicability
• It refers to the domain over which a validated model can be properly
used

• Potential uses (Decoursey, 1992):

– Screening

– Research

– Planning, monitoring and assessment

37

WHAT ABOUT DATA?
• Data is even more important than the model itself.

• Usually from multiple sources: surveys (continuous, stations, vertical
profiles), remote sensing, circulation models, …

• The scale of the response and the environmental variables might not
be the same. Need to define a common scale unit. Sometimes
interpolation might be needed. This might include additional
uncertainities

• Simple exploratory statistics and figures can be very useful before
even start thinking on any model. They also help to spot errors in the
data.

Predictive Habitat Distribution Models, Leire Ibaibarriaga

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de DTU - Technical University of Denmark

Mais de DTU - Technical University of Denmark (6)

Último

Último (20)

Predictive Habitat Distribution Models, Leire Ibaibarriaga