A Rapid Location Independent Full Tensor Gravity Algorithm
A four dimensional analysis of agricultural data
1. CAR Models for Agricultural data
Margaret Donald
Joint work with Clair Alston, Chris Strickland, Rick Young, and
Kerrie Mengersen
Bayes on the Beach
October 6-7, 2011
2. Outline
Modelling agricultural data in three spatial dimensions & in a fourth dimension, time
1. Data
2. Modelling in 3 spatial dimensions
2.1 Random effects
2.2 Treatment effects
2.3 Results
3. Modelling in 4 dimensions
3.1 Model
3.2 Results
4. Selected References
3. Agriculture: Modelling in 3 dimensions
Data
Data
Data are moisture measurements from a field experiment
To determine a cropping method least likely to lead to salinification
Consist of treatment, moisture value, row, column, x co-ord, y co-ord, depth
and date
or, 108 sites × 15 depths (1620) by 56 days or 90720 moisture observations.
The treatments were
Long Fallowing (3 phases-treatments)
Continuous cropping (1 treatment)
Response cropping (2 treatments)
Pastures (Lucerne, Lucerne mixture)
Pastures (native grasses)
The purpose was to determine the difference between long fallowing and response
cropping.
5. Agriculture: Modelling in 3 dimensions
Methods
CAR or kriging
Kriging models are
slow to converge, because at each MCMC iteration they
involve all the data
require matrix inversions
And here with a complex regression model they failed to converge
Conditional Autoregressive (CAR) models
deal with spatial auto-correlation using the notion of neighbour
thought of as ‘areal’ models
are easy to use
flexible
appropriate for dealing with localised spatial similarity
CAR models have been shown to be closely related to kriging (Rue, Tjelmeland,2002;
Hrafnkelsson, Cressie, 2003; Besag,Mondal,2005;Lindgren et al,2010)
6. Agriculture: Modelling in 3 dimensions
Methods
Model for a single day
For site i ∈ I at depth d ∈ D, the model is
yid = µj(i)d + ψid + ϵid
where
µj(i)d is the treatment effect, j, at site, i, and depth, d
ψid is the spatial residual at (i, d)
ϵid is the unstructured residual at (i, d), with ϵid ∼ N(0, σ 2 )
Relabel ψid as ψs , where s ∈ I × D points on the 3-dimensional
lattice, then the conditional probability of the spatial residual, ψs ,
given its neighbours, ψk , is given by
( )
∑ wsk ψk
ψs |ψk , k ∈ ∂s ∼ N , γ 2 /ws+ )
ws+
k∈∂s
7. Agriculture: Modelling in 3 dimensions
Random effects
Random effects
A single spatial variance across all depths is forced on us when
(∑ )
wsk ψk
ψs |ψk , k ∈ ∂s ∼ N k∈∂s ws+ , γ 2 /ws+ )
and neighbours include depth neighbours. If, however, we define
neighbourhoods only within a layer, then there are two possibilities:
( )
∑ wik ψkd
ψid |ψkd , k ∈ ∂i ∼ N , γ 2 /wi+ )
wi+
k∈∂i
Or
( )
∑ wik ψkd
ψid |ψkd , k ∈ ∂i ∼ N 2
, γd /wi+ )
wi+
k∈∂i
Similarly, ϵid ∼ N(0, σ 2 ), or perhaps ϵid ∼ N(0, σd ).
2
8. Agriculture: Modelling in 3 dimensions
Random effects
Random Effects - continued
To specify a CAR model, we nominate
which sites are neighbours
a weight for each pair of neighbours
Use weights of 0 and 1
Use the DIC to determine choice of neighbours.
9. Agriculture: Modelling in 3 dimensions
Treatment effects: functional description
Modelling the treatment effect
15 depth measurements for each treatment site
Treatment effect is a function of depth
smooth
continuous
Choices
orthogonal polynomials
splines
linear
cubic
cubic radial bases
10. Agriculture: Modelling in 3 dimensions
Treatment effects: functional description
Errors-in-measurement model
true depth z interval-censored
related to the observed depth index d:
zd |d ∼ N(d, σz )I(zd−1 , zd+1 ) for d = 2, 3, ...14
2
z1 |d = 1 ∼ N(1, σz )I(0, z2 )
2
z15 |d = 15 ∼ N(15, σz )I(z14 , 16)
2
where σz ∼ Half-Cauchy(1)
Find the treatment effect as a function of z for each site and
nominal depth, d.
11. Agriculture: Modelling in 3 dimensions
Results for half the field, or 810 measurements
Results: Choosing the neighbourhood and variance
structures
Table: Comparing spatial residual modelling: Fixed component identical
for all models(Orthogonal polynomial degree 8).
Description pD DIC
Null model: No spatial residuals 81 -2690
Linear CAR (maximum 2 neighbours) 264 -2811
CAR (maximum 4 neighbours) 358 -2990
CAR (maximum 8 neighbours) 320 -2930
AR(1), AR(1) at each depth 436 -2789
CAR (max 4 horiz, 2 depth neighbours)* 109 -2752
CAR (max 4 horiz)* 110 -2960
CAR (max 4 horiz)** 121 -2766
12. Agriculture: Modelling in 3 dimensions
Results for half the field, or 810 measurements
Results: Choosing the treatment effect function
Table: Comparing ‘Fixed’ modelling: 4 neighbour CAR with 15 depth
variances
pD DIC Degree/Knots Type
297 -2970 6 Orthogonal poly
358 -2990 8
371 -2967 10
318 -2923 4 Linear Spline
369 -3002 4 (+error in depth)
401 -2999 5 (+error in depth)
327 -2954 5 Cubic radial bases
368 -3013 5 (+error in depth)
13. Agriculture: Modelling in 3 dimensions
Results for half the field, or 810 measurements
Results
Figure: ‘Fixed’ part: Linear spline treatment effects, depth measured with
error & 95% credible intervals, CAR model, sites 1-54, December 22,
1998. Depth differences are those implied by the errors-in-measurement
model.
14. Agriculture: Modelling in 3 dimensions
Results for half the field, or 810 measurements
Results
Figure: 95% CI for the ratio of square root of the spatial variance to that
of the unstructured variance at the fifteen depths: Cubic radial bases
model with errors-in-measurement for depth.
15. Agriculture: Modelling in 3 dimensions
Results for half the field, or 810 measurements
Conclusions from modelling in three dimensions
From this modelling we concluded that
layered CAR models, where neighbours of a point belong to
the same horizontal depth layer,
best model the spatially structured variation
And they are
easier to define,
and faster to run.
than a CAR model based on the three dimensions
16. Four Dimensional Analysis of Agricultural Data
Model for Agricultural data which includes time
Considerations
In moving to four dimensions, it was clear that a model such
as ytid = fj(i) (t, d) + ψid + ηt + ϵtid with common spatial
effects across time (ψid ), and time residuals (ηt ) common
across sites and depths was unlikely to describe the data well.
We wanted to use the full field, 108 × 15 = 1620 data points
/ day rather than the 54 × 15 = 810 of the three-dimensional
modelling and that implied the need for a different computing
platform.
Preliminary modelling 5 days of the full dataset, which used
pyMCMC (Strickland, 2010) and a block updating Gibbs
sampler, firmed the view, that the data might best be
modelled (initially) by repeated use of the daily model.
17. Four Dimensional Analysis of Agricultural Data
Model for Agricultural data which includes time
Model
Let ytid be the response variable measured on date t, at site i (of I
plot sites in the horizontal plane), at depthid d (d = 1, ..., 15). Let
j be the treatment at site i.
Then
ytid = ftj (d) + ψtid + ϵtid , ϵtid ∼ N(0, σtd ), with
2
ftj (d) = αtjd , ( ∑ ) (1)
2
τtd
ψtid |ψti ′ d , i ̸= i ′ ,
ψti
ψtid ∼ N ρt i ′ ∈∂i ni′ d , ni ,
where ni is the number of sites adjacent to site i, and i ′ ∈ ∂i
denotes that site i ′ is a neighbour of site i. ρt is common across all
depths for a given date, t. ftj (d) indicates that a function of d is
estimated for each treatment and date.
18. Four Dimensional Analysis of Agricultural Data
Results from four dimensional model
−220 −200 −180 −160 −140 −120 −100
0.01
0.09 0.08
02
0.03
0.08
0.
0.07
03
−0.01
0.
0.07
0.06
6
0.0 0
0.06 3
2
0.0 0.0
0
3
0.
0.05
0.0
0.05
5
04
1 0.04
0.0 0.04
0.
0.
04
0.03 0.0
Depth
0.03 2
0.
03
0
0.02
0.
0.02
01
0.01 0.02
2
0.02
0.0
0.
0.02 03
0.03
10 20 30 40 50
Day
Figure: Long fallowing vs Response cropping. Saturated model. Contour
graph from the point estimates from the MCMC iterates of the full
model.
19. Four Dimensional Analysis of Agricultural Data
Results from four dimensional model
Figure: Square root of variances & 95% credible intervals at depth 100
cm
20. Four Dimensional Analysis of Agricultural Data
Results from four dimensional model
0.014
0.008
0.02
14
0.0
0.004
0.01
0.008
0.0
0.0
06
16
−50
0.016
6
0. 0.0
01
0.014 01 04
0.
2
0.006
0.004
18
0.00 0.01
8
0.012
0.0
−100
0.014
0.006
0.01
0.00 06
8 0.0
−150
0.002
Depth
0.004
−200
0.002
−250
0.004
0.006
−300
0.008 0.008
10 20 30 40 50
Day
Figure: Square root of unstructured variance: Days by Depth
21. Four Dimensional Analysis of Agricultural Data
Results from four dimensional model
0.02
5
0.0
0.01
2
0.015
0.0
2
−50
0.0
05
0.01
5
00
0.01
0.0
5
0.
0.015
0.01
0.0
15
5 0
0.01
−100
0.01
−150
Depth
0.005
−200
0.005
−250
0.01
−300
0.015 0.015
0.015
10 20 30 40 50
Day
Figure: Square root of spatial variance: Days by Depth
22. Selected References
Selected References
Banerjee, S., Carlin, B. P. and Gelfand, A. E.: 2004, Hierarchical modeling and
analysis for spatial data, Monographs on statistics and applied probability,
Chapman & Hall, Boca Raton, London, New York, Washington D.C.
Besag, J. E.: 1974, Spatial interaction and the statistical analysis of lattice systems
(with discussion), J. R. Statist. Soc. B 36(2), 192–236.
Besag, J. E. and Mondal, D.: 2005, First-order intrinsic autoregressions and the de
Wijs process, Biometrika 92 (4), 909–920.
Besag, J., York, J. and Mollie, A.: 1991, Bayesian image restoration with applications
in spatial statistics (with discussion), Annals of the Institute of Mathematical
Statistics 43, 1–59.
Cressie, N. A. C.: 1993, Statistics for spatial data. Wiley series in probability and
mathematical statistics. Applied probability and statistics. New York: John Wiley.
Donald, M., Alston, C., Young, R. and Mengersen, K.: 2011, A Bayesian analysis of
an agricultural feld trial with three spatial dimensions, Computational Statistics
and Data Analysis 55, 3320–3332.
Gelfand, A. E. and Vounatsou P.: 2003, Proper multivariate conditional autoregressive
models for spatial data analysis, Biostatistics 4(1), 11–25.
23. Selected References
Selected References
Hrafnkelsson, B. and Cressie. N.: 2003, Hierarchical modeling of count data with
application to nuclear fall-out. Environmental and Ecological Statistics 10,
179–200.
Lindgren, F., H. Rue, and Lindstrom J.: 2010, An explicit link between Gaussian fields
and Gaussian Markov random fields: The SPDE approach. Journal of the Royal
Statistical Society Series B, to appear.
Lunn, D. J., A. Thomas, N. Best, and Spiegelhalter, D.: 2000, WinBUGS - a Bayesian
modelling framework: Concepts, structure, and extensibility, Statistics and
Computing 10(4), 325–337.
Ngo, L. and Wand, M.: 2004, Smoothing with mixed model software, Journal of
Statistical Software 9, 1–56.
Rue, H. and L. Held: 2005, Gaussian Markov random fields : Theory and
Applications. Boca Raton: Chapman & Hall/CRC.
Rue, H. and H. Tjelmeland: 2002, Fitting Gaussian Markov random fields to Gaussian
fields. Scandinavian Journal of Statistics 29(1), 31–49.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde: 2002, Bayesian
measures of model complexity and fit. Journal of the Royal Statistical Society.
Series B (Statistical Methodology) 64(4), 583–639.
25. DICs, Priors and Fit
Priors for Equations 2-5
Table: Various priors used for the precisions of the timeseries models of
Method 2
Precision for observational error Precision for random walk error*
Prior 1 ∼ Gamma(.000001,.000001) ∼ Gamma(.000001,.000001)
Prior 2 ∼ Gamma(.0001,.0001) ∼ Gamma(.0001,.0001)
Prior 3 mean τ ∼ Gamma(.000001,.000001)
Prior 4 total ∗ r total ∗ (1 − r )
Prior 5 ∼ Gamma(.000001,.000001) mean τ
total ∼ Gamma(a, b), r ∼ Beta(1, 1)
a,b calculated via method of moments from mean & 95%CI for posterior in Method 1
26. DICs, Priors and Fit
Priors for Equations 2-5 (continued)
Table: Constants for priors 3-5 for the precisions of the timeseries models
of Method 2
Depth (cm) Mean τ a b
100 1395 6.934 .004971
120 1759 6.024 .003425
140 2241 12.413 .005538
160 3019 52.316 .017327
180 3226 87.249 .027045
200 3201 180.410 .056354
220 2175 82.412 .037894
27. DICs, Priors and Fit
Model Comparisons & the DIC
Table: Summary of DICs for Contrast 1 (Long fallowing vs Response
cropping) at Depth 140
Prior 1 Prior 2
Model pD DIC pD DIC
Regression 30 -377
AR(1) 4 -343 4 -343
AR(1)(12) -2 -356 -2 -355
AR(2) 4 -343 5 -342
RW(1) 69 -435 36 -379
RW(1) (weighted) 73 -468 * 40 -392 *
RW(1) (t10 distribution) 73 -450 39 -378
RW(2) 20 -370 23 -373
RW(2) (weighted) 26 -390 43 -395 *
RW(1) (1768 time points) 49 -304 (Prior 5)