Los Angeles R users group - July 12 2011 - Part 1

Using R for multilevel modeling of salmon habitat
Yasmin Lucero, Statistical Consultant

Kelly Burnett, PNW Research Station, USFS
Kelly Christiansen, PNW Research Station, USFS
E. Ashley Steel, PNW Research Station, USFS
Eli Holmes, NW Fisheries Science Center, NOAA

Acknowledgements:
NRC-RAP, National Academy of Sciences
ISEMP Monitoring Program, NOAA

Outline

• Background on ﬁsh ecology and the data

• Background on multilevel modeling

• Demo of lme4 package in R

The big goal: measure effect of stream
habitat quality on ﬁsh survival

Photo by David Wolman

Schooling Juvenile Coho Salmon

Land Area Affected by
Endangered Species
Act Listings of Salmon
& Steelhead
* 28 distinct population segments:
6 endangered, 22 threatened

* 176,000 sq. miles in Washington,
Oregon, Idaho & California study area
* 61% of Washington’s land area,
55% of Oregon’s, 26% of Idaho’s, &
32% of California’s

February 2008

The Data

~266 study sites
Oregon coastal region
juvenile coho salmon habitat
sparsely sampled, longitudinal
study design Oregon
12 year time series
35 data layers
~100 landscape level variates
~22 habitat level variates

Abundance increases over time due to variation in
Ocean conditions (i.e. external to our analysis)

coho.obs coho.obs
●

1.0
●
●
●

8 ●

0.8
●
●
●
●
● ●
6 ●
●

coefficient
●

0.6
●
fs.coho.obs

●
● ● ●
● ●
●
● ●
●
4 ● ● ●
●
● ●
● ● ● ● ●
● ●

0.4
●
● ●
● ●
● ● ●
●
●
● ● ●
● ●
●
● ● ●
● ●
● ●
● ●
2 ● ●
●
●
●
●

0.2
● ●
● ●
●
● ●
●
● ●
● ●
●
●
●

0 1998 2000 2002 2004 2006 2008

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 year
fs.year

Sparsely sampled longitudinal data
• Only ﬁsh data has time 3.0
17100201010102 17100202030201 17100203020501 17100203020902

component 2.5
2.0 ●

• year effects exogenous
1.5
1.0 ● ● ● ●
● ● ● ● ●
0.5 ●

• Landscape data everywhere
● ● ● ● ● ● ● ● ● ●
● ● ●
●
● ● ● ● ● ● ● ● ● ●
0.0 ● ● ●

• Habitat data some places
17100203040402 17100203040602 17100203070101 17100203090101
3.0 ●

2.5
• Fish data some places
●
2.0 ●
●

1.5 ●

• Not always same places
●
●
1.0 ●
●
●
fs.coho.obs 0.5 ● ● ●
●
0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

17100204050303 17100205040105 17100205070202 17100206010504
3.0
2.5
2.0
1.5 ● ● ●
●
●
1.0 ● ● ● ●
●
0.5
● ● ● ●
●
0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

17100206010603 17100303080202 17100304010604 17100305060202
3.0
2.5
2.0 ●

1.5
1.0 ●
0.5 ● ● ● ● ●
Figure Legend. Mean density of coho at
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
0.0 ● ● ● ● ● ● ● ● ● ● ●

16 frequently visited sites for 1998–2009 1998 2002 2006 1998 2002 2006 1998 2002 2006 1998 2002 2006
2000 2004 2008 2000 2004 2008 2000 2004 2008 2000 2004 2008
year

How the landscape data is acquired
summarize across area surrounding
GIS map layers
study site

habitat level data is collected by survey visits:
labor intensive to collect/therefore less abundant

gradient
pool density
debris
ﬂow rates
drainage area high structure: rocks and woody debris
channel width
etc.

shallow, highly channelized

Multilevel structure for two reasons

Multi-level structure for two reasons:
(1) longitudinal sampling design
(2) varying scales of predictors

landscape

habitat

ﬁsh

Generalized linear mixed models
(aka hierarchical, multilevel, or random effects models)

canonical example: school test scores

class class class

class class class

class class class

school school school

state

student_score ~ class_average + school_average + state_average

state level predictors Norm(0, σstate )
2

state

school level predictors Norm(µstate1 , σschool )
2

school 1 school 2 school 3 school 4

class level predictors Norm(µschool1 , σclass )
2

class 1 class 2 class 3 class 4

student level predictors Norm(µclass3 , σstudent )
2

student 1 student 2 student 3 student 4

Our model structure is not so complicated
global

landscape level predictors

site 1 site 2 site 3 site 4

habitat level predictors
& year effects

obs 1 obs 2 obs 3 obs 4

Modeling presence/absence of ﬁsh:
logistic mixed model with site and year effects
year effects
γ ∗ year
logit(Pr{yi = 1}) = βyear xy + β1 xh1 + β1 xh2 + αsite
+ βh1 xh1 + βh2 xh2 + ...
+ αsite habitat level
predictors
site effects

αsite ∼ Norm(βl1 xl1 + βl2 xl2 + ... , σsite )
2

landscape level
predictors

Fit a lot of models, some predictors rose to the top
1300

m3
m18
m5
m6
m13
m11
m17
m15
m1m4
m9
m2
m21
Best predictors:
m8
m12
m7

gradient
1250

debris level
drainage area
1200
AIC

m14
mean elevation
1150

m10
m32
m30
m33
m34
1100

m16
m29
m31
m25
m20
m26
m28
m27
m19
m23 m22
m24

−620 −600 −580 −560 −540 −520

logLik

Overall model performance is strong at some
things, weak at others

fitted probabilities

1.0
●
●
●
●
●
●
●
●
●
●
●

0.8
800 ●
●
●
●
●
●
●
●
●
●
●
●
●

ﬁtted probability
●
●

fitted probabilities
●
●
●

0.6
600 ●
●
●
●
●
●
●
●
●
●
●
●
count

0.4
●
●
400 ●

0.2
●
●
200 ●

●

0.0
0
0 1
0.0 0.2 0.4 0.6 0.8 1.0
fitted(models.ls$m24)
absence presence
histogram of ﬁtted probabilities

Another look at model ﬁt: some heavy outliers
~
pa.obs
s.year + (fs.grad.rs + fs.cfs.down.rs + fs.vol.len.rs + el.mean.rs | catchment
p/a of coho obs (data)

0.8

1998 2004
1999 2005
2000 2006
2001 2007
0.4

2002 2008
2003 2009
0.0

0.0 0.2 0.4 0.6 0.8 1.0

fitted

conclusions

• site matters

• we can explain about half of the variation in why site matters
with 4-5 predictors

• habitat data more valuable than landscape data

• small number of predictions are very wrong, and we can’t seem
to improve them

Thanks. yasmin.lucero@gmail.com

Model predicted probabilities given presence/
absence with and without site effects

m0 m1
1.0

1.0
●
●
●
●
● ●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
0.8

0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Pr{coho present}

Pr{coho present}
●
●
●
0.6

0.6
●
●
●
●
●
●
●
●
●
●
●
0.4

0.4
●
●
●
●
●
●
●
●
●
0.2

0.2
●
●
0.0

0.0
FALSE TRUE FALSE TRUE

coho presence coho presence

Los Angeles R users group - July 12 2011 - Part 1

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (10)

Semelhante a Los Angeles R users group - July 12 2011 - Part 1

Semelhante a Los Angeles R users group - July 12 2011 - Part 1 (20)

Último

Último (20)

Los Angeles R users group - July 12 2011 - Part 1