3. Missing biological data
• Substantial proportions of missing biological data
• A number of processes that could result in missing
biological data in biosocial surveys
– No main interview
– Main interview but no nurse visit
– Nurse visit but no blood sample
– Blood sample but no blood analytes
4. Blood sampling outcome in UKHLS-
wave 2
blood sampling outcome- UKHLS wave 2
Freq. Percent
blood sample obtained 13021 64%
blood sample not obtained 1412 6.9%
refused blood sample or nurse 4271 21.0%
ineligible for blood sample 1570 7.7%
blood sample lost in transit 81 0.4%
5. Use survey data for correlates of
missing biological data
Detailed survey information on participants
Standard practice for biological data collectors (usually
nurses) to record reasons for missing biological
samples.
These data can be useful in informing us about the
reasons for missing biological samples.
6. Typical variables included in
developing non-response weights
• Sample month/quarter
• Region or other geographical aggregates of postcode
sectors
• Deprivation indices (IMD, Townsend, Carstairs) and other
geographically-referenced indicators
• Interviewer observation variables regarding the dwelling
and immediate surroundings
• Social, demographic and economic indicators from the
household and individual questionnaires
7. English Longitudinal Study of Ageing (ELSA)-
wave 6 hair analytes
• At wave 6, ELSA hair samples were collected for the
first time.
• From the hair samples, hair analytes such as cortisol
were processed. Hair cortisol is an integrated
measure of Hypothalamic-Pituitary-Axis (HPA) axis
activity, with higher levels indicating higher
physiological stress responses.
• Around 2 cm of hair were collected, which is
indicative of stress levels over the last 2-3 months.
8. ELSA Hair Sample Protocol
In order to measure Cortisol in a hair sample, the sample needs to be a
minimum of 2cms in length and weigh a minimum of 10mg.
http://www.elsa-project.ac.uk/uploads/elsa/docs_w6/hair_sample_card.pdf
9. ELSA Hair Sample Protocol
The hair sample should be taken from an area on the back of the head, indicated
by the yellow circles on the pictures
10. ELSA wave 6- missing hair cortisol data
• Out of 7,419 ELSA participants in the nurse data collection,
there were only 2,558 participants with hair cortisol data.
• This is partly because some people were ineligible for the
data collection (having less than 2 cm of hair).
• Others refused to give hair samples, mainly for reasons
related to appearance.
• And funding constraints meant that only a subset of the
hair samples could be processed to produce hair cortisol
data.
11. Possible characteristics of ELSA
participants with missing hair data
• Baldness predominantly affects men and older adults
• Given the importance of appearance to some participants, it
is likely that having a negative self-image is linked to missing
hair cortisol data.
• ELSA survey asks detailed questions related to depressive
symptoms (the CESD questionnaire)
• As stress and depression are interlinked, depressive
symptoms may predict both missing hair cortisol data as well
as higher levels of hair cortisol.
12. Predicted probability of missing hair cortisol
data by age/gender and depressive symptoms
age/gender interaction Depressive
symptoms
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
under 60 60-64 65-69 70-74 75-79 over 80
predictedprobofhaircortisolsample
Women
Men
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Less than 4 4 or more
predictedprobofhaircortisolsample
CESD Depressive
symptoms score
13. Differences in complete case and weighted
regression estimates of (log) hair cortisol by
depressive symptoms
0
0.5
1
1.5
2
2.5
3
complete case weighted analyses
predictedhair(log)cortisol
CESD score less than 4
CESD score 4 or more
14. Missing biosocial data: methodological
considerations
• Rich survey and nurse observation data allows us
to discover factors that are both correlated with
the missingness mechanism as well as our
outcome of interest.
• Inference based on complete case analyses may
be biased if we don’t take account of such
factors.
• Need to investigate the reasons behind missing
biological data and incorporate such information
in their methods to deal with missing data.
16. Methodological considerations when analysing
biosocial data
• A biosocial theoretical framework is key
• Consider:
– normal ranges of biological variables (if available)
– identify outliers
– relevant medication use
– context of blood sampling like time of day, room temperature,
recent operations, smoking, food & alcohol, etc
– quality control processes in producing biological data
– transformations (for skewed biological dependent variables)
• Identify relevant predictors of missing biological data to use
in non-response methods from rich social/attitudinal data
Notas do Editor
We used these three variables (depressive symptoms and the interaction between age and gender) in a logistic regression model to predict missingness and derived weights for missing hair cortisol data based on inverse probability weights from this response model. We then applied these weights to the regression model predicting (log) cortisol with depression as the explanatory variable and compared these estimates to the complete case analyses where no weights are used, but accounting for the design of the survey in both cases with respect to clustering and stratification. We see in Figure 1 that although both the complete case and weighted analyses show similar patterns, the association of depression with (log) cortisol is stronger in the weighted analyses. When the log cortisol estimates are exponentiated, there is a difference of 2.38 pg/mg in cortisol estimates from the complete case vs weighted analyses. This suggests that the association between depression and cortisol is under-estimated in the complete case analysis, possibly because there are fewer depressed people who are willing to give a hair sample.
There are many caveats to this analysis. The model of missingness would normally include many more predictors and it also does not take into account the other complexities of missing ELSA data at wave 6 such as survey weights for the nurse visit. But the main point of this article is to highlight the richness of the survey data which allows us to discover factors that are both correlated with the missingness mechanism as well as our outcome of interest. Our inference based on complete case analyses may be biased if we don’t take account of such factors. Researchers using biosocial datasets should investigate the reasons behind missing biological data using the rich survey data to discover the missingness mechanisms and incorporate such information in their methods to deal with missing data.