This document summarizes research comparing different weighting procedures for volunteer online panels. It finds that post-stratification weighting and propensity score adjustment had limited impact on reducing differences between online panel and population data. Weighting did not consistently improve representativeness, and for some weights, differences became larger. While weighting aims to correct for systematic errors, other issues like biases in reference surveys, mode effects, and unmeasured variables hindered improving representativeness. Future research could combine weighting techniques, use improved reference surveys collected in parallel, and question how representativeness is defined.
1. Comparing different weighting
procedures for
volunteer online panels
Stephanie Steinmetz
and Kea Tijdens
AIAS Lunch Seminar, 1. October 2009
erasmus studio
2. Outline
Background
Sources of errors in ((volunteer) web) surveys
Weighing - a solution?
Example for the German and Dutch WageIndicator data
Results
Conclusion and Outlook
3. Background
Increasing importance of web surveys
Germany: between 2000 und 2007 from 3% to 27% (ADM, 2007)
Advantages
time and cost reduction, interactivity, flexibility,
‘worldwide’ coverage, no interviewer influence
Disadvantages
Representativeness? To what degree are
(volunteer) web survey results representative of the
general public?
4. Types of web surveys (see Couper, 2000)
Sample selection is
probability based = representative
- intercept surveys,
- online access panels,
- mixed-mode surveys
not probability based = representative?
- entertainment surveys
- self-selected web surveys
- volunteer online panels
5. Sources of error
Combination of causes
1 (Non)Coverage: number of people having internet access +
differences between the persons with and without internet
access.
2 Sampling/Self-selection: no comprehensive list of Internet
users to draw probability-based sample + people with specific
characteristics participate in a volunteer online panel.
3 Non-response: Not all persons finish the questionnaire,
people with specific characteristics might have a higher non-
response.
+ measurement, processing and adjustment errors
6. Weighting - a possible solution?
Weighting is a mean to correct subsequently for
systematic survey errors and to adjust the sample to the
target population.
Expectation disappearance of significant differences
between web survey & random reference survey.
☺ = web survey data can be adjusted to be representative
of general public.
= persistence of differences due to other error sources,
like measurement and processing errors
7. Solution: Post-stratification weighting
Aim: Adjustment for demographic under- and over-
representations between sample and target
population
Method: %population (reference data) / %sample (web) =
weighting coefficient
Findings: Necessary but has a rather limited impact
(Vehovar et al. 1999, Loosvelt and Sonck 2008)
corrects for proportionality but not necessarily
for representativeness of substantive answers
8. ...but
Previous research comparing web and traditional
methods
Significant differences can be observed for web
respondents. They...
– are more intensive users of the Internet, more
technically-oriented (Bandilla et al. 2003; Vehovar et al. 1999)
– have a larger social trust & a greater subjective control
over their lives (Lenhart et al. 2003)
– are more politically and socially active (Duffy et al. 2005)
9. Solution:
Propensity Score Adjustment (PSA)
Origin: experimental studies (Rosenbaum & Rubin, 1983)
Aim: to correct for differences due to the varying
inclination to participate in web surveys
(Harrison Interactive).
Findings: Mixed (Taylor 2005; Bethlehem & Stoop 2007)
- some differences disappeared by demographic
weighting,
- some only after additional PSA, and
- others continued to exist or become even larger
10. PSA - method (see Schonlau et al. 2009)
Web and probability-based reference survey are
combined in one data file
Logistic regression of people’s probability to
participate in the web survey given demographic
and/or attitudinal variables estimation of PS
Make distribution of these propensity scores similar
for web survey and random sample = calculation of
weight wpsi (1/ psi if W = 1 (in the web survey), and 1/(1-psi) if
W = 0 (in the reference survey))
web survey and random sample do not differ
significantly for selected variables included in the PS
11. Example - the WageIndicator data
Web surveys: German and Dutch WageIndicator
data, year 2006, employees, age 16-75, cross
monthly income 400€-10000€ (Dutch net hourly
income)
NGerman= 21914
NDutch = 8015
Reference surveys: Same restrictions
Germany (GSOEP, 2006) N= 7993
Netherlands (OSA, 2006) N= 2019
12. Selection bias - socio-demographics
Germany Netherlands
LS SOEP LW OSA
100 100
80 80
60 60
40 40
20 20
0 0
low
low
medium
medium
high
high
16-34
35-44
45-75
16-34
35-44
45-75
women
women
men
men
sex education cohort sex education cohort
13. Selection bias - Labour markert
Germany Netherlands
LS SO P
E
LW OSA
100
100
80 80
60
60
40
40
20
20 0
manual non full part below above
0 manual
manual nonmanual full part below above
occupation workingtime unemployment occupation working time unemployment
14. Selection bias - satisfaction
Germany Netherlands
LS S EP
O LW O A
S
100 100
80 80
60 60
40 40
20 20
0 0
not satisfied satisfied not satisfied satisfied not satisfied satisfied not satisfied satisfied
health satisfaction jobsatisfaction health satisfaction jobsatisfaction
15. Summary
Similarities: underrepresentation of
women, people between 45 und 75, part-timers,
persons from regions with high unemployment,
unsatisfied people
Differences: underrepresentation of
DE: highly educated, manual workers
NL: low and medium educated, non-manual workers
Two possible solutions
a) Post-stratification weighting
b) PSA
16. Weights
A) 6 post-stratification weights:
W1= gender (2), education (2) and cohort (2)
W2= gender (2), education (2), cohort (2) and part time (2)
W3= gender (2), education (2), cohort (2) and nonmanual (2)
W4= gender (2), education (2), cohort (2), part time (2) and jobsat
W5= gender (2), education (2), cohort (2), nonmanual (2) and jobsat
W6= part(2) and jobsat(2)
B) 4 PSA weights
PS1 = treat women edu2 coh2 nonman part perm nojob logwagemo
PS2 = treat women edu2 coh2 nonman part perm nojob logwagemo +
healthsat
PS3 = treat women edu2 coh2 nonman part perm nojob logwagemo +
jobsat
PS4 = treat women edu2 coh2 nonman part perm nojob logwagemo +
healthsat jobsat
23. Conclusion
Impact weighting
Both weighting methods show no substantial impact
Moreover
- no consistency within weights
- for some weights differences become larger (?!)
- effect of weights differ between countries
Weighting cannot improve representativeness of
(volunteer) web surveys
Problems
- Reference surveys (also biased?), mode effects,
unobservables (not measured)
24. Discussion
Possible solutions for representativeness:
Improving weights through inclusion of more
variables or advanced/mixed weighting procedures
Only mixed-mode surveys (time and cost-reduction
disappears)
Non-representative use of web survey data
(only for experiments or exploratory analysis)
OR
questioning the definition of representativeness
(content vs. methodological)
survey quality ≠ absolute
25. Pa
rtt
im
e
Fu M
llt al
Pa im e
rtt e 15
-2
im Ma 4
e le yr
0%
2%
4%
6%
8%
10%
12%
14%
Fu Fe 15 hi
llt
im m -2 gh
al 4
e e yr er
Fu Fe 15 hi
llt m -2 gh
im al 4
e e yr er
Fe 15- hi
Pa m 24 gh
rtt al er
im e yr
e 15 hi
gh
Pa M -2
rtt al 4 er
e yr
Fu ime 25 lo
llt -4 w
im Ma
le 4 er
e yr
Pa Fe 45 lo
m -6 w
rtt
im al 4 er
e e yr
M 45 lo
Pa -6 w
rtt al
e 4 er
15 yr
Fu ime -2 lo
llt M 4 w
im a yr er
Pa e F le 1 m
rtt em 5- id
im 24 dl
al e
Fu e e yr
Telepanel_NL_% _2002
llt Fe 25 lo
im m -4 w
e a 4 er
Fe le 1 yr
m lo
WageIndicator_NL_% _2005
Pa 5-
rtt al 24 w
im e yr
er
15
-2 lo
Pa e M
4 w
World Value Survey_NL_% _1999
al
rtt e yr er
Labour Force Survey_NL_% _2005
Fu ime 45 m
llt M -6 id
im al 4 dl
e
e yr
Fu e F
em 25- m
llt
im 44 id
dl
e al
e yr e
Fe 45 hi
Pa m -6 gh
rtt al
e 4 er
im 45 yr h
Pa e M -6 ig
rtt a 4 he
im le 2 yr r
e 5- m
Fu M 44 id
dl
Pa lltim e 4
al yr e
m
rtt e 5-
im M 64 idd
le
e al
e yr
F 15 hi
Fu em -2 gh
llt al 4
im e yr
er
Pa e 15
lo
M -2 w
rtt
im al 4
e e yr er
Pa
Fe 15- m
rtt id
im m 24 dl
e a yr e
Pa Fe le 2 m
rtt m 5- id
dl
im al
e
44
e
Fu e yr
llt Fe 45- l
im m 64 ow
e a yr er
Fu Fe le 4 hi
llt m 5- gh
im al 64
Pa e e yr
er
Fe 25
rtt m -4 lo
im al 4 w
e yr er
Fe e 2
5- m
Fu ma 44 idd
le le
Pa lltim 45 yr h
rtt e -6 ig
im M 4 he
e a yr r
Fe le 4 m
Fu m 5- id
64 dl
llt al e
im e 2 yr
e 5-
Fu
llt M 44 low
er
im ale yr
Fu e M 25- hi
gh
ll al 44 er
Pa tim e yr
e 45
Representativness of surveys
rtt M -6 lo
im al 4 w
e e er
Fe 45 yr h
Fu m -6 ig
al 4 he
llt
im e yr r
2 m
Fu e M 5-4 id
llt 4 dl
im
al
e y e
e 25 r m
M -4 id
al
e 4 dl
e
25 yr h
-4 ig
4 he
yr r
m
id
dl
e
26. Outlook
Comparison: more countries
Methods:
- Combination of different weighting
techniques (see Lee & Valliant, 2009)
- Weighting with ‚better‘ reference survey and
more webograhic variables (LISS panel=
parallel survey, identical questionnaire + same mode)
27. The end
Thanks' for listening...
...questions ?
...comments and suggestions?
contact: steinmetz@fsw.eur.nl
erasmus studio