1. Online Opinions – A Pilot to Extend Social Data
Collection Capabilities within the Office for National
Statistics
Kathryn Ashton and Ed Dunn
1. Summary
In the face of falling response rates, increases in traditional face-to-face social data collection
costs and challenging public sector efficiency targets, there is increasing pressure to review
and improve methods of social data collection. This has led to the exploration of the internet
as a tool for social data collection.
In November 2008 the Office for National Statistics’ (ONS) Social Survey Division ran a pilot
on the Opinions survey (OPN - previously known as the Omnibus survey) providing
respondents with the option to complete on-line rather than via a traditional face-to-face
interview. Key objectives of the pilot were to determine the unit and item response rates that
might be obtained, gain valuable experience of designing and implementing a web survey, and
to investigate the characteristics of respondents who would respond on-line.
The pilot demonstrated that it is possible to get a substantial, if minority, response from a web
survey. The pilot obtained responses from around 20 per cent of issued addresses. However,
the respondent profile of those completing on-line does appear to be substantially different to
a standard face-to-face respondent. The results suggest that internet-only surveys risk
substantial bias in estimators but that the internet could possibly be used to supplement
traditional methods of data collection. However, further work would be required to establish
if this would cause mode effects in the quality of the data collected.
2. Introduction
There is a combination of pressures which have led to ONS exploring the use of web surveys
as part of a mixed mode approach to social data collection. Challenging public sector
efficiency targets and the increasing costs and difficulty of face-to-face data collection have
both led to web-based data collection becoming an increasingly attractive mode for
government social research surveys. In addition, over the past few years, there has been a
steady decline in the response rates of our major social surveys. These pressures have
enhanced the appeal of web based data collection, as part of a mixed mode approach and as a
potential opportunity to enter the online-only survey market.
While face-to-face, telephone and paper-based interviewing are well established activities, the
development of web-based data collection within ONS has not, so far, extended beyond
consideration of the significant and challenging issues web-based data collection poses. As
Flatley et al (2001) indicated, much development work is needed. Key methodological issues
and concerns have to some extent prevented further exploration of web-based data collection
so far.
This paper will highlight the methodological issues with web surveys, outline the key
objectives of the pilot, describe the methodology used and present the main findings.
46 SMB 53 1/04
2. 3. Methodological Issues with Web Surveys
There are a number of methodological issues and concerns surrounding the use of web surveys
as a method of data collection within ONS national household surveys. Household surveys
have several main key characteristics:
- the use of random probability sampling to gain a representative sample
- the questionnaire is often very long and questions require complex routing
- they require a high response rate in order to produce results with the required accuracy
and precision.
- questions frequently require responses from all members of the household
All these characteristics are difficult to achieve in a web survey. Past research (Fricker and
Schonlau 2002) indicates that only a certain sub-set of the population with similar
characteristics will respond to a web survey. This will cause bias in response. For example, if
higher income households are more likely to complete on-line than lower income households,
or if those from a certain geographical area are more likely to complete the survey on-line than
others.
Random probability sampling is difficult to achieve in a web survey. To date, there is no
sampling frame of those who use the internet or are able to use the internet within the UK. This
will result in different sampling methods having to be utilised when designing a wholly internet
based survey.
There is a multitude of choice in which software to use when designing a web based survey.
The software available ranges from the simple HTML approach to more complex
programming.
As in all methods of data collection, security is a key concern with internet surveys. Due to
recent data security issues within government, respondents may be more sensitive about
disclosing personal details over the internet, and this is an aspect which the design of an
internet survey must attempt to overcome. It is important that the software used complies with
relevant data security and confidentiality regulations, not only when storing the data, but also
during transmission.
Inevitably, the required developmental work required can be both costly and timely.
4. Pilot Objectives
Key objectives of the web pilot were to:
- determine the unit and item response rates that might be obtained using a web-based
survey as the preferred approach
- investigate the characteristics of people responding to the on-line survey and, where
possible, compare to the face-to-face mode of the OPN survey
- evaluate the on-line software used and determine our current hardware and software
capabilities in this area, including consideration of data security issues for web-based
surveys
- examine the effect of offering an incentive to half of the sample on unit and item
response rates
3. - gain valuable experience in translating current computer assisted personal interview
(CAPI) programmes to a computer assisted web interview (CAWI) approach
- gain experience in designing and launching a CAWI survey and the logistical
requirements involved
- where the scale of the pilot allows, ensure that the needs of OPN customers, in terms
of data quality, quantity and timeliness of delivery, can continue to be met
- deliver the pilot to a short timescale (by December 2008) in order to feed into a
review of social data collection within ONS and provide recommendations for further
work in this area
Given the limited nature and short timescale for the pilot, the objectives did not seek to:
- investigate the multitude of mode effects and question/response ordering effects that
may occur
- make robust comparisons between the web pilot and the 'normal' face–to-face OPN
survey being run in parallel
- test the impact of different incentives
- test the impact of alternative advance letter design or format
- test the impact of alternative question wording
- investigate alternative designs and format of the web survey
5. Sample Design and Respondent Selection
5.1 Sample Design and Respondent Selection of the Web Pilot
The web pilot took a simple random sample of 2,000 addresses from the Post Office Address
File (PAF). By using a simple random sample, all cases within the sampling frame had an
equal chance of being selected. This ensured the highest level of confidence in the unit
response rate was obtained. As previously noted, there is no sampling frame available for
those who have access to, or are able to use the internet. Only English households were
selected since an existing sampling program was available to use and there was no capacity to
amend the program to include Scottish, Welsh or Northern Irish households.
Half of the sample (1,000 addresses) were selected at random and offered an incentive to take
part on-line. Respondents in the other half of the sample were offered no incentive to take
part. The incentive offered was a £10 Amazon 'electronic' gift certificate which was emailed
to the respondent on receipt of their fully completed response.
Each household within the sample was sent a modified version of the Integrated Household
Survey (IHS) advance letter, inviting them to take part in the 'Opinions Survey' giving some
basic details about the survey and its uses. The on-line aspect was heavily emphasised, but
the letter also suggested that if they did not complete on-line by December 1st an interviewer
may call to conduct the survey in person. The letter contained the website address
'www.ons.gov.uk/takepart' and a unique 12 character ‘userid’ (in the format CPS08123456x)
and a five character password (in the format x12yz) for each respondent. The website
'www.ons.gov.uk/takepart' contained some basic completion information for respondents and
a direct link to the on-line survey.
The letter asked the adult with the most recent birthday to complete the survey on-line. This
is a method of random selection of the respondent within the household (Dillman 2007). It is
important to note that no attempt was made to evaluate the effectiveness of this response
selection method within this pilot.
Half way through the field period, a follow-up letter was sent to non-responders, but worded
as if to all respondents. At the survey close, a further letter was sent to non-responding
households informing them that their participation was no longer required and that an
interviewer would no longer be calling.
4. 5.2 Sample Design and Respondent Selection of the Opinions Survey (OPN)
The OPN face-to-face survey differs from the web pilot in that it uses multistage cluster
sampling, as opposed to simple random sampling. The November OPN which was in the field
at the same time as the pilot had a sample size of 2,010 households.
The face-to-face OPN survey uses a Kish grid to select a respondent. This involves
constructing a list of eligible household members, based on their age, then selecting the
respondent using the address’ serial number.
6. Questionnaire Design
The pilot used a set of questions and question blocks designed to be broadly equivalent (in
terms of questionnaire length and the variety of questions included) to the November OPN
survey. However, the full Integrated Household Survey (IHS) core, which asks each member
of the household to complete a section of the questions, was not included in the web survey.
This was due to concerns over the impact on response (e.g., it would require all members of the
household to log-on and complete the section) and because of the extra complexity in
programming.
The questionnaire consisted of:
- IHS core questions asked of the individual selected to respond (e.g. socio-
demographics, economic activity questions)
- smoking OPN module
- healthy eating OPN module
- charities OPN module
- tax OPN module
- disability OPN module
- follow-up question for consent to further research
Questions were reviewed to take web-based self-completion into consideration. Most
questions and responses were incorporated in their existing format. However, some
modifications were necessary; e.g., replacing 'Code all that apply' with 'Please select all that
apply.' Dillman, (2007) found that respondents are more likely to drop out of a questionnaire if
they encounter problems so, as response was the main quality measure, it was important the
pilot questionnaire was as intuitive and easy to complete as possible.
The literature in this area indicates that the primary tasks of the respondent are to read,
comprehend and respond. The pilot's web page design and layout reflected this. ONS logos
and colours were used throughout the questionnaire; no multimedia effects were used as they
may have affected the respondent's interaction with the instrument and influence their answers.
The design also respected the Western visual flow of reading from left to right. In considering
the page layout, an attempt was made to minimise the amount of screen scrolling required by
the respondent while also attempting to maximise the numbers of questions per page. This is a
standard approach to web surveys, as suggested by Dillman (2007). A progress bar was
included within the web-based questionnaire to motivate respondents to proceed through the
questionnaire and not drop out.
7. Findings
5. 7.1 Unit response
Table 1 provides headline response figures to the pilot web survey with highlights from the
November OPN figures for comparison.
The headline response figures were:
- 18% of the total issued sample of the web pilot responded compared with the
November OPN which gained a response rate of 53%
- of the incentivised group on the web pilot, 18% responded and the non-incentivised
group had a response rate of 19%
- 69% of the full issued sample on the web pilot responded to the income question
compared with the November OPN where 84% of the sample responded.
Table 1
Headline Response Figures
Web Pilot Web pilot Nov OPN Nov OPN
n % n %
Total issued sample 2000 2010
Full response~ 364 18 1083 53^
Partial response # 32 2
Full response based on assumptions
on eligibility and internet access*
364 33 1083 59¬
Issued sample offered incentive 1000
Full response~ (incentivised group) 179 18
Partial response # (incentivised
group)
14 1
Issued sample not offered incentive 1000
Full response rate~ (non-
incentivised group)
185 19
Partial response rate #(non-
incentivised group)
18 2
Response to income question (full or
banded)
251 69 910 84
within incentivised group 124 69
within non-incentivised group 127 69
~ Full response is defined as where respondents have answered all questions including a refusal to
provide income details. Refusal was a valid response category.
# Partial response is defined as being where a respondent has not finished completing the survey
^ This is a crude response rate based on issued addresses as for the web pilot we have no information on
eligibility. More detailed response rates are available for November OPN
* Uses assumption of 10% ineligibility and 61% household internet access rate applied to 2,000 issued
sample = 1,098 eligible households with internet access.
¬ This is the published OPN response rate for November calculated as the number of achieved
interviews as a percentage of the eligible sample.
These response rates are rather healthy if we compare them with those of other recent internet
surveys. For example, the Scottish Census test obtained a response rate of 17 per cent and the
Canadian Census in 2006 achieved a response rate of 19 per cent. Research, including work
by Fricker and Schonlau (2002), indicates that web surveys do only achieve fairly modest
response rates, and this is why they should be used as part of a mixed mode method.
6. Solomon (2001) states that partial responders are most likely to stop completing a
questionnaire when: 1) encountering the very first question; 2) encountering complex
household grid questions or; 3) when asked to supply their email addresses. Our pilot
supported this argument. Partial responders either stopped at the first question, the household
grid question, which asked for information on each household member, or the question which
requested their email address at the end of the interview. Methods to overcome this could be
derived by further investigation into the issues of partial response.
The pilot results showed that there was little difference between the incentive and non-
incentive groups in terms of response. But these results may have been an effect of the
approach which was taken, which offered the incentive on receipt of the full completion of the
survey. Literature in the US has suggested that pre-paid incentives may have more of a
substantive effect (Dilman 2007, Göritz 2006, Singer, Hoewyk and Maher 2000).
7.2 Item Response
When compared with the November OPN, response rates were likely to be lower on the
internet pilot for the name variable, personal income questions and employment status
questions. In particular, the face-to-face interview gained an item response rate of 84 per cent
on the personal income questions, whereas the internet pilot received a response rate of 69 per
cent. All of these results were statistically significant.
The results also indicated that response rates were likely to be higher on the internet pilot than
the face-to-face interview for questions on health and religion.
7.3 Survey metrics
7.3.1 Timing
Overall, 80 per cent of respondents appear to have completed the survey in less than 25
minutes. Table 2 provides further details.
Table 2
Length of Survey in Minutes
Length in Minutes Frequency Percent
0-5 5 1
6-10 20 6
11-15 107 29
16-20 100 28
21-25 58 16
26-30 25 7
31-35 18 5
36-40 11 3
41-45 7 2
46-50 5 1
51-55 1 0
56-60 0 0
60-70 4 1
Over 70 3 1
Total 364 100
7. This is significant when compared with the average time taken to complete a face-to-face OPN
survey, which is on average 45 minutes, excluding interview administration time.
7.3.2 Time and day of completion
Overall, 75 per cent completed the survey between the hours of 8am to 6pm, and 11 per cent
between 7pm and 8pm. However, the most popular hour in which to begin the survey was
between 7pm-8pm, with 4pm-5pm a close second. Just under a quarter of responses were
completed at the weekend and Wednesdays and Fridays proved to be the most popular days -
this reflects the sending of the advance and follow up letters. Tables 3 and 4 provide further
details.
Table 3
Day of completion
Day of Completion Frequency Percent (%)
Monday 36 10
Tuesday 36 10
Wednesday 76 21
Thursday 56 15
Friday 70 19
Saturday 43 12
Sunday 47 13
Total 364 100
Table 4
Starting Hours of On-line Completion
Starting Hour Frequency Percent (%)
7.00 3 1
8. 8.00 5 1
9.00 21 6
10.00 25 7
11.00 32 9
12.00 21 6
13.00 21 6
14.00 28 8
15.00 25 7
16.00 35 10
17.00 28 8
18.00 26 7
19.00 39 11
20.00 19 5
21.00 20 6
22.00 11 3
23.00 2 1
24.00 3 1
Total 364 100
In hindsight, a useful additional question on the survey would have been a question that asked
where the respondent completed the survey; for example, at home, at work or at a friend's
house. This variable would have supported further analysis into factors of response, and
would be useful for development work of web-based surveys in the future.
8. Results - Profile and Key question responses
This section provides a basic summary of the responses obtained from the web-based pilot and
the characteristics of the on-line respondents; it also highlights some key question responses
within the OPN modules. Readers are reminded of the pilot objectives, as stated in section
4.0, when looking at the response analysis.
The OPN survey is routinely weighted to an overall population estimate using the person level
weights provided with the data, which are calibrated to a set of population constraints. It was
not desirable to use the same set of calibration totals on the internet data because the sample
size was too small to produce a robust set of results. Consequently, a reduced set of
calibration constraints was constructed for the internet data. The same set of constraints was
then applied to the OPN design weight in order to produce a new calibration weight for the
OPN data that was in accord with that used for the internet data. This ensured that any
differences arising between estimates produced by the two datasets were not caused by
spurious differences in weighting procedures.
The sample designs of the internet and OPN surveys are independent; hence, the standard error
of the difference between the estimators based on data from the respective surveys can be
obtained by combining the standard errors of the individual estimators. The latter are squared
and added up, the square root of the sum yields the standard error for the difference.
Significance tests were carried out using the ratio of the difference of the two estimates by their
estimated standard error as the test statistic. We assumed that the distribution of this ratio is
9. approximated by a t-distribution and hence used the t-test (two-sided with significance level
0.05).
8.1 Profile of Responders of the Web Pilot compared with the OPN face-to-face
Some clear (and statistically significant) differences were found in the profile of internet pilot
respondents compared with the face-to-face November OPN and key question responses.
When compared with the face-to-face OPN, internet pilot respondents were more likely to be:
- aged 25-44 and 55-64
- married and living with partner
- white
- better educated (with degree level qualifications)
- managers or supervisors
- in good health
It is possible to see a response bias emerging from individuals who responded on-line.
While it may be thought that younger age groups are more likely to complete on-line, the web
pilot does not support this as the youngest age group (age 16-24) was one of the age groups
with the lowest response rates. This may have been due to the approach we took in contacting
respondents, for example the advance letter to the household. However, overall, the profile of
the web pilot respondent is, as might be expected, slightly younger than that of the OPN face-
to-face respondent profile.
Chart 1
Age Profile of the Respondents
In terms of key question responses, when compared with the November OPN face-to-face
survey, the pilot respondents were less likely to:
- smoke
- have a disability
- think charities played an important role in society
0
5
10
15
20
25
16-
17
18-
19
20-
24
25-
29
30-
34
35-
39
40-
44
45-
49
50-
54
55-
64
65-
74
75 or
over
Age in Years
Nov 2008 %
Pilot %
10. - think HM Revenue and Customs treated them fairly
Also, respondents to the pilot were more likely to:
- eat healthily (5 portions of fruit or vegetables a day)
Again, all of these results were statistically significant.
The results of the opinion question modules produced some interesting results. As the pilot
respondents were less likely to think charities played an important role in society, or think HM
Revenue and Customs treated them fairly, this indicates that individuals might feel more
comfortable disclosing such opinions on-line as opposed to a face-to-face environment with a
Government field interviewer. More research could be developed in this area to confirm if the
differences in these results are a result of mode effects.
9. Next Steps for the Future of Internet Surveys within ONS
This pilot has demonstrated the viability of conducting a major ONS survey on-line and
achieving a significant, if minority, response. The limitations of this pilot are widely accepted
and much more time and resource within ONS is required to fully investigate the use of an
internet survey.
The web-based pilot has highlighted logistical and technological issues of designing and
administering a web survey and also reinforces the methodological issues highlighted in
section 2.0. For example, the translation of a questionnaire to on-line software may be
difficult due to its complexity, and issues with the sampling frame used in a web survey would
need to be investigated further.
The pilot has also highlighted the risk of significant bias in estimates that are produced using
data collected via an internet survey. Although the web pilot achieved a positive response rate,
there are differences in the characteristics of the respondents and the survey responses that
require much more investigation. Further pilot work within ONS will take forward the
findings of this initial pilot to examine, more robustly, the modal differences and respondent
selection issues that the initial pilot did not attempt to explore. The pilot has provided a solid
foundation for further work.
References
Dillman, D.A. (2007). Mail and Internet Surveys – The Tailored Design Method (2nd ed). J
Wiley and Sons:New Jersey
Göritz, A. (2006) Incentives in Web Studies: Methodological Issues and a Review.
International Journal of Internet Science 1(1) pp.58-70
Flatley, J. (2001). The Internet as a Mode of Data Collection in Government Social Surveys:
Issues and Investigation Social Survey Methodology Bulletin, ONS 49(7)
Fricker, R and Schonlau, M. (2002). Advantages and Disadvantages of Internet Research
Surveys: Evidence from the Literature. Field Methods 14(4)
Maher, M, Singer, E and Van Hoewyk, J. (2000). Experiments with Incentives in Telephone
Surveys. Public Opinion Quarterly 64. pp. 171-188
Solomon, D. (2001). Conducting Web-Based Surveys. Practical Assessment, Research and Evaluation 7(19)