Quentative research method

DBA6000

Quantitative
Business
Research
Methods

Rob J Hyndman

c Rob J Hyndman, 2008.

Professor Rob Hyndman
Department of Econometrics and Business Statistics
Monash University (Clayton campus)
VIC 3800.

Email: Rob.Hyndman@buseco.monash.edu.au
Telephone: (03) 9905 2358
www.robhyndman.info

Contents

Preface 5

1 Research design 9
1.1 Statistics in research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Organizing a quantitative research study . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Some quantitative research designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 The survey process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Appendix A: Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Data collection 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Data collecting instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Errors in statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Questionnaire design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Sampling schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7 Scale development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Appendix B: Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Data summary 53
3.1 Summarising categorical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Summarizing numerical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Summarising two numerical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4 Measures of reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 Computing and quantitative research 70
4.1 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Using a statistics package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 SPSS exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Signiﬁcance 77
5.1 Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3

5.2 Numerical differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Statistical models and regression 88
6.1 One numerical explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2 One categorical explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Several explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Comparing regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.5 Choosing regression variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.6 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.7 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Significance in regression 107
7.1 Statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 ANOVA tables and F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3 t-tests and confidence intervals for coefficients . . . . . . . . . . . . . . . . . . . . . . 108
7.4 Post-hoc tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.5 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8 Dimension reduction 112
8.1 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

9 Data analysis with a categorical response variable 119
9.1 Chi-squared test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.2 Logistic and multinomial regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.3 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

10 A survey of statistical methodology 124

11 Further methods 131
11.1 Classification and regression trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.2 Structural equation modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.3 Time series models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.4 Rank-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

12 Presenting quantitative research 135
12.1 Numerical tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
12.2 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Appendix: Good graphs for better business . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

13 Readings 145

DBA6000: Quantitative Business Research Methods 4

Preface

Subject convenor

Professor Rob J Hyndman
B.Sc.(Hons), Ph.D., A.Stat
Department of Econometrics and Business Statistics
Location: Room 671, Menzies Building, Clayton.
Phone: (03) 9905 2358
Email: Rob.Hyndman@buseco.monash.edu.au
WWW: http://www.robhyndman.info

Objectives

On completion of this subject, students should have:

• the necessary quantitative skills to conduct high quality independent research related to
business administration;
• comprehensive grounding in a number of quantitative methods of data production and
analysis;
• been introduced to quantitative data analysis through a practical research activity.

Synopsis

This unit considers the quantitative research methods used in studying business, management
and organizational analysis. Topics to be covered:

1. research design including experimental designs, observational studies, case studies, lon-
gitudinal analysis and cross-sectional analysis;
2. data collection including designing data collection instruments, sampling strategies and
assessing the appropriateness of archival data for a research purpose;
3. data analysis including graphical and numerical techniques for the exploration of large

5

Preface

data sets and a survey of advanced statistical methods for modelling the relationships
between variables;
4. communication of quantitative research; and
5. the use of statistical software packages such as SPSS in research.

The effective use of several quantitative research methods will be illustrated through reading
research papers drawn from several disciplines.

References

None of these are required texts—they provide useful background material if you want to read
further. Huck (2007) is excellent on interpreting statistical results in academic papers. Pallant
(2007) is very helpful when using SPSS and in giving advice on how to write up research results.
Use Wild and Seber (2000) if you need to brush up on your basic statistics; it contains lots of
helpful advice and interesting examples.

1. H UCK , S.W. (2007) Reading statistics and research. 5th ed., Allyn & Bacon: Boston, MA
2. PALLANT, J. (2007) SPSS survival manual, 3rd ed., Allen & Unwin.
3. DE VAUS , D. (2002) Analyzing social science data. SAGE Publications: London.
4. W ILD , C.J., & S EBER , G.A.F. (2000) Chance encounters: a ﬁrst course in data analysis and
inference. John Wiley & Sons: New York.

Timetable

17 July Introduction/Chapter 1
24 July Chapters 2
31 July Chapter 3
7 August Chapter 4 SPSS tutorial
14 August Chapter 5
21 August Chapter 6
28 August Chapter 7 SPSS tutorial
4 September Chapter 8–9 SPSS tutorial
11 September Chapter 10
18 September Chapter 11–12 First assignment due
25 September No class
2 October No class
9 October SPSS tutorial
16 October Oral presentations Second assignment due


Preface

Assessment
1. A written report presenting and critiquing a research paper which uses quantitative re-
search methods. 45%
• It can be a published research paper from a scholarly journal, or a company report.
It must contain substantial quantitative research. It must be approved in advance.
• Your report should include comments on the research questions addressed, the ap-
propriateness of the data used, how the data were collected, the method of analysis
chosen, and the conclusions drawn.
• Length: 4000–5000 words excluding tables and graphs.
• Due: 17 September
2. A written report presenting some original quantitative analysis of a suitable multivariate
data set. 45%
• You may use your own data, or use data that I will provide. The data set must
include at least four variables. It can be data from your workplace.
• Your report should include comments on the research questions addressed, the ap-
propriateness of the data used, how the data were collected, the method of analysis
chosen, and the conclusions drawn.
• You may use any statistical computing package or Excel for analysis.
• Length: 4000–5000 words excluding tables and graphs.
• Due: 15 October
3. A 20 minute oral presentation of one of the above reports. 10%.
• On either 8 or 15 October.

Assignment marking scheme

• Research questions addressed: 6%
• Appropriateness of data: 6%
• Data collection: 6%
• Description of statistical methods used: 6%
• Suitability of statistical methods: 6%
• Discussion of statistical results: 8%
• Conclusions (are they supported/valid?): 7%

Choosing a paper for Assignment 1

Choose something you are interested in. For example, it can be an article you are reading as
part of your other DBA studies or something you have read as part of your professional life.

The following journals contain some articles that would be suitable. There are also many others.

• Australian Journal of Management
• International Journal of Human Resource Management
• Journal of Advertising
• Journal of Applied Management Studies
• Journal of Management
• Journal of Management Accounting Research


Preface

• Journal of Management Development
• Journal of Managerial Issues
• Journal of Marketing
• Management Decision

You can obtain online copies for some of these via the Monash Voyager Catalogue. Hard copies
should be in the Monash library.

Things to look for:

• it should involve some substantial data analysis;
• it should involve more than summary statistics (e.g., a regression model, or some chi-
squared tests);
• it should not use sophisticated statistical methods that are beyond this subject (e.g., avoid
factor analysis and structural equation models).

All papers should be approved by Rob Hyndman before you begin work on the assignment.

Choosing a data set for Assignment 2

• Choose something you know about. The best data analyses involve a mix of good knowl-
edge of the data context as well as good use of statistical methodology.
• Don’t try to do too much. One response variable with 3–5 explanatory variables is usually
sufficient. Resist the temptation to write a long treatise!
• You will find it easier if the response variable is numeric. Analysing categorical response
variables with several explanatory variables can be tricky.
• Be clear about the purpose of your analysis. State some explicit objectives or hypotheses,
and address them via your statistical analysis.
• Think about what you include. A few well-chosen graphics that tell a story is better than
pages of computer output that mean very little.
• Start early. Even before we cover much methodology, you can do some basic data sum-
maries and think about the key questions you want to address.
• All data sets should be approved by Rob Hyndman before you begin work on the assign-
ment.

Readings

Most weeks we will read a case study from a research journal and discuss the analysis. Please
read these in advance. We will discuss them in the third hour. You cannot use a paper we
have discussed for your first assessment task. If you have a suggestion of a paper that may be
suitable for class discussion, please let me know.


CHAPTER
1
Research design

1.1 Statistics in research
“Statistics is the study of making sense of data.” Ott and Mendenhall
“The key principle of statistics is that the analysis of observations
doesn’t depend only on the observations but also on how they were
obtained.” Anonymous

• Data beat anecdotes “For example” proves nothing. (Hebrew proverb)
• Data beat intuition
“Belief is no substitute for arithmetic.” (Henry Spencer)
• Data beat “expert” opinion
“When information becomes unavailable, the expert comes into his own.” (A.J. Liebling)

1.1.1 Statistics answers questions using data

• Do pollutants cause asthma?
• Do transaction volumes on the stock market react to price changes?
• Does deregulation reduce unemployment?
• Does ﬂuoride reduce tooth decay?

A deﬁnition

Statistical Analysis: Mysterious, sometimes bizarre, manipulations performed upon the col-
lected data of an experiment in order to obscure the fact that the results have no generalizable
meaning for humanity. Commonly, computers are used, lending an additional aura of unreality
to the proceedings.
(Source unknown)

97.3% of all statistics are made up.

9

Part 1. Research design

1.1.2 Some statistics stories

The Challenger disaster

2
Number of O-rings damaged

1

0

55 60 65 70 75 80

Ambient temperature at launch

Charlie’s chooks
14
12
Y: Percentage mortality

10
8
6
4

0 20 40 60 80 100

X: Percentage Tegel birds



Risk factors for heart disease

A doctor wants to investigate who is most at risk for coronary-related deaths. He selects 12
patients at random from his clinic and records their age, blood pressure and drug used. He
also records whether they eventually died from heart disease or not.

Age BP Drug L/D
18 68 1 D
20 64 2 L
22 72 1 D
25 67 2 L
29 80 – D
33 70 – D
34 86 1 D
36 85 – D
37 73 2 L
39 82 – L
41 90 1 D
45 87 2 L

Drug Lived Died % lived
1 0 4 0%
2 4 0 100%
– 1 3 25%
5 7

Drug 1 looks bad, 2 looks good.



1.1.3 Causation and association

Smoking and Lung Cancer

There is a strong positive correlation between smoking and lung cancer. There are several
possible explanations.

• Causal hypothesis: Smoking causes lung cancer.
• Genetic hypothesis: There is a hereditary trait which predisposes people to both nicotine
addiction and lung cancer.
• Sloppy lifestyle hypothesis: Smoking is most prevalent amongst people who also drink
too much, don’t exercise, eat unhealthy food, etc.

Postnatal care

Mothers who return home from hospital soon after birth do better than those who stay in
hospital longer.

• Causation hypothesis: Hospital is harmful and/or home is helpful.
• Common response hypothesis: Mothers return home early because they are coping well.
• Confounding hypothesis: Mothers return home early if there is someone at home to help.

University applicants

Male Female Total
Accept 70 40 110
Reject 100 100 200
Total 170 140 310

Is there evidence of discrimination?

Course: Introduction to bean counting

Male Female Total
Accept 60 20 80
Reject 60 20 80
Total 120 40 160



Course: Advanced welding

Male Female Total
Accept 10 20 30
Reject 40 80 120
Total 50 100 150
This is an example of Simpson’s Paradox. Simpson’s
Paradox occurs when the association between variables is
reversed when data from several groups are combined.

Other examples of Simpsons’ paradox

• Average tax rate has increased with time even though rate in every income category has
decreased. Why?
• Ave. female salary of B.Sc. graduates is lower than ave. male salary. Why?

Causality or association?

1. A positive correlation between blood pressure and income is observed. Does
this indicate a causal connection?
2. In a survey in 1960, it was found that for 25–34 y.o. males there was a positive
correlation between years of school completed and height. Does going to
school longer make a man taller?
3. The same survey showed a negative correlation between age and educational
level for persons aged over 25. Why?
4. Students at fee paying private schools perform better on average in VCE than
students at government funded schools. Why?

Some subtle differences

• Distinguish between: causation & association, prediction & causation, prediction & ex-
planation.
• Note difference between deterministic and probabilistic causation.



1.2 Organizing a quantitative research study

As a quick check, ask the following questions

1. What is your hypothesis (your research question)?

2. What is already known about the problem (literature review)?

3. What sort of design is best suited to studying your hypothesis? (method)

4. What data will you collect to test your hypothesis? (sample)

5. How will you analyse these data? (data analysis)

6. What will you do with the results of the study? (communication)

These questions are broken down in more detail below. (These are mostly taken from Rubin et
al. (1990), and have also appeared in Balnaves and Caputi (2001).)

1.2.1 Hypothesis

• What is the goal of the research?
• What is the problem, issue, or critical focus to be researched?
• What are the important terms? What do they mean?
• What is the signiﬁcance of the problem?
• Do you want to test a theory?
• Do you want to extend a theory?
• Do you want to test competing theories?
• Do you want to test a method?
• Do you want to replicate a previous study?
• Do you want to correct previous research that was conducted in an inadequate manner?
• Do you want to resolve inconsistent results from earlier studies?
• Do you want to solve a practical problem?
• Do you want to add to the body of knowledge in another manner?

1.2.2 Review of literature

• What does previous research reveal about the problem?
• What is the theoretical framework for the investigation?
• Are there complementary or competing theoretical frameworks?
• What are the hypotheses and research questions that have emerged from the literature
review?



1.2.3 Method

• What methods or techniques will be used to collect the data? (This holds for applied and
non-applied research)
• What procedures will be used to apply the methods or techniques?
• What are the limitations of these methods?
• What factors will affect the study’s internal and external validity?
• Will any ethical principles be jeopardized?

1.2.4 Sample

• Who (what) will provide (constitute) the data for the research?
• What is the population being studied?
• Who will be the participants for the research?
• What sampling technique will be used?
• What materials and information are necessary to conduct the research?
• How will they be obtained?
• What special problems can be anticipated in acquiring needed materials and information?
• What are the limitations in the availability and reporting of materials and information?

1.2.5 Data analysis

• How will data be analysed?
• What statistics will be used?
• What criteria will be used to determine whether hypotheses are supported?
• What was discovered (about the goal, data, method, and data analysis) as a result of
doing preliminary work (if conducted)?

1.2.6 Communication

• How will the ﬁnal research report be organised? (Outline)
• What sources have you examined thus far that pertain to your study? (Reference list)
• What additional information does the reader need?
• What time frame (deadlines) have you established for collecting, analysing and present-
ing data? (Timetable)

1.3 Some quantitative research designs
• Case study: questionnaire, interview, observation. Best for exploratory work and hy-
pothesis generation. Limited quantitative analysis possible.
• Survey: questionnaire, interview, observation. Best if sample is random.
• Experiment: questionnaire, interview, observation. Best for demonstrating
causality.



1.3.1 Cross-sectional vs longitudinal analysis

All designs can be either cross-sectional or longitudinal.

• Cross-sectional design involves data collection for one time only.
• Longitudinal design involves successive data collection over a period of time. Necessary
if you want to study changes over time.

1.3.2 Case study designs

• involves intense involvement with a few cases rather than limited involvement with
many cases
• can’t generalize results easily
• useful in exploring ideas and generating hypotheses

1.3.3 Survey designs

• Most popular in business/management research
• useful when you cannot control the things you want to study
• difficult to get random and representative samples

1.3.4 Experimental designs

• requires control group to allow for the placebo effect
• requires the experimenter to control all variables other than the variable of interest
• requires randomization to groups
• allows causation to be tested

Which research design would you use?

Hypotheses:
1. Women believe they are better at managing than men.
2. Children who listen to poetry in early childhood make better progress in learn-
ing to read than those who do not.
3. A business will run more efficiently if no person is directly responsible for more
than five other people.
4. There are inherent advantages in businesses staying small.
5. Employees with postgraduate qualifications have shorter job expectancy than
employees without postgraduate qualifications.

What data would you collect in each case?



1.4 Data structure

1.4.1 Populations and samples

A population is the entire collection of ‘things’ in which we are interested. A sample is a subset of
a population. We wish to make an inference about a population of interest based on information
obtained from a sample from that population.

E XAMPLES :

• You measure the proﬁt/loss of 50 public hospitals in Victoria, randomly selected.
Population:
Sample:
Points of interest:
• Sales on 500 products from one company for the last 5 years are analysed.
Population:
Sample:
Points of interest:

1.4.2 Cases and variables

Think about your data in terms of cases and variables.

• A case is the unit about which you are taking measurements. E.g., a person, a business.
• A variable is a measurement taken on each case.
E.g., age, score on test, grade-level, income.

1.4.3 Types of Data

The ways of organizing, displaying and analysing data depends on the type of data we are
investigating.

• Categorical Data (also called nominal or qualitative)

e.g. sex, race, type of business, postcode
Averages don’t make sense. Ordered categories are called ordinal data

• Numerical Data (also called scale, interval and ratio)

e.g. income, test score, age, weight, temperature, time.
Averages make sense.

Note that we sometimes treat numerical data as categories. (e.g. three age groups.)



1.4.4 Response and explanatory variables

Response variable: measures the outcome of a study. Also called dependent variable.

Explanatory variable: attempts to explain the variation in the observed outcomes.
Also called independent variables.
Many statistical problems can be thought of in terms of a response
variable and one or more explanatory variables.

Sometimes the response variable is called the dependent variable and the explanatory variables
are called the independent variables.

• Study of profit/loss in Victorian hospitals.
Response variable:
Explanatory variables:

• Monthly sales of 500 products
Response variable:
Explanatory variables: competitor advertising.

1.5 The survey process
1. Planning a survey
State the objectives: In order to state the objectives we often need to ask questions such as:
• What is the survey’s exact purpose?
• What do we not know and want to know?
• What inferences do we need to draw?
Begin by developing a specific list of information needs. Then write focused survey ques-
tions.
2. Design the sampling procedure
Identify the target population: Whom are we drawing conclusions about?
Select a sampling scheme: Examples: simple random sampling, stratified random sampling,
systematic sampling, and cluster sampling.
3. Select a survey method
Decide how to collect the data: personal interviews, telephone interviews, mailed ques-
tionnaires, diaries, . . .
4. Develop the questionnaire
Write the questionnaire. Decide on the wording, types of questions, and other issues.
5. Pretest the questionnaire
Select a very small sample from the sampling frame. Conduct the survey and see what
goes wrong. Correct any problems before carrying out the full-scale study.
6. Conduct the survey
Run the survey in an efficient and time effective manner.
7. Analyze the data
Gather the results and determine outcomes.



Appendix A: Case studies

Injury management in NSW

Four injury management pilots (IMP) running during 2001:

• private hospitals and nursing homes within NSW;

• all industry groups within the Central West NSW region;

• two insurance companies (QBE and EML).

We wish to do a statistical comparison of the injury management pilots with the current stan-
dard injury management arrangements.

Performance measures

• incidence of specific payment types
• duration of claims
• number of claims
• proportion of claimants in receipt of weekly benefits at 4, 8, 13 and 26 weeks.
• costs for claimants at 4, 8, 13 and 26 weeks.
– medical, rehabilitation, physiotherapy, chiropractic
– weekly-benefits
• timeliness
– number of days from injury to agent notification
– number of days from injury to first payment

Some potential driving variables

• age
• gender
• injury type
• agency (e.g., powered tools)
• severity of injury
• medical interventions
• employer size
• insuring agency
• weekly pay at time of injury
• industry (ANZSIC code)
• occupation (ASCO code)

• Driving variables affect the performance measures.
• Variations between groups in key driver variables can induce apparent differences be-
tween groups. This is then confused with any real differences due to the programs being
evaluated.
• Therefore any comparisons of groups of employees should either eliminate the effect of
drivers or try to measure the effect of the drivers.



The ideal design!

Ideally, we would use a randomized control trial. This eliminates the effect of driving vari-
ables.

• The control group would be employees on the old IM system.
• The treatment group would be employees in the new IMP.
• Employees would be randomly allocated to the two groups.
• Statistical comparisons between the two groups would show differences between the old
IM system and the new IMP.
• This random allocation would prevent any systematic differences between those in the
IMP and those not in the IMP.
• Such a scheme is impracticable.

The actual design

We have to use pseudo-control groups and eliminate differences between the control and IMP
groups using statistical models.

• All injuries within the speciﬁed industry group, geographical region or insurer will be
subject to the new IMP during 2001.
• The pseudo-controls will be the equivalent groups of employees in 2000 who are not
subject to the new IMP.

Problem of confounding

• If there are differences between the IMP and the control, is it due to the different IM
program or the different group?

Solution:

• adjust for as many driving variables as possible;
• compare similar groups not subject to the IMP.

Comparisons undertaken

IMP group: Private hospitals/nursing homes in NSW 2001
Pseudo-control: Private hospitals/nursing homes 2000

IMP group: Central West NSW region 2001
Pseudo-controls: Central West NSW region 2000

IMP group: Insurance company 2001
Pseudo-control: Insurance company 2000

Non-IMP group: Comparable industry group 2001
Pseudo-controls: Comparable industry group 2000

Non-IMP group: Comparable NSW region 2001
Pseudo-controls: Comparable NSW region 2000



We do not directly compare:

• private hospitals/nursing homes with other industry groups;
• Central West NSW region with other geographical regions.

Instead, we compare the change between 2000 and 2001 in each industry group and each geo-
graphical region.

How to interpret the results. . .

• If all 2001 groups are different from the 2000 groups after taking into account all drivers,
then it is likely there are changes between years not reﬂected in the drivers. We won’t be
able to attribute any changes to the IMP.

• If all IMP 2001 groups are different from the 2000 groups after taking into account all drivers,
but the non-IMP 2001 groups are not different from the 2000 groups, then it is likely the
changes between years are due to the IMP.



Needlestick injuries

You are interested in the number and severity of needle stick injuries amongst health workers
involved in blood donation and transfusion. Work in groups of three to carefully deﬁne the
objectives of your survey. You will need to specify

• the objective of the survey
• what data are to be collected
• the target population
• the survey population
• the sample
• the data collection method
• potential errors which could occur in your survey.

Palliative care referrals

A few years ago, I helped the Health Department with a survey on palliative care. As part
of the study, it was necessary to study the ‘referral’ pattern for palliative care providers: how
many patients they send to hospital (for inpatient or outpatient treatment); how many they
refer to consultants for specialist comment; how many to community health programs; and so
on.

Possible sampling schemes:

1. sample a group of palliative care practitioners and study their referral patterns;
2. sample a group of palliative care patients and study their referral patterns.

Discuss the possible advantages and disadvantages of the two schemes.


CHAPTER
2
Data collection

2.1 Introduction

“You don’t have to eat the whole ox to know that the meat is tough.”
Samuel Johnson

Sampling is very familiar to all of us, because we often reach conclusions about phenomena
on the basis of a sample of such phenomena. You may test a swimming pool’s temperature by
dipping your toe in the water or the performance of a new vehicle by a short test drive. These
are among the countless small samples that we rely on when making personal decisions. We
tend to use haphazard methods in picking our sample and risk substantial sampling error.

Research also usually reaches its conclusions on the basis of sampling, but the methods used
must adhere to certain rules that are going to be discussed. The goal in obtaining data through
survey sampling is to use a sample to make precise inferences about the target population. We
want to be highly conﬁdent about our inferences. It is important to have a substantial grasp
of sampling theory to appraise the reliability and validity of the conclusions drawn from the
sample taken.

2.2 Data collecting instruments

The choice of data collection instrument is crucial to the success of the survey. When deter-
mining an appropriate data collection method, many factors need to be taken into account,
including complexity or sensitivity of the topic, response rate required, time or money avail-
able for the survey and the population that is to be targeted. Some of the most common data
collection methods are described in the following sections.

23

Part 2. Data collection

2.2.1 Interviewer enumerated surveys

Interviewer enumerated surveys involve a trained interviewer going to the potential respon-
dent, asking the questions and recording the responses.

The advantages of using this methodology are:
• provides better data quality
• special questioning techniques can be used
• greater rapport established with the respondent
• allows more complex issues to be included
• produces higher response rates
• more flexibility in explaining things to respondents
• greater success in dealing with language problems

The disadvantages of using this methodology are:
• expensive to conduct
• training for interviewers is required
• more intrusive for the respondent
• interviewer bias may become a source of error

2.2.2 Web surveys

Web surveys are increasingly popular, although care must be taken to avoid sample selection
bias and multiple responses from an individual.

The advantages of this methodology are:
• cheap to administer
• private and confidential
• easy to use conditional questions and to prompt if no response or inappropriate response.
• can build in live checking.
• can provide multiple language versions

The disadvantages of this methodology are:
• respondent bias may become a source of error
• not everyone has access to the internet
• language and interface must be very simple
• cannot build up a rapport with respondents
• resolution of queries is difficult
• only appropriate when straight forward data can be collected

2.2.3 Mail surveys

Self-enumeration mail surveys are where the questionnaire is left with the respondent to com-
plete.




• cheaper to administer
• more private and confidential
• in some cases does not require interviewers

• difficult to follow-up non-response
• respondent bias may become a source of error
• response rates are much lower
• language must be very simple
• problems with poor English and literacy skills
• cannot build up a rapport with respondents
• resolution of queries is difficult
• only appropriate when straight forward data can be collected

2.2.4 Telephone surveys

A telephone survey is the process where a potential respondent is phoned and asked the survey
questions over the phone.

• cheap to administer
• convenient for interviewers and respondents

• interviews easily terminated by respondent
• cannot use prompt cards to provide alternatives for answers
• burden placed on interviewers and respondents
• biased sample through households with phones

2.2.5 Diaries

Diaries can be used as a format for a survey. In these surveys respondents are directed to record
the required information over a predetermined period in the diary, book or booklet supplied.

• high quality and detailed data from the completed diaries
• more private and confidential circumstances for the respondent
• does not require interviewers

• response rates are lower and the diaries are rarely completed well
• language must be simple
• can only include relatively simple concepts
• cannot build up a rapport
• cannot explain the purpose of survey items to respondents



Face-to-face Telephone Mail
Response rates Good Good Good

Representative samples
Avoidance or refusal bias Good Good Poor
Control over who completes the questionnaire Good Good Satisfactory
Gaining access to the selected person Satisfactory Good Good
Locating the selected person Satisfactory Good Good

Effects on questionnaire design
Ability to handle:
Long questionnaires Good Satisfactory Satisfactory
Complex questions Good Poor Satisfactory
Boring questions Good Satisfactory Poor
Item non-response Good Good Satisfactory
Filter questions Good Good Satisfactory
Question sequence control Good Good Poor
Open ended questions Good Good Poor

Quality of answers
Minimize socially desirable responses Poor Satisfactory Good
Ability to avoid distortion due to
Interviewer characteristics Poor Satisfactory Good
Interviewer opinions Satisfactory Satisfactory Good
Inﬂuence of other people Satisfactory Good Poor
Allows opportunities to consult Satisfactory Poor Good
Avoids subversion Poor Satisfactory Good

Implementing the survey
Ease of ﬁnding suitable staff Poor Good Good
Speed Poor Good Satisfactory
Cost Poor Satisfactory Good

Table 2.1: Advantages and disadvantages of three methods of data collection. Table taken from de Vaus
(2001) who adapted it from Dillman (1978).

2.2.6 Ideas for increasing response rates

1. Provide reward
2. Systematic follow up
3. Keep it short.
4. Interesting topic.



2.2.7 Archival data

Rather than collecting your own data, you may use some existing data. If you do, keep the
following points in mind.

Available information Is there sufficient documentation of the original research proposal for
which the data were collected? If not, there may be hidden problems in re-using the data.

Geographical area Are the data relevant to the geographical area you are studying? e.g., what
country, city, state or other area does the archive data cover?

Time period Are the data relevant to the time period you are studying? Does your research
area cover recent events, or is it historical or does it look at changes over a specified range
of time? Most data are at least a year old before they are released to the public.

Population What population do you wish to study? This can refer to a group or groups of
people, particular events, official records, etc. In addition you should consider whether
you will look at a specific sample or subset of people, events, records, etc.

Context Does the archival data contain the information relevant to your research area?

2.3 Errors in statistical data

In sample surveys there are two types of error that can occur:

• sampling error which arises as only a part of the population is used to represent the whole
population and;
• non-sampling error which can occur at any stage of a sample survey.

It is important to be aware of these errors so that they can be minimized.

2.3.1 Sampling error

Sampling error is the error we make in selecting samples that are not representative of the
population. Since it is practically impossible for a smaller segment of a population to be exactly
representative of the population, some degree of sampling error will be present whenever we
select a sample. It is important to consider sampling error when publishing survey results as
it gives an indication of the accuracy of the estimate and therefore reflects the importance that
can be placed on interpretations.

If sampling principles are carefully applied within the constraints of available resources, sam-
pling error can be accurately measured and kept to a minimum. Sampling error is affected
by:

• sample size
• variability within the population
• sampling scheme



Generally larger sample sizes decrease sampling error. To halve the sampling error the sample
size has to be increased fourfold. In fact, sampling error can be completely eliminated by
increasing the sample size to include every element in the population.

The population variability also affects the error, more variable populations give rise to larger
errors as the samples or estimates calculated from different samples are more likely to have
greater variation. The effect of the variability within the population can be reduced by increas-
ing sample size to make it more representative of the target population.

2.3.2 Non-sampling error

Non-sampling error can be defined as those errors in a survey that are not sampling errors.
Non-sampling error is any error not caused by the fact that we have only selected part of
the population in the survey. Even if we were to undertake a complete enumeration of the
population, non-sampling errors might remain. In fact, as the size of the sample increases, the
non-sampling errors may get larger, because of such factors as possible increase in the response
rate, interviewer errors, and data processing errors.

For the most part we cannot measure the effect that non-sampling errors will have on the re-
sults. Because of their nature, these errors may not be totally eliminated. Perhaps the biggest
source of non-sampling error is a poorly designed questionnaire. The questionnaire can in-
fluence the response rate achieved in the survey, the quality of responses obtained and conse-
quently the conclusions drawn from survey results.

Some common sources of non-sampling error are discussed in the following paragraphs.

Target Population
Failure to identify clearly who is to be surveyed. This can result in an inadequate sam-
pling frame; imprecise definitions of concepts and poor coverage rules.
Non-response
A non-response error occurs when the respondents do not reflect the sampling frame.
This could occur when the people who do not respond to the survey differ to the people
who did respond to the survey. This often occurs in voluntary response polls. For ex-
ample, suppose that in an air bag study we asked respondents to call a 0018 number to
be interviewed. Because a 0018 call cost $2 per minute, many drivers may not respond.
Furthermore, those who do respond may be the people who have had bad experiences
with air bags. Thus the final sample of respondents may not even represent the sampling
frame.
For example,
• telephone polls miss those people without phones
• household surveys miss homeless, prisoners, students in colleges, etc.
• train surveys only target public transport users and tend to include regular public
transport users.



Manufacturers and advertising agencies often use interviews at shopping malls to
gather information about the habits of consumers and the effectiveness of ads. A
sample of mall shoppers is fast and cheap. “Mall interviewing is being propelled
primarily as a budget issue”, one expert told the New York Times. But people con-
tacted at shopping malls are not representative of the entire population. They are
richer, for example, and more likely to be teenagers or retired. Moreover, mall inter-
viewers tend to select neat safe looking individuals from the stream of customers.
Decisions based on mall interviews may not reflect the preferences of all consumers.

In 1991 it was claimed that data showed that right-handed persons live on average
almost a decade longer than left-handed or ambidextrous persons. The investigators
had compared mean ages at death of people who appeared to be survivors as left,
right or mixed handed.
• What is the problem?
The questionnaire
Poorly designed questionnaires with mistakes in wording, content or layout may make it
difficult to record accurate answers. The most effective methods of designing a question-
naire are discussed in Section 2.4. If these principles are followed it will help reduce the
non-sampling error associated with the questionnaire.
Interviewers
If an interviewer is used to administer the survey, their work has the potential to produce
non-sampling error. This can be due to the personal characteristics of the interviewer.
For example, an elderly person will often be more comfortable giving information to a
female interviewer. Other factors which could cause error are the interviewer’s opinions
and characteristics which may influence the respondent’s answers.

In 1968, one year after a major racial disturbance in Detroit, a sample of black resi-
dents was asked:
Do you personally feel that you can trust most white people, some white people,
or none at all?
Of those interviewed by whites, 35% answered “Most”, while only 7% of those in-
terviewed by blacks gave this answer. Many questions were asked in this study.
Only on some topics, particularly black-white trust or hostility, did the race of the
interviewer have a strong effect on the answers given. The interviewer was a large
source of non-sample error in this study.

Respondents
Respondents can also be a source of non-sampling error. They may refuse to answer ques-
tions, or provide inaccurate information to protect themselves. They may have memory
lapses and/or lack of motivation to answer the questionnaire, particularly if the ques-
tionnaire is lengthy, overly complicated or of a sensitive nature. Respondent fatigue is a
very important factor.

Social desirability bias refers to the effect where respondents will provide answers which
they think are more acceptable, or which they think the interviewer wants to hear. For
example, respondents may state that they have a higher income than is actually the case
if they feel this will increase their status.



Respondents may refuse to answer a question which they find embarrassing or choose
a response which prevents them from continuing with the questions. For example, if
asked the question: “Are you taking oral contraceptive pills for any reason?”, and know-
ing that if they respond “Yes” they will be asked for more details, respondents who are
embarrassed by the question are likely to answer “No”, even if this is incorrect.

Fatigue can be a problem in surveys which require a high level of commitment for respon-
dents. The level of accuracy and detail supplied may decrease as respondents become
tired of recording all information. Sometimes interviewer fatigue can also be a problem,
particularly when the interviewers have a large number of interviews to conduct.

Processing and collection
Processing and collection errors can be a source of non-sampling error. For example,
the results from the survey may be entered incorrectly . The time of year the survey is
enumerated can produce non-sampling error. For example, if the survey is conducted in
the school holidays, potential respondents with school children could possibly be away
or hard to contact.

The Shere Hite surveys

In 1987, Shere Hite published a best-selling book called Women and Love. The author distributed
100,000 questionnaires through various women’s groups, asking questions about love, sex, and
relations between women and men. She based her book on the 4.5% of questionnaires that were
returned.

• 95% said they were unhappily married
• 91% of those who were divorced said that they had initiated the divorce

What are the problems with this research?

Exercise 1: In Case 2, it was necessary to study the ‘referral’ pattern for palliative
care providers: how many patients they send to hospital (for inpatient or out-
patient treatment); how many they refer to consultants for specialist comment;
how many to community health programs; and so on. Two alternative sam-
pling schemes are available: sample a group of palliative care practitioners
and study their referral patterns; or sample a group of palliative care patients
and study their referral patterns. Discuss the possible advantages and disad-
vantages of the two schemes.

2.4 Questionnaire design

2.4.1 Introduction

The purpose of a questionnaire is to obtain specific information with tolerable accuracy and
completeness. Before the questionnaire is designed, the collection objectives should be defined.
These include:



• clarifying the objectives of the survey
• determining who is to be interviewed
• defining the content
• justifying the content
• prioritizing the data that are to be collected. This is important as it makes it easier to
discard items if the survey, once developed, is too lengthy.

Careful consideration should be given to the content, wording and format of the questionnaire
as one of the largest sources of non-sampling error is poor questionnaire design. This error can
be minimized by considering the objectives of the survey and the required output, and then
devising a list of questions that will accurately obtain the information required.

2.4.2 Content of the questionnaire

Relevant questions

It is important to ask only questions that are directly related to the objectives of a survey as a
means of minimizing the burden place on respondents. The concept of a fatigue point, which oc-
curs when respondents can no longer be bothered answering questions, should be recognized,
and questions designed so that the respondent is through the form before this point is reached.

Towards the end of long questionnaires, respondents may give less thought to their answers
and concentrate less on the instructions and questions, thereby decreasing the accuracy of in-
formation they provide. Very long questionnaires can also lead the respondent to refuse to
complete the questionnaire. Hence it is necessary to ensure only relevant questions are asked.

Reliable questions

It is important to include questions in a questionnaire that can be easily answered. This objec-
tive can be achieved by adhering to the following techniques.

Appropriate recall If information is requested by recall, the events should be sufficiently recent
or familiar to respondents. People tend to remember what they should have done, have
selective memories, and move into reference period activities which surround the event.
Minimizing the need for recall improves the accuracy of response.

Common reference periods To make it easier for the respondent to answer, use reference periods
which match those of the respondent’s records.

Results justify efforts The amount of effort to which a respondent goes to obtain the data must
be worth it. It is reasonable to accept a respondent’s estimate when calculating the exact
figures would make little difference to the outcome.

Filtering Respondents should not be asked question they cannot answer. Filter questions should
be asked to exclude respondents from irrelevant questions.



2.4.3 Types of questions

Factual questions
Information is required from these questions rather than an opinion. For example respon-
dents could be asked about behaviour patterns (e.g., When did you last visit a General
Practitioner?).

Classification or demographic questions
These are used to gain a profile of the population that has been surveyed and provide
important data for analysis.

Opinion questions
Rather than facts, these questions seek opinion. There are many problems associated with
opinion questions:

• a respondent may not have an opinion/attitude towards the subject so the response
may be provided without much thought;
• opinion questions are very sensitive to changes in wording;
• it is impossible to check the validity of responses to opinion questions.

Hypothetical questions
The “What would you do if . . . ?” type of question. The problems with these questions
are similar to opinion questions. You can never be certain how valid any answer to a
hypothetical is likely to be.

2.4.4 Answer formats

Questions can generally be classified as one of two types, open or closed, depending on the
amount of freedom allowed in answering the question. When deciding which type of question
to use, consideration should be given to the kind of information sought, ease of processing the
response, and the availability of the resources of time, money, and personnel.

Open questions

Open questions allow the respondents to answer the question in their own words. These ques-
tions allow as many possible answers and they can collect exact values from a wide range of
possible values. Hence, open questions are used when the list of responses is very long and not
obvious.

The major disadvantage of open questions is they are far more demanding than closed ques-
tions both to answer and process. These questions are most commonly used where a wide
range of responses is expected. Also, the answers to these questions depend on the respon-
dents ability to write or speak as much as their knowledge. Two respondents might have the
same knowledge and opinions, but their answers may seem different because of their varying
abilities.



Question Format

Which country makes the best cars Open ended
...............................................

Which country makes the best cars? Multiple choice questions
1. USA 2. Germany 3. Japan

Which country makes the best cars? Partially closed questions
1. USA 2. Germany 3. Japan
4. Other (please specify)

For the list provided, indicate which brand/s of Checklist questions
cars you have owned?
1. Ford 2. Toyota 3. BMW

I believe Japanese cars are less reliable than Likert scale (opinion) questions
European cars.
Strongly Agree Agree No opinion Disagree Strongly disagree
1 2 3 4 5

Closed questions

Closed questions ask the respondents to choose an answer from the alternatives provided.
These questions should be used when the full range of responses is known. Closed questions
are far easier to process than open questions. The main disadvantage of closed questions is the
reasons behind a particular selection cannot be determined.

There are a number of types of closed questions.

• Limited choice questions require the respondent to choose one of two mutually exclusive
answers. For example yes/no.
• Multiple choice questions require the respondent to choose from a number of responses
provided.
• Checklist questions allow a respondent to choose more than one of the responses pro-
vided.
• Partially closed questions provide a list of alternatives where the last alternative is “Other,
please specify”. These questions are useful when it is difﬁcult to list all possible choices.
• Opinion (Likert) scale An opinion scale question seeks to locate a respondent’s opin-
ion on a rating scale with a limited number of points. For example, a ﬁve point scale
measure of strong and weak attitudes would ask the respondent whether they strongly
agree/agree/are neutral/disagree/strongly disagree with a particular statement of opin-



ion. Whereas a three point scale would only measure whether they agree, disagree or are
neutral. Opinion scales of this sort are called Likert scales.
Five point scales are best because:
–
–
–

Response Categories

When questions have categories provided, it is important that every response is catered for.

Number of Categories
The quality of the data can be influenced if there are too few categories as the respondent
may have difficulty finding one which accurately describes their situation. If there are too
many categories the respondent may also have difficulty finding one which accurately
describes their situation.

Don’t Know A ‘Don’t Know’ category can be included so respondents are not forced to make
decisions/attitudes that they would not normally make. Excluding the option is not usu-
ally good, however, it is hard to predict the effect of including it. The decision of whether
or not to include a ‘Don’t Know’ option depends, to a large extent, on the subject matter.
I was gifted to be able to answer promptly, and I did. I said I didn’t know.
Mark Twain, Life on the Mountain

2.4.5 Wording of questions

Language

Questions which employ complex or technical language or jargon can confuse or irritate re-
spondents. Respondents who do not understand the question may be unwilling to appear
ignorant by asking the interviewer to explain the question or if a interviewer is not present,
may not answer or answer incorrectly.

Ambiguity

If ambiguous words or phrases are included in a question, the meaning may be interpreted
differently by different people. This will introduce errors in the data since different respondents
will virtually be answering different questions.

For example “Why did you fly to New Zealand on Qantas airlines?”. Most might interpret
this question as was intended, but it contains three possible questions, so the response might
concern any of these:

• I flew (rather than another mode of travel) because . . .
• I went to New Zealand because . . .
• I selected Qantas because . . .



Double-barreled questions

When one question contains two concepts, it is known as a double-barreled question. For
example , “How often do you go grocery shopping and do you enjoy it?”.

Each concept in the question may have a different answer, or one concept may not be relevant,
respondents may be unsure how to respond. The interpretation of the answers to these ques-
tions is almost impossible. Double-barreled questions should be split into two or more separate
questions.

Leading questions

Questions which lead respondents to answers can introduce error. For example, the question
“How many days did you work last week?”, if asked without first determining whether re-
spondents did in fact take work in the previous week, is a leading question. It implies that
the person would have been at work. Respondents may answer incorrectly to avoid telling the
interviewer that they were not working.

Unbalanced questions
“Are you in favour of euthanasia?” is an unbalanced question because is provides only one al-
ternative. It can be reworded to ‘Do you favour or not favour euthanasia?’, to give respondents
more than one alternative.
Similarly, the use of a persuasive tone can affect the respondent’s answers. Wording should be
chosen carefully to avoid a tone that may produce bias in responses.

Recall/memory error
Respondents tend to remember what should have been done rather that what was done. The
quality of data collected from recall questions is influenced by the importance of the event to
the respondent and the length of time since the event took place. Subjects of greater interest or
importance to the respondent, or events which happen infrequently, will be remembered over
longer periods and more accurately. Minimizing the recall period also helps to reduce memory
bias.
Telescoping is a specific type of memory error. This occurs if the respondent reports events
as occurring either earlier or later than they actually occur. Error occurs when respondents
included details of an event which actually occurred outside the specified reference period.

Sensitive questions
Questions on topics which respondents may see as embarrassing or highly sensitive can pro-
duce inaccurate answers. If respondents are required to answer questions with information
that might seem socially undesirable, they may provide the interviewer with responses they
believe are more ‘acceptable’. If placed at the being of the questionnaire, it could lead to non-
response if respondents are unwilling to continue with the remaining questions.
For example, “Approximately how many cans of beer do you consume each week, on aver-
age?”
1. None



2. 1–3 cans
3. 4–6 cans
4. More than 6
A respondent might answer response 2 or 3 rather than admit to consuming the greatest quan-
tity on the scale. Consider extending the range of choices far beyond what is expected. The
respondent can select an answer closer to the middle and feel more in the normal range.

In 1980, the New York Times CBS News Poll asked a random sample of Americans
about abortion. When asked “Do you think there should be an amendment to the
Constitution prohibiting abortions, or should not there be such an amendment?”
29% were in favour and 62% were opposed. The rest of the sample were uncer-
tain. The same people were later asked a different question: “Do you believe there
should be an amendment to the Constitution protecting the life of the unborn child,
or should not there be such an amendment?” Now 50% were in favour and only
39% were opposed.

Acquiescence

This situation arises when there is a long series of questions for which respondents answer
with the same response category. Respondents get used to providing the same answer and
may answer inaccurately.

2.4.6 Questionnaire format

Including an introduction

It can be advantageous to include an introductory statement or explanation at the beginning of
a survey. The introduction may included such information as the purpose of the survey or the
scope of collection. It will aid the respondent when answering the questions if they know why
the information is being sought. The respondent should be given a context in which to frame
his or her answers. An assurance of confidentiality will provide respondents with confidence
that the results will not be obtained by unwanted parties.

Question and page numbers

To ensure that the questionnaire can be easily administered by interviewer or respondents, the
pages of the questionnaire and the questions should be number consecutively with a simple
numbering system. Question numbering is a way of providing sign-posts along the way. They
help if remedial action is required later, and you want to refer the interviewer or respondent
back to a particular place.

Sequencing

The questions in a questionnaire should follow an order which is logical and smoothly flows
from one question to the next. The questionnaire layout should have the following character-
istics.



Related questions grouped
Questions which are related should be grouped together and where necessary placed into
sections. Sections should contain an introductory heading or statement.

If possible, question ordering should try and anticipate the order in which respondents
will supply information. It shows good survey design if a question not only prompts an
answer but also prompts an answer to a question following shortly.

Question ordering
It is important to be aware that earlier questions can influence the responses of later ques-
tions, so the order of questions should be carefully decided. In attitudinal questions, it
is important to avoid conditioning respondents in an early question which could then
bias their responses to later questions. For example, you should ask about awareness of
a concept before any other mention of the concept.

Respondent motivation

Whenever possible, start the questionnaire with easy and pleasant questions to promote inter-
est in the survey and give the respondent confidence in their ability to complete the survey.
The opening questions should ensure that the particular respondent is a member of the survey
population.

Questions that are perceived as irritating or obtrusive tend to get a low response rate and
may effectively trigger a refusal from the respondent. These questions need to be carefully
positioned in a questionnaire where they are least likely to be sensitive.

It is also important that respondents are only asked relevant questions. Respondents may be-
come annoyed and disinterested if this does not occur. Include filter questions to direct re-
spondents to skip to questions which do not apply to them. Filter questions often identify
sub-populations. For example,

“Do you usually speak English at home?” Yes (Go to Q34)
No (Go to Q10)

Questionnaire layout

The questionnaire layout should be aesthetically pleasing, so the layout does not contribute to
respondent fatigue. Things that can interfere with the answering of a questionnaire are: unclear
instructions and questions, insufficient space to provide answers, hard-to-read text, difficulty
in understanding language, back-tracking through the form. Many of these things are bad form
design and are avoidable.

Only include essentials on the questionnaire form. Keep the amount of ink on the form to the
minimum necessary for the form to work properly. Anything that is not necessary contributes
to the fatigue point of the respondent and to the subsequent detriment of the data quality.



General layout

Consistency of layout: If consistency and logical patterns are introduced into the form design, it
eases the form filler’s task. Patterns that can be useful are:

• white spaces for responses
• using the same question type throughout the form
• using the same layout throughout the form
• using a different style, consistently, for instructions or directions.

Type Size: A font size between 10 and 12 is considered the best in most circumstances. If the
respondent does not have perfect vision, or ideal working conditions, small fonts can
cause problems.

Use of all upper-case text: It is best to avoid upper case text. Upper case text has been shown to
be hard to read, especially where large amounts of text are involved. Words lose their
shape when in upper case, becoming converted to rectangles. Text in upper case should
be left for use for titles or for emphasis but, this can often be done just as well using other
methods, such as bold, italics, or slightly larger type size.

Line length: As the eye has a clear focus range of only a few degrees, lines should be kept short.
It takes the eyeball several eye movements to scan a line of text. If more than 2 or 3 such
movement occur then the eye can become fatigued. There is a tendency for the eye to lose
track of which line it is reading. This leads to backtracking the text or misinterpretation.

Character and line spacing: It is very important to leave enough space on a form for answers. It
has been shown in research that forms requiring hand written responses need a distance
of 7–8mm between lines and a 4–5mm width for each possible character.

Response layout

Obtaining responses: A popular way of obtaining responses is using tick boxes. However, it is
usually preferable to use a labelled list (e.g., a, b, c, . . . ) and ask respondents to circle their
response. This makes coding and data entry easier.

If a written response is required it is best to provide empty answer spaces, with lines
made up of dots.

Positioning of responses: Vertical alignment of responses is preferred to horizontal alignment. It
is easier to read up and down the list, and select the correct box, than read across the page
and locate an item in a horizontal string. Captions to the left of the answer box are easier
for respondents to complete.

Order of response options: The consideration of the order of responses is important as the order
can be a source of bias. The options presented first may be selected because they make
an impact on respondents or because respondents lose concentration and do not hear or
read the remaining options. The last options may be chosen because it was easily recalled,
particularly if respondents are faced with a long list of options. Long or complex response
options may also make recall more difficult and increase the effects due to the order of



response options.

Prompt card: If the questionnaire is interviewer based, and a number of response options are
given for some questions, then a prompt card may be appropriate. A prompt card is a list
of possible responses to a question, displayed on a separate card which are shown by the
interviewer to assist respondents. This helps to decrease error resulting from respondents
being unable to remember all the options read out. However respondents with poor
eyesight, migrants with limited English or adults with literacy problems will experience
difficulties in answering accurately.

Exercise 2: (Case 2) The questionnaire on pages 47–48 was an early draft of the
questionnaire prepared by the client. The questionnaire on pages 49–51 is a
later draft of the questionnaire after I had provided the client with some advice.
See if you can determine why each of the changes has been made. How could
you further improve the questionnaire?

2.4.7 Pretesting the questionnaire

A pretest of a questionnaire should be considered mandatory. Although the designer of the
questionnaire would have reviewed the drafted questionnaire meticulously on all points of
good design, it is still likely to contain faults. Normally, a number of these emerge when the
form is used in the field, because the researcher did not completely anticipate what would take
place. The only way that these faults may be fully detected is by actually administering the
survey with the types of respondents who would be sampled in the study.

Each type of testing is used at a different stage of survey development and aims to test different
aspects of the survey.

Skirmishing
Skirmishing is the process of informally testing questionnaire design with groups of re-
spondents. The questionnaire is basically unstructured and is tested with a group of
people who can provide feedback on issues such as each question’s frame of reference,
the level of knowledge needed to answer the questions, the range of likely answers to
questions and how answers are formulated by respondents. Skirmishing is also used to
detect flaws or awkward wording of questionnaires as well as testing alternative designs.
At this stage we may use open-ended response categories to work-out likely responses.
The questionnaire should be redrafted after skirmishing.

Focus groups
A skirmish tests the questionnaire design against general respondents whilst focus groups
concentrate on a specific audience. For example, a survey studying the effects of living
on unemployment benefits could have a group of unemployed people as a focus group.

A focus group can be used to test questions directed at small sub-populations. For ex-
ample if we were looking at community services we may have a filter question to target
disabled people. Since there may not be many disabled chosen in the sample, we need to
test the questions on a focus group of disabled people, which is a biased sample.



Observational studies
Respondents complete a draft questionnaire in the presence of an observer during an
observational study. Whilst completing the form the respondents explain their under-
standing of the questions and the method required in providing the information. These
studies can be a means of identifying problem questions through observations, questions
asked by the respondents, or the time taken to complete a particular question. Data avail-
ability and the most appropriate person to supply the information can also be gauged
through observational studies. The form is being tested and not the respondent and this
should be stressed to the respondent.

Pilot testing
Pilot testing involves formally testing a questionnaire or survey with a small represen-
tative sample of respondents. Semi-closed questions are usually used in pilot testing to
gather a range of likely responses which are used to develop a more highly structured
questionnaire with closed questions. Pilot testing is used to identify any problems asso-
ciated with the form, such as questionnaire format, length, question wording and allows
comparison of alternative versions of a questionnaire.

2.5 Data processing

Data processing involves translating the answers on a questionnaire into a form that can be
manipulated to produce statistics. In general, this involves coding, editing, data entry, and
monitoring the whole data processing procedure. The main aim of checking the various stages
of data processing is to produce a ﬁle of data that is as error free as possible.

2.5.1 Data coding

Up to this point, the questionnaire has been considered mainly as a means of communication
with the respondent. Just as important, the questionnaire is a working document for the trans-
fer of data on to a computer ﬁle. Consequently it is important to design the questionnaire to
facilitate data entry.

Unless all the questions on a questionnaire are “closed” questions, some degree of coding is
required before the survey data can be sent for punching. The appropriate codes should be de-
vised before the questionnaires are processed, and are usually based on the results of pretesting.

Coding consists of labelling the responses to questions (using numerical or alphabetic codes) in
order to facilitate data entry and manipulation. Codes should be formulated to be simple and
easy. For example if Question 1 has four responses then those four responses could be given
the codes a, b, c, and d. The advantage of coding is the simplistic storage of data as a few-digit
code compared to lengthy alphabetical descriptions which almost certainly will not be easy to
categorize.

Coding is relatively expensive in terms of resource effort. However, improvements are always
being sought by developing automated techniques to cover this task. Other options include the
use of self coding where respondents answer the appropriate code or the interviewer performs



the coding task.

Before the interviewing begins, the coding frame for most questions can be devised. That is, the
likely responses are obvious from previous similar surveys or thorough pilot testing, allowing
those responses and relevant codes to be printed on the questionnaire. An “Other (Please
Specify)” answer code is often added to the end of a question with space for interviewers to
write the answer. The standard instruction to interviewers in doubt about any precodes is that
they should write the answers on the questionnaire in full so that they can be dealt with by a
coder later.

2.5.2 Data entry

Ensure that the questionnaire is designed so data entry personnel have minimal handling of
pages. For example, all codes should be on the left (or right) hand side of the page. It is
advisable to use trained data entry people to enter the data. It is quicker and more reliable and
therefore more cost effective.

2.6 Sampling schemes

When you have a clear idea of the aims of the survey and the data requirements, the degree of
accuracy required, and have considered the resources and time available, you are in a position
to make a decision about the size and the form of collection of sampling units.

The two qualities most desired in a sample (besides that of providing the appropriate ﬁndings),
are its representativeness and stability. Sample units may be selected in a variety of ways. The
sampling schemes fall into two general types: probability and non-probability methods.

2.6.1 Non-probability samples

If the probability of selection for each unit is unknown, or cannot be calculated, the sample is
called a non-probability sample. For non-probability samples, since there is no control over rep-
resentativeness of the sample, it is not possible to accurately evaluate the precision of estimates
(i.e., closeness of estimates under repeated sampling of the same size). However, where time
and ﬁnancial constraints make probability sampling infeasible, or where knowing the level of
accuracy in the results is not an important consideration, non-probability samples do have a
role to play. Non-probability samples are inexpensive, easy to run and no frame is required.
This form of sampling is popular amongst market researchers and political pollsters as a lot of
their surveys are based on a pre-determined sample of respondents of certain categories.

One common method of non-probability sampling is voluntary response polling. A general
appeal is made (often via television) for people to contact the researcher with their opinion.
Voluntary response samples are rarely useful because they over-represent people with strong
opinions, most often negative opinion.



2.6.2 Probability sampling schemes

Probability sampling schemes are those in which the population elements have a known chance
of being selected for inclusion in a sample. Probability sampling rigorously adheres to a pre-
cisely specified system that permits no arbitrary or biased selection. There are four main types
of probability sampling schemes.

Simple Random Sample: If a sample size of size n is drawn from a population of size N in
such a way that every possible sample of size n has the sample chance of being selected,
the sampling procedure is called simple random sampling. The sample thus obtained
is called a simple random sample. This is the simplest form of probability sample to
analyse.

Stratified Random Sample: A stratified random sample is one obtained by separating the pop-
ulation elements into non-overlapping groups, called strata, and then selecting a simple
random sample from each stratum. This can be useful when a population is naturally
divided into several groups. If the results on each stratum vary greatly, then it is possi-
ble to obtain more efficient estimators (and therefore more precise results) than would be
possible without stratification.

Systematic Sample: A sample obtained by randomly selecting one element from the first k el-
ements in the frame and every kth element thereafter is called a 1-in-k systematic sample,
with a random start. This is obviously a simple method if there is a list of elements in
the frame. Systematic sampling will provide better results than simple random sampling
when the systematic sample has larger variance than the population. This can occur when
the frame is ordered.

Cluster Sample: A cluster sample is a probability sample in which each sampling unit is a
collection, or cluster, of elements. The population is divided into clusters and one or
more of the clusters is chosen at random and sampled. Sometimes the entire cluster is
sampled; on other occasions a simple random sample of the chosen clusters is taken.
Cluster sampling is usually done for administrative convenience, and is especially useful
if the population has a hierarchical structure.

A comparison of these four sampling schemes appears in the table on the following page.

Example (Case 2): A few years ago, I advised the Department of Health and Com-
munity Services on a survey of palliative care patients in Victoria.
Objective: To estimate the proportion of palliative care patients in Vic-
torian hospitals.
Difficulties: What is a “palliative care patient”? Proportion of what?
Target population: Patients in acute beds at the time of the survey?
Survey population: All patients in acute beds in Victorian hospitals except for
very small (< 10 bed) country hospitals.
Sampling scheme: Stratified (hospital types) and clustered (hospitals). Ran-
dom selection of hospitals within each strata. Total cover-
age of patients in the selected hospitals.
Sample: All patients in the 18 hospitals selected out of 115 hospitals
in Victoria.



Scheme How to select sample Strengths/Weaknesses

Simple Random Assign numbers to elements
• The basic building block
Sample in sampling. Use a random
• Simple, but often costly.
number table or random
• Cannot use unless we can
number generator to select
assign a number to each
sample.
element in a target
population.

Stratified Sample Divide population into
• With proper strata, can
groups that are similar
produce very accurate
within and different between
estimates.
on the variable of interest.
• Less costly than simple
Use random numbers to
random sampling.
select the sample from each
• Must stratify target
stratum.
population correctly.

Systematic Sample Select every kth element
• Produces very accurate
from a list after a random
estimates when elements
start.
in a population exhibit
order.
• Used when simple
random or stratified
sampling is impractical:
e.g., the population size is
not known.
• Simplifies the selection
process.
• Do not use with periodic
populations.

Cluster Sample Randomly choose clusters
• With proper clusters, can
and sample all elements
produce very accurate
within each cluster.
estimates.
• Useful when sampling
frame unavailable or
travel costs high.
• Must cluster target
population correctly.


Quentative research method

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a Quentative research method

Semelhante a Quentative research method (20)

Mais de Marketing Utopia

Mais de Marketing Utopia (20)

Último

Último (20)

Quentative research method