Introduction to Web Survey Usability Design and Testing

Introduction to Web Survey Usability
Design and Testing
DC-AAPOR Workshop

Amy Anderson Riemer
Jennifer Romano Bergstrom
The views expressed on statistical or methodological issues are those of the presenters
and not necessarily those of the U.S. Census Bureau.

Schedule
9:00 – 9:15 Introduction & Objectives
9:15 – 11:45 Web Survey Design: Desktop &
Mobile
11:45 – 12:45 Lunch
12:45 – 2:30 Assessing Your Survey
2:30 – 2:45 Break
2:45 – 3:30 Mixed Modes Data Quality
3:30 – 4:00 Wrap Up

2

Objectives
Web Survey Design: Desktop & Assessing Your Survey
Mobile
• Paging vs. Scrolling • Paradata
• Navigation
• Scrolling lists vs. double-banked • Usability
response options
• Edits & Input fields
• Checkboxes & Radio buttons Quality of Mixed Modes
• Instructions & Help
• Graphics • Mixed Mode Surveys
• Emphasizing Text & White Space
• Authentication
• Response Rates
• Progress Indicators • Mode Choice
• Consistency

3

Web Survey Design


Activity #1
1. Today’s date
2. How long did it took you to get to BLS today?
3. What do you think about the BLS entrance?

5

Why is Design Important?
• No interviewer present to correct/advise
• Visual presentation affects responses
– (Couper’s activity)
• While the Internet provides many ways to
enhance surveys, design tools may be misused

6

• Respondents extract meaning from how
question and response options are displayed
• Design may distract from or interfere with
responses
• Design may affect data quality

7


8
http://www.cc.gatech.edu/gvu/user_surveys/

• Many surveys are long (> 30min)
• Long surveys have higher nonresponse rates
• Length affects quality

Adams & Darwin, 1982; Dillman et al., 1993;
9
Haberlein & Baumgartner, 1978

• Respondents are more tech savvy today and
use multiple technologies
• It is not just about reducing respondent
burden and nonresponse
• We must increase engagement
• High-quality design = trust in the designer

Adams & Darwin, 1982; Dillman et al., 1993;
10
Haberlein & Baumgartner, 1978

http://www.pewinternet.org/Static-Pages/Trend-
11
Data-(Adults)/Device-Ownership.aspx

http://www.pewinternet.org/Static-Pages/Trend-
12
Data-(Adults)/Device-Ownership.aspx

http://www.nielsen.com/content/dam/corporate/us
/en/reports-downloads/2012-Reports/Nielsen- 13
Multi-Screen-Media-Report-May-2012.pdf

http://www.nielsen.com/content/dam/corporate/us
/en/reports-downloads/2012-Reports/Nielsen- 14
Multi-Screen-Media-Report-May-2012.pdf

Nielsen: The Cross-Platform Report, Quarter 2, 15
2012-US

UX Design Failure
• Poor planning
• “It’s all about me.” (Redish: filing cabinets)
• Human cognitive limitations
– Memory & Perception
– (fun activity time)

UX Design Failure
• Poor planning
• “It’s all about me.” (Redish: filing cabinets)
• Human cognitive limitations
– Memory & Perception
– (fun activity time)
- Primacy - Chunking
- Recency - Patterns

Web Survey Design
• Paging vs. Scrolling • Graphics
• Navigation • Emphasizing Text
• Scrolling vs. Double- • White Space
Banked • Authentication
• Edits and Input Fields • Progress Indicators
• Checkboxes and • Consistency
Radio Buttons
• Instructions and Help

23

Paging vs. Scrolling
Paging Scrolling
• Multiple questions per page • All on one static page
• Complex skip patterns • No data is saved until
submitted at end
• Not restricted to one item
– Can lose all data
per screen
• Respondent can
• Data from each page saved review/change responses
– Can be suspended/resumed • Questions can be answered
• Order of responding can be out of order
controlled • Similar look-and-feel as
• Requires more mouse clicks paper

24

Paging vs. Scrolling
• Little advantage (breakoffs, nonresponse,
time, straightlining) of one over the other
• Mixed approach may be best
• Choice should be driven by content and target
audience
– Scrolling for short surveys with few skip patterns;
respondent needs to see previous responses
– Paging for long surveys with intricate skip
patterns; questions should be answered in order

Couper, 2001; Gonyea, 2007; Peytchev, 2006;
25
Vehovar, 2000

Web Survey Design
Radio Buttons

26

Navigation
• In a paging survey, after entering a response
– Proceed to next page
– Return to previous page (sometimes)
– Quit or stop
– Launch separate page with Help, definitions, etc.

27

Navigation: NP
• Next should be on the left
– Reduces the amount of time to move cursor to
primary navigation button
– Frequency of use

Couper, 2008; Dillman et al., 2009; Faulkner,
28
1998; Koyani et al., 2004; Wroblewski, 2008

Navigation NP Example

Peytchev & Peytcheva, 2011
29

Navigation: PN
• Previous should be on the left
– Web application order
– Everyday devices
– Logical reading order

30

Navigation PN Example

31


32


33


34

Navigation Usability Study/Experiment

Romano & Chen, 2011
35

Method
• Lab-based usability study
• TA read introduction and left letter on desk
• Separate rooms
• R read letter and logged in to survey
• Think Aloud
• Eye Tracking
• Satisfaction Questionnaire
• Debriefing
Romano & Chen, 2011
36

Results: Satisfaction I

* p < 0.0001

Romano & Chen, 2011
37

8.5
Results: Satisfaction II
8.5
Mean Satisfaction

8

Mean Satisfaction
8
7.5
Rating

Rating
7.5
7 7
6.5 6.5
6 6
Mean N_P PN Mean N_P PN
Overall reaction to the survey: Information displayed on the screens:
terrible – wonderful. p < 0.05. inadequate – adequate. p = 0.07.
8.5
8.5
Mean Satisfaction

Mean Satisfaction
8
8
7.5
Rating

Rating
7.5
7 7
6.5 6.5
6 6
Mean N_P PN Mean N_P PN
Arrangement of information on the screens: Forward navigation:
illogical – logical. p = 0.19. impossible – easy. p = 0.13.

Romano & Chen, 2011
38

Eye Tracking

• Participants looked at Previous and Next in PN conditions
• Many participants looked at Previous in the N_P conditions
–
39 Couper et al. (2011): Previous gets used more when it is on the right.

N_P vs. PN: Respondent Debriefing
• N_P version
– Counterintuitive
– Don’t like the “buttons being flipped.”
– Next on the left is “really irritating.”
– Order is “opposite of what most people would
design.”
• PN version
– “Pretty standard, like what you typically see.”
– The location is “logical.”

Romano & Chen, 2011
40

Navigation Alternative
• Previous below Next
– Buttons can be closer
– But what about older adults?
– What about on mobile?

Couper et al., 2011; Wroblewski, 2008
41

Navigation Alternative
• Previous below Next
– Buttons can be closer
– But what about older adults?
– What about on mobile?

Couper et al., 2011; Wroblewski, 2008
42

Navigation Alternative: Large primary
navigation button; secondary smaller

43

Navigation Alternative: No
back/previous option

44

Confusing Navigation

45

Web Survey Design
Radio Buttons

46

Long List of Response Options
• One column: Scrolling
– Visually appear to belong to one group
– When there are two columns, 2nd one may not be
seen (Smyth et al., 1997)
• Two columns: Double banked
– No scrolling
– See all options at once
– Appears shorter

47

1 Column vs. 2 Column Study

Romano & Chen, 2011
48

Seconds to First Fixation
25

20

15
first half
* p < 0.01
10 second half

5

0
2 column 1 column

Romano & Chen, 2011
49

Total Number of Fixations
40

35

30

25

20 first half
15 second half

10

5

0
2 column 1 column

Romano & Chen, 2011
50

Time to Complete Item
120

100

80
Seconds

60 1 col
2 col
40

20

0
Mean Min Max

Romano & Chen, 2011
51

1 Col. vs. 2 Col.: Debriefing
• 25 had a preference
– 6 preferred one column
• They had received the one-column version
– 19 preferred 2 columns
• 7 had received the one-column version
• Prefer not to scroll
• Want to see and compare everything at once
• It is easier to “look through,” to scan, to read
• Re one column, “How long is this list going to be?”

Romano & Chen, 2011
52

Long Lists
• Consider breaking list into smaller questions
• Consider series of yes/no questions
• Use logical order or randomize
• If using double-banked, do not separate
columns widely

53

Web Survey Design
Radio Buttons

54

Input Fields Activity

55

Input Fields
• Smaller text boxes = more restricted
• Larger text boxes = less restricted
– Encourage longer responses
• Visual/Verbal Miscommunication
– Visual may indicate “Write a story”
– Verbal may indicate “Write a number”
• What do you want to allow?

56

Types of Open-Ended Responses
• Narrative
– E.g., Describe…
• Short verbal responses
– E.g., What was your occupation?
• Single word/phrase responses
– E.g., Country of residence
• Frequency/Numeric response
– E.g., How many times…
• Formatted number/verbal
– E.g., Telephone number

57

Open-Ended Responses: Narrative
• Avoid vertical scrolling when possible
• Always avoid horizontal scrolling

58

Open-Ended Responses: Narrative
• Avoid vertical scrolling when possible
• Always avoid horizontal scrolling

32.8 characters 38.4 characters

~700 Rs
Wells et al., 2012
59

Open-Ended Responses: Numeric
• Is there a better way?

60

• Is there a better way?

61

• Use of templates reduces ill-formed responses
– E.g., $_________.00

Couper et al., 2009; Fuchs, 2007
62

Open-Ended Responses: Date
• Not a good use: intended response will always
be the same format
• Same for state, zip code, etc.
• Note
– “Month” = text
– “mm/yyyy” = #s

63

Web Survey Design
Radio Buttons

64

Check Boxes and Radio Buttons
• Perceived Affordances
• Design according to existing conventions and
expectations
• What are the conventions?

65

Check Boxes: Select all that apply

66

Check Boxes in drop-down menus

67

Radio Buttons: Select only one

68

Radio Buttons: Select only one

69

Radio Buttons: In grids

70

Radio Buttons on mobile
• Would something
else be better?

71

Reducing Options
• What is necessary?

72

Web Survey Design
Radio Buttons

73

Placement of Instructions
• Place them near
the item
• “Don’t make me
think”
• Are they
necessary?

74

• Place them near
the item
think”
• Are they
necessary?

75

• Place them near
the item
think”
• Are they
necessary?

76

Instructions
• Key info in first 2 sentences
• People skim
– Rule of 2s: Key info in first two
paragraphs, sentences, words

77

Placement of Clarifying Instructions
• Help respondents have the same
interpretation
• Definitions, instructions, examples

Conrad & Schober, 2000; Conrad et al., 2006;
Conrad et al., 2007; Martin, 2002; Schober & 80
Conrad, 1997; Tourangeau et al., 2010


81
Redline, 2013

• Percentage of valid responses was higher with
clarification
• Longer response time when before item
• No effects of changing the font style
• Before item is better than after
• Asking a series of questions is best

82
Redline, 2013

Placement of Help
• People are less likely to use help when they
have to click than when it is near item
• “Don’t make me think”

83

Placement of Error Message
• Should be near the item
• Should be positive and helpful, suggesting
HOW to help
• Bad error message:

84

Placement of Error Message
• Should be near the item
• Should be positive and helpful, suggesting
HOW to help
• Bad error message:

85

Error Message Across Devices

86

Error Message Across Devices

87

Web Survey Design
Radio Buttons

88

Graphics
• Improve motivation, engagement, satisfaction
with “fun”
• Decrease nonresponse & measurement error
• Improve data quality
• Gamification

89
Henning, 2012; Manfreda et al., 2002

Graphics
• Use when they supply meaning
– Survey about advertisements
• Use when user experience is improved
– For children or video-game players
– For low literacy

90
Libman, 2012

Graphics

http://glittle.org/smiley-slider/

http://www.ollie.net.nz/casestudies/smiley_slider/

92

Graphics Experiment 1.1
• Appearance
– Decreasing boldness (bold  faded)
– Increasing boldness (faded  bold)
– Adding face symbols to response options (  )
• ~ 2400 respondents
• Rated satisfaction re health-related things
• 5-pt scale: very satisfied  very dissatisfied

93
Medway & Tourangeau, 2011

• Bold side selected more
Very Somewhat Somewhat Very
satisfied satisfied Neutral dissatisfied dissatisfied
Your physician O O O O O

• Less satisfaction when face symbols present
Very Very
satisfied dissatisfied

 
Your physician O O O O O

94
Medway & Tourangeau, 2011

• Appearance
– Radio buttons
– Face symbols (    )
• ~ 1000 respondents
• Rated satisfaction with a journal
• 6-pt scale: very dissatisfied  very satisfied

95
Emde & Fuchs, 2011

• Faces were equivalent to radio buttons
• Respondents were more attentive when faces
were present
– Time to respond

96
Emde & Fuchs, 2011

Slider Usability Study

• Participants thought 1 was selected and did not move the slider. 0 was actually
selected if they did not respond.
97
Strohl, Romano Bergstrom & Krulikowski, 2012

• Modified the visual design of survey items
– Increase novelty and interest on select items
– Other items were standard
• ~ 100 respondents in experimental condition
• ~ 1200 in control
• Questions about military perceptions and
media usage
• Variety of question types
98
Gibson, Luchman & Romano Bergstrom, 2013

• No differences

99

• Slight differences:
– Those with enhanced version skipped more often
– Those in standard responded more negatively.

100

• Slight differences:
– Those with enhanced version skipped more often

101

• No major differences

102

Graphics Considerations
• Mixed results
• “Ad blindness”
• Internet speed and
download time
• Unintended meaning

103


104


105


106

Web Survey Design
Radio Buttons

107

Emphasizing Text
• Font
– Never underline plain text
– Never use red for plain text
– Use bold and italics sparingly

108

Emphasizing Text

109

Emphasizing Text

110

Emphasizing Text
• Hypertext
– Use meaningful
words and phrases
– Be specific
– Avoid “more” and
“click here.”

111

Web Survey Design
Radio Buttons

112

White Space
• White space on a page
• Differentiates sections
• Don’t overdo it

113

Web Survey Design
Radio Buttons

115

Authentication
• Ensures respondent is the selected person
• Prevents entry by those not selected
• Prevents multiple entries by selected
respondent

116

Authentication
• Passive
– ID and password embedded in URL
• Active
– E-mail entry
– ID and password entry
• Avoid ambiguous passwords (Couper et al., 2001)
– E.g., contains 1, l, 0, o
• Security concerns can be an issue
• Don’t make it more difficult than it needs to be

117

Web Survey Design
Radio Buttons

119

Progress Indicators
• Reduce breakoffs
• Reduce burden by displaying length of survey
• Enhance motivation and visual feedback
• Not needed in scrolling design
• Little evidence of benefit

Couper et al., 2001; Crawford et al., 2001;
Conrad et al., 2003, 2005; Sakshaug & 120
Crawford, 2009

Progress Indicators: At the bottom

121

Progress Indicators: At the top

122

Progress Indicators: Mobile

123

Progress Indicators
• They should provide meaning

124

Web Survey Design
Radio Buttons

125

Consistency
• Predictable
– User can anticipate what the system will do
• Dependable
– System fulfills user’s expectations
• Habit-forming
– System encourages behavior
• Transferable
– Habits in one context can transfer to another
• Natural
– Consistent with user’s knowledge

126

Inconsistency

129

Assessing Your Survey


Assessing Your Survey
Paradata Usability
• Usability vs. User
• Background Experience
• Why, When, What?
• Uses of Paradata by • Methods
mode – Focus Groups, In-Depth
Interviews
• Paradata issues – Ethnographic
Observations, Diary Studies
– Usability & Cognitive
Testing
• Lab, Remote, In-the-Field
• Obstacles

Types of Data
• Survey Data – collected information from R’s
• Metadata – data that describes the survey
– Codebook
– Description of the project/survey
• Paradata – data about the process of
answering the survey at the R level
• Auxiliary/Administrative Data – not collected
directly, but acquired from external sources

Paradata
• Term coined by Mick Couper
– Originally described data that were by-products of
computer-assisted interviewing
– Expanded to include data from other self-
administered modes
• Main uses:
– Adaptive / Responsive design
– Nonresponse adjustment
– Measurement error identification

Total Survey Error Framework

Groves et al. 2004; Groves & Lyberg 2010

TSE Framework & Paradata

Krueter, 2012

Adaptive / Responsive Design
• Create process indicators
• Real-time monitoring (charts & “dashboards”)
• Adjust resources during data collection to achieve
higher response rate and/or cost savings
• Goal:
– Achieve high response rates in a cost-effective way
– Introduce methods to recruit uncooperative – and
possibly different – sample members (reducing
nonresponse bias)

Nonresponse Adjustment
• Decreasing response rates have encouraged
researchers to look at other sources of
information to learn about nonrespondents
– Doorstep interactions
– Interviewer observations
– Contact history data

Contact History Instrument (CHI)
• CHI developed by the U.S. Census Bureau
(Bates, 2003)
• Interviewers take time after each attempt (refusal
or non-contact) to answer questions in the CHI
• Use CHI information to create models (i.e., heat
maps) to identify optimal contact time
• Typically a quick set of questions to answer

• European Social Survey uses a standard contact
form (Stoop et al., 2003)

Contact History Inventory (CHI)
U.S. Census Bureau CHI

Paradata
• Background information about Paradata
• Uses of Paradata by mode
• Paradata issues

Uses of Paradata by Mode
• CAPI
• CATI
• Web
• Mail
• Post-hoc

Uses of Paradata - CAPI
• Information collected can include:
– Interviewer time spent calling sampled households
– Time driving to sample areas
– Time conversing with household members
– Interview time
– GPS coordinates (tablets/mobile devices)
• Information can be used to:
– Inform cost-quality decisions (Kreuter, 2009)
– Develop cost per contact
– Predict the likelihood of response by using interviewer
observations of the response unit (Groves & Couper, 1998)
– Monitor interviewers and identify any falsification

Uses of Paradata - CATI
– Call transaction history (record of each attempt)
– Contact rates
– Sequence of contact attempts & contact rates
• Information can be used to:
– Optimize call back times
– Interviewer monitoring
– Inform a responsive design

Uses of Paradata - Web
• Server-side vs. client-side
– Device information (i.e., browser type, operating
system, screen resolution, detection of JavaScript
or Flash)
– Questionnaire navigation information

Callegaro, 2012

Web Paradata - Server-side
• Page requests or “visits” to a web page from
the web server
• Identify device information and monitor
survey completion

Web Paradata - Server-side cont.
• Typology of response behaviors in web surveys
1. Complete responders
2. Unit non-responders
3. Answering drop-outs
4. Lurkers
5. Lurking drop-outs
6. Item non-responders
7. Item non-responding drop-outs

Bosnjak, 2001

Web Paradata – Client-Side
• Collected on the R’s computer
• Logs each “meaningful” action
• Heerwegh (2003) developed code / guidance for
client-side paradata collected using Java-Script
– Clicking on a radio button
– Clicking and selecting a response option in a drop-
down box
– Clicking a check box (checking / unchecking)
– Writing text in an input field
– Clicking a hyperlink
– Submitting the page

Web Paradata – Client-Side cont.
• Stern (2008) used Heerwegh’s paradata
techniques to identify:
– Whether R’s changed answers; what direction
– The order that questions are answered when more
than one are displayed on the screen
– Response latencies – the time that elapsed between
when the screen loaded on the R’s computer and they
submitted an answer
• Heerwegh (2003) found that the longer the response
time, the greater the probability of changing answers and an
incorrect response

Browser Information / Operating
System Information
• Programmers use this information to ensure they are
developing the optimal design
• Desktop, laptop, smartphone, tablet, or other device
• Sood (2011) found a correlation between browser type
and survey breakoff & number of missing items
– Completion rates for older browsers were lower
– Using browser type as a proxy for age of device and
possible connection speed
– Older browsers were more likely to display survey
incorrectly; possible explanation for higher drop-out rates

JavaScript & Flash
• Helps to understand what the R can see and do in
a survey
• JavaScript adds functionality such as question
validations, auto-calculations, interactive help
– 2% or less of computer users have JavaScript disabled
(Zakas, 2010)
• Flash is used for question types such as drag &
drop or slide-bar questions
– Without Flash installed, R’s may not see the question

Questionnaire Navigation Paradata
• Mouse clicks/coordinates
– Captured with JavaScript
– Excessive movements can indicate
• An issue with the question
• Potential for lower quality
• Changing answers
– Can indicate potential confusion with a question
– Paradata can capture answers that were erased
– Changes more frequent for opinion question than
factual questions

Stieger & Reips, 2010

Questionnaire Navigation Paradata cont.

• Order of answering
– When multiple questions are displayed on a
screen
– Can indicate how respondents read the questions
• Movement through the questionnaire
(forward and back)
– Unusual patterns can indicate confusion and a
possible issue with the questionnaire (i.e., poor
question order)

• Number of prompts/error messages/data
validation messages
• Quality Index (Haraldsen, 2005)

• Goal is to decrease number of activated errors by
improving the visual design and clarity of the
questions

• Clicks on non-question links
– Help, FAQs, etc.
– Indication of when and where Rs use help or other
information built into the survey and displayed as a
link
• Last question answered before dropping out
– Helps to determine if the data collected can be
classified as complete, partial, or breakoff
– Used for response rate computation
– Peytchev (2009) analyzed breakoff by question type
• Open ended increased break-off chances by 2.5x; long
questions by 3x; slider bars by 5x; introductory screens by
2.6x

• Time per screen / time latency
– Attitude strength
– Response uncertainty
– Response error
• Examples
– Heerwegh (2003)
• R’s with weaker attitudes take more time in answering
survey questions than R’s with stronger attitudes
– Yan and Tourangeu (2008)
• Higher-educated R’s respond faster than lower-educated R’s
• Younger R’s respond faster than older R’s.

Uses of Paradata – Call Centers
• Self-administered (mail or electronic) surveys
• Call transaction history software
– Incoming calls
• Date and time: useful for hiring, staffing, and workflow
decisions
• Purpose of the call
– Content issue: useful for identifying problematic questions or
support information
– Technical issue: useful for identifying usability issues or system
problems
• Call outcome: type of assistance provided

Paradata Issues
• Reliability of data collected
• Costs
• Privacy and Ethical Issues

Reliability of data collected
• Interviewers can erroneously record housing
unit characteristics, misjudge features about
respondents & fail to record a contact attempt
• Web surveys can fail to load properly, and
client-side paradata fails to be captured
• Recordings of interviewers can be unusable
(e.g., background noise, loose microphones)

Casas-Cardero 2010; Sinibaldi 2010; West 2010

Paradata costs
• Data storage – very large files
• Instrument performance
• Development within systems
• Analysis

Privacy and Ethical Issues
• IP addresses along with e-mail address or
other information can be used to identify a
respondent
• This information needs to be protected

Paradata Activity
• Should the respondent be informed that the
organization is capturing paradata?
• If so, how should that be communicated?

Privacy and Ethical Issues cont.
• Singer & Couper asked members of the Dutch
Longitudinal Internet Studies for the Social
Sciences (LISS) panel at the end of the survey
if they could collect paradata – 38.4% agreed
• Asked before the survey – 63.4% agreed
• Evidence that asking permission to use
paradata might make R’s less willing to
participate in a survey

Couper & Singer, 2011

Privacy and Ethical Issues cont.
• Reasons for failing to inform R’s about
paradata or get their consent
– Concept of paradata is unfamiliar and difficult for
R’s to grasp
– R’s associate it with the activities of advertisers,
hackers or phishers
– Asking for consent gives it more salience
– Difficult to convey benefits of paradata for the R

Usability Assessment
• Usability vs. User Experience
• Why, When, What?
• Methods
• Focus Groups, In-Depth Interviews
• Ethnographic Observations, Diary Studies
• Usability and Cognitive Testing
• Lab, Remote, In-the-Field
• Obstacles

Background Knowledge
• What does usability mean to you?
• Have you been involved in usability research?
• How is “user experience” different from
“usability?”

Usability vs. User Experience
• Usability: “the extent to which a product can
be used by specified users to achieve specified
goals with effectiveness, efficiency, and
satisfaction in a specified context of use.” ISO
9241-11
• Usability.gov
• User experience includes emotions, needs and
perceptions.

Understanding Users
Whitney’s 5 E’s of Usability Peter’s User Experience Honeycomb

The 5 Es to Understanding Users (W. Quesenbery):
http://www.wqusability.com/articles/getting-started.html
User Experience Design (P. Morville):
http://semanticstudios.com/publications/semantics/000029.php

Measuring the UX
• How does it work for the end user?
• What does the user expect?
• How does it make the user feel?
• What is the user’s story and habits?
• What is the user’s needs?

What people do on the Web

Krug, S. Don’t Make Me Think

Why is Testing Important?
• Put it in the hands of the users.
• Things may seem straightforward to you but
maybe not to your users.

Why is Testing Important?
• Put it in the hands of the users.
• Things may seem straightforward to you but
maybe not to your users.
• You might have overlooked something big!

When to test

Test Test

Test Final Test
with Concept Prototype with
users
Test Test Product users

Test

What can be tested?
• Existing surveys
• Low-fidelity prototypes
– Paper mockups or mockups on computer
– Basic idea is there but not functionality or
graphical look
• High-fidelity prototypes
– As close as possible to final interface in look and
feel

Methods to Understand Users
ensure users
understand
Ethnographi Usability can use
interactions in
products
c Testing natural
efficiently & w
Observation environment
satisfaction

assess assess
User ensure content
Linguistic emotions, per interactional
Cognitive Experience is understood ceptions, and motivations
Testing Research Analysis as intended reactions and goals

Focus randomly
Groups and discuss users’
sample the
Surveys perceptions
In-Depth population of
and reactions
Interviews interest

Method Assessme
nt

Focus Groups
• Structured script
• Moderator discusses the survey
with actual or typical users
– Actual usage of survey
– Workflow beyond survey
– Expectations and opinions
– Desire for new features and
functionality
• Benefit of participants
stimulating conversations, but
risk of “group think”

In-Depth Interviews
• Structured or unstructured
• Talk one-on-one with
users, in person or
remotely
– Actual usage of the survey
– Workflow beyond survey
– Expectations and opinions
– Desire for new features and
functionality

Ethnographic Observations
• Observe users in home, office
or any place that is “real-
world.”
• Observer is embedded in the
user’s culture.
• Allows conversation & activity
to evolve naturally, with
minimum interference.
• Observe settings and artifacts
(other real-world objects).
• Focused on context and
meaning making.

Diaries/Journals
• Users are given a journal or a web site to
complete on a regular basis (often daily).
• They record how/when they used the survey,
what they did, and what their perceptions were.
• User-defined data
• Feedback/responses develop and change over time
• Insight into how technology is used “on-the-go.”
• There is often a daily set of structured questions
and/or free-form comments.

Diaries/Journals
• Users are given a journal or a web site to
complete on a regular basis (often daily).
• They record how/when they used the
survey, what they did, and what their perceptions
were.
• User-defined data
• Feedback/responses develop and change over time
• Insight into how technology is used “on-the-go.”
• There is often a daily set of structured questions
and/or free-form comments.

Usability Testing
•Participants respond to survey items
•Assess interface flow and design
• Understanding
• Confusion
• Expectations
•Ensure skip intricate response patterns work as
intended
•Can test final product or early prototypes

Cognitive Testing
•Participants respond to survey items
•Assess text
• Confusion
• Understanding
• Thought process
•Ensure questions are understood as intended
and resulting data is valid
•Proper formatting is not necessary.

Usability vs. Cognitive Testing
Usability Testing Metrics Cognitive Testing Metrics
• Accuracy • Accuracy
• In completing item/ survey • Of interpretations
• Number/severity of errors • Verbalizations
• Efficiency
• Time to complete item/survey
• Path to complete item/survey
• Satisfaction
• Item-based
• Survey-based
• Verbalizations

Moderating Techniques
Techniques Pros Cons
Concurrent Think Aloud Understand participants’ Can interfere with usability
thoughts as they occur and as metrics, such as accuracy and
(CTA)
they attempt to work through time on task
issues they encounter
Elicit real-time feedback and
emotional responses
Retrospective Think Aloud Does not interfere with usability Overall session length increases
metrics Difficulty in remembering
(RTA)
thoughts from up to an hour
before = poor data
Concurrent Probing Understand participants’ Interferes with natural thought
thoughts as they attempt to process and progression that
(CP)
work through a task participants would make on
their own, if uninterrupted
Retrospective Probing Does not interfere with usability Difficulty in remembering = poor
metrics data
(RP)

Romano Bergstrom, Moderating Usability Tests:
http://www.usability.gov/articles/2013/04/moderating-usability-tests.html

Choosing a Moderating Technique
• Can the participant work completely alone?
• Will you need time on task and accuracy data?
• Are the tasks multi layered and/or require
concentration?
• Will you be conducting eye tracking?

Tweaking vs. Redesign
Tweaking Redesign
• Less work • Lots of work after much has
• Small changes occur quickly. already been invested
• Small changes are likely to • May break something else
happen. • A lot of people
• A lot of meetings

Lab vs. Remote vs. In the Field
• Controlled environment
Laboratory Remote In the Field
• All participants have the
• Participants in their • Participants tend to be
same experience
natural environments more comfortable in
• Record and (e.g., home, work) their natural
communicate from environments
• Use video chat
control room
(moderated sessions) • Recruit hard-to-reach
• Observers watch from or online programs populations (e.g.,
control room and (unmoderated) children, doctors)
provide additional
• Conduct many sessions • Moderator travels to
probes (via moderator)
quickly various locations
in real time
• Recruit participants in • Bring equipment (e.g.,
• Incorporate
many locations (e.g., eye tracker)
physiological measures
states, countries)
(e.g., eye tracking, • Natural observations
EDA)
• No travel costs

Lab-Based Usability Testing
Participant in
the testing
room
Live streaming
close-up screen shot
of the participant’s
screen

0

Participant in
the testing Large screens
room to display
material
Observation during focus
area for groups
We maneuver the clients
cameras, record, and
communicate through
microphones and
speakers from the
control room so we do
not interfere

Fors Marsh Group UX Lab

Eye Tracking

• Desktop
• Mobile
• Paper


Remote Moderated Testing

Observer
taking
notes, remain
s unseen from
Moderator participant
working from
the office

Participant
working on
the survey
from her
home in
another state


Field Studies
Participant
uses books
from her
natural
environment
Researcher to complete
goes to tasks on the
participant’s website
workplace to
conduct
session. She
observes and
takes notes

Participant is
in her natural
environment,
completing
tasks on a site
she normally
uses for work

Obstacles to Testing
• “There is no time.”
– Start early in development process.
– One morning a month with 3 users (Krug)
– 12 people in 3 days (Anderson Riemer)
– 12 people in 2 days (Lebson & Romano Bergstrom)
• “I can’t find representative users.”
– Everyone is important.
– Travel
– Remote testing
• “We don’t have a lab.”
– You can test anywhere.

Final Thoughts
• Test across devices.
• “User experience is an ecosystem.”
• Test across demographics.
• Older adults perform differently than young.
• Start early.

Kornacki, 2013, The Long Tail of UX

Quality of Mixed Modes

The views expressed on statistical or methodological issues are those of the
presenters and not necessarily those of the U.S. Census Bureau.

• Mixed Mode Surveys
• Response Rates
• Mode Choice

213

Mixed Mode Surveys
• Definition: Any combination of survey data
collection methods/modes
• Mixed vs. Multi vs. Multiple – Modes
• Survey organization goal:
– Identify optimal data collection procedure (for the
research question)
– Reduce Total Survey Error
– Stay within time/budget constraints

214

Mixed Mode Designs
• Sequential
– Different modes for different phases of interaction
(initial contact, data collection, follow-up)
– Different modes used in sequence during data
collection (i.e., panel survey which begins in one
mode and moves to another)
• Concurrent – different modes implemented at
the same time

215
de Lueew & Hox, 2008

Why Do Mixed Mode?
• Cost savings
• Improve Timeliness
• Reduces Total Survey Error
– Coverage error
– Nonresponse error
– Measurement error

216

Mixed Modes – Cost Savings
• Mixed mode designs give an opportunity to
compensate for the weaknesses of each individual
mode in a cost effective way (de Leeuw, 2005)
• Dillman 2009 Internet, Mail, and Mixed-Mode
Surveys book:
– Organizations often start with lower cost mode
and move to more expensive one
• In the past: start with paper then do CATI or in person
nonresponse follow-up (NRFU)
• Current: start with Internet then paper NRFU

217

Mixed Modes – Cost Savings cont.
• Examples:
• U.S. Current Population Survey (CPS) – panel survey
– Initially does in-person interview and collects a
telephone number
– Subsequent calls made via CATI to reduce cost
• U.S. American Community Survey
– Phase 1: mail
– Phase 2: CATI NRFU
– Phase 3: face-to-face with a subsample of remaining
nonrespondents

218

Mixed Mode - Timeliness
• Collect responses more quickly
• Examples:
– Current Employment Statistics (CES) offers 5
modes (Fax, Web, Touch-tone Data Entry,
Electronic Data Interchange, & CATI) to facilitate
timely monthly reporting

219

Why Do Mixed Mode?
• Cost savings
• Improve Timeliness
• Reduces Total Survey Error
– Coverage error
– Nonresponse error
– Measurement error

220

Mixed Mode - Coverage Error
• Definition: proportion of the target population
that is not covered by the survey frame and
the difference in the survey statistic between
those covered and not covered
• Telephone penetration
• Landlines vs mobile phones
– Web penetration

222
Groves, 1989

Coverage – Telephone
• 88% of U.S. adults have a cell phone
• Young adults, those with lower education, and
lower household income more likely to use
mobile devices as main source of internet
access

223
Smith, 2011; Zickuhr & Smith, 2012

Coverage - Internet
• Coverage is limited
– No systematic directory of addresses
• 1 in 5 in U.S. do not use the Internet

224
Zickuhr & Smith, 2012

World Internet Statistics

226

Coverage –Web cont.
• Indications that Internet adoption rates have
leveled off
• Demographics least likely to have Internet
– Older
– Less education
– Lower household income
• Main reason for not going online: not relevant

227
Pew, 2012

European Union – Characteristics of
Internet Users

228

Coverage - Web cont.
• R’s reporting via Internet can be different from
those reporting via other modes
– Internet vs. mail (Diment & Garret-Jones, 2007;
Zhang, 2000)
• R’s cannot be contacted through the Internet
because e-mail addresses lack structure for
generating random samples (Dillman, 2009)

229

Mixed Mode – Nonresponse Error
• Definition: inability to obtain complete
measurements on the survey sample
(Groves, 1998)
– Unit nonresponse - entire sampling unit fails to
respond
– Item nonresponse – R’s fail to respond to all
questions
• Concern is that respondents and non-
respondents may differ on variable of interest
230

Mixed Mode – Nonresponse cont.
• Overall response rates have been declining
• Mixed mode is a strategy used to increase
overall response rates while keeping costs low
• Some R’s have a mode preference
(Miller, 2009)

231

Mixed Mode – Nonresponse cont.
• Some evidence of a reduction in overall
response rates when multiple modes offered
concurrently in population/household surveys
– Examples: Delivery Sequence File Study
(Dillman, 2009); Arbitron Radio Diaries
(Gentry, 2008), American Communities Survey
(Griffen, et al, 2001), Survey of Doctorate
Recipients (Grigorian & Hoffer, 2008)
• Could assign R’s to modes based on known
preferences
232

Mixed Mode – Measurement Error
• Definition: “observational errors” arising from
the interviewer, instrument, mode of
communication, or respondent (Groves, 1998)
• Providing mixed modes can help reduce the
measurement error associated with collecting
sensitive information
– Example: Interviewer begins face-to-face
interview (CAPI) then lets R continue on the
computer with headphones (ACASI) to answer
sensitive questions
233

Mode Comparison Research
• Meta-analysis of articles by
– Harder to get mail responses
– Overall non-response rates & item non-response
rates are higher in self-administered
questionnaires, BUT answered items are of high
quality
– Small difference in quality between face-to-face
and telephone (CATI) surveys.
– Face-to-face surveys had slightly less item non-
response rates

234
de Leeuw, 1992

Mode Comparison Research cont.
• Question order and response order effects
less likely in self-administered than telephone
– R’s more likely to choose last option heard in CATI
(recency effect)
– R’s more likely to choose the first option seen in
self-administered (primacy effect)
– Mixed results on item-nonresponse rates in Web

235
de Leeuw, 1992; 2008

Mode Comparison Research cont.
• Some indication that Internet surveys are
more like mail than telephone surveys
– Visual presentation vs auditory
• Conflicting evidence item non-response (some
show higher item non-response on Internet
v.s. mail while others show no difference)
• Some evidence of better quality data
– Fewer post-data collection edits needed for
electronic v.s. mail responses

236
Sweet & Ramos, 1995; Griffin et. al, 2001

Disadvantages of Mixed Mode
• Mode Effects
– Concerns for measurement error due to the mode
• R’s providing different answers to the same questions
displayed in different modes
– Different contact/cooperation rates because of
different strategies used to contact R’s

237

Disadvantages of Mixed Mode
• Decrease in overall response rates
– Why: Effects of offering a mix of mail/web mixed
– What: Meta-analysis of 16 studies that compared
mixed mode surveys with mail and web options

Results: empirical evidence that offering mail and
Web concurrently resulted in a significant
reduction in response rates

238
Medway & Fulton, 2012

Response Rates in Mixed Mode Surveys
• Why this is happening?
– Potential Hypothesis #1: R’s dissuaded from
responding because they have to make a choice
• Offering multiple modes increases burden (Dhar, 1997)
• While judging pros/cons of each mode, neither appear
attractive (Schwartz, 2004)
– Potential Hypothesis #2: R’s choose Web, but never
actually do it
• If R’s receive invitation in mail, there is a break in their
response process (Griffin, et. al, 2001)
– Potential Hypothesis #3: R’s that choose Web may get
frustrated with the instrument and abandon the
whole process (Couper, 2000)

239

Overall Goals
• Find the optimal mix given the research
questions and population of interest
• Other factors to consider:
– Reducing Total Survey Error (TSE)
– Budget
– Time
– Ethics and/or privacy issues

240
Biemer & Lyberg, 2003

• Response Rates
• Mode Choice

241

Technique for Increasing Response
Rates to Web in Multi-Mode Surveys
• “Pushing” R’s to the web
– Sending R’s an invitation to report via Web
– No paper questionnaire in the initial mailing
– Invitation contains information for obtaining the
alternative version (typically paper)
– Paper versions are mailed out during follow-up to
capture responses to those that do not have web
access or do not want to respond via web
– “Responding to Mode in Hand” Principal

242

“Pushing” Examples
• Example 1: Lewiston-Clarkson Quality of Life
Survey
• Example 2: 2007 Stockholm County Council
Public Health Survey
• Example 3: American Community Survey
• Example 4: 2011 Economic Census Re-file
Survey

243

Pushing Example 1 –
Lewiston-Clarkson Quality of Life Survey
• Goals: increase web response rates in a
paper/web mixed-mode survey and identify
mode preferences
• Method:
– November 2007 – January 2008
– Random sample of 1,800 residential addresses
– Four treatment groups
– To assess mode preference, this question was at the
end of the survey:
• “If you could choose how to answer surveys like this, which
one of the following ways of answering would you prefer?”
• Answer options: web or mail or telephone

244
Miller, O’Neill, Dillman, 2009

Pushing Example 1 – cont.
• Group A: Mail preference with web option
– Materials suggested mail was preferred but web
was acceptable
• Group B: Mail Preference
– Web option not mentioned until first follow-up
• Group C: Web Preference
– Mail option not mentioned until first follow-up
• Group D: Equal Preference

245

• Results

246

“If you could choose how to answer surveys like this, which one
of the following ways of answering would you prefer?”

247


248


Group C = Web Preference Group

249

• Who can be pushed to the Web?

250

Pushing Example 2 – 2007 Stockholm
County Council Public Health Survey
• Goal: increase web response rates in a
paper/web mixed-mode survey
• Method:
– 50,000 (62% response rate)
– 4 treatments that varied in “web intensity”
– Plus a “standard” option – paper and web login
data

251
Holmberg, Lorenc, Werner, 2008

Pushing Example 2 – Cont.
• Overall response rates

S= Standard A1= very paper “intense”
A2= paper “intense” A3= web “intense”
A4= very web “intense”

252

• Web responses

S= Standard A1= very paper “intense”
A2= paper “intense” A3= web “intense”
A4= very web “intense”

253

Pushing Example 3 – American
Community Survey
• Goals:
– Increase web response rates in a paper/web
mixed-mode survey
– Identify ideal timing for non-response follow-up
– Evaluate advertisement of web choice

254
Tancreto et. al., 2012

• Method
– Push: 3 versus 2 weeks until paper questionnaire
– Choice: Prominent and Subtle
– Mail only (control)

– Tested among segments of US population
• Targeted
• Not Targeted

255

Response Rates by Mode in Targeted
Areas
45
40.5
40 38.1 38.2 37.6
35 31.1 12.5
30 2.5
25 28.4
20 34.1
38.1
15 28.6 28.0
10
5 9.8
3.5
0
Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)
Internet Mail

256

Response Rates by Mode in Not
Targeted Areas
50
45
40
35 29.7 30.4 29.9 29.8
30
25 19.8
12.6
20 24.1 2.7
15 29.7 27.8

10 17.1 17.2
5
6.3
0 2.1

Ctrl (Mail only) Prom Choice Subtle Choice Push (3 weeks) Push (2 weeks)
Internet Mail

257

Example 4: Economic Census Refile
• Goal: to increase Internet response rates in a
paper/Internet establishment survey during
non-response follow-up
• Method: 29,000 delinquent respondents were
split between two NRFU mailings
– Letter-only mailing mentioning Internet option
– Letter and paper form mailing

258
Marquette, 2012

Example 4: Cont.

259

• Response Rates
• Mode Choice

260

Why Respondents Choose Their Mode?
• Concern about “mode paralysis”
– When two option are offered, R’s much choose
between tradeoffs
– This choice makes each option less appealing
– By offering a choice between Web and mail;
possibly discouraging response

261
Miller and Dillman, 2011

Mode Choice
• American Community Survey – Attitudes and
Behavior Study
• Goals:
– Measure why respondents chose the Internet or
paper mode during the American Community
Survey Internet Test
– Why there was nonresponse and if it was linked to
the multi-mode offer

262
Nichols, 2012

Mode Choice – cont.
• CATI questionnaire was developed in
consultation with survey methodologists
• Areas of interest included:
– Salience of the mailing materials and messages
– Knowledge of the mode choice
– Consideration of reporting by Internet
– Mode preference

263

• 100 completed interviews per notification
strategy (push

264

• Results
– Choice/Push Internet respondents opted for perceived
benefits – easy, convenient, fast
– Push R’s noted that not having the paper form
motivated them to use the Internet to report
– Push R’s that reported via mail did so because they did
not have the Internet access or had computer
problems
– The placement of the message about the Internet
option was reasonable to R’s
– R’s often recalled the letter that accompanied the
mailing package mentioning the mode choice

265

• Results cont.
– Several nonrespondents cited not knowing that a
paper option was available as a reason for not
reporting
– Very few nonrespondents attempted to access the
online form
– Salience of the mailing package and being busy
were main reasons for nonresponse
– ABS study did NOT find “mode paralysis”

266

Amy Anderson Riemer
US Census Bureau
amy.e.anderson.riemer@census.gov

Jennifer Romano Bergstrom
Fors Marsh Group
jbergstrom@forsmarshgroup.com

Introduction to Web Survey Usability Design and Testing

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Introduction to Web Survey Usability Design and Testing

Semelhante a Introduction to Web Survey Usability Design and Testing (20)

Mais de Jennifer Romano Bergstrom

Mais de Jennifer Romano Bergstrom (20)

Último

Último (20)

Introduction to Web Survey Usability Design and Testing

Notas do Editor