To encourage professional development in the wider UX community, I have been an active contributor of The Washington D.C. Jobs to Be Done group. This is a presentation I gave during two separate JTBD sessions over the past 6 months which discusses survey design and common errors in the process of exploring user needs through survey design.
50. WHAT IS WRONG WITH THIS QUESTION?
SOURCE: https://measuringu.com/left-side-bias/
Knowing that the population of the
U.S. is 316 million, what is the
population of Canada?
54. IMPACT OF STRONG LANGUAGE
SOURCE: https://measuringu.com/extreme-items/
I think this is one of my all-time favorite websites.
I never want to use the website again.
55. IMPACT OF STRONG LANGUAGE
SOURCE: https://measuringu.com/extreme-items/
• User Experience Professional
Association Website (N=60)
• 95% confidence level
63. SEVEN! (95% Confidence)
SOURCE: (Nunnally 1978).
https://measuringu.com/scale-points/
2.5% of 858 responses
users provided
responses between the
points.,
0% of 840 responses
users provided
responses between the
points.
Finstad, 2010
64. WHAT’S
WRONG WITH
THESE
QUESTIONS?
What was your total family income
from all sources in 2013?
Do you support or oppose making
immigration into the United States
difficult?
What is your favorite sport to watch on
television?
(If employed) Does your employer
provide you with health insurance?
68. Keeping track of my stock ideas
Investing advice based on a potential market theme (e.g., Artificial Intelligence,
eSports, Personalized Medicine)
Compare my holdings to a recommended investment allocation
Figuring out the best time to buy a stock
69. SURVEY
• Market Pass Members
(N=253)
• 6.6% response rate
• Satisfaction &
Importance Survey
• Usage of site features
• Demographics
• Misc. Questions Based
on findings from
interviews/Operating
Hypotheses
80. THE GOOD
1. Keeping track of my
stock ideas
2. Investing advice based
on a potential market
theme (e.g., Artificial
Intelligence, eSports, etc.)
8. Finding stocks I believe
in
81. THE OPPORTUNITY
5. Understanding how
much money I am getting
from my investment each
month
6. Figuring out the best
time to buy a stock
9. Determining the best
time to sell a stock
10. Learning how to invest
82. NEXT STEPS
• Evaluate Scalability of Findings
• Determine Future Priorities
• Identify Surprises and Outliers
• Rinse/Repeat
• Frequency vs. Pain
• Focused Job Interviews
• More iterations for addressing those jobs
83. YOUR NEXT STEPS
What jobs do you think would scale? Which ones would you want
to survey?
How could you use this in your work?
Reflection Questions
What biases will you need to control for in your questions?
How can you minimize survey error?
Notas do Editor
affordable, scalable, speedy
Good at measuring attitudes, opinions, beliefs, values, behaviors, factual items, and perceptions
quantitative people like them
Multiple people can conduct a survey without affecting the validity or reliability of the results
Provides baselines over time
Anonymity/confidentiality for users
You can ask a lot of questions
OK at measuring behaviors
patterns, frequency, ease and success of use
user needs, expectations, perspectives, priorities and preferences
user satisfaction with collections and services
shifts in user attitudes and opinions
relevance of collections and services to user needs
trends (by repetition over time).
HUFF POST – some wise guy had feminism students signing away their suffrage/Daily Show
Multiple people can conduct a survey without affecting the validity or reliability of the results
Provides baselines over time
Anonymity/confidentiality for users
You can ask a lot of questions
OK at measuring behaviors
patterns, frequency, ease and success of use
user needs, expectations, perspectives, priorities and preferences
user satisfaction with collections and services
shifts in user attitudes and opinions
relevance of collections and services to user needs
trends (by repetition over time).
Multiple people can conduct a survey without affecting the validity or reliability of the results
Provides baselines over time
Anonymity/confidentiality for users
You can ask a lot of questions
OK at measuring behaviors
patterns, frequency, ease and success of use
user needs, expectations, perspectives, priorities and preferences
user satisfaction with collections and services
shifts in user attitudes and opinions
relevance of collections and services to user needs
trends (by repetition over time).
Anonymity/confidentiality for users
You can ask a lot of questions
OK at measuring behaviors
patterns, frequency, ease and success of use
user needs, expectations, perspectives, priorities and preferences
user satisfaction with collections and services
shifts in user attitudes and opinions
relevance of collections and services to user needs
trends (by repetition over time).
Anonymity/confidentiality for users
You can ask a lot of questions
Disadvantages - what they aren’t so good at (5 minutes)
Difficult to measure attitude changes over time (unless designed that way)
Example: Conducting a survey right after a massive organizational restructure
Aspirational Responses/Dishonesty
May not feel encouraged to provide accurate, honest answers (social bias)
May not feel comfortable providing answers that present themselves in an unfavorable manor
Users may not have realistic perceptions of our behaviors
Many argue that surveys are inadequate at capturing emotions, feelings and behaviors
No ways to tell how truthful a respondent is being
Differences in Understanding and Interpreting Questions
Understanding Questions
Subjectivity of Respondent
Who in here is someone who never gives an establishment a 10? Maybe you give them a 9.4 or a 9.5, but never a ten?
Subjectivity of the Researcher/Analyst
Ways to mitigate
Need to get a representative sample
Ways to encourage users to participate
Disadvantages - what they aren’t so good at (5 minutes)
Difficult to measure attitude changes over time (unless designed that way)
Example: Conducting a survey right after a massive organizational restructure
Aspirational Responses/Dishonesty
May not feel encouraged to provide accurate, honest answers (social bias)
May not feel comfortable providing answers that present themselves in an unfavorable manor
Users may not have realistic perceptions of our behaviors
Many argue that surveys are inadequate at capturing emotions, feelings and behaviors
No ways to tell how truthful a respondent is being
Differences in Understanding and Interpreting Questions
Understanding Questions
Subjectivity of Respondent
Who in here is someone who never gives an establishment a 10? Maybe you give them a 9.4 or a 9.5, but never a ten?
Subjectivity of the Researcher/Analyst
Ways to mitigate
Need to get a representative sample
Ways to encourage users to participate
Disadvantages - what they aren’t so good at (5 minutes)
Difficult to measure attitude changes over time (unless designed that way)
Example: Conducting a survey right after a massive organizational restructure
Aspirational Responses/Dishonesty
May not feel encouraged to provide accurate, honest answers (social bias)
May not feel comfortable providing answers that present themselves in an unfavorable manor
Users may not have realistic perceptions of our behaviors
Many argue that surveys are inadequate at capturing emotions, feelings and behaviors
No ways to tell how truthful a respondent is being
Differences in Understanding and Interpreting Questions
Understanding Questions
Subjectivity of Respondent
Who in here is someone who never gives an establishment a 10? Maybe you give them a 9.4 or a 9.5, but never a ten?
Subjectivity of the Researcher/Analyst
Ways to mitigate
Need to get a representative sample
Ways to encourage users to participate
Disadvantages - what they aren’t so good at (5 minutes)
Difficult to measure attitude changes over time (unless designed that way)
Example: Conducting a survey right after a massive organizational restructure
Aspirational Responses/Dishonesty
May not feel encouraged to provide accurate, honest answers (social bias)
May not feel comfortable providing answers that present themselves in an unfavorable manor
Users may not have realistic perceptions of our behaviors
Many argue that surveys are inadequate at capturing emotions, feelings and behaviors
No ways to tell how truthful a respondent is being
Differences in Understanding and Interpreting Questions
Understanding Questions
Subjectivity of Respondent
Who in here is someone who never gives an establishment a 10? Maybe you give them a 9.4 or a 9.5, but never a ten?
Subjectivity of the Researcher/Analyst
Ways to mitigate
Need to get a representative sample
Ways to encourage users to participate
Disadvantages - what they aren’t so good at (5 minutes)
Difficult to measure attitude changes over time (unless designed that way)
Example: Conducting a survey right after a massive organizational restructure
Aspirational Responses/Dishonesty
May not feel encouraged to provide accurate, honest answers (social bias)
May not feel comfortable providing answers that present themselves in an unfavorable manor
Users may not have realistic perceptions of our behaviors
Many argue that surveys are inadequate at capturing emotions, feelings and behaviors
No ways to tell how truthful a respondent is being
Differences in Understanding and Interpreting Questions
Understanding Questions
Subjectivity of Respondent
Who in here is someone who never gives an establishment a 10? Maybe you give them a 9.4 or a 9.5, but never a ten?
Subjectivity of the Researcher/Analyst
Ways to mitigate
Need to get a representative sample
Ways to encourage users to participate
Coverage error is when the list from which sample members are drawn does not accurately represent the population on the characteristic(s) one wants to estimate with the survey. Coverage error is the difference between the estimate produced when the list is inaccurate and what would have been produced with an accurate list.
If you herd had way more brown goats then what was sampled, that would be coverage error.
Coverage error is another important source of variability in survey statistics; it is the degree to which statistics are off due to the fact that the sample used does not properly represent the underlying population being measured.
Response rate is only an indirect indicator of survey quality.
Error occurs when the characteristics of those who chose to respond different from those who chose not to respond in a way that is relevant to the survey results.
Favorable environment example – If those who have positive views of the environment are the only ones who responded, than the sample would be biased because of the non-response error. [take something questionable – like a big mac burger or something]
HIV testing example with 84% response rate with remarkably low levels of HIV. Later learned that those who participated did not report behaviors that would be consistent with the contraction of HIV.
At the Motley Fool we look for at least a 10-15% response rate for external surveys. Internal surveys typically hit 30-40% response rates
ne early example of a finding was reported by Visser, Krosnick, Marquette and Curtin (1996) who showed that surveys with lower response rates (near 20%) yielded more accurate measurements than did surveys with higher response rates (near 60 or 70%).[3] In another study, Keeter et al. (2006) compared results of a 5-day survey employing the Pew Research Center’s usual methodology (with a 25% response rate) with results from a more rigorous survey conducted over a much longer field period and achieving a higher response rate of 50%. In 77 out of 84 comparisons, the two surveys yielded results that were statistically indistinguishable. Among the items that manifested significant differences across the two surveys, the differences in proportions of people giving a particular answer ranged from 4 percentage points to 8 percentage points.[4]
How many of you are familiar with statistics?
When you send out a survey you likely won’t want to send it out to your whole file. Sampling error is the difference between the population you survey and the actual population.
Measurement error occurs when respondents are unwilling or unable to provide accurate answers. This can be due to poor question design, survey mode effects, or data collection mistakes.
Good examples could be:
Socially deviant or illegal behaviors
Cannot answer because the words are not understood or phrases are confusing
Question structure may encourage certain answers when another would not. For example, items that ask respondents to check all that apply tend to result in fewer selections as opposed to those who ask for explicit positive or negative answers
Question order could impact measurement error
Fun fact: scalar questions are likely to be answered differently in visual versus aural survey
Assuming prior knowledge or understanding
To what extent to you agree or disagree that there are enough arts and cultural activities in your local area? [1- strongly disagree- 5 strongly agree]
To what extent to you agree or disagree that there are enough arts and cultural activities in your local area? By ‘local area’ we mean within a 10 minute drive of your home? [1- strongly disagree- 5 strongly agree]
Assuming prior knowledge or understanding
To what extent to you agree or disagree that there are enough arts and cultural activities in your local area? [1- strongly disagree- 5 strongly agree]
To what extent to you agree or disagree that there are enough arts and cultural activities in your local area? By ‘local area’ we mean within a 10 minute drive of your home? [1- strongly disagree- 5 strongly agree]
Asking two questions in one:
A double-barreled question is a one that has more than one question embedded within it. Participants may answer one but not both, or may disagree with part or all of the question.
Double-barreled question: Do you agree that campus parking is a problem and that the administration should be working diligently on a solution?
Revised question: Is campus parking a problem? (If the participant responds yes): Should the administration be responsible for solving this problem?
Asking two questions in one:
A double-barreled question is a one that has more than one question embedded within it. Participants may answer one but not both, or may disagree with part or all of the question.
Double-barreled question: Do you agree that campus parking is a problem and that the administration should be working diligently on a solution?
Revised question: Is campus parking a problem? (If the participant responds yes): Should the administration be responsible for solving this problem?
Asking two questions in one:
A double-barreled question is a one that has more than one question embedded within it. Participants may answer one but not both, or may disagree with part or all of the question.
Double-barreled question: Do you agree that campus parking is a problem and that the administration should be working diligently on a solution?
Revised question: Is campus parking a problem? (If the participant responds yes): Should the administration be responsible for solving this problem?
Bad Question: How short was Napoleon?
The word “short” immediately brings images to the mind of the respondent. If the question is rewritten to be neutral-sounding, it can eliminate the leading bias.
Good Question: How would you describe Napoleon’s height?
Starting with demographics
Asking two questions in one
Bad Question: How short was Napoleon?
The word “short” immediately brings images to the mind of the respondent. If the question is rewritten to be neutral-sounding, it can eliminate the leading bias.
Good Question: How would you describe Napoleon’s height?
Starting with demographics
Asking two questions in one
Bad Question: How short was Napoleon?
The word “short” immediately brings images to the mind of the respondent. If the question is rewritten to be neutral-sounding, it can eliminate the leading bias.
Good Question: How would you describe Napoleon’s height?
Inadequate response options
Example: How long have you been working in this job? [1-2 years, 2-5 years, 5 or more years]
Alternative: How long have you been working in this job? [Less than a year, 1-2 years, 2-5 years, 5 or more years].
[Less than a year, 1-2 years, 2-5 years, 5 or more years].
Using negative question wording
Negative question wording make respondents do a double take when trying to respond. They usually include the word ‘not’ in the question itself, then they ask respondents to agree or disagree with the position or statement. Designing your survey questions in positive language gives you one less source of error and bias to worry about, and that’s not a bad thing!
Example: Do you agree or disagree that KFC’s Double Down Sandwich is not a healthy option for our growing children? [Agree/ Disagree/ Don’t know]
Alternative: Do you think the arts are recognized as a valuable tool in helping to strengthen communities? [Yes/ No/ Don’t know]
Using negative question wording
Negative question wording make respondents do a double take when trying to respond. They usually include the word ‘not’ in the question itself, then they ask respondents to agree or disagree with the position or statement. Designing your survey questions in positive language gives you one less source of error and bias to worry about, and that’s not a bad thing!
Example: Do you agree or disagree that KFC’s Double Down Sandwich is not a healthy option for our growing children? [Agree/ Disagree/ Don’t know]
Alternative: Do you think the arts are recognized as a valuable tool in helping to strengthen communities? [Yes/ No/ Don’t know]
Example 2.7. Anchoring
Consider two different wordings for a particular question:
Wording 1: Knowing that the population of the U.S. is 316 million, what is the population of Canada?
Wording 2: Knowing that the population of Australia is 23 million, what is the population of Canada?
This survey was conducted in Stat 100 classes where both wordings of the question were randomly distributed. The students did not know that there were two versions of this question so each only answered the question that they received. The results for this survey are found in Figure 2.4.
Figure 2.4. STAT 100 Survey Results
As you can see, the students were influenced by the wording of the question that they were asked to answer. People's perceptions can be severely distorted when they are provided with a reference point or an anchor. People tend to say close to the anchor because of either having limited knowledge about the topic or being distracted by the anchor. You should also consider the following three points:
The sample sizes were large enough to detect a difference in the two groups
Canada's population is about 35 million
The anchor might be less distracting if the following wording were used: "What is the population of Canada, when knowing that the population of the U.S. is 316 million?" but it is best to leave out the anchoring statement altogether.
Using negative question wording
Negative question wording make respondents do a double take when trying to respond. They usually include the word ‘not’ in the question itself, then they ask respondents to agree or disagree with the position or statement. Designing your survey questions in positive language gives you one less source of error and bias to worry about, and that’s not a bad thing!
Example: Do you agree or disagree that KFC’s Double Down Sandwich is not a healthy option for our growing children? [Agree/ Disagree/ Don’t know]
Alternative: Do you think the arts are recognized as a valuable tool in helping to strengthen communities? [Yes/ No/ Don’t know]
Using negative question wording
Negative question wording make respondents do a double take when trying to respond. They usually include the word ‘not’ in the question itself, then they ask respondents to agree or disagree with the position or statement. Designing your survey questions in positive language gives you one less source of error and bias to worry about, and that’s not a bad thing!
Example: Do you agree or disagree that KFC’s Double Down Sandwich is not a healthy option for our growing children? [Agree/ Disagree/ Don’t know]
Alternative: Do you think the arts are recognized as a valuable tool in helping to strengthen communities? [Yes/ No/ Don’t know]
User Experience Professional Association Website (N=60)
95% confidence level
People adjust to the wording and disagree. They tend to adjust for the language being used.
Additional work is needed, but it suggests you may want to be careful how you use language in your surveys or else encounter this type of bias.
Too many open-ended questions
The short answer is that 7-point scales are a little better than 5-points—but not by much. The psychometric literature suggests that having more scale points is better but there is a diminishing return after around 11 points (Nunnally 1978). Having seven points tends to be a good balance between having enough points of discrimination without having to maintain too many response options. So what are the consequences of this?
The short answer is that 7-point scales are a little better than 5-points—but not by much. The psychometric literature suggests that having more scale points is better but there is a diminishing return after around 11 points (Nunnally 1978). Having seven points tends to be a good balance between having enough points of discrimination without having to maintain too many response options. So what are the consequences of this?
A recent article tested the SUS response error by counting the number of times users couldn’t decide between two points. The users were allowed to “interpolate” or pick between points such as 3.5. In 2.5% of the 858 responses users provided responses between two points (95% CI between 1.6% to 3.9%).
In contrast, when a 7-point version of the SUS was used there was no interpolating for any of the 840 ratings.While this seems like compelling evidence to always use 7-point over 5 scales there are two tempering factors.
While there was error in the five point SUS it is unclear how much of an impact this actually has on the final SUS score
since the study used different systems for each scale.
Errors in statistics have a way of cancelling themselves out. It is likely that many responses that are “forced” into higher numbers will be cancelled out by those forced into lower numbers.
There are many things that can go wrong with questions and responses: users misinterpret the question, users select the wrong box or administrators forget to invert the scales. The data suggests this type of error is small but can be addressed in the following four ways.