Usability evaluation methods (part 2) and performance metrics

CN5111 – Week 4: Usability
evaluation methods (part 2)
and performance metrics
Dr. Andres Baravalle

Lecture content
• Usability testing (review & scenarios)
• Usability inspection
• Usability inquiry
• Performance metrics
2

Usability testing (review and
scenarios)
3

Usability testing
• When: common for comparison of products or
prototypes
• Tasks & questions focus on how well users
perform tasks with the product
– Focus is on time to complete task & number & type of
errors
• Data collected by video & interaction logging
• Experiments are central in usability testing
– Usability inquiry tends to use questionnaires &
interviews
4

Testing conditions
• Usability lab or other controlled space
• Emphasis on:
– Selecting representative users
– Developing representative tasks
• Small sample (5-10 users) typically selected
• Tasks usually last no longer than 30 minutes
• The test conditions should be the same for every
participant
5

Some type of data
 Time to complete a task
 Time to complete a task after a specified time
away from the product
 Number and type of errors per task
 Number of errors per unit of time
 Number of navigations to online help or manuals
 Number of users making a particular error
 Number of users completing task successfully
6

How many participants is
enough for user testing?
• The number is a practical issue
• Depends on:
– Schedule for testing
– Availability of participants
– Cost of running tests
• Typically 5-10 participants
– Some experts argue that testing should
continue with additional users until no new
insights are gained
7

Examples
• The next slides describe 2 experiments:
the one behind the book Prioritizing Web
Usability and a fictional one on
OpenSMSDroid
• Both use Thinking Aloud and video/screen
recording for data collection
8

Prioritizing Web Usability
• Prioritizing Web Usability (Nielsen and Loranger, 2006)
used the Thinking Aloud method to collect insight on
user behaviour:
– 69 users, all with at least one year experience in using the
web
– Broad range of job backgrounds and web experience – but no
one working in IT or marketing
– 25 web sites tested with specific tasks
– Windows desktops with 1024x768 resolution running Internet
Explorer
– Recordings of monitor and upper body for each session
– Broadband speed between 1 and 3 Mbps
9

Prioritizing Web Usability (2)
• The tasks that the users were asked to perform
included:
– Go to ups.com and find how much does it cost to
send a postcard to China
– You want to visit the Getty Museum this weekend. Go
to getty.edu and find opening times/prices
– Go to nestle.com and find a snack to eat during
workouts
– Go to bankone.com and find best savings account if
you have a $1,000 balance
10

Prioritizing Web Usability (3)
• The result of the research is presented as
a book:
– Organising the finding in categories (including
searching, navigation, typography and
writing style)
– Using plenty of examples and screenshots to
demonstrate the usability issues that were
identified
11

Prioritizing Web Usability:
findings
• People succeed 66% of the time when
working on “single site” activities and 60%
of the time when having to browse through
the internet for information
12

findings (2)
• Experienced users spend about 25
seconds in a homepage and 45 in an
interior page (35 and 60 for
inexperienced users)
• Only 23% of users scroll on their first
visit of a homepage
– The number decreases after the first visit
– The average scroll for first visit is 0.8 of a
screen
13

findings (3)
• 88% of users go to search engines to find
information
• Font face and size: different font faces for
print and screen
– Different font size depending on target
audience
• More in the book…
14

OpenSmsDroid evaluation
• You have been tasked to evaluate the usability
for a new (fictional) Android application to write
short text messages, OpenSMSDroid
• You have decided to set up an experiment
– The next experiment is (loosely) adapted from
“Experimental Evaluation of Techniques for Usability
Testing of Mobile Systems in a Laboratory Setting”
(Beck, Christiansen, Kjeldskov, Kolbe and Stage,
2003)
15

OpenSmsDroid evaluation
• Your test users will be perform a set of
tasks in specific configurations using the
thinking aloud method for data collection
– A constraint of 5 minutes has been set for
each of the tasks
– The usability researcher will record the
session and take notes
16

OpenSmsDroid evaluation:
testing configurations
• Configurations for the test (tentative list):
– Sitting on a chair at a table
– Walking on a treadmill at constant speed
– Walking on a treadmill at varying speed
– Walking on an 8-shaped course that is changing as
obstructions are being moved, within 2 meters of a
person that walks at constant speed
– Walking on an 8-shaped course that is changing as
obstructions are being moved, within 2 meters of a
person that walks at varying speed
– Walking in Westfield Stratford at 16:00 on Saturday
17

testing configurations (2)
• For practical reasons and after reviewing the
literature, these settings have been selected for
this evaluation:
– Sitting on a chair at a table
– Walking on a treadmill at constant speed
– Walking in Westfield Stratford at 16:00 on
Saturday
18

tasks
• Writing a new SMS containing the phrase “The quick
brown fox jumps over the lazy dog” repeated 2 times to
an existing contact (without using predictive text
features)
• Writing a new SMS containing the phrase “The quick
brown fox jumps over the lazy dog” repeated 2 times to
an existing contact (using predictive text features)
• Taking a picture and sending it to an existing contact
• Taking a short 1 minute video and sending it to an
existing contact
19

tasks (2)
• In each test, you can collect:
– Quantitative data: time needed to perform the
task, and if the task has been completed
– Qualitative data: asking the user to think
aloud while interacting with the device and
recording the interaction
20

OpenSmsDroid evaluation: data
analysis
• The evaluation will analyse the data
collected and report on any findings,
informing on any difference in
performance and suggesting possible
changes to the interface
– An experiment can also generate further
hypothesis which will be used in further
experiments
21

OpenSmsDroid experiment:
what's missing?
• Something to compare to!
– Otherwise you cannot know if the interface is
better or not
22

Usability inspection methods
• Heuristic evaluation and walkthroughs
are the most common usability inspection
methods
– We'll also see several other methods
24

Usability inspections and
heuristics (2)
• Usability inspection methods are based on
having evaluators inspecting an user
interface
• Usability inspection methods aim to
examine usability-related aspects of an
user interface, even if the interface has not
been yet developed
– Can be used to perform usability evaluations
in the initial stages of the development
25

Heuristic evaluations
• Heuristic evaluation is a method that
requires usability specialists to judge
whether each element of an user interface
follows established usability principles and
guidelines
– E.g. Jakob Nielsen’s heuristics
• Heuristics are being developed for mobile
devices, wearables, virtual worlds, etc.
26

27
Nielsen’s heuristics: discount
evaluations
• An heuristic evaluation is referred to as
discount evaluation when 5 evaluators
are used
– Empirical evidence suggests that on
average 5 evaluators identify 75-80% of
usability problems on generalist web
sites

28
Heuristic evaluations: stages
• Briefing session to tell experts what to do.
• Evaluation period of 1-2 hours in which:
– Each expert works separately
– Take one pass to get a feel for the product
– Take a second pass to focus on specific features
• Debriefing session in which experts work
together to prioritize and categorise the
problems

Relevant standards and
guidelines
• Relevant standards and guidelines
include:
– W3C HTML and CSS standards
– W3C WAI guidelines
– Mobile Web Best Practices guidelines
– ISO 9271
29

Heuristic evaluations:
advantages and problems
• Few ethical & practical issues to consider
because users not involved
– Can be difficult & expensive to find experts
– Experts should have knowledge of application domain
& of the evaluation method used
• Critical points:
– Important problems may get missed
– Focus can be lost on trivial problems
– Experts have biases
30

Cognitive walkthroughs
• Focus on ease of learning
• Designer presents an aspect of the design
& usage scenarios
• Expert is told the assumptions about user
population, context of use, task details.
• One or more experts walk through the
design prototype with the scenario.
31

Cognitive walkthroughs (2)
• Experts are guided by 3 questions:
– Will the correct action be sufficiently evident to
the user?
– Will the user notice that the correct action is
available?
– Will the user associate and interpret the
response from the action correctly?
• As the experts work through the scenario
they note problems.
32

Pluralistic walkthrough
• Variation on the cognitive walkthrough
theme.
– Performed by a team
• The panel of experts begins by working
separately
• Then there is managed discussion that
leads to agreed decisions.
• The approach lends itself well to
participatory design
33

Feature inspection
• Feature inspection is a technique that focuses
on the features of a product or of a web site
– A group of inspectors that are given some use cases
and are asked to analyse each feature of the web site
for what regards availability, understandability, and
other aspects of usability
– This technique is better in the middle stages of
development, when features are known but the
artefact cannot be evaluated with methods as lab
experiments.
34

Standards inspection
• Standards inspection is a technique used to
ensure the compliance of a web site against
some standard
• A usability professional with extensive
knowledge of the relevant standards inspects a
web site for compliance
• Different standard inspections can be run on the
same artefact
– Nielsen’s heuristics include standards inspection
35

Truckers and mobile devices
[...] a major mobile device company was trying to understand why there were
so many data entry errors on a mobile device for long-haul truck drivers. Many
people in the company tended to blame the truckers, whom they assumed were
un educated.
None of them had ever actually met a trucker, but they figured it couldn’t be too
hard to type in a word or two. One winter, a senior user interface (UI) designer
decided to see for himself.
The designer spent a week at a truck stop watching truck drivers use the
device and talking to them about it. He quickly discovered that the truckers
could spell perfectly. Instead, the problem was the device. The truckers tended
to be big men, with big fingers. To make matters worse, they often wore bulky
gloves in the winter.
The device had tiny buttons, making typing with big fingers in warm gloves
frustrating.
(Observing the User's Experience; Goodman, Kuniavsky and Moed 2012)
37

Truckers and mobile devices (2)
• The team realized it had been basing
important design decisions on faulty
assumptions
• The team redesigned the UI so that it
required less typing and increased the size
of buttons.
– The error rates dropped dramatically
38

Truckers and mobile devices (2)
• That's an example of how usability enquiry
methods work
39

Usability inquiry
• Usability inquiry methods focus (at
different degrees) on analysing an artefact
either from “the native point of view" or
looking for “the native point of view"
– Used to obtain information about users' likes,
dislikes, needs, and understanding of the
system
40

Usability inquiry (2)
• They may use one or more of these
tecniques:
– Talking to users
– Observing users using a system in a real
working situation
– Letting the users answer questions (verbally
or in written form)
41

Data collection & analysis
• Data collection:
– Observation & interviews (e.g. contextual inquiry)
– Notes, pictures, recordings, diaries
– Video
– Logging
• Analysis
– Categorizing the findings
– Using existing categories can be provided by pre-
existing research
42

Usability inquiry methods
• The next slides will cover some popular
usability inquiry methods:
– Diary
– Contextual inquiry
– Interviews and focus groups
– Surveys
43

Diary method
• The diary method requires users to keep a
diary of their interactions
• Diaries can be free form or structured
– The diary method is best used when the
researcher does not have the time, the
resources or the possibility to use user
monitoring methods or when the level of detail
provided by user monitoring methods is not
needed
44

Contextual inquiry
• Contextual inquiry is a structured field
interviewing method which typically evaluates:
– User opinions
– User experience
– Motivation
– Context
• It is a study based on dialogue and interaction between
interviewee and user, and it is one of the best methods
to use when researchers need to understand the users'
work context.
45

Interviews and focus groups
• Interviews and focus groups are research
methods based on interaction between
researchers and users
– The researcher facilitates the discussion
about the issues rose by the questions
– In focus groups (multiple users present), the
interaction among the users may raise
additional issues, or identify common
problems that many persons experience
46

Surveys
• Surveys are a quantitative research
method, where a set list of questions are
asked and the users' responses recorded
– When the questions are administered by a
researcher, the survey is called a structured
interview
– When the questions are administered by the
respondent, the survey is referred to as a
questionnaire
47

And now…
• You have had an overview of a wide
selection of usability evaluation methods
– And you are ready to use them in your
assignment
48

Measuring the User Experience
• The next slides are based on the core text
book for this module, "Measuring the User
Experience"
49

Types of performance metrics
• Task success
• Level of success
• Errors
• Efficiency
• Learnability
51

Task success
• Binary success
• Level of success
52

Task success
• It measures how effectively users are able
to complete a given set of tasks
• To measure task success, each task that
users are asked to perform must have a
clear end state or goal
53

Task success (2)
• You can:
– Ask users to articulate the answer verbally
– Ask users to provide an answer in a
structured way (e.g. using an online tool or
paper)
– Use proxy measures (e.g. when the correct
solution is not easily identifiable as it may
depend from individual circumstances)
54

Task failure
• There are many different ways in which a
participant might fail
– Giving up: participants indicate that they would not
continue with the task if they were doing this on their
own
– Moderator “calls” it because the participant is not
making any progress
– Too long: the participant completed the task but not
within a predefined time
– Wrong: participants thought that they completed the
task successfully, but they actually did not
55

Binary success
• Binary success is the simplest and most
common way of measuring task success
– Each time users perform a task, they should
be given a “success” or “failure” score (0 or 1)
– Users either completed a task successfully or
they didn’t
– That's the case, for example. for e-commerce
– Sometimes, you will should measure
perceived success rather than factual
success (why?)
56

Binary success: analysing and
presenting data
• The most common way to analyse and
present binary success rates is by task.
• This involves simply presenting the
percentage of participants who completed
each task successfully
57

Binary success: analysing and
presenting data by task
58

Binary success: presenting data
by user
• Frequency of use (infrequent users versus
frequent users)
• Previous experience using the product
• Domain expertise
• Age group
59

Levels of success
• Identifying levels of success is useful
when there are reasonable shades of grey
associated with task success.
60

Levels of success: complete,
partial and failure
• Complete success
– With assistance
– Without assistance
• Partial success
– With assistance
– Without assistance
• Failure
– User thought it was complete, but it wasn’t
– User gave up
61

Levels of success: analysing
and presenting data
62

Time on task
• In most situations, the faster a user can
complete a task, the better the experience
– Time on task is particularly important for
products where tasks are performed
repeatedly by the user.
63

Time on task: analysing and
presenting data
• The most common way is to look at the
average amount of time spent on any
particular task
– You should always report a confidence
interval to show the variability in the time data
64

Time on task: range and
threshold
• A variation is to create ranges and report
the frequency of users who fall into each
interval
• Another useful way to analyse task time
data is by using a threshold.
– In many situations, the only thing that matters
is whether users can complete certain tasks
within an acceptable amount of time.
65

Errors
• Errors reflect the mistakes made during a
task. Errors can be useful in pointing out
particularly confusing or misleading parts
of an interface.
66

Errors (2)
• Examples of errors include:
– Entering incorrect data into a form field
– Making the wrong choice in a menu or drop-
down list
– Taking an incorrect sequence of actions
– Failing to take a key action
67

Errors: analysing and
presenting data
• The most common approach is to look at
average error rates per task or per
participant
68

Efficiency
• Efficiency can be assessed by examining
the amount of effort a user expends to
complete a task, such as the number of
clicks in a website or the number of button
presses on a mobile phone.
69

Efficiency (2)
• Efficiency typically measures the number
of actions or steps that users took in
performing each task.
– Identify the action(s) to be measured: for
websites, it's typically mouse clicks or page
views
– Define the start and end of an action
– Count the action
70

Learnability
• Learnability is a way to measure how
performance improves or fails to improve
over time.
71

Learnability (2)
• Learnability is normally measured using
performance metrics: time on task, errors,
number of steps, or task success per
minute.
72

Learnability: analysing and
presenting data
• The most common way to analyse and
present learnability data is by examining a
specific performance metric (such as time
on task, number of steps, or number of
errors) by trial for each task or aggregated
• across all tasks.
73

Learnability: analysing and
presenting data (2)
74

References
• Beck, E., Christiansen, M., Kjeldskov, J.,
Kolbe, N. and Stage, J. (2003).
‘Experimental Evaluation of Techniques for
Usability Testing of Mobile Systems in a
Laboratory Setting’, OzCHI 2003.
• Nielsen, J. and Loranger, H. (2006).
Prioritizing Web Usability.
75

Usability evaluation methods (part 2) and performance metrics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Usability evaluation methods (part 2) and performance metrics

Similar to Usability evaluation methods (part 2) and performance metrics (20)

More from Andres Baravalle

More from Andres Baravalle (20)

Recently uploaded

Recently uploaded (20)

Usability evaluation methods (part 2) and performance metrics

Editor's Notes