1. 11/17/2009
An Evidence-Based Approach to What are outcome measures?
the Selection of Outcome
Measures for AHPs Any measurement of a patient’s health
status that can change as a result of time,
treatment or disease (MacDermid J 2002)
Donna Kennedy, BSc OT, MSc, CHT
Clinical Specialist Hand Therapy
Honorary Research Associate
How many outcome measures are you Pub Med Nov 2009
aware of?
How many outcome measures do you Outcome measures - 486,379
use?
Outcome measures and OT - 2169
Standardised Outcome Measures How can we use outcome measures?
• To determine if treatment is causing a change
• Published
• To demonstrate to others that treatment has
• Detailed instructions for administration, resulted in clinically important change
scoring and interpreting the test • To evaluate programs of care
• Defined purpose • To identify subgroups of patients who most
• Population specific benefit from care
• Published data indicating acceptable • To evaluate quality improvements
reliability and validity • Clinical research
(MacDermid J 2002)
1
2. 11/17/2009
The International Classification of
Measuring for Quality Improvement in
Functioning, Disability and Health (ICF)
(WHO 2002) the NHS
• Impairments- loss or abnormality of psychologic,
physiologic, or anatomic structure or function
"We can only be sure to improve what we
• Activity limitations- difficulties in performing activities in a
manner or within a range that is considered normal can actually measure“
Lord Darzi, High Quality Care for All, June 2008
• Participation Restriction- a disadvantage resulting from
impairment or activity limitation that limits or prevents
fulfilment of a role that is normal for the individual
Barriers Helpful hints…….
• Time
• Cost • Be organised
• Training requirements • Keep notes
• Date your work
PI (C) O
PI(C)O Questions
Element Define Example
Patient “How would I Do adults with traumatic
succinctly describe lower limb amputation…
• Patient group these patients?”
Intervention “What is the main …who receive OT in the
• Intervention action I am acute care setting
considering?”
• (Control)
(Control) “What is (are) the other compared with patients
• Outcome option(s)?” who do not receive OT
Outcome “What do I/ the patient demonstrate greater
(CEMB 2009)
want to happen/ not independence in ADL at
happen?” discharge?
2
3. 11/17/2009
Practical Example Ask a PICO Question
“Do patients with P: Adults with RA
rheumatoid arthritis I: OT
demonstrate
improved hand C: (no OT)
following O: Hand function (Activity limitation)
occupational
therapy?”
Planning your search
When searching, throw a big net!
Inclusions
(P) Adults, RA
(I) OT, Hand therapy
Exclusions
(P) Paediatrics, rheumatologic disease other
than RA
(I) Hand Surgery, rheumatologic medication
(O) Impairment (grip strength, ROM)
Participation restriction (quality of life)
1: rheumatoid adj arthritis
Literature Searching Psychometric Properties
2: adults
3: (1 and 2)
4: occupational therapy 1. Conduct
5: hand therapy electronic search
6: (4 or 5)
2. Apply
7: activities adj of adj daily adj living
inclusion/exclusion
8: hand adj function
criteria to titles,
9: (7 or 8)
10: assessment
abstracts
11. evaluation 3. Hand search
12. outcome measure reference lists for
13. (10 or 11 or 12) additional tools
14. (3 and 6 and 9 and 13)
3
4. 11/17/2009
Reliability Validity
Is the measurement Does the test measure
consistent and free what it is intended to
from error? measure?
Search 2; Psychometric Properties
Responsiveness 1: Michigan adj Hand adj Outcomes adj Measure
2: MHQ
3: (1 or 2)
4: Patient adj Evaluation adj Measure
Is the measure able to 5: PEM
detect change over 6: (4 or 5)
time? 7: reliability
8: validity
9: responsiveness
10: (7 or 8 or 9)
11: (3 or 6 and 10)
Hierarchy of Evidence Scales of Measurement
Types of
Level 1a Systematic reviews & meta-analysis Units with equal Distance, age,
Level 1b Randomized controlled trial (RCT) Reliability Ratio
intervals, measured time, weight
from true zero
Level 2a Systematic reviews & meta-analysis of randomized & non-
randomized controlled trials Intrarater Equal intervals Calendar years,
Interval between numbers, IQ, degrees
Level 2b Controlled trials, cohort & poor quality RCTs Interrater but not related to centigrade
true zero
Level 4 Case series Test-retest Rank order of MMT, functional
Level 5 Expert opinion including literature/ narrative reviews,
consensus statements, description studies & individual
Ordinal observations status, pain
case studies Category labels or Sex, nationality,
Level ? What someone told me once or I learnt 15 years ago Nominal classification blood type
( from Portney and
Watkins 2000)
4
5. 11/17/2009
Statistical Analysis of Reliability Statistical Analysis of Reliability
Interval or ratio data (age, time, weight, grip Nominal data (sex, blood type, diagnosis) -
strength, IQ) - Intraclass correlation Kappa statistic
coefficients (ICC)
Interpretation
Interpretation < 40% - poor to fair agreement
< .50 – poor 40 – 60% - moderate agreement
.50 to .75 - moderate > 60% - substantial agreement
> .75 - good > 80% - excellent agreement
> .90 – suggested for clinical measurements
(Landis and Loch 1977)
(Portney and Watkins 2000)
Standard Error of Measurement (SEM) Reliability
Test-retest reliability of pain-free grip
strength for one trial left and right
• Reliability estimates, standard errors
hands (Kennedy D 2008) reported?
ICC 2,1 SEM (Kg) • Are methods of collecting reliability data
clear?
One grip left 0.96 0.8
hand • Might reliability estimates or standard
One grip 0.92 1.2
errors of measurement differ substantially
right hand for various populations?
7.6 8.4 9.2 10 10.8 11.6 12.4
68% chance grip is +/- 1 SEM or G • Rationale for time elapsed between tests
+/- 0.8
mean and in study design to ensure changes in
95% chance grip is +/- 2 SEM or G
grip
+/- 1.6 health status were minimal?
Validity Criterion-related and predictive validity
• Face validity- (weakest form) indicates a
tool appears to test what it is supposed to • Statistics -Spearman’s rank or Pearson’s
test
correlation
• Content validity - indicates that the items in
a tool adequately sample the content that • Score 0 to 1.0 - scores closer to 1 have higher
defines the variable being measured correlation. 1.0
• Construct validity- ability to measure an
abstract concept 0
• Criterion- related validity- (most practical
and most objective) indicates that the
outcomes of one tool, the tool being
assessed, can be used as a substitute
measure for a gold standard
• (Portney and Wakins 2000, pg 82)
5
6. 11/17/2009
Validity Responsiveness
• Clear description of methods to collect validity
data? • The ability to detect change over time
• Is validation sample described in enough detail • If testing effectiveness, then score must change
(gender, age, ethnicity, and language)? in proportion to the patient’s status change, and
• Is there reason to believe validity will differ remain the same when the patient has not
substantially for various populations? changed
• Is evidence of content validity presented? • For research - the change must be large enough
to be statistically significant
• Is evidence of construct validity presented for
each proposed use? • For clinical purposes- the change must be
precise enough to show increments of
• Are criterion validity data presented with a clear meaningful change
rationale and support for the choice of criteria
measure? (Portney and Watkins 2000)
Analysis of Responsiveness Effect Size
• Independent samples t-test – compares the • T-test tells us if the difference between groups is
mean scores of two different groups of people or statistically significant
conditions • Effect size indicates the relative magnitude of
• Paired-samples t-test- compares mean scores the differences between the means
for the same group of people on two different • Interpretation:
occasions < .4 – small
• Analysis of variance- used with 3 or more .5 moderate
conditions or groups .8 large
(Pallant 2005) (Cohen 1988)
Responsiveness
Interpretability
• Is information provided on change scores?
• Is effect size reported with information on • Is information provided on the relationship of
methods used in calculation? scores to clinically recognised conditions or
• Are responsiveness claims derived from need for specific treatments?
longitudinal data? • Is information provided on the relationship of
• Is the population being tested clearly identified? scores or changes in scores to commonly
.4 .8 recognised life events?
• Is information provided on how well scores
predict known relevant events?
6
7. 11/17/2009
Respondent Burden Administrative Burden
• Information provided on
• Does the instrument place undue strain on the amount of training/
respondent?
• Information provided on time needed to education/expertise needed
complete the instrument? by staff to administer, score
• Information provided about the reading level or use instrument?
assumed? • Information provided about
• Information provided about special requirements any resources required for
or requests placed on subjects? administration of instrument,
• Information provided on the acceptability of the such a computer hardware?
instrument?
What do we do now? Next Steps
Ask yourself…….. Identify and implement outcome measures….
• in your setting
Can you demonstrate that • Locally
your treatment is causing • Nationally
a change? • Internationally
Can you demonstrate to
others that your treatment
has resulted in clinically
important change?
• Andresen EM (2000) “Criteria for Assessing the Tools of Disability Outcomes Research”,
Archives of Physical Medicine and Rehabilitation, 81:2, S15-S20.
• Brettle A, Grant MJ (2003) Finding Evidence for Practice: a workbook for health professionals.
Edinburgh: Churchill Livingstone.
• Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence
Earlbaum Associates, 1988.
• Jerosch-Herold C (2005) “An Evidence-Based Approach to Choosing Outcome Measures: a
Checklist for the Critical Appraisal of Validity, Reliability and Responsiveness Studies”, British
Journal of Occupational Therapy, 68:8, 347-353.
• Kendall N (1997) Developing outcome assessments: a step by step approach New Zealand
Journal of Physiotherapy Dec, 11 - 17
• Landis JR, Loch GG (1977) “The measurement of observer agreement for categorical data”,
Biometrics, 33: 159-74.
• Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL (1996) “Evaluating Quality –of –Life
and Health Status Instruments: Development of Scientific Review Criteria”, Clinical
Therapeutics, 18:5, 979-992.
• MacDermid J (2002) “Outcome Measurement in the Upper Extremity” in Rehabilitation of the
Hand and Upper Extremity, 5th edition, Mosby, St Louis.
• Oxford Centre for Evidence-Based Medicine (2009) Focusing clinical questions.
• Pallant J (2005)SPSS Survival Manual, 2nd ed. .Open University Press,
Berkshire.www.cebm.net/index.aspx?o=1036
• Portney LG, Watkins MP (2000) Foundations of Clinical Research, Prentice Hall Health, New
Jersey.
• World Health Organisation (2002) “Towards a Common Language for Functioning, Disability
and Health: ICF”, Geneva, http://www.who.int/classification/icf
7