SlideShare a Scribd company logo
1 of 5
Download to read offline
Authors:
Jan E. Lexell, MD, PhD
David Y. Downham, PhD                                                                                   Statistics

Affiliations:
From the Department of
Rehabilitation, Lund University
Hospital, Lund, Sweden (JEL); the         INVITED REVIEW
Department of Health Sciences, Lund
University, Malmo, Sweden (JEL); the
                 ¨
Department of Health Sciences,
     ¨
Lulea University of Technology,
Boden, Sweden (JEL); and the              How to Assess the Reliability of
Department of Mathematical
Sciences, University of Liverpool,
Liverpool, UK (DYD).
                                          Measurements in Rehabilitation
Correspondence:                           ABSTRACT
All correspondence and requests for
reprints should be addressed to Jan       Lexell JE, Downham DY: How to assess the reliability of measurements in
E. Lexell, MD, PhD, Department of         rehabilitation. Am J Phys Med Rehabil 2005;84:719 –723.
Rehabilitation, Lund University
Hospital, Orupssjukhuset, 22185           To evaluate the effects of rehabilitation interventions, we need reliable measure-
Lund, Sweden.                             ments. The measurements should also be sufficiently sensitive to enable the
                                          detection of clinically important changes. In recent years, the assessment of
Disclosures:                              reliability in clinical practice and medical research has developed from the use of
Supported, in part, by grants from        correlation coefficients to a comprehensive set of statistical methods. In this
the Gun and Bertil Stohne                 review, we present methods that can be used to assess reliability and describe
Foundation, Magn. Bergvall
Foundation, the Crafoord Foundation,
                                          how data from reliability analyses can aid the interpretation of results from
the Swedish Society of Medicine, the      rehabilitation interventions.
Swedish Stroke Association, the
                                          Key Words:     Reference Values, Reproducibility of Results, Research Design, Statistics
Swedish Association of Neurologically
Disabled (NHR), the Council for
Medical Health Care Research in
South Sweden, Lund University
                        ¨
Hospital, and Region Skane.

0894-9115/05/8409-0719/0
                                          I n rehabilitation, measurements are obtained for clinical practice and research
                                          purposes. To be clinically and scientifically useful, measurements must be
American Journal of Physical
Medicine & Rehabilitation
                                          reliable. Reliability refers to the reproducibility of measurements. Measure-
Copyright © 2005 by Lippincott            ments are reliable if they are stable over time and show adequate levels of
Williams & Wilkins                        measurement variability.1 They must also be sufficiently sensitive to detect
                                          clinically important changes after rehabilitation interventions. The more sen-
DOI: 10.1097/01.phm.0000176452.17771.20   sitive your measurement, the easier it is to detect improvements after inter-
                                          ventions or deterioration over time.
                                               Reliability in clinical practice and medical research is most commonly
                                          determined from measurements of the same subjects on two occasions: so-
                                          called retest reliability.2 By applying well-defined statistical methods, physia-
                                          trists and clinical scientists can determine whether measurements are suffi-
                                          ciently reliable for a particular purpose. In recent years, there has been a
                                          growing interest in the statistical methods for the assessment of reliability in
                                          clinical practice and medical research. Today, there is general agreement that a
                                          comprehensive set of statistical methods is required to address the reliability of
                                          measurements.3– 8
                                               In this review, we present the statistical methods that are part of the
                                          current approach to the assessment of retest reliability based on continuous
                                          data and illustrate them using measurements of isokinetic muscle strength. We
                                          also describe how data from reliability analyses can aid the interpretation of
                                          results from rehabilitation interventions and the detection of clinically impor-
                                          tant changes. In the Appendix, we present the equations that are used to
                                          calculate reliability. This provides physiatrists and clinical scientists with a

September 2005                                                  Reliability of Measurements in Rehabilitation              719
source of reference for the determination of the        ple, it can be used with small samples and with data
            characteristics of measurements and how to inter-       from more than two test occasions. There are dif-
            pret data obtained from rehabilitation interven-        ferent types of ICCs available for different study
            tions.                                                  designs, but in practice, their values are often very
                                                                    similar.6,10 The ICC2,1 (Equation 1, Appendix),
            RETEST CORRELATION COEFFICIENTS                         which is calculated from a two-way repeated-mea-
                 The most common approach in the assessment         sures analysis of variance, covers most situations
            of retest reliability is to measure a group of sub-     and also has the advantage that it provides the basis
            jects on two occasions, separated by hours or days.     for the calculations of some of the other reliability
            This is done on the same type of subjects or pa-        indices. (Reliability can also be determined for or-
            tients who one wants to use in an intervention.         dinal data obtained from, for example, clinical rat-
            Once the data are obtained, the first step is to plot    ing scales. The equivalent to a correlation coeffi-
            the data. In Figure 1, data on concentric ankle         cient is then the kappa coefficient.11)
            dorsiflexor muscle strength at 30 degrees/sec ob-             The data plotted in Figure 1 were analyzed
            tained from 30 healthy young men and women at           using a two-way analysis of variance, and the values
            two test occasions 7 days apart are presented. The      from the analysis of variance table were substituted
            data come from a larger study in which we evalu-        into Equation 1. The value of ICC2,1 is 0.915, which
            ated the reliability of peak torque, work, and torque   is very close to the Pearson’s r. This is often the
            at a specific time at different angular velocities       case when measurements of the same subjects on
            using the Biodex dynamometer.5                          two occasions are analyzed.5
                 Clearly, the closer the values are to a straight        How do we then interpret the values of ICC? As
            line, the better is the reliability. The usual Pear-    a matter of fact, no generally agreed ICC “cut-off”
            son’s correlation coefficient (Pearson’s r) can be       points exist. In their original publication from
            used to quantify the reliability. The value of Pear-    1979, Shrout and Fleiss9 suggested that specific
            son’s r is here 0.91; as a value of 1.00 represents     values of ICC could be considered to represent
            perfect agreement and a value of 0.00 no agree-         “acceptable,” “good,” and “fair” reliability. Fleiss12
            ment, we can already at this stage presume that our     later recommended that ICC values of Ͼ0.75 rep-
            measurements are highly reliable.                       resent “excellent reliability” and values between 0.4
                 The intraclass correlation coefficient (ICC) is     and 0.75 represent “fair to good reliability.”
            nowadays the preferred retest correlation coeffi-             It is becoming clear that the use of only ICCs
            cient.9 The ICC has several advantages: for exam-       for the analyses of reliability is not sufficient.3,4,7,8
                                                                    The ICC can give misleading results—for example,
                                                                    if the sample is homogeneous, the value of ICC
                                                                    (and Pearson’s r) may be low. We should therefore
                                                                    avoid assessing reliability just from retest correla-
                                                                    tions and also avoid using global terms, such as
                                                                    “good reliability,” and instead interpret reliability
                                                                    from several statistical methods.

                                                                    CHANGES IN THE MEAN
                                                                         The next step in the reliability analysis is to
                                                                    calculate changes in the mean from the measure-
                                                                    ments obtained from the two test occasions. The
                                                                    change in the mean values between two test occa-
                                                                    sions can consist of two components: a random
                                                                    change and a systematic change. A random change
                                                                    is often referred to as the “sampling error” and
                                                                    comes from the variability in the actual test situa-
                                                                    tion. It can be due to the variability in the equip-
                                                                    ment or method used and to the inherent biolog-
                                                                    ical variability. A systematic change is a
                                                                    nonrandom change in the mean values between
                                                                    the two test occasions. This occurs if the subject or
            FIGURE 1 To illustrate the analysis of reliability,
                         measurements of concentric ankle dorsi-    patient systematically performs better (or worse)
                         flexor muscle strength at 30 degrees/sec    on the second test occasion as a result of, for
                         are used. The data were obtained from 30   example, a change in behavior, a learning effect, or
                         healthy young men and women at two         fatigue if performance is measured.
                         test occasions 7 days apart.5                   To detect a systematic change, several indices

720   Lexell and Downham                                                   Am. J. Phys. Med. Rehabil.     ●   Vol. 84, No. 9
can be calculated (Equations 2– 4). Based on these
calculations, a 95% confidence interval for the
mean difference between the two test occasions can
be formed (Equation 5). This will allow you to
determine if there is a true systematic difference
between the two test occasions. If the mean value is
positive or negative, the measurements from one
test occasion tend to be larger or smaller than
those from the other occasion. When the 95%
confidence interval does not include zero, this in-
dicates a systematic change in the mean, for exam-
ple, due to a significant learning effect. If this
happens, one should choose tests or equipment
that minimize this learning effect or let the sub-
jects familiarize themselves before the real trials.
This is less of a problem in a controlled study if
both groups are equally affected by systematic
                                                        FIGURE 2 Bland–Altman plot for the data in Figure
changes in the mean.
                                                                     1, with the approximate 95% confidence
     Using the data from Figure 1, the indices rep-                  interval (dashed line) of the mean differ-
resenting the changes in the mean are calculated                     ence (solid line) between the two test
from Equations 2–5 and are presented in Table 1.                     occasions.
The mean difference between the two test occa-
sions is here positive (test occasion 2 minus 1),
indicating that the isokinetic muscle strength at
the second test occasion tended to be larger than at    confidence interval, and that there are no apparent
the first occasion. However, as zero is well within      systematic biases or outliers in the data.
the 95% confidence interval, there is no significant
change in the mean muscle strength between the
                                                        MEASUREMENT VARIABILITY
two test occasions, and we can therefore conclude            Once we have established if there are any sys-
that there is no systematic change between the two      tematic changes in the mean, we want to quantify
test occasions.                                         the actual size of the variability between the mea-
     The next step is then to present the data graph-   surements obtained from the two test occasions.
ically in so-called Bland–Altman plots.2 In these       This is often referred to as the “within-subject
plots, the differences between measurements from        variation,” “typical error,” or “typical variation”7
                                                        and is one of the more important reliability indices.
the two test occasions are plotted against the mean
                                                        Quite naturally, a change after any intervention
of the two test occasions for each subject, and any
                                                        will be easier to detect if we have established that
systematic bias or outliers can be seen. The Bland–
                                                        the variability between measurements from two
Altman plot for the data in Figure 1 is presented in
                                                        test occasions is small.3,7
Figure 2; the mean difference in muscle strength
                                                             A simple index of the measurement variability
between the two test occasions (test occasion 2
                                                        is the standard deviation of the differences between
minus 1) and the 95% confidence interval are also
                                                        the two test occasions (SDdiff) (Equation 3). Divid-
included. It is clearly seen that the mean difference
                                                        ing the SDdiff by ͌2 yields the “method error”
is close to zero, that zero is included in the 95%
                                                        (ME) (Equation 6).13 An alternative index is the
                                                        “standard error of measurement” (SEM). There are
                                                        different ways to calculate SEM; a preferred way is
 TABLE 1. Indices of changes in the mean                to take the square root of the mean square error
          between two test occasions                    term (WMS) from the analysis of variance (Equa-
                                                        tion 7). If the sample size is sufficiently large and
 Mean difference between two          0.03              the mean difference small, both highly likely con-
                   ៮
   test occasions (d)                                   ditions, ME and SEM take similar values. These
 Standard deviation of the            2.88              values are, on their own, not easily interpreted
   differences between two
                                                        because we do not know how much of the variabil-
   test occasions (SDdiff)
                                                        ity comes from the change in the mean and how
 Standard error of ៮ (SE)
                   d                  0.53
 95% confidence interval               Ϫ1.04 to 1.10     much is due to the typical variation.7
                                                             This can be overcome by expressing the mea-
   of ៮ (95% CI)
      d
                                                        surement variability as a coefficient of variation.
                                                        The ME or the SEM is then divided by the mean of

September 2005                                                   Reliability of Measurements in Rehabilitation    721
all the measurements and multiplied by 100 to give       ence range or “error band”14 around the mean
            a percentage value. These indices—the CV%                difference of the measurements from the two test
            (Equation 8) and the SEM% (Equation 9)—are               occasions, a 95% SRD can be calculated (Equation
            independent of the units of measurements and are         11). The SRD is then divided by the mean of all the
            therefore more easily interpreted. The CV% and           measurements and multiplied by 100 to give a
            SEM% indicate the typical variation expressed as a       percentage value, the SRD% (Equation 12).15 The
            percentage value and can be used to interpret the        SRD%, like the SEM% and CV%, is independent of
            results of an improvement after an intervention.         the units of measurements and therefore more
                 Substituting the data from Figure 1 into Equa-      easily interpreted.
            tions 6 –9, we can calculate the values of ME, SEM,           To form practically useful 95% SRD and
            CV%, and SEM%. As can be seen in Table 2, ME             SRD%, a fairly large sample size is required. This
            and SEM and CV% and SEM% are very similar. We            brings us to the question of how many subjects or
            can then determine that with a CV% of 6.36 and a         patients you need to determine the reliability of a
            mean isokinetic muscle strength of 31.9 Nm (based
                                                                     measurement. In comparison with clinical trials,
            on the data in Fig. 1), the average person’s muscle
                                                                     sample sizes have received little attention in the
            strength has a typical variation from one test oc-
                                                                     reliability literature. Some scientists have recom-
            casion to the other of 2.03 Nm. An improvement
                                                                     mended specific sample sizes, but these recom-
            after an intervention that is smaller than the typ-
                                                                     mendations are usually based on practical experi-
            ical variation does not, in most situations, indicate
            a clinically important improvement.                      ence. A sample size of 15–20 is often used in
                                                                     reliability studies with continuous data.12 Larger
            CLINICALLY IMPORTANT CHANGES                             sample sizes, 30 –50, have been suggested more
                 To evaluate more specifically if a measurement       recently7 and would be required to form practically
            represents a clinically important change, we can         useful 95% SRD and SRD%.
            use the data from the two test occasions to calcu-            The values of SRD, 95% SRD, and SRD% are
            late an interval, which represents the 95% likely        then calculated from Equations 10 –12 using the
            range for the difference between a subject’s mea-        data from Figure 1 and are presented in Table 3.
            surements from these two test occasions. This is a       The SRD% is 18.1, which indicates that a measure-
            form of “reference range” that can be used to            ment should exceed that value to indicate a clini-
            determine if differences between measurements            cally important change. Taking the mean value of
            subsequently obtained before and after “real” inter-     the isokinetic muscle strength in Figure 1 (31.9
            ventions represent clinically important changes. If      Nm), this has to change 5.7 Nm to indicate a
            the difference for a subject or patient is outside (or   clinically important improvement in strength.
            within) this reference range, it does (or does not)           Having done all this, one should remember
            represent a clinically important change. It is clear     that there is an essential difference between the
            that the smaller the reference range, the more           reliability indices described here and clinically im-
            sensitive are the measurements, and also that the        portant changes. The reliability indices describe
            measurements can be considered highly reliable,          the clinometric property of a measurement,
            but the reference range is too wide to be clinically     whereas clinically important changes are more ar-
            or scientifically useful.                                 bitrarily chosen values that physiatrists and clinical
                 The “smallest real difference” (SRD), intro-        scientists judge as minimally (and clinically) im-
            duced by Beckerman et al.14 is one way to evaluate       portant. An interesting area for future research is
            clinically important changes. The SRD is similar to      to explore the clinometric property of measure-
            the “limits of agreement” proposed by Bland and          ments and how that corresponds to what physia-
            Altman.4 The SRD (Equation 10) is formed by tak-         trists and clinical scientists judge as clinically im-
            ing the measurement variability, represented by
                                                                     portant. Such research will help us to define
            the SEM, and multiplying it by ͌2 and by 1.96 to
                                                                     optimal outcome measures for clinical practice and
            include 95% of the observations of the difference
                                                                     research purposes.
            between the two measurements. To obtain a refer-


             TABLE 2. Indices of measurement variability              TABLE 3. Indices of clinically important
                                                                               changes
             Method error (ME)                              2.03
             Standard error of the measurement (SEM)        2.01      Smallest real difference (SRD)           5.81
             Coefficient of variation (CV%)                  6.36      Smallest real difference (95% SRD)       Ϫ5.78 to 5.84
             Standard error of the measurement (SEM%)       6.30      Smallest real difference (SRD%)          18.1%



722   Lexell and Downham                                                    Am. J. Phys. Med. Rehabil.     ●   Vol. 84, No. 9
APPENDIX                                                    The SEM% is defined by
Retest Correlation Coefficients                             SEM% ϭ (SEM/mean) ϫ 100                                   [9]
     The intraclass correlation coefficient (ICC2,1)
for n subjects and for two test occasions is defined    where mean in Equations 8 and 9 is the mean of all
by                                                     the data from the two test occasions.
     ICC2,1 ϭ (BMS Ϫ EMS)/
                                                       CLINICALLY IMPORTANT CHANGES
    (BMS ϩ EMS ϩ 2[JMS Ϫ EMS]/n)                [1]         The smallest real difference (SRD) is defined by
where BMS is the between-subjects mean square,
                                                            SRD ϭ 1.96 ϫ SEM ϫ ͌2                                   [10]
EMS is the residual mean square, JMS is the with-
in-subjects mean square, all obtained from the              The 95% SRD is defined by
two-way analysis of variance, and n is the number
of subjects.                                                          ៮
                                                            95% SRD ϭ d Ϯ SRD                                      [11]

CHANGES IN THE MEAN                                         The SRD% is defined by
     To evaluate changes in the mean between two            SRD% ϭ (SRD/mean) ϫ 100                                 [12]
test occasions, ៮ is defined by
                d
                                                       where mean in Equation 12 is the mean of all the
    ៮
    d ϭ the mean difference between the two test       data from the two test occasions.
occasions                                    [2]
                                                       REFERENCES
                        ៮ is assessed by
    The variation about d                               1. Rothstein JM: Measurement and clinical practise: Theory
                                                           and application, in Rothstein JM (ed): Measurement in
    SDdiffϭ                                                Physical Therapy. New York, Churchill Livingstone, 1985
    the standard deviation of the differences    [3]    2. Bland JM, Altman DG: Statistical methods for assessing
                                                           agreement between two methods of clinical measurement.
    The variation of ៮ is assessed by the standard
                     d                                     Lancet 1986;1:307–10
error of the mean (SE)                                  3. Atkinson G, Nevill AM: Statistical methods for assessing
                                                           measurement error (reliability) in variables relevant to
    SE ϭ SDdiff/͌n                               [4]       sports medicine. Sports Med 1998;26:217–38
                                                        4. Bland JM, Altman DG: Measuring agreement in method
where n is the number of subjects. SDdiff represents       comparison studies. Stat Methods Med Res 1999;8:135–60
the variability of the differences between the two      5. Holmback AM, Porter MM, Downham DY, et al: Reliability
                                                                   ¨
test occasions and SE represents the precision of ៮d       of isokinetic ankle dorsiflexor strength measurements in
                                                           healthy young men and women. Scand J Rehabil Med
as an estimate of the underlying change in the             1999;31:229–39
mean. An approximate 95% confidence interval             6. Holmback AM, Porter MM, Downham DY, et al: Ankle
                                                                   ¨
                                    ៮
(95% CI) of the mean difference (d) between the            dorsiflexor muscle performance in healthy young men and
                                                           women: Reliability of eccentric peak torque and work.
two test occasions is defined by                            J Rehabil Med 2001;33:90–6
    95% CI ϭ ៮ Ϯ 2 ϫ SE
             d                                   [5]    7. Hopkins WG: Measures of reliability in sports medicine and
                                                           science. Sports Med 2000;30:1–15
    The multiplier of SE in Equation 5 depends on       8. Rankin G, Stokes M: Reliability of assessment tools in
the number of subjects, n; 2 is a good approxima-          rehabilitation: An illustration of appropriate statistical anal-
                                                           yses. Clin Rehabil 1998;12:187–99
tion when n is Ͼ20. For the data in Figure 1, it
                                                        9. Shrout PE, Fleiss JL: Intraclass correlations: Uses in assess-
should be 2.045, which is obtained from the t-table        ing rater reliability. Psychol Bull 1979;86:420–8
with 29 (n Ϫ 1) degrees of freedom.                    10. Baumgartner TA: Norm-referenced measurement: Reliabil-
                                                           ity, in Safrit M, Wood T (eds): Measurement Concepts in
MEASUREMENT VARIABILITY                                    Physical Education and Exercise Science. Champaign, IL,
                                                           Human Kinetics, 1989, pp 45–72
    The method error (ME) is defined by                 11. Sim J, Wright CC: The kappa statistic in reliability studies:
    ME ϭ SDdiff/͌2                               [6]       Use, interpretation, and sample size requirements. Phys
                                                           Ther 2005;85:257–68
where SDdiff comes from Equation 3.                    12. Fleiss JL: The Design and Analysis of Clinical Experiments.
    The standard error of measurement (SEM) is             New York, John Wiley Sons, 1986, pp 1–31
defined by                                              13. Portney LG, Watkins MP: Statistical measures of reliability,
                                                           in: Foundations of Clinical Research: Applications to Prac-
    SEM ϭ ͌WMS                                  [7]        tice . Englewood Cliffs, NJ, Prentice Hall, 1993, pp 525–6
                                                       14. Beckerman H, Roebroeck ME, Lankhorst GJ, et al: Smallest
where WMS is the mean square error term from               real difference: A link between reproducibility and respon-
the analysis of variance.                                  siveness. Qual Life Res 2001;10:571–8
    The coefficient of variation (CV%) is defined by     15. Flansbjer UB, Holmback AM, Downham DY, et al: Reliability
                                                                                  ¨
                                                           of gait performance test in men and women with hemipa-
    CV% ϭ (ME/mean) ϫ 100                       [8]        resis after stroke. J Rehabil Med 2005;37:75–82



September 2005                                                     Reliability of Measurements in Rehabilitation              723

More Related Content

What's hot

Measurement of variables IN RESEARCH
Measurement of variables IN RESEARCHMeasurement of variables IN RESEARCH
Measurement of variables IN RESEARCHVinold John
 
East bayesian power calculations
East bayesian power calculationsEast bayesian power calculations
East bayesian power calculationsCytel
 
Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)Scott Miller
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy RowlandsBarry Duncan
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016Scott Miller
 
Conceptualization, operationalization and measurement
Conceptualization, operationalization and measurementConceptualization, operationalization and measurement
Conceptualization, operationalization and measurementKatherine Bautista
 
Educ 243 final report pepito
Educ 243 final report pepitoEduc 243 final report pepito
Educ 243 final report pepitodeped
 
Anderson%20and%20 gerbing%201988
Anderson%20and%20 gerbing%201988Anderson%20and%20 gerbing%201988
Anderson%20and%20 gerbing%201988piyushsinghal2003
 
Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power nQuery
 
ReeseUsherBowmanetetal
ReeseUsherBowmanetetalReeseUsherBowmanetetal
ReeseUsherBowmanetetalBarry Duncan
 
PCOMS ICCE SAMHSA Review
PCOMS ICCE SAMHSA ReviewPCOMS ICCE SAMHSA Review
PCOMS ICCE SAMHSA ReviewScott Miller
 
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato BegumDr. Cupid Lucid
 
Designing studies with recurrent events | Model choices, pitfalls and group s...
Designing studies with recurrent events | Model choices, pitfalls and group s...Designing studies with recurrent events | Model choices, pitfalls and group s...
Designing studies with recurrent events | Model choices, pitfalls and group s...nQuery
 
Goal attainment scaling
Goal attainment scalingGoal attainment scaling
Goal attainment scalingAntonio Moset
 

What's hot (19)

Ch # 6 brm
Ch # 6 brmCh # 6 brm
Ch # 6 brm
 
Measurement of variables IN RESEARCH
Measurement of variables IN RESEARCHMeasurement of variables IN RESEARCH
Measurement of variables IN RESEARCH
 
East bayesian power calculations
East bayesian power calculationsEast bayesian power calculations
East bayesian power calculations
 
Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy Rowlands
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016
 
Conceptualization, operationalization and measurement
Conceptualization, operationalization and measurementConceptualization, operationalization and measurement
Conceptualization, operationalization and measurement
 
Educ 243 final report pepito
Educ 243 final report pepitoEduc 243 final report pepito
Educ 243 final report pepito
 
Anderson%20and%20 gerbing%201988
Anderson%20and%20 gerbing%201988Anderson%20and%20 gerbing%201988
Anderson%20and%20 gerbing%201988
 
Logit Model Artifact
Logit Model ArtifactLogit Model Artifact
Logit Model Artifact
 
Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power
 
ReeseUsherBowmanetetal
ReeseUsherBowmanetetalReeseUsherBowmanetetal
ReeseUsherBowmanetetal
 
Reeseetal2013
Reeseetal2013Reeseetal2013
Reeseetal2013
 
PCOMS ICCE SAMHSA Review
PCOMS ICCE SAMHSA ReviewPCOMS ICCE SAMHSA Review
PCOMS ICCE SAMHSA Review
 
NCME 2015 Hurtz Brown
NCME 2015 Hurtz BrownNCME 2015 Hurtz Brown
NCME 2015 Hurtz Brown
 
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato Begum
 
Designing studies with recurrent events | Model choices, pitfalls and group s...
Designing studies with recurrent events | Model choices, pitfalls and group s...Designing studies with recurrent events | Model choices, pitfalls and group s...
Designing studies with recurrent events | Model choices, pitfalls and group s...
 
Goal attainment scaling
Goal attainment scalingGoal attainment scaling
Goal attainment scaling
 

Similar to Assessing Reliability in Rehabilitation Measurements

Enhancing The Reliability of Physician Performance Measures
Enhancing The Reliability of Physician Performance MeasuresEnhancing The Reliability of Physician Performance Measures
Enhancing The Reliability of Physician Performance MeasuresRobert Sutter
 
RESEARCH Open AccessA methodological review of resilience.docx
RESEARCH Open AccessA methodological review of resilience.docxRESEARCH Open AccessA methodological review of resilience.docx
RESEARCH Open AccessA methodological review of resilience.docxverad6
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfPubrica
 
Care expert assistant for Medicare system using Machine learning
Care expert assistant for Medicare system using Machine learningCare expert assistant for Medicare system using Machine learning
Care expert assistant for Medicare system using Machine learningIRJET Journal
 
Application of EQ-5D in Reimbursement Decision Making: The Case of NICE
Application of EQ-5D in Reimbursement Decision Making: The Case of NICEApplication of EQ-5D in Reimbursement Decision Making: The Case of NICE
Application of EQ-5D in Reimbursement Decision Making: The Case of NICEOffice of Health Economics
 
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906[Typ]Poster[Sbj]1593Synoptics[Dte]20150906
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906Mark Gusack
 
Valid and Reliable ToolsThe goal of an evaluation is to determin.docx
Valid and Reliable ToolsThe goal of an evaluation is to determin.docxValid and Reliable ToolsThe goal of an evaluation is to determin.docx
Valid and Reliable ToolsThe goal of an evaluation is to determin.docxnealwaters20034
 
11.forecast analysis of food price inflation in pakistan
11.forecast analysis of food price inflation in pakistan11.forecast analysis of food price inflation in pakistan
11.forecast analysis of food price inflation in pakistanAlexander Decker
 
Forecast analysis of food price inflation in pakistan
Forecast analysis of food price inflation in pakistanForecast analysis of food price inflation in pakistan
Forecast analysis of food price inflation in pakistanAlexander Decker
 
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...Nat Rice
 
Shekelle et. al 2011 advancing the science of patient safety
Shekelle et. al 2011 advancing the science of patient safetyShekelle et. al 2011 advancing the science of patient safety
Shekelle et. al 2011 advancing the science of patient safetyJoya Smit
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesnQuery
 
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTS
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTSCritical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTS
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTSMulugetaAbeneh1
 

Similar to Assessing Reliability in Rehabilitation Measurements (20)

Enhancing The Reliability of Physician Performance Measures
Enhancing The Reliability of Physician Performance MeasuresEnhancing The Reliability of Physician Performance Measures
Enhancing The Reliability of Physician Performance Measures
 
RESEARCH Open AccessA methodological review of resilience.docx
RESEARCH Open AccessA methodological review of resilience.docxRESEARCH Open AccessA methodological review of resilience.docx
RESEARCH Open AccessA methodological review of resilience.docx
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Oac guidelines
Oac guidelinesOac guidelines
Oac guidelines
 
my reliability
my reliabilitymy reliability
my reliability
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
 
Care expert assistant for Medicare system using Machine learning
Care expert assistant for Medicare system using Machine learningCare expert assistant for Medicare system using Machine learning
Care expert assistant for Medicare system using Machine learning
 
Application of EQ-5D in Reimbursement Decision Making: The Case of NICE
Application of EQ-5D in Reimbursement Decision Making: The Case of NICEApplication of EQ-5D in Reimbursement Decision Making: The Case of NICE
Application of EQ-5D in Reimbursement Decision Making: The Case of NICE
 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervals
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906[Typ]Poster[Sbj]1593Synoptics[Dte]20150906
[Typ]Poster[Sbj]1593Synoptics[Dte]20150906
 
Valid and Reliable ToolsThe goal of an evaluation is to determin.docx
Valid and Reliable ToolsThe goal of an evaluation is to determin.docxValid and Reliable ToolsThe goal of an evaluation is to determin.docx
Valid and Reliable ToolsThe goal of an evaluation is to determin.docx
 
11.forecast analysis of food price inflation in pakistan
11.forecast analysis of food price inflation in pakistan11.forecast analysis of food price inflation in pakistan
11.forecast analysis of food price inflation in pakistan
 
Forecast analysis of food price inflation in pakistan
Forecast analysis of food price inflation in pakistanForecast analysis of food price inflation in pakistan
Forecast analysis of food price inflation in pakistan
 
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
 
Shekelle et. al 2011 advancing the science of patient safety
Shekelle et. al 2011 advancing the science of patient safetyShekelle et. al 2011 advancing the science of patient safety
Shekelle et. al 2011 advancing the science of patient safety
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size Challenges
 
Clinical Prediction Rules
Clinical Prediction RulesClinical Prediction Rules
Clinical Prediction Rules
 
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTS
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTSCritical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTS
Critical appraisal.docx IMPORTAN TO HEALTH SCIENCE STUDENTS
 
01 validity and its type
01 validity and its type01 validity and its type
01 validity and its type
 

More from analisedecurvas

Measuring agreement in method comparison
Measuring agreement in method comparisonMeasuring agreement in method comparison
Measuring agreement in method comparisonanalisedecurvas
 
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...analisedecurvas
 
Duhamel, 2004 statistical tools for clinical gait analysis
Duhamel, 2004   statistical tools for clinical gait analysisDuhamel, 2004   statistical tools for clinical gait analysis
Duhamel, 2004 statistical tools for clinical gait analysisanalisedecurvas
 

More from analisedecurvas (6)

Measuring agreement in method comparison
Measuring agreement in method comparisonMeasuring agreement in method comparison
Measuring agreement in method comparison
 
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...
Kadaba (1989) -_repeatability_of_kinematic,_kinetic_and_elctromiophic_data_in...
 
Duhamel, 2004 statistical tools for clinical gait analysis
Duhamel, 2004   statistical tools for clinical gait analysisDuhamel, 2004   statistical tools for clinical gait analysis
Duhamel, 2004 statistical tools for clinical gait analysis
 
Besier (2003)
Besier (2003)Besier (2003)
Besier (2003)
 
10.1.1.133.8405
10.1.1.133.840510.1.1.133.8405
10.1.1.133.8405
 
Mackey (2005)
Mackey (2005)Mackey (2005)
Mackey (2005)
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Assessing Reliability in Rehabilitation Measurements

  • 1. Authors: Jan E. Lexell, MD, PhD David Y. Downham, PhD Statistics Affiliations: From the Department of Rehabilitation, Lund University Hospital, Lund, Sweden (JEL); the INVITED REVIEW Department of Health Sciences, Lund University, Malmo, Sweden (JEL); the ¨ Department of Health Sciences, ¨ Lulea University of Technology, Boden, Sweden (JEL); and the How to Assess the Reliability of Department of Mathematical Sciences, University of Liverpool, Liverpool, UK (DYD). Measurements in Rehabilitation Correspondence: ABSTRACT All correspondence and requests for reprints should be addressed to Jan Lexell JE, Downham DY: How to assess the reliability of measurements in E. Lexell, MD, PhD, Department of rehabilitation. Am J Phys Med Rehabil 2005;84:719 –723. Rehabilitation, Lund University Hospital, Orupssjukhuset, 22185 To evaluate the effects of rehabilitation interventions, we need reliable measure- Lund, Sweden. ments. The measurements should also be sufficiently sensitive to enable the detection of clinically important changes. In recent years, the assessment of Disclosures: reliability in clinical practice and medical research has developed from the use of Supported, in part, by grants from correlation coefficients to a comprehensive set of statistical methods. In this the Gun and Bertil Stohne review, we present methods that can be used to assess reliability and describe Foundation, Magn. Bergvall Foundation, the Crafoord Foundation, how data from reliability analyses can aid the interpretation of results from the Swedish Society of Medicine, the rehabilitation interventions. Swedish Stroke Association, the Key Words: Reference Values, Reproducibility of Results, Research Design, Statistics Swedish Association of Neurologically Disabled (NHR), the Council for Medical Health Care Research in South Sweden, Lund University ¨ Hospital, and Region Skane. 0894-9115/05/8409-0719/0 I n rehabilitation, measurements are obtained for clinical practice and research purposes. To be clinically and scientifically useful, measurements must be American Journal of Physical Medicine & Rehabilitation reliable. Reliability refers to the reproducibility of measurements. Measure- Copyright © 2005 by Lippincott ments are reliable if they are stable over time and show adequate levels of Williams & Wilkins measurement variability.1 They must also be sufficiently sensitive to detect clinically important changes after rehabilitation interventions. The more sen- DOI: 10.1097/01.phm.0000176452.17771.20 sitive your measurement, the easier it is to detect improvements after inter- ventions or deterioration over time. Reliability in clinical practice and medical research is most commonly determined from measurements of the same subjects on two occasions: so- called retest reliability.2 By applying well-defined statistical methods, physia- trists and clinical scientists can determine whether measurements are suffi- ciently reliable for a particular purpose. In recent years, there has been a growing interest in the statistical methods for the assessment of reliability in clinical practice and medical research. Today, there is general agreement that a comprehensive set of statistical methods is required to address the reliability of measurements.3– 8 In this review, we present the statistical methods that are part of the current approach to the assessment of retest reliability based on continuous data and illustrate them using measurements of isokinetic muscle strength. We also describe how data from reliability analyses can aid the interpretation of results from rehabilitation interventions and the detection of clinically impor- tant changes. In the Appendix, we present the equations that are used to calculate reliability. This provides physiatrists and clinical scientists with a September 2005 Reliability of Measurements in Rehabilitation 719
  • 2. source of reference for the determination of the ple, it can be used with small samples and with data characteristics of measurements and how to inter- from more than two test occasions. There are dif- pret data obtained from rehabilitation interven- ferent types of ICCs available for different study tions. designs, but in practice, their values are often very similar.6,10 The ICC2,1 (Equation 1, Appendix), RETEST CORRELATION COEFFICIENTS which is calculated from a two-way repeated-mea- The most common approach in the assessment sures analysis of variance, covers most situations of retest reliability is to measure a group of sub- and also has the advantage that it provides the basis jects on two occasions, separated by hours or days. for the calculations of some of the other reliability This is done on the same type of subjects or pa- indices. (Reliability can also be determined for or- tients who one wants to use in an intervention. dinal data obtained from, for example, clinical rat- Once the data are obtained, the first step is to plot ing scales. The equivalent to a correlation coeffi- the data. In Figure 1, data on concentric ankle cient is then the kappa coefficient.11) dorsiflexor muscle strength at 30 degrees/sec ob- The data plotted in Figure 1 were analyzed tained from 30 healthy young men and women at using a two-way analysis of variance, and the values two test occasions 7 days apart are presented. The from the analysis of variance table were substituted data come from a larger study in which we evalu- into Equation 1. The value of ICC2,1 is 0.915, which ated the reliability of peak torque, work, and torque is very close to the Pearson’s r. This is often the at a specific time at different angular velocities case when measurements of the same subjects on using the Biodex dynamometer.5 two occasions are analyzed.5 Clearly, the closer the values are to a straight How do we then interpret the values of ICC? As line, the better is the reliability. The usual Pear- a matter of fact, no generally agreed ICC “cut-off” son’s correlation coefficient (Pearson’s r) can be points exist. In their original publication from used to quantify the reliability. The value of Pear- 1979, Shrout and Fleiss9 suggested that specific son’s r is here 0.91; as a value of 1.00 represents values of ICC could be considered to represent perfect agreement and a value of 0.00 no agree- “acceptable,” “good,” and “fair” reliability. Fleiss12 ment, we can already at this stage presume that our later recommended that ICC values of Ͼ0.75 rep- measurements are highly reliable. resent “excellent reliability” and values between 0.4 The intraclass correlation coefficient (ICC) is and 0.75 represent “fair to good reliability.” nowadays the preferred retest correlation coeffi- It is becoming clear that the use of only ICCs cient.9 The ICC has several advantages: for exam- for the analyses of reliability is not sufficient.3,4,7,8 The ICC can give misleading results—for example, if the sample is homogeneous, the value of ICC (and Pearson’s r) may be low. We should therefore avoid assessing reliability just from retest correla- tions and also avoid using global terms, such as “good reliability,” and instead interpret reliability from several statistical methods. CHANGES IN THE MEAN The next step in the reliability analysis is to calculate changes in the mean from the measure- ments obtained from the two test occasions. The change in the mean values between two test occa- sions can consist of two components: a random change and a systematic change. A random change is often referred to as the “sampling error” and comes from the variability in the actual test situa- tion. It can be due to the variability in the equip- ment or method used and to the inherent biolog- ical variability. A systematic change is a nonrandom change in the mean values between the two test occasions. This occurs if the subject or FIGURE 1 To illustrate the analysis of reliability, measurements of concentric ankle dorsi- patient systematically performs better (or worse) flexor muscle strength at 30 degrees/sec on the second test occasion as a result of, for are used. The data were obtained from 30 example, a change in behavior, a learning effect, or healthy young men and women at two fatigue if performance is measured. test occasions 7 days apart.5 To detect a systematic change, several indices 720 Lexell and Downham Am. J. Phys. Med. Rehabil. ● Vol. 84, No. 9
  • 3. can be calculated (Equations 2– 4). Based on these calculations, a 95% confidence interval for the mean difference between the two test occasions can be formed (Equation 5). This will allow you to determine if there is a true systematic difference between the two test occasions. If the mean value is positive or negative, the measurements from one test occasion tend to be larger or smaller than those from the other occasion. When the 95% confidence interval does not include zero, this in- dicates a systematic change in the mean, for exam- ple, due to a significant learning effect. If this happens, one should choose tests or equipment that minimize this learning effect or let the sub- jects familiarize themselves before the real trials. This is less of a problem in a controlled study if both groups are equally affected by systematic FIGURE 2 Bland–Altman plot for the data in Figure changes in the mean. 1, with the approximate 95% confidence Using the data from Figure 1, the indices rep- interval (dashed line) of the mean differ- resenting the changes in the mean are calculated ence (solid line) between the two test from Equations 2–5 and are presented in Table 1. occasions. The mean difference between the two test occa- sions is here positive (test occasion 2 minus 1), indicating that the isokinetic muscle strength at the second test occasion tended to be larger than at confidence interval, and that there are no apparent the first occasion. However, as zero is well within systematic biases or outliers in the data. the 95% confidence interval, there is no significant change in the mean muscle strength between the MEASUREMENT VARIABILITY two test occasions, and we can therefore conclude Once we have established if there are any sys- that there is no systematic change between the two tematic changes in the mean, we want to quantify test occasions. the actual size of the variability between the mea- The next step is then to present the data graph- surements obtained from the two test occasions. ically in so-called Bland–Altman plots.2 In these This is often referred to as the “within-subject plots, the differences between measurements from variation,” “typical error,” or “typical variation”7 and is one of the more important reliability indices. the two test occasions are plotted against the mean Quite naturally, a change after any intervention of the two test occasions for each subject, and any will be easier to detect if we have established that systematic bias or outliers can be seen. The Bland– the variability between measurements from two Altman plot for the data in Figure 1 is presented in test occasions is small.3,7 Figure 2; the mean difference in muscle strength A simple index of the measurement variability between the two test occasions (test occasion 2 is the standard deviation of the differences between minus 1) and the 95% confidence interval are also the two test occasions (SDdiff) (Equation 3). Divid- included. It is clearly seen that the mean difference ing the SDdiff by ͌2 yields the “method error” is close to zero, that zero is included in the 95% (ME) (Equation 6).13 An alternative index is the “standard error of measurement” (SEM). There are different ways to calculate SEM; a preferred way is TABLE 1. Indices of changes in the mean to take the square root of the mean square error between two test occasions term (WMS) from the analysis of variance (Equa- tion 7). If the sample size is sufficiently large and Mean difference between two 0.03 the mean difference small, both highly likely con- ៮ test occasions (d) ditions, ME and SEM take similar values. These Standard deviation of the 2.88 values are, on their own, not easily interpreted differences between two because we do not know how much of the variabil- test occasions (SDdiff) ity comes from the change in the mean and how Standard error of ៮ (SE) d 0.53 95% confidence interval Ϫ1.04 to 1.10 much is due to the typical variation.7 This can be overcome by expressing the mea- of ៮ (95% CI) d surement variability as a coefficient of variation. The ME or the SEM is then divided by the mean of September 2005 Reliability of Measurements in Rehabilitation 721
  • 4. all the measurements and multiplied by 100 to give ence range or “error band”14 around the mean a percentage value. These indices—the CV% difference of the measurements from the two test (Equation 8) and the SEM% (Equation 9)—are occasions, a 95% SRD can be calculated (Equation independent of the units of measurements and are 11). The SRD is then divided by the mean of all the therefore more easily interpreted. The CV% and measurements and multiplied by 100 to give a SEM% indicate the typical variation expressed as a percentage value, the SRD% (Equation 12).15 The percentage value and can be used to interpret the SRD%, like the SEM% and CV%, is independent of results of an improvement after an intervention. the units of measurements and therefore more Substituting the data from Figure 1 into Equa- easily interpreted. tions 6 –9, we can calculate the values of ME, SEM, To form practically useful 95% SRD and CV%, and SEM%. As can be seen in Table 2, ME SRD%, a fairly large sample size is required. This and SEM and CV% and SEM% are very similar. We brings us to the question of how many subjects or can then determine that with a CV% of 6.36 and a patients you need to determine the reliability of a mean isokinetic muscle strength of 31.9 Nm (based measurement. In comparison with clinical trials, on the data in Fig. 1), the average person’s muscle sample sizes have received little attention in the strength has a typical variation from one test oc- reliability literature. Some scientists have recom- casion to the other of 2.03 Nm. An improvement mended specific sample sizes, but these recom- after an intervention that is smaller than the typ- mendations are usually based on practical experi- ical variation does not, in most situations, indicate a clinically important improvement. ence. A sample size of 15–20 is often used in reliability studies with continuous data.12 Larger CLINICALLY IMPORTANT CHANGES sample sizes, 30 –50, have been suggested more To evaluate more specifically if a measurement recently7 and would be required to form practically represents a clinically important change, we can useful 95% SRD and SRD%. use the data from the two test occasions to calcu- The values of SRD, 95% SRD, and SRD% are late an interval, which represents the 95% likely then calculated from Equations 10 –12 using the range for the difference between a subject’s mea- data from Figure 1 and are presented in Table 3. surements from these two test occasions. This is a The SRD% is 18.1, which indicates that a measure- form of “reference range” that can be used to ment should exceed that value to indicate a clini- determine if differences between measurements cally important change. Taking the mean value of subsequently obtained before and after “real” inter- the isokinetic muscle strength in Figure 1 (31.9 ventions represent clinically important changes. If Nm), this has to change 5.7 Nm to indicate a the difference for a subject or patient is outside (or clinically important improvement in strength. within) this reference range, it does (or does not) Having done all this, one should remember represent a clinically important change. It is clear that there is an essential difference between the that the smaller the reference range, the more reliability indices described here and clinically im- sensitive are the measurements, and also that the portant changes. The reliability indices describe measurements can be considered highly reliable, the clinometric property of a measurement, but the reference range is too wide to be clinically whereas clinically important changes are more ar- or scientifically useful. bitrarily chosen values that physiatrists and clinical The “smallest real difference” (SRD), intro- scientists judge as minimally (and clinically) im- duced by Beckerman et al.14 is one way to evaluate portant. An interesting area for future research is clinically important changes. The SRD is similar to to explore the clinometric property of measure- the “limits of agreement” proposed by Bland and ments and how that corresponds to what physia- Altman.4 The SRD (Equation 10) is formed by tak- trists and clinical scientists judge as clinically im- ing the measurement variability, represented by portant. Such research will help us to define the SEM, and multiplying it by ͌2 and by 1.96 to optimal outcome measures for clinical practice and include 95% of the observations of the difference research purposes. between the two measurements. To obtain a refer- TABLE 2. Indices of measurement variability TABLE 3. Indices of clinically important changes Method error (ME) 2.03 Standard error of the measurement (SEM) 2.01 Smallest real difference (SRD) 5.81 Coefficient of variation (CV%) 6.36 Smallest real difference (95% SRD) Ϫ5.78 to 5.84 Standard error of the measurement (SEM%) 6.30 Smallest real difference (SRD%) 18.1% 722 Lexell and Downham Am. J. Phys. Med. Rehabil. ● Vol. 84, No. 9
  • 5. APPENDIX The SEM% is defined by Retest Correlation Coefficients SEM% ϭ (SEM/mean) ϫ 100 [9] The intraclass correlation coefficient (ICC2,1) for n subjects and for two test occasions is defined where mean in Equations 8 and 9 is the mean of all by the data from the two test occasions. ICC2,1 ϭ (BMS Ϫ EMS)/ CLINICALLY IMPORTANT CHANGES (BMS ϩ EMS ϩ 2[JMS Ϫ EMS]/n) [1] The smallest real difference (SRD) is defined by where BMS is the between-subjects mean square, SRD ϭ 1.96 ϫ SEM ϫ ͌2 [10] EMS is the residual mean square, JMS is the with- in-subjects mean square, all obtained from the The 95% SRD is defined by two-way analysis of variance, and n is the number of subjects. ៮ 95% SRD ϭ d Ϯ SRD [11] CHANGES IN THE MEAN The SRD% is defined by To evaluate changes in the mean between two SRD% ϭ (SRD/mean) ϫ 100 [12] test occasions, ៮ is defined by d where mean in Equation 12 is the mean of all the ៮ d ϭ the mean difference between the two test data from the two test occasions. occasions [2] REFERENCES ៮ is assessed by The variation about d 1. Rothstein JM: Measurement and clinical practise: Theory and application, in Rothstein JM (ed): Measurement in SDdiffϭ Physical Therapy. New York, Churchill Livingstone, 1985 the standard deviation of the differences [3] 2. Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. The variation of ៮ is assessed by the standard d Lancet 1986;1:307–10 error of the mean (SE) 3. Atkinson G, Nevill AM: Statistical methods for assessing measurement error (reliability) in variables relevant to SE ϭ SDdiff/͌n [4] sports medicine. Sports Med 1998;26:217–38 4. Bland JM, Altman DG: Measuring agreement in method where n is the number of subjects. SDdiff represents comparison studies. Stat Methods Med Res 1999;8:135–60 the variability of the differences between the two 5. Holmback AM, Porter MM, Downham DY, et al: Reliability ¨ test occasions and SE represents the precision of ៮d of isokinetic ankle dorsiflexor strength measurements in healthy young men and women. Scand J Rehabil Med as an estimate of the underlying change in the 1999;31:229–39 mean. An approximate 95% confidence interval 6. Holmback AM, Porter MM, Downham DY, et al: Ankle ¨ ៮ (95% CI) of the mean difference (d) between the dorsiflexor muscle performance in healthy young men and women: Reliability of eccentric peak torque and work. two test occasions is defined by J Rehabil Med 2001;33:90–6 95% CI ϭ ៮ Ϯ 2 ϫ SE d [5] 7. Hopkins WG: Measures of reliability in sports medicine and science. Sports Med 2000;30:1–15 The multiplier of SE in Equation 5 depends on 8. Rankin G, Stokes M: Reliability of assessment tools in the number of subjects, n; 2 is a good approxima- rehabilitation: An illustration of appropriate statistical anal- yses. Clin Rehabil 1998;12:187–99 tion when n is Ͼ20. For the data in Figure 1, it 9. Shrout PE, Fleiss JL: Intraclass correlations: Uses in assess- should be 2.045, which is obtained from the t-table ing rater reliability. Psychol Bull 1979;86:420–8 with 29 (n Ϫ 1) degrees of freedom. 10. Baumgartner TA: Norm-referenced measurement: Reliabil- ity, in Safrit M, Wood T (eds): Measurement Concepts in MEASUREMENT VARIABILITY Physical Education and Exercise Science. Champaign, IL, Human Kinetics, 1989, pp 45–72 The method error (ME) is defined by 11. Sim J, Wright CC: The kappa statistic in reliability studies: ME ϭ SDdiff/͌2 [6] Use, interpretation, and sample size requirements. Phys Ther 2005;85:257–68 where SDdiff comes from Equation 3. 12. Fleiss JL: The Design and Analysis of Clinical Experiments. The standard error of measurement (SEM) is New York, John Wiley Sons, 1986, pp 1–31 defined by 13. Portney LG, Watkins MP: Statistical measures of reliability, in: Foundations of Clinical Research: Applications to Prac- SEM ϭ ͌WMS [7] tice . Englewood Cliffs, NJ, Prentice Hall, 1993, pp 525–6 14. Beckerman H, Roebroeck ME, Lankhorst GJ, et al: Smallest where WMS is the mean square error term from real difference: A link between reproducibility and respon- the analysis of variance. siveness. Qual Life Res 2001;10:571–8 The coefficient of variation (CV%) is defined by 15. Flansbjer UB, Holmback AM, Downham DY, et al: Reliability ¨ of gait performance test in men and women with hemipa- CV% ϭ (ME/mean) ϫ 100 [8] resis after stroke. J Rehabil Med 2005;37:75–82 September 2005 Reliability of Measurements in Rehabilitation 723