1 Reliability and Validity in Physical Therapy Tests

Reliability and Validity in
Physical Therapy Tests

Lecture I
Dr. Amal HM. Ibrahim
Professor of Physical Therapy

aebrahim123@hotmail.com

OBJECTIVES
• Levels of measurements
• Define validity and reliability
• Understand the purpose for needing valid
and reliable measures
• Know the most utilized and important
types of validity seen in assessment
• Know the most utilized and important
types of reliability seen in assessment

Levels of Measurements
• Physiotherapist deal with measurements.
• Measurement is the process of observing
and recording the observations that are
collected as part of a research effort.
• Levels of measurements are categorized
for measuring variables.


INTRODUCTION
• Examination of physical therapy practice demonstrates the
growing importance of measurement. Walking through a
physical therapy clinic, you may observe a patient's range of
motion being measured, or you may see a therapist testing
the inspiratory capacity of a patient. Other therapists may be
measuring the developmental status of a child or the
accessory motion of the knee joint in a postsurgical patient.
Still other therapists may be measuring the functional status
of a patient with hemiplegia.


INTRODUCTION
• Physical therapists need to obtain measurements
because they make decisions, offer consultative
opinions and document changes in patient
status. The physical therapy evaluation is the
foundation for the measurement of the outcome
of our therapeutic intervention and we must
measure these outcomes. Quality assurance
studies with an outcome focus can provide a
measure of our progress toward achieving that
goal.



Why Level of Measurement is Important?
• the level of measurement helps to decide how to
interpret the data from that variable.
• knowing the level of measurement helps to
decide what statistical analysis is appropriate on
the values that were assigned. If a measure is
nominal, then you know that you would never
average the data values or do a t-test on the data



• From the least to the most sensitive, the
scales are”
1- Nominal
2- Ordinal.
3- Interval
4- Ratio.



In nominal measurement the numerical values just "name" the attribute
uniquely


Nominal Level of Measurements

• It is the first level of measurement. At the
nominal level the numerical values just
“name”, so they can’t be added or
subtracted or ordered or subjected to any
arithmetic process. But the numbers in
each category can be counted.


• A clinical example would be:
• - Classify a group of patients into right
handed and left handed.
• - Classify arthritic patients into
osteoarthritis and rheumatoid arthritis.
• - Classify blood groups where the letter A,
B, O, and AB represent the different
classes.


• - We can classify our observations into the
categories "females" and "males," with 1
representing females and 2 representing males.
We could use any of a variety of symbols to
represent the different categories of a nominal
variable; however, when numbers are used to
represent the different categories, we do not
imply anything about the magnitude or
quantitative difference between the categories.


Second Levels of
Measurements

In ordinal measurement the attributes can be rank-ordered


• In this level, magnitude is added for
categorization. Ordinal numbers do not
indicate more than rank order of the
objects. The numbers do not imply
definite magnitude, nor do they imply
that the categories are the same, in terms
of the quantity that they represent


Distances between orders do not have any
meaning. It does not imply that the intervals
between the numbers are equal.

• The values of ordinal measurements can be
summarized by frequency of occurrence, by
percentage of the whole or by counting the
members in the category. Ordinal
measurement level is not appropriate for
arithmetical computation. Simply ordinal
level of measurement can be extension to a,
b, c...,n, in which it indicates that a > b > c> ...
n, in some property.

Clinical Examples
• In manual muscle testing we know that muscle
with grade 5 is stronger than muscle with grade
4, and muscle with grade 4 is stronger than
muscle with grade 3. So numbers on an ordinal
scale represent a rough and ready ordering of
measurements but the difference or ratios
between any two measurements (grade 5 and 4,
or 4 and 3) represented along the scale will not
be the same.


Clinical Examples
• As for nominal scale, with ordinal scales you can
use textual labels instead of numbers to
represent the categories. In muscle testing we
can use normal, good, fair instead of 5,4, and 3
grades.
• For example, in pain scale with 5 possible levels,
it is not possible to equal the difference between
"no" and "slight pain" to the difference between
"severe" and "intolerable pain." These
descriptors are subject to a wide range of
interpretations.

Third Levels of
Measurements

In interval measurement the distance between attributes does have meaning.


• This measurement includes all the qualities
of the ordinal level measurements and also
includes units that are equal in size. Also the
distances between levels are equal. This
permits the use of arithmetic operations and
the zero point is arbitrary (e.g. centigrade
temperature measurement, where zero does
not indicate the absence of heat but rather is
an arbitrary point).


Example
• Examples of interval data include the
measurement of temperature in degrees
(Celsius or Fahrenheit). The two
temperature scales have zero at two points
on their respective scales. Fresh water
freezes at 0°on the Celsius scale and at
32°Fahrenheit scale. The temperature at
which salt water freezes is arbitrarily
designed as 0° on the Fahrenheit scale.

Fourth Level of
Measurements

in ratio measurement there is always an absolute zero that is meaningful.


• The ratio scale is a fixed relation in degree
or number between two similar things.
The major difference between the two data
classification of interval and ratio is that
ratio data has absolute zero. Ratio data is
the most frequently used class by
healthcare professional who deals with
patients’ physical attributes.


Examples

• Examples of ratio data include
height, weight, velocity, distance,
heart rate, VO2 Max, force, torque,
etc. with this latter classification, all
mathematical operations are valid.


RELIABILITY
The consistency of measurements

A RELIABLE TEST
Produces similar scores across
various conditions and situations,
including different evaluators and
testing environments.


When a Measurement Procedure yields
consistent scores when the phenomenon
being measured is not changing.
Degree to which scores are free of
“measurement error”
Consistency of measurement


• Valid=faithful, true
•
• What is assessed is indeed what is
intended to be assessed
• Denotes the extent to which an
instrument is measuring what it is
supposed to measure.


• Necessary but not sufficient
• Reliability is a prerequisite for
measurement validity
• One needs reliability, but it’s not
enough


• Measuring height with reliable
bathroom scale
• Measuring “aggression” with observer
agreement by observing a kid hitting a
Bobo doll


• Inter-Rater or Inter-Observer
Reliability
• Test-Retest Reliability
• Parallel-Forms Reliability
• Internal Consistency Reliability


• Used to assess the
degree to which
different
raters/observers
give consistent
estimates of the
same
phenomenon.

• So how do we
determine
whether two
observers are
being consistent
in their
observations?


• The degree of agreement
between the scores from two
raters following observation
and rating of the same subject;
correlation of .85 or higher are
expected to compare the
objective competency between
two raters of the same testing
condition.

Intra-rater reliability
• Means that one person should come
out with the same results on every
repetition of the test, within
acceptable level.
• Consistency in measurement and
scoring by the evaluator when two
tests results from two similar
situations are correlated.

• Test-retest
• SAME TEST –
DIFFERENT
TIMES
• Testing
phenomenon at
two different
times. Used to
assess the
consistency of a
measure from one
time to another.


• This approach
assumes that
there is no
substantial
change in the
construct being
measured
between the
two occasions.


• The amount of time allowed between
measures is critical. We know that if we
measure the same thing twice that the
correlation between the two
observations will depend in part by how
much time elapses between the two
measurement occasions.
• The shorter the time gap, the higher the
correlation; the longer the time gap, the
lower the correlation.


• Used to assess the consistency of the results of
two tests constructed in the same way from the
same content domain.


• In parallel forms reliability
you first have to create two
parallel forms. One way to
accomplish this is to create
a large set of questions that
address the same construct
and then randomly divide
the questions into two sets.
You administer both
instruments to the same
sample of people.


• Useful when multiple equivalent forms of the same
test are needed; particularly useful when one's
response to the earlier test items can easily be recalled
and influence the responses on the second tests after a
lapse of time (alternate); while the forms contain
different questions, similar items on each test are
expected to have items equality, making the test equal
at a given point in time (parallel); correlation should
be {.80}.


• Used to assess the consistency of
results across items within a test.
• (Internal consistency): The
association of answers to a set of
questions designed to measure the
same concept.

• In internal consistency reliability
estimation we use our single measurement
instrument administered to a group of
people on one occasion to estimate
reliability.
• In effect we judge the reliability of the
instrument by estimating how well the
items that reflect the same construct yield
similar results.


• When ratings are by an observer rather
than the subjects themselves, this is called
Intraobserver Reliability or Intrarater
Reliability.
• Answers about the past are less reliable
when they are very specific, because the
questions may exceed the subjects’ capacity
to remember accurately.


• Construct validity
Translation validity
• Face validity
• Content validity
Criterion-related validity
• Predictive validity
• Concurrent validity
• Convergent validity
• Discriminant validity


• Construct validity is the
approximate truth of the conclusion
that your operationalization
accurately reflects its construct.


• Face Validity
• confidence gained from careful inspection of a
concept to see if it’s appropriate “on its face”
• In face validity, you look at the test items and
see whether "on its face" it seems like a good
translation of the construct. This is probably
the weakest way to try to demonstrate
construct validity.


• Also called “sampling validity”
• Establishes that the measure covers the full
range of the concept’s meaning, i.e., covers all
dimensions of a concept
• In content validity, you essentially check the
test items against the relevant content domain
for the construct. This approach assumes that
you have a good detailed description of the
content domain


• Actually I think face and content
validity are probably Same Thing


EMPIRICAL Validity
• Establishes that the results from one
measure match those obtained with a
more direct or already validated
measure of the same phenomenon (the
“criterion”)
• Includes
Concurrent
Predictive


Concurrent Validity
• Validity exists when a measure yields scores that
are closely related to scores on a criterion
measured at the same time
• Does the new instrument correlate highly with
an old measure of the same concept that we
assume (judge) to be valid? (use of “good”
judgment).
• The extent of agreement between two
simultaneous measures of the same behavior or
trials.

• Exits when a measure is validated by
predicting scores on a criterion measured
in the future
• Are future events which we judge to be a
result of the concept we’re measuring
anticipated [predicted] by the scores we’re
attempting to validate
• Use of “good” judgment


Construct validity
• Established by showing that a
measure is
(1) Related to a variety of other
measures as specified in a theory, used
when no clear criterion exists for
validation purposes
(2) That the test items has a set of
interrelated items
(3) That the operationalization has not
included separate concepts

• Check the intercorrelation of items used
to measure construct judged to be valid
• Use theory to predict a relationship and
use a judged to be valid measure of the
other variable then check for
relationship
• Demonstrate that your measure isn’t
related to judged to be valid measures
of unrelated concepts


• Convergent validity: achieved when one
measure of a concept is associated with
different types of measures in the same
concept (this relies on the same type of
logic as measurement triangulation)
• Measures intercorrelated


• Discriminant validity: scores on the
measure to be validated are compared to
scores on measures of different but
related concepts and discriminant
validity is achieved if the measure to be
validated is NOT strongly associated
with the measures of different concepts
• Measure not related to unrelated
concepts


References

1. Rothstein JM, Campbell SK, Ekhternach JL , Jlette AM , Knecht
HG , Rose SJ. Standards for tests and measurements in physical
therapy. Physical Therapy 1991;71( 8):590-622
2. Levels of measurement by Heather Wharrad. 2004 UCel collective.
Downloaded from:
http://www.ucel.ac.uk/showroom/levels_of_measurement/downloa
ds/levels_notes.pdf
3. level of measurements refresher. Downloaded from
http://courses.csusm.edu/soc201kb/levelofmeasurementrefresher
.htm
4. Hinojosa, J. & Kramer, P. (1998). Occupational therapy evaluation:
Obtaining and interpreting data. Betheida, MD: American
Occupational Therapy Association

Questions ???????????


1 Reliability and Validity in Physical Therapy Tests

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 1 Reliability and Validity in Physical Therapy Tests

Semelhante a 1 Reliability and Validity in Physical Therapy Tests (20)

Último

Último (20)

1 Reliability and Validity in Physical Therapy Tests