This document discusses reliability and validity in physical therapy tests. It begins by defining levels of measurement, including nominal, ordinal, interval and ratio scales. It then defines reliability as the consistency of measurements and validity as measuring what is intended. The document discusses various types of reliability, including inter-rater, test-retest, parallel-forms and internal consistency. It also discusses different types of validity such as face, content, concurrent, predictive and construct validity.
Chandrapur Call girls 8617370543 Provides all area service COD available
1 Reliability and Validity in Physical Therapy Tests
1. Reliability and Validity in
Physical Therapy Tests
Lecture I
Dr. Amal HM. Ibrahim
Professor of Physical Therapy
aebrahim123@hotmail.com
2. OBJECTIVES
• Levels of measurements
• Define validity and reliability
• Understand the purpose for needing valid
and reliable measures
• Know the most utilized and important
types of validity seen in assessment
• Know the most utilized and important
types of reliability seen in assessment
aebrahim123@hotmail.com
3. Levels of Measurements
• Physiotherapist deal with measurements.
• Measurement is the process of observing
and recording the observations that are
collected as part of a research effort.
• Levels of measurements are categorized
for measuring variables.
aebrahim123@hotmail.com
4. INTRODUCTION
• Examination of physical therapy practice demonstrates the
growing importance of measurement. Walking through a
physical therapy clinic, you may observe a patient's range of
motion being measured, or you may see a therapist testing
the inspiratory capacity of a patient. Other therapists may be
measuring the developmental status of a child or the
accessory motion of the knee joint in a postsurgical patient.
Still other therapists may be measuring the functional status
of a patient with hemiplegia.
aebrahim123@hotmail.com
5. INTRODUCTION
• Physical therapists need to obtain measurements
because they make decisions, offer consultative
opinions and document changes in patient
status. The physical therapy evaluation is the
foundation for the measurement of the outcome
of our therapeutic intervention and we must
measure these outcomes. Quality assurance
studies with an outcome focus can provide a
measure of our progress toward achieving that
goal.
aebrahim123@hotmail.com
6. Levels of Measurements
Why Level of Measurement is Important?
• the level of measurement helps to decide how to
interpret the data from that variable.
• knowing the level of measurement helps to
decide what statistical analysis is appropriate on
the values that were assigned. If a measure is
nominal, then you know that you would never
average the data values or do a t-test on the data
aebrahim123@hotmail.com
7. Levels of Measurements
• From the least to the most sensitive, the
scales are”
1- Nominal
2- Ordinal.
3- Interval
4- Ratio.
aebrahim123@hotmail.com
8. Levels of Measurements
In nominal measurement the numerical values just "name" the attribute
uniquely
aebrahim123@hotmail.com
9. Nominal Level of Measurements
• It is the first level of measurement. At the
nominal level the numerical values just
“name”, so they can’t be added or
subtracted or ordered or subjected to any
arithmetic process. But the numbers in
each category can be counted.
aebrahim123@hotmail.com
10. Nominal Level of Measurements
• A clinical example would be:
• - Classify a group of patients into right
handed and left handed.
• - Classify arthritic patients into
osteoarthritis and rheumatoid arthritis.
• - Classify blood groups where the letter A,
B, O, and AB represent the different
classes.
aebrahim123@hotmail.com
11. Nominal Level of Measurements
• - We can classify our observations into the
categories "females" and "males," with 1
representing females and 2 representing males.
We could use any of a variety of symbols to
represent the different categories of a nominal
variable; however, when numbers are used to
represent the different categories, we do not
imply anything about the magnitude or
quantitative difference between the categories.
aebrahim123@hotmail.com
12. Second Levels of
Measurements
In ordinal measurement the attributes can be rank-ordered
aebrahim123@hotmail.com
13. • In this level, magnitude is added for
categorization. Ordinal numbers do not
indicate more than rank order of the
objects. The numbers do not imply
definite magnitude, nor do they imply
that the categories are the same, in terms
of the quantity that they represent
aebrahim123@hotmail.com
14. Distances between orders do not have any
meaning. It does not imply that the intervals
between the numbers are equal.
• The values of ordinal measurements can be
summarized by frequency of occurrence, by
percentage of the whole or by counting the
members in the category. Ordinal
measurement level is not appropriate for
arithmetical computation. Simply ordinal
level of measurement can be extension to a,
b, c...,n, in which it indicates that a > b > c> ...
n, in some property.
aebrahim123@hotmail.com
15. Clinical Examples
• In manual muscle testing we know that muscle
with grade 5 is stronger than muscle with grade
4, and muscle with grade 4 is stronger than
muscle with grade 3. So numbers on an ordinal
scale represent a rough and ready ordering of
measurements but the difference or ratios
between any two measurements (grade 5 and 4,
or 4 and 3) represented along the scale will not
be the same.
aebrahim123@hotmail.com
16. Clinical Examples
• As for nominal scale, with ordinal scales you can
use textual labels instead of numbers to
represent the categories. In muscle testing we
can use normal, good, fair instead of 5,4, and 3
grades.
• For example, in pain scale with 5 possible levels,
it is not possible to equal the difference between
"no" and "slight pain" to the difference between
"severe" and "intolerable pain." These
descriptors are subject to a wide range of
interpretations.
aebrahim123@hotmail.com
17. Third Levels of
Measurements
In interval measurement the distance between attributes does have meaning.
aebrahim123@hotmail.com
18. • This measurement includes all the qualities
of the ordinal level measurements and also
includes units that are equal in size. Also the
distances between levels are equal. This
permits the use of arithmetic operations and
the zero point is arbitrary (e.g. centigrade
temperature measurement, where zero does
not indicate the absence of heat but rather is
an arbitrary point).
aebrahim123@hotmail.com
19. Example
• Examples of interval data include the
measurement of temperature in degrees
(Celsius or Fahrenheit). The two
temperature scales have zero at two points
on their respective scales. Fresh water
freezes at 0°on the Celsius scale and at
32°Fahrenheit scale. The temperature at
which salt water freezes is arbitrarily
designed as 0° on the Fahrenheit scale.
aebrahim123@hotmail.com
20. Fourth Level of
Measurements
in ratio measurement there is always an absolute zero that is meaningful.
aebrahim123@hotmail.com
21. • The ratio scale is a fixed relation in degree
or number between two similar things.
The major difference between the two data
classification of interval and ratio is that
ratio data has absolute zero. Ratio data is
the most frequently used class by
healthcare professional who deals with
patients’ physical attributes.
aebrahim123@hotmail.com
22. Examples
• Examples of ratio data include
height, weight, velocity, distance,
heart rate, VO2 Max, force, torque,
etc. with this latter classification, all
mathematical operations are valid.
aebrahim123@hotmail.com
23. RELIABILITY
The consistency of measurements
A RELIABLE TEST
Produces similar scores across
various conditions and situations,
including different evaluators and
testing environments.
aebrahim123@hotmail.com
24. When a Measurement Procedure yields
consistent scores when the phenomenon
being measured is not changing.
Degree to which scores are free of
“measurement error”
Consistency of measurement
aebrahim123@hotmail.com
25. • Valid=faithful, true
•
• What is assessed is indeed what is
intended to be assessed
• Denotes the extent to which an
instrument is measuring what it is
supposed to measure.
aebrahim123@hotmail.com
26. • Necessary but not sufficient
• Reliability is a prerequisite for
measurement validity
• One needs reliability, but it’s not
enough
aebrahim123@hotmail.com
27. • Measuring height with reliable
bathroom scale
• Measuring “aggression” with observer
agreement by observing a kid hitting a
Bobo doll
aebrahim123@hotmail.com
29. • Used to assess the
degree to which
different
raters/observers
give consistent
estimates of the
same
phenomenon.
aebrahim123@hotmail.com
30. • So how do we
determine
whether two
observers are
being consistent
in their
observations?
aebrahim123@hotmail.com
31. • The degree of agreement
between the scores from two
raters following observation
and rating of the same subject;
correlation of .85 or higher are
expected to compare the
objective competency between
two raters of the same testing
condition.
aebrahim123@hotmail.com
32. Intra-rater reliability
• Means that one person should come
out with the same results on every
repetition of the test, within
acceptable level.
• Consistency in measurement and
scoring by the evaluator when two
tests results from two similar
situations are correlated.
aebrahim123@hotmail.com
33. • Test-retest
• SAME TEST –
DIFFERENT
TIMES
• Testing
phenomenon at
two different
times. Used to
assess the
consistency of a
measure from one
time to another.
aebrahim123@hotmail.com
34. • This approach
assumes that
there is no
substantial
change in the
construct being
measured
between the
two occasions.
aebrahim123@hotmail.com
35. • The amount of time allowed between
measures is critical. We know that if we
measure the same thing twice that the
correlation between the two
observations will depend in part by how
much time elapses between the two
measurement occasions.
• The shorter the time gap, the higher the
correlation; the longer the time gap, the
lower the correlation.
aebrahim123@hotmail.com
36. • Used to assess the consistency of the results of
two tests constructed in the same way from the
same content domain.
aebrahim123@hotmail.com
37. • In parallel forms reliability
you first have to create two
parallel forms. One way to
accomplish this is to create
a large set of questions that
address the same construct
and then randomly divide
the questions into two sets.
You administer both
instruments to the same
sample of people.
aebrahim123@hotmail.com
38. • Useful when multiple equivalent forms of the same
test are needed; particularly useful when one's
response to the earlier test items can easily be recalled
and influence the responses on the second tests after a
lapse of time (alternate); while the forms contain
different questions, similar items on each test are
expected to have items equality, making the test equal
at a given point in time (parallel); correlation should
be {.80}.
aebrahim123@hotmail.com
39. • Used to assess the consistency of
results across items within a test.
• (Internal consistency): The
association of answers to a set of
questions designed to measure the
same concept.
aebrahim123@hotmail.com
40. • In internal consistency reliability
estimation we use our single measurement
instrument administered to a group of
people on one occasion to estimate
reliability.
• In effect we judge the reliability of the
instrument by estimating how well the
items that reflect the same construct yield
similar results.
aebrahim123@hotmail.com
41. • When ratings are by an observer rather
than the subjects themselves, this is called
Intraobserver Reliability or Intrarater
Reliability.
• Answers about the past are less reliable
when they are very specific, because the
questions may exceed the subjects’ capacity
to remember accurately.
aebrahim123@hotmail.com
43. • Construct validity is the
approximate truth of the conclusion
that your operationalization
accurately reflects its construct.
aebrahim123@hotmail.com
44. • Face Validity
• confidence gained from careful inspection of a
concept to see if it’s appropriate “on its face”
• In face validity, you look at the test items and
see whether "on its face" it seems like a good
translation of the construct. This is probably
the weakest way to try to demonstrate
construct validity.
aebrahim123@hotmail.com
45. • Also called “sampling validity”
• Establishes that the measure covers the full
range of the concept’s meaning, i.e., covers all
dimensions of a concept
• In content validity, you essentially check the
test items against the relevant content domain
for the construct. This approach assumes that
you have a good detailed description of the
content domain
aebrahim123@hotmail.com
46. • Actually I think face and content
validity are probably Same Thing
aebrahim123@hotmail.com
47. EMPIRICAL Validity
• Establishes that the results from one
measure match those obtained with a
more direct or already validated
measure of the same phenomenon (the
“criterion”)
• Includes
Concurrent
Predictive
aebrahim123@hotmail.com
48. Concurrent Validity
• Validity exists when a measure yields scores that
are closely related to scores on a criterion
measured at the same time
• Does the new instrument correlate highly with
an old measure of the same concept that we
assume (judge) to be valid? (use of “good”
judgment).
• The extent of agreement between two
simultaneous measures of the same behavior or
trials.
aebrahim123@hotmail.com
49. • Exits when a measure is validated by
predicting scores on a criterion measured
in the future
• Are future events which we judge to be a
result of the concept we’re measuring
anticipated [predicted] by the scores we’re
attempting to validate
• Use of “good” judgment
aebrahim123@hotmail.com
50. Construct validity
• Established by showing that a
measure is
(1) Related to a variety of other
measures as specified in a theory, used
when no clear criterion exists for
validation purposes
(2) That the test items has a set of
interrelated items
(3) That the operationalization has not
included separate concepts
aebrahim123@hotmail.com
51. • Check the intercorrelation of items used
to measure construct judged to be valid
• Use theory to predict a relationship and
use a judged to be valid measure of the
other variable then check for
relationship
• Demonstrate that your measure isn’t
related to judged to be valid measures
of unrelated concepts
aebrahim123@hotmail.com
52. • Convergent validity: achieved when one
measure of a concept is associated with
different types of measures in the same
concept (this relies on the same type of
logic as measurement triangulation)
• Measures intercorrelated
aebrahim123@hotmail.com
53. • Discriminant validity: scores on the
measure to be validated are compared to
scores on measures of different but
related concepts and discriminant
validity is achieved if the measure to be
validated is NOT strongly associated
with the measures of different concepts
• Measure not related to unrelated
concepts
aebrahim123@hotmail.com
54. References
1. Rothstein JM, Campbell SK, Ekhternach JL , Jlette AM , Knecht
HG , Rose SJ. Standards for tests and measurements in physical
therapy. Physical Therapy 1991;71( 8):590-622
2. Levels of measurement by Heather Wharrad. 2004 UCel collective.
Downloaded from:
http://www.ucel.ac.uk/showroom/levels_of_measurement/downloa
ds/levels_notes.pdf
3. level of measurements refresher. Downloaded from
http://courses.csusm.edu/soc201kb/levelofmeasurementrefresher
.htm
4. Hinojosa, J. & Kramer, P. (1998). Occupational therapy evaluation:
Obtaining and interpreting data. Betheida, MD: American
Occupational Therapy Association
aebrahim123@hotmail.com