Day 1
Day 2
Day 3
Report 1
Report 2
Report 3
Same every
time and is
same what
was asked
Different reports
every time or
different from
what is asked
Reliability Validity
What do reliability, validity and utility mean….
• Same information obtained every time the same situation
comes - RELIABILITY
• Information is what is wanted - VALIDITY
• It is practically possible to obtain the information - UTILITY
• These qualities helps trust a person or machine
• Facilitate outcomes such as making friends, employees,
getting service deals for equipments etc.
Gareis and Grant (2008), in Teacher-made Assessments: How to Connect Curriculum,
Instruction, and Student Learning p. 33, taken from https://reliablerubrics.com/2015/03/18/what-
is-reliability-and-validity/
What are reliability, validity and utility?
• Reliability is related to accuracy
• Validity is related to success of measuring what is intended
• If a tool is valid, it is also reliable BUT if a tool is reliable, its
not necessary that its valid
• A tool may be reliable and valid, but it should also be
practically possible to use it
• When experiments, tests, or measuring procedure is reliable
and valid, then results from replicated studies can support
claims of generalization of findings and contribute to
research based evidences
Sources of ‘error’ in measurement
• ‘Error’ in measurement means a variation from true reading
• ‘Errors’ in measurement can be due to different reasons
• The sampling of items –
• type of items,
• relation to construct and its aspects,
• number of items,
• How the tool is used
• How participants respond –
• Guessing
• Marking answers incorrectly
• Skipping questions by mistake
• Misinterpreting test instructions
Random error
Systematic error
Any measurement can have two types of errors (variations in repeated readings)
Random error
Caused due to chance
Systematic error
Caused due to specific reason
E.g., if responder is
distracted
E.g., if responder already has
experience in the construct
being tested
Checking Reliability Checking Validity
Stability Internal
Consistency
Equivalence
Test the measurement tool for ….
How to test
reliability ?
Is the tool giving
same result on
repeated
measurements?
Are the items in
the tool
measuring the
same construct?
When two people administer
the tool will results be same?
If we have two versions of the
tool, will they give equivalent
results?
Stability
• Question asked: Is the instrument or data collection
(measurement) tool able to give same results on repeated
administrations?
• The method: test-retest reliability
• The instrument is administered two times (about 15 days
apart) and the correlation coefficient of the readings obtained
both times is used as reliability coefficient
• Advantage: can see consistency across time
• Limitations: can be affected by …
• Memory if duration between tests is less
• Maturation if duration between tests is more
E.g., Test-retest correlation for two sets of scores of many college
students on Rosenberg Self-Esteem scale
https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/
Pearson’s r for these data is +.95
Internal Consistency
• Question asked: Are the items in the tool measuring the same
construct or concept or parts of the same concept?
• The method: Split-half reliability, Cronbach’s alpha or Coefficient
alpha and Kuder-Richardson Formula 20 (KR-20).
• The items are split into two halves. The correlation coefficient for
the two sets of readings from the two halves is calculated as the
split-half reliability coefficient.
• Advantages: Is not affected by memory or maturation effects
• Limitations: Does not consider fluctuations across time
E.g., Split-half correlation for sets of odd and even-numbered scores of many
college students on Rosenberg Self-Esteem scale
Pearson’s r for these data is +.88
https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/
Find Split-half
correlations for
all possible
combinations of
halves
and take their
mean.
Conceptually,
that mean is the
Cronbach’s alpha
Equivalence
Questions asked are:
• When two people
administer the tool will
results be same?
• If we have two versions of
the tool, will they give
equivalent results?
The methods:
• Inter-rater reliability
• Alternate form test reliability
The statistics used include:
• Kendall’s tau
• Inter-class correlations
• Rasch’s item-response model
Bannigan & Watson, 2009; Drost, 2011
How to make tests more reliable?
Write items clearly
Make test instructions
easily understood
Train raters effectively
by making rules of
scoring clearly
Add more items
Obtain a reliability
coefficient of 0.7 to
0.8
When situation
demands, reliability
coefficient of 0.9 can
be sought
Most attributes that are to be measured in the fields of social sciences
like education, are construct variables
Example: happiness, intelligence, anxiety, academic achievement, fear,
personality, etc.
Construct variables are variables or attributes that are abstract in
nature. They cannot be universally defined hence understood in the
same way.
Construct variables have to be operationally defined based on theory
about them
What are
the types of
validity?
• Content validity at two levels
• Face validity
• Content validity
• Criterion validity – concurrent validity
& predictive validity
• Construct validity
• convergent validity
• divergent validity
• factorial validity
• discriminant validity
• Statistical conclusion validity
Face Validity
• Check the test tool by subject
experts or researchers to see if the
items are reasonable, relevant.
• Focus is to confirm ‘subject’s
acceptance of text’
• Informal
Content Validity
• Critical review by expert panel to
check if content of test tool
matches with all aspects of a
construct for clarity and
completeness
• Comparison with literature
• Both above
• Formal
• Content validity index (CVI) or
Content validity ratio (CVR) can be
used
Content Validity at two levels
Bannigan & Watson, (2009)
Calculation of Content validity index
• Calculate item-wise content validity index (I-CVI)
• Number of experts who give ‘very relevant’ / total number of
experts
• The above measure is calculated for each item on the tool
• Items that have I-CVI <0.79 are taken as relevant
• Items that have I-CVI between 0.70 and 0.79 are taken as
needing revision
• Items that have I-CVI below 0.70 are eliminated
(Rodrigues, Adachi and Beattie, 2017)
Criterion validity: comparison to established
‘criteria’
Concurrent validity
• Test tool is compared to already
established ‘criteria’ by conducting
both tests at the same time
• Procedure applied to scale under
development
• Correlation of each question with
criterion score is used to refine
questionnaire
• E.g., a culture-relevant short test
for self-esteem is created and
compared to existing self-esteem
tool.
Predictive validity
• Test tool is compared to already
established ‘criteria’ by conducting
test tool at one time and criterion
tool at a future time
• Procedure used to predict if test
can predict performance in the
construct
• Correlation between test scores
and targeted outcomes
• E.g., TET test score used to predict
how well teachers perform in their
classes
Construct validity: correlation between test tool
and construct under investigation
• It is an indirect approach
• Relevant when scale or test tool has been developed
based on the assumption of a particular theory
• Starts with defining the topic or construct to be
assessed. Here:
• Hypotheses about correlations with other instruments
• Respondents who would score low or high
• Other findings that can be predicted from the scores
Construct validity
Convergent validity Divergent validity Factorial validity Discriminant validity
How similar is this
scale to other scales
measuring same or
related concepts
How different is this
scale from scales
measuring different
concepts?
What are the
factors that are
exactly covered by
the items on the
scale? Uses Factor
Analysis
Can the scale
discriminate among
people with differing
values on the construct?
Uses Discriminant
Analysis
Exploratory factor analysis Confirmatory factor analysis
Many variables. Find the
variables that would relate
to the construct
Variables and their relations
are known. Use data to test
hypothesis of their relations
Statistical conclusion validity
•Question asked: is the inference obtained on the
relationship tested trustworthy and dependable?
•Threats to statistical conclusion validity are:
• Low statistical power
• Violation of assumptions
• Reliability of measures
• Reliability of treatment
• Random irrelevancies of experimental setting
• Random heterogeneity of respondents
Drost, 2011, pg. 115
Utility or Practicality of a test
• Time to administer
• Ease of administering
• Easy language
• Does not cause boredom to
respondents which can
increase error
• Need to test the scale in
different settings as part of
reliability
• The tests should be cost
effective
Can my test tool or
scale be actually used
in the field?
Summary
• Reliability and Validity are important checks to ensure
trustworthiness of research findings
• Reliability addresses random errors in measurement
• Validity address systematic errors in measurement
• Utility addresses practicability of use of the measurement tool
• Reliability measures are in the form of correlation coefficients
• Validity measures are in the form of comments, reviews, and
correlation coefficients
• Correlation coefficients, factor analysis & discriminant analysis can
be obtained using statistical software such as SPSS, PSPP and JASP
References
• https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-
measurement/
• Drost, E.A. (2011) Validity and Reliability in Social Science Research. Education
Research and Perspectives, 38(1),105- 123
https://www3.nd.edu/~ggoertz/sgameth/Drost2011.pdf
• Bannigan, K., & Watson, R. (2009). Reliability and validity in a nutshell. Journal
of clinical nursing, 18(23), 3237–3243.
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365-2702.2009.02939.x
• Rodrigues, I.B., Adachi, J.D., Beattie, K.A. et al. (2017) Development and
validation of a new tool to measure the facilitators, barriers and preferences to
exercise in people with osteoporosis. BMC Musculoskelet Disord 18, 540.
https://doi.org/10.1186/s12891-017-1914-5.
https://bmcmusculoskeletdisord.biomedcentral.com/track/pdf/10.1186/s12891-
017-1914- 5.pdf