Reliability and validity are important concepts for researchers to consider when developing and evaluating measurement tools and methods. Reliability refers to the consistency of a measure, while validity refers to the accuracy. There are different types of reliability, including test-retest and internal consistency, which can be estimated using Cronbach's alpha. Validity includes face, content, construct, internal, external, statistical conclusion, and criterion-related validity. Researchers must ensure their measures are both reliable in providing consistent results and valid in accurately measuring the intended construct.
2. • Measurement involves assigning scores to individuals so that they represent some
characteristic of the individuals. But how do researchers know that the scores actually
represent the characteristic, especially when it is a construct like intelligence, self-
esteem, depression, or working memory capacity? The answer is that they conduct
research using the measure to confirm that the scores make sense based on their
understanding of the construct being measured. This is an extremely important point.
Psychologists do not simply assume that their measures work. Instead, they collect
data to demonstrate that they work. If their research does not demonstrate that a
measure works, they stop using it.
• As an informal example, imagine that you have been dieting for a month. Your
clothes seem to be fitting more loosely, and several friends have asked if you have
lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds,
this would make sense and you would continue to use the scale. But if it indicated
that you had gained 10 pounds, you would rightly conclude that it was broken and
either fix it or get rid of it. In evaluating a measurement method, psychologists
consider two general dimensions: reliability and validity.
Why Reliability and validity?
3.
4. Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a
weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual weight), it can be specified
as reliable, because the scale displays the same weight every time we measure a specific item.
However, the scale is not valid because it does not display the actual weight of the item.
5. Quantitative Data
•Reliability
1. Test/retest
2. Internal Consistency
• Validity
1. Face validity
2. Content validity
3. Construct validity
4. Internal validity
5. External validity
6. Statistical conclusion validity
7. Criterion-related validity
Reliability is about the consistency of a measure,
and validity is about the accuracy of a measure
6. Reliability
• Reliability is the consistency of your measurement, or the
degree to which an instrument measures the same way each
time it is used under the same condition with the same subjects.
• In short, it is the repeatability of your measurement.
• A measure is considered reliable if a person's score on the
same test given twice is similar. It is important to remember that
reliability is not measured, it is estimated.
7. There are two ways that reliability is usually estimated:
1. test/retest
The more conservative method to estimate reliability. Simply put, the idea
behind test/retest is that you should get the same score on test 1 as you do on
test 2.
• The three main components to this method are as follows:
1. Implement your measurement instrument at two separate times for
each subject;
2. Compute the correlation between the two separate measurements;
3. Assume there is no change in the underlying condition (or trait you are
trying to measure) between test 1 and test 2.
8. 2. Internal Consistency
• Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. One common way of
computing correlation values among the questions on your instruments is by using Cronbach's Alpha. Cronbach’s Alpha Formula
• The formula for Cronbach’s alpha is:
Where:
• N = the number of items.
• c̄ = average covariance between item-pairs.
• v̄ = average variance.
• In short, Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all
(we use a computer program for this part).
• In the end, your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient, the closer it is to
one, the higher the reliability estimate of your instrument. Cronbach's alpha is a less conservative estimate of reliability than test/retest.
11. What is validity in research?
validity is about the accuracy of a measure
• Research validity in surveys relates to the extent at which the survey measures
right elements that need to be measured. In simple terms, validity refers to how
well an instrument as measures what it is intended to measure.
• Validity and reliability make the difference between “good” and “bad”
research reports. Quality research depends on a commitment to testing and
increasing the validity as well as the reliability of your research results.
Reliability alone is not enough, measures need to be reliable, as well as, valid. For
example, if a weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual
weight), it can be specified as reliable, because the scale displays the same weight every
time we measure a specific item. However, the scale is not valid because it does not display
the actual weight of the item.
12. Types of Validity:
Here are the 7 key types of validity in research:
1. Face validity
2. Content validity
3. Construct validity
4. Internal validity
5. External validity
6. Statistical conclusion validity
7. Criterion-related validity
13. 1. Face validity
• Face validity is how valid your results seem based
on what they look like. This is the least scientific
method of validity, as it is not quantified using
statistical methods.
• Face validity is not validity in a technical sense of the
term. It is concerned with whether it seems like we
measure what we claim.
• Face validity is one of the methods to assess content
validity, it is important to check whether an
instrument is valid for a particular culture and also if
it contains unclear or unrelated items. But statistically
it is considered weak because the judgments might
be subjective.
14. 2. Content validity
• Content validity is also similar to face validity.
However, they both use different approaches to check
for validity. Face validity is an informal way to check
for validity; anyone could take a test at it’s “face
value” and say it looks good. Content validity uses
a more formal, statistics-based approach, usually
with experts in the field. These experts judge the
questions on how well they cover the material.
• Content validity and internal consistency are similar,
but they are not the same thing. Content validity is
how well an instrument (i.e. a test or questionnaire)
measures a theoretical construct. Internal consistency
measures how well some test items or
questions measure particular characteristics or
variables in the model. For example, you might have a
ten-question customer satisfaction survey with three
questions that test for “overall satisfaction with phone
service.” Testing those three questions for satisfaction
with phone service is an example of checking for
internal consistency; taking the whole survey and
making sure it measures “customer satisfaction”
would be an example of content validity.
15. 3. Construct validity
• Construct validity is one way to test the validity of a test; it’s used in education,
the social sciences, and psychology. It demonstrates that the test is actually
measuring the construct it claims it’s measuring. For example, you might try to
find out if an educational program increases emotional maturity in elementary
school age children. Construct validity would measure if your research is actually
measuring emotional maturity.
• An example is a measurement of the human brain, such as intelligence, level of
emotion, proficiency or ability.
• There are two types of construct validity-
• Convergent validity
• Discriminant validity
16. Convergent validity and discriminant
validity are commonly regarded as
subsets of construct validity.
• Convergent validity tests that constructs that are expected to be related are, in fact, related. Discriminant validity
(or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.
• Convergent and discriminant validity is that convergent validity tests whether constructs that should be
related, are related. Discriminant validity tests whether believed unrelated constructs are, in fact, unrelated.
• Imagine that a researcher wants to measure self-esteem(self-respect), but she also knows that the other four
constructs are related to self-esteem and have some overlap. The ultimate goal is to maker an attempt to isolate
self-esteem.
• In this example, convergent validity would test that the four other constructs are, in fact, related to self-esteem in
the study. The researcher would also check that self-worth and confidence, and social skills and self-appraisal,
are also related.
• Discriminant validity would ensure that, in the study, the non-overlapping factors do not overlap. For example,
self esteem and intelligence should not relate (too much) in most research projects.
• As you can see, separating and isolating constructs is difficult, and it is one of the factors that makes social
science extremely difficult.
• Social science rarely produces research that gives a yes or no answer, and the process of gathering knowledge is
slow and steady, building on top of what is already known.
17. Questionnaire
The THinK
questionnaire included
16 items, using a 5-level
Likert scale (yes, much,
somewhat, little, no).
Questions concerned
general knowledge
about vaccination
(acceptance,
administration,
effectiveness), HPV and
related risks and
acceptability of vaccine.
The age, birthplace and
education of each
respondent were
requested too.
18. Cronbach’s alpha
Whole questionnaire 0.816
kHPV 0.882
aHPV 0.784
KV 0.732
Internal consistency of the THinK questionnaire.
kHPV = knowledge of HPV infection
aHPV = Attitude to get vaccinated against HPV
KV = Knowledge about vaccines
19. 4. Internal validity
• Internal validity is the extent to which a study establishes a trustworthy cause-and-
effect relationship between a treatment and an outcome.It also reflects that a given study
makes it possible to eliminate alternative explanations for a finding. For example, if you
implement a smoking cessation(pause) program with a group of individuals, how sure can
you be that any improvement seen in the treatment group is due to the treatment that you
administered?
• Internal validity depends largely on the procedures of a study and how rigorously it is
performed.
• Internal validity is not a "yes or no" type of concept. Instead, we consider how confident
we can be with the findings of a study, based on whether it avoids traps that may make the
findings questionable.
• The less chance there is for "confounding“(puzzle) in a study, the higher the internal
validity and the more confident we can be in the findings. Confounding refers to a
situation in which other factors come into play that confuses the outcome of a study. For
instance, a study might make us unsure as to whether we can trust that we have identified
the above "cause-and-effect" scenario.
20. 5. External validity
• External validity refers to how well the outcome of a study can be expected to
apply to other settings. In other words, this type of validity refers to how
generalizable the findings are. For instance, do the findings apply to other people,
settings, situations, and time periods?
• Ecological validity, an aspect of external validity, refers to whether a study's
findings can be generalized to the real world.
• While rigorous research methods can ensure internal validity, external validity, on
the other hand, may be limited by these methods.
• Another term called transferability relates to external validity and refers to
the qualitative research design. Transferability refers to whether results transfer to
situations with similar characteristics.
21. External vs. Internal Validity
Internal validity is a way to gauge how strong your research
methods were. External validity helps to answer the question: can the
research be applied to the “real world”? If your research is applicable to
other situations, external validity is high. If the research cannot be replicated
in other situations, external validity is low.
22. 6. Statistical conclusion validity
Statistical Conclusion Validity(SCV), or just Conclusion Validity is a measure of
how reasonable a research or experimental conclusion is. For example, let’s say you
ran some research to find out if two years of preschool is more effective than one.
Based on the data, you conclude that there’s a positive relationship between how
well a child does in school and how many years of preschool they attended.
Conclusion validity well tell you how reliable that conclusion is.
Conclusion validity is only concerned with the question:
Based on the data, is there a relationship or isn’t there? It
doesn’t delve into specifics (like reliability tests) about what
kinds of relationship exist. It can be used for qualitative
research as well as quantitative research. That said, if you use the
term statistical conclusion validity, that’s usually taken as
meaning there’s some type of statistical data analysis involves
(i.e. that your research has quantitative data).
23. 7. Criterion-related validity
• Criterion validity (or criterion-related validity) measures how well one measure predicts an
outcome for another measure. A test has this type of validity if it is useful for predicting
performance or behavior in another situation (past, present, or future). For example:
• A job applicant takes a performance test during the interview process. If this test accurately
predicts how well the employee will perform on the job, the test is said to have criterion validity.
• A graduate student takes the GRE. The GRE has been shown as an effective tool (i.e. it has
criterion validity) for predicting how well a student will perform in graduate studies.
• The first measure (in the above examples, the job performance test and the GRE) is sometimes
called the predictor variable or the estimator. The second measure is called the criterion
variable as long as the measure is known to be a valid tool for predicting outcomes.