SlideShare uma empresa Scribd logo
1 de 34
Validity and Reliability
Session 2
Chapters 4
Colton & Covert (2007)
What is Validity?
According to Colton and Covert (2007), validity is
“the ability of an instrument to measure what you intend
it to measure” (p. 65).
Validity ensures trustworthy and credible information.
“Validity is a matter of degree.”
(Colton & Covert, 2017, p. 65)
• Assessment instruments are not merely valid or
invalid.
• Validity exists in varying degrees across a
continuum.
• Validity is a characteristic of the responses/data
gathered.
• The greater the evidence of validity the greater
the likelihood of credible trustworthy data.
• Hence, the importance of establishing/testing
the validity before the instrument is used.
In order to gather evidence that an
instrument is valid, we need to establish
that it is measuring :
1. the right content (Content Validity)
(Does the instrument measure the content it’s intended to measure?
2. the right construct (Construct Validity)
(Does the instrument measure the construct it’s designed to measure?)
3. the right criterion (Criterion Validity)
(Do the instrument scores align with 1 or more standards or outcomes
related to the instrument’s intent?)
Establishing Evidence of
Content Validity
To determine this, ask:
Do the items in the instrument represent
the topics or process being investigated?
Ex: An instrument designed to measure alcohol use
should measure behaviors associated with alcohol use
(not smoking, drug use, etc.).
These steps are done during the assessment development stage:
1. Define the content domain that the assessment intends to measure.
2. Define the components of the content domain that should be
represented in the assessment through a literature review.
3. Write the items/questions that reflect this defined content domain.
4. Have a panel of topic experts review the items/questions.
Establishing Evidence of
Content Validity
You are to design an instrument to measure undergraduate college teaching
effectiveness.
1. Clearly define the domain of the content that the assessment intends to represent.
Determine the topics/principles related to college teaching effectiveness using the
literature.
2. Define the components of the content domain that should be represented in the
assessment.
Select the content areas that are specific to effective undergraduate college
teaching (not graduate school or adult learning)
3. Write items/questions that reflect this defined content domain
Write response items for each component.
4. Have a panel of topic experts review the response items for clarity and coverage.
.
An Example: Establishing Evidence of
Content-related Validity
Recommended method for a response item
review by panel of topic experts (Popham, 2000)
1. Have a panel of experts individually examine each item for
content relevance—noting YES, it’s relevant or No, not
relevant
2. Calculate the percentage of YES responses for each item
and then the average percent of YES for all items. This
reflects item relevance.
3. Have panel members individually review the instrument for
content-coverage—noting a percentage
4. Compute an average percentage of all panelist estimates of
coverage. This reflects content coverage.
What do the results mean?
95% item relevance
85% content coverage
Impressive evidence of content-related validity!
You could say with, relative confidence, that the instrument
validly measures the content it intends to measure.
---------------------
65% item relevance
40% content coverage
Poor evidence of content-related validity
You could NOT say with, confidence, that instrument validly
measures the content it intends to measure.
Establishing Criterion Validity
To determine this, ask:
Are the results from the instrument
comparable to an external standard or
outcome?
There are 2 types:
1. Predictive-related validity
2. Concurrent-related validity
1. Predictive-related Criterion Validity
The assessment scores are valid for predicting future
outcomes regarding similar criteria.
A significant lag time is needed.
Ex: A group of students take a standardized math & verbal
aptitude test in 10th grade and score very low. In the
students’ senior year, 2 years later, the students’ math and
verbal aptitude scores (criterion data) on the SAT (a college
entrance exam) bear out to be similarly low.
In this case, evidence of predictive criterion-related validity has been established.
We can trust the predictive inferences regarding math & verbal skills made from
this standardized instrument 
2. Concurrent-related Criterion Validity
The assessment scores are valid for indicating
current behavior.
Ex: A group of students take a standardized math &
reading comprehension aptitude test in 10th grade
and receive very low scores. The scores are
compared to grades in 10th grade algebra and English
literature courses. They are equally low.
In this case, evidence of concurrent criterion-related validity has been established.
We trust the inferences regarding math and reading comprehension scores made
from the standardized instrument.
Establishing Construct Validity
To determine this, ask:
Does the instrument measure the construct
(i.e. psychological characteristic or human
behavior) it’s designed to measure?
Note: “Constructs” are hypothetical or theoretical.
Example: “Love” is a theoretical construction. Everyone
has constructed their own theory of what it is.
Establishing Construct Validity
• The first step is to use the literature to
operationalize (i.e. define) the construct.
• A panel of topic experts can add additional
support.
• Specific studies provide additional evidence.
Studies for Establishing Construct
Validity
1. Intervention studies
2. Differential-population studies
3. Related-measures studies: compare scores to
other measures that measure the same
construct
1. Intervention studies
Demonstrate pre-post changes in the construct
of interest based on a treatment.
Ex : An inventory designed to measure test-anxiety is given to 25
students self-identified as having test anxiety and 25 students
who claim they do not. The inventory is administered just
before a high-stakes final exam. As predicted, the scores were
significantly different between the test anxiety group and
non-test anxiety group.
In this case, evidence of construct-related validity has been established.
We can trust inferences regarding anxiety based on the anxiety inventory scores.
2. Differential-population studies:
Demonstrate different populations score differently
on the measure.
Ex : An inventory is designed to measure insecurity due to
baldness. The inventory is given to bald-headed men and
men with a head full of hair. As predicted the bald-headed
men had much higher scores than the men with hair.
Evidence of construct-related validity has been established.
We can trust inferences regarding bald-headed insecurity based on the inventory scores.
3. Related-measures studies:
Correlate scores (positive or negative) to other
measures that measure similar constructs.
Ex : An inventory is designed to measure introversion.
The inventory is given to sales people scoring high on an
extroversion inventory. As predicted the sales people
introversion inventory resulted in very low scores.
Evidence of construct-related validity has been established.
We can trust inferences regarding introversion based on the inventory scores.
It is recommended to continually establish construct-related validity as the instrument
is use. The theoretical definition of a construct changes over time.
Other types of validity
• Convergent Validity
• Discriminant Validity
• Multicultural Validity: Evidence that the instrument measures what it
intends to measure as understood by participants of a particular culture.
For example: If your instrument is to be administered to the Hmong population, then
the language, phrases, and connotations should be understood by this culture.
• Both are a type of construct validity
• Convergent validity refers to evidence that
similar constructs are strongly related.
For example: If your instrument is
measuring Depression, the response items
related to Sadness should score similarly.
• Discriminant validity refers to evidence that
dissimilar constructs are NOT related.
For example: If your instrument is
measuring Depression, the response items
related to Happiness should score
dissimilarly.
Quick summary
 In order to make valid decisions we need to use
appropriate instruments that have established
evidence of content-related, construct-related, and
criterion-related validity.
 This is determined in the developmental stage of the
instrument.
 If you are designing an instrument, you need to
establish this.
 If the instrument is already designed, review the
instrument’s manual to determine how this was done.
 If you alter an established instrument from it’s original
state, you need to re-establish validity and reliability.
Reliability
Assessment instruments need to yield valid data
AND be
reliable
What’s “Reliability”
The ability to gather consistent results from a
particular instrument.
There are 3 approaches to establishing
instrument reliability.
1. Stability reliability
2. Alternate-form reliability
3. Internal consistency reliability
Each is a statistical test of correlation to measure
of consistency.
1. Stability Reliability
Definition: Consistent results over time
Also known as “test-retest” reliability
Use this if the assessment is to be given to the same individuals
at different times.
How do we do determine this?
 Give the assessment over again to the same group of people.
 Calculate the correlation between the 2 scores.
 Be sure to wait several days or a few weeks.
 Long enough to reduce the influence of the 1st testing (i.e.
memory of test items) and short enough to reduce the
influence of intervening influences.
2. Alternate-form reliability
Definition: Consistent results between different forms of the
same test.
Also known as “parallel form” reliability
Use this if multiple test forms are needed for interchangeability —
usually for test security (i.e. prevent cheating).
How do we determine this?
Create different forms that are similar in content (i.e. “content
parallel”) and difficulty (i.e. “difficulty-parallel”). Administer both
forms to the same group of people and calculate the correlation.
Are stability reliability and alternate-
form reliability ever combined?
YES!
This is called stability and alternate-form reliability.
This is where there are consistent results over time using two
different test forms of parallel-content and parallel-difficulty.
3. Internal Consistency reliability
The degree to which all test items measure the content domain
consistently.
Use this when there is no concern about stability over time and no
need for an alternate form.
How do we do this?
Split-half technique: Divide test in half by treating the odd numbered
items and even numbered items as 2 separate tests. The entire
test is administered and the 2 sub-scores (scores from even items
& scores from odd items) are correlated.
Reliability Coefficients (known as “r “)
Stability reliability
Alternative form reliability
Internal reliability
Pearson-Product moment
Pearson-Product moment
Pearson-Product moment is used to
correlate each half
or Kuder-Richardson or
Cronbach’s alpha
When establishing reliability a correlation between the two sets of data
needs to be calculated using appropriate statistical formulas.
Reliability method Statistical formula
Acceptable r values
A reliability value of 0.00 means absence of reliability
whereas value of 1.00 means perfect reliability. An
acceptable reliability coefficient should not be below
0.80, less than this value indicates inadequate
reliability.
However with stability and alternative-form combined
reliability, .70 is acceptable since there are more
variables.
So let’s check your understanding
You design an instrument to be used as a pre-post assessment.
Which form of reliability should definitely be established?
____Stability
____Alternative-form
____Internal consistency
What type of statistical formula should you use to correlate the two
results? (i.e test and re-test scores)
____Pearson-Product Moment
____ Spearman Brown
The reliability coefficient was .70 Is the assessment reliable?
____Yes
____No It needs to be at least .80
Remember….
In order for an assessment to be worthwhile it
needs to be
RELIABLE
and able to yield
VALID data
AND….
It’s quite possible for an instrument to be
RELIABLE
and not
provide VALID inferences
HOWEVER….
It’s NOT possible for an instrument to provide
VALID inferences
without being
RELIABLE
This ends Info Session 2
“Validity and Reliability”
I highly recommend traveling through this session at least TWICE

Mais conteúdo relacionado

Mais procurados

Good scale measurement
Good scale measurementGood scale measurement
Good scale measurementsai precious
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidKhalid Mahmood
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types? Dr. Amjad Ali Arain
 
Reliability & Validity
Reliability & ValidityReliability & Validity
Reliability & ValidityIkbal Ahmed
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)Phong Đá
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jonesAmir Hamid Forough Ameri
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran Irfan Sheikh
 
MYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricMYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricBrad Kremer
 
MYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricMYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricBrad Kremer
 
Presentation on Rating scale
Presentation on Rating scalePresentation on Rating scale
Presentation on Rating scaleZubair Bhatti
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in researchDhani Ahmad
 
MYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricMYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricBrad Kremer
 

Mais procurados (20)

Reliability
ReliabilityReliability
Reliability
 
Good scale measurement
Good scale measurementGood scale measurement
Good scale measurement
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalid
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Reliability
ReliabilityReliability
Reliability
 
Likert scale
Likert scaleLikert scale
Likert scale
 
Reliability & Validity
Reliability & ValidityReliability & Validity
Reliability & Validity
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)
 
Lecture 07
Lecture 07Lecture 07
Lecture 07
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jones
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran
 
Attitude scaling
Attitude scalingAttitude scaling
Attitude scaling
 
MYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricMYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A Rubric
 
MYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricMYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C Rubric
 
Attitude scales
Attitude scalesAttitude scales
Attitude scales
 
Presentation on Rating scale
Presentation on Rating scalePresentation on Rating scale
Presentation on Rating scale
 
Measurement scaling
Measurement   scalingMeasurement   scaling
Measurement scaling
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in research
 
MYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricMYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B Rubric
 

Semelhante a Session 2 2018

Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
Validity of a Research Tool
Validity of a Research ToolValidity of a Research Tool
Validity of a Research TooljobyVarghese22
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilationHannan Mahmud
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docxSONU61709
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Validity.pptx
Validity.pptxValidity.pptx
Validity.pptxrupasi13
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxDebdattaMandal3
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Topic validity
Topic validityTopic validity
Topic validitymikki khan
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good TestDrSindhuAlmas
 

Semelhante a Session 2 2018 (20)

Reliablity and Validity
Reliablity and ValidityReliablity and Validity
Reliablity and Validity
 
Shaheen Anwar
Shaheen AnwarShaheen Anwar
Shaheen Anwar
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Validity of a Research Tool
Validity of a Research ToolValidity of a Research Tool
Validity of a Research Tool
 
Rep
RepRep
Rep
 
Business research methods
Business research methodsBusiness research methods
Business research methods
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilation
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Validity.pptx
Validity.pptxValidity.pptx
Validity.pptx
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptx
 
Validity of test
Validity of testValidity of test
Validity of test
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
Topic validity
Topic validityTopic validity
Topic validity
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Session 2 2018

  • 1. Validity and Reliability Session 2 Chapters 4 Colton & Covert (2007)
  • 2. What is Validity? According to Colton and Covert (2007), validity is “the ability of an instrument to measure what you intend it to measure” (p. 65). Validity ensures trustworthy and credible information.
  • 3. “Validity is a matter of degree.” (Colton & Covert, 2017, p. 65) • Assessment instruments are not merely valid or invalid. • Validity exists in varying degrees across a continuum. • Validity is a characteristic of the responses/data gathered. • The greater the evidence of validity the greater the likelihood of credible trustworthy data. • Hence, the importance of establishing/testing the validity before the instrument is used.
  • 4. In order to gather evidence that an instrument is valid, we need to establish that it is measuring : 1. the right content (Content Validity) (Does the instrument measure the content it’s intended to measure? 2. the right construct (Construct Validity) (Does the instrument measure the construct it’s designed to measure?) 3. the right criterion (Criterion Validity) (Do the instrument scores align with 1 or more standards or outcomes related to the instrument’s intent?)
  • 5. Establishing Evidence of Content Validity To determine this, ask: Do the items in the instrument represent the topics or process being investigated? Ex: An instrument designed to measure alcohol use should measure behaviors associated with alcohol use (not smoking, drug use, etc.).
  • 6. These steps are done during the assessment development stage: 1. Define the content domain that the assessment intends to measure. 2. Define the components of the content domain that should be represented in the assessment through a literature review. 3. Write the items/questions that reflect this defined content domain. 4. Have a panel of topic experts review the items/questions. Establishing Evidence of Content Validity
  • 7. You are to design an instrument to measure undergraduate college teaching effectiveness. 1. Clearly define the domain of the content that the assessment intends to represent. Determine the topics/principles related to college teaching effectiveness using the literature. 2. Define the components of the content domain that should be represented in the assessment. Select the content areas that are specific to effective undergraduate college teaching (not graduate school or adult learning) 3. Write items/questions that reflect this defined content domain Write response items for each component. 4. Have a panel of topic experts review the response items for clarity and coverage. . An Example: Establishing Evidence of Content-related Validity
  • 8. Recommended method for a response item review by panel of topic experts (Popham, 2000) 1. Have a panel of experts individually examine each item for content relevance—noting YES, it’s relevant or No, not relevant 2. Calculate the percentage of YES responses for each item and then the average percent of YES for all items. This reflects item relevance. 3. Have panel members individually review the instrument for content-coverage—noting a percentage 4. Compute an average percentage of all panelist estimates of coverage. This reflects content coverage.
  • 9. What do the results mean? 95% item relevance 85% content coverage Impressive evidence of content-related validity! You could say with, relative confidence, that the instrument validly measures the content it intends to measure. --------------------- 65% item relevance 40% content coverage Poor evidence of content-related validity You could NOT say with, confidence, that instrument validly measures the content it intends to measure.
  • 10. Establishing Criterion Validity To determine this, ask: Are the results from the instrument comparable to an external standard or outcome? There are 2 types: 1. Predictive-related validity 2. Concurrent-related validity
  • 11. 1. Predictive-related Criterion Validity The assessment scores are valid for predicting future outcomes regarding similar criteria. A significant lag time is needed. Ex: A group of students take a standardized math & verbal aptitude test in 10th grade and score very low. In the students’ senior year, 2 years later, the students’ math and verbal aptitude scores (criterion data) on the SAT (a college entrance exam) bear out to be similarly low. In this case, evidence of predictive criterion-related validity has been established. We can trust the predictive inferences regarding math & verbal skills made from this standardized instrument 
  • 12. 2. Concurrent-related Criterion Validity The assessment scores are valid for indicating current behavior. Ex: A group of students take a standardized math & reading comprehension aptitude test in 10th grade and receive very low scores. The scores are compared to grades in 10th grade algebra and English literature courses. They are equally low. In this case, evidence of concurrent criterion-related validity has been established. We trust the inferences regarding math and reading comprehension scores made from the standardized instrument.
  • 13. Establishing Construct Validity To determine this, ask: Does the instrument measure the construct (i.e. psychological characteristic or human behavior) it’s designed to measure? Note: “Constructs” are hypothetical or theoretical. Example: “Love” is a theoretical construction. Everyone has constructed their own theory of what it is.
  • 14. Establishing Construct Validity • The first step is to use the literature to operationalize (i.e. define) the construct. • A panel of topic experts can add additional support. • Specific studies provide additional evidence.
  • 15. Studies for Establishing Construct Validity 1. Intervention studies 2. Differential-population studies 3. Related-measures studies: compare scores to other measures that measure the same construct
  • 16. 1. Intervention studies Demonstrate pre-post changes in the construct of interest based on a treatment. Ex : An inventory designed to measure test-anxiety is given to 25 students self-identified as having test anxiety and 25 students who claim they do not. The inventory is administered just before a high-stakes final exam. As predicted, the scores were significantly different between the test anxiety group and non-test anxiety group. In this case, evidence of construct-related validity has been established. We can trust inferences regarding anxiety based on the anxiety inventory scores.
  • 17. 2. Differential-population studies: Demonstrate different populations score differently on the measure. Ex : An inventory is designed to measure insecurity due to baldness. The inventory is given to bald-headed men and men with a head full of hair. As predicted the bald-headed men had much higher scores than the men with hair. Evidence of construct-related validity has been established. We can trust inferences regarding bald-headed insecurity based on the inventory scores.
  • 18. 3. Related-measures studies: Correlate scores (positive or negative) to other measures that measure similar constructs. Ex : An inventory is designed to measure introversion. The inventory is given to sales people scoring high on an extroversion inventory. As predicted the sales people introversion inventory resulted in very low scores. Evidence of construct-related validity has been established. We can trust inferences regarding introversion based on the inventory scores. It is recommended to continually establish construct-related validity as the instrument is use. The theoretical definition of a construct changes over time.
  • 19. Other types of validity • Convergent Validity • Discriminant Validity • Multicultural Validity: Evidence that the instrument measures what it intends to measure as understood by participants of a particular culture. For example: If your instrument is to be administered to the Hmong population, then the language, phrases, and connotations should be understood by this culture. • Both are a type of construct validity • Convergent validity refers to evidence that similar constructs are strongly related. For example: If your instrument is measuring Depression, the response items related to Sadness should score similarly. • Discriminant validity refers to evidence that dissimilar constructs are NOT related. For example: If your instrument is measuring Depression, the response items related to Happiness should score dissimilarly.
  • 20. Quick summary  In order to make valid decisions we need to use appropriate instruments that have established evidence of content-related, construct-related, and criterion-related validity.  This is determined in the developmental stage of the instrument.  If you are designing an instrument, you need to establish this.  If the instrument is already designed, review the instrument’s manual to determine how this was done.  If you alter an established instrument from it’s original state, you need to re-establish validity and reliability.
  • 21. Reliability Assessment instruments need to yield valid data AND be reliable
  • 22. What’s “Reliability” The ability to gather consistent results from a particular instrument.
  • 23. There are 3 approaches to establishing instrument reliability. 1. Stability reliability 2. Alternate-form reliability 3. Internal consistency reliability Each is a statistical test of correlation to measure of consistency.
  • 24. 1. Stability Reliability Definition: Consistent results over time Also known as “test-retest” reliability Use this if the assessment is to be given to the same individuals at different times. How do we do determine this?  Give the assessment over again to the same group of people.  Calculate the correlation between the 2 scores.  Be sure to wait several days or a few weeks.  Long enough to reduce the influence of the 1st testing (i.e. memory of test items) and short enough to reduce the influence of intervening influences.
  • 25. 2. Alternate-form reliability Definition: Consistent results between different forms of the same test. Also known as “parallel form” reliability Use this if multiple test forms are needed for interchangeability — usually for test security (i.e. prevent cheating). How do we determine this? Create different forms that are similar in content (i.e. “content parallel”) and difficulty (i.e. “difficulty-parallel”). Administer both forms to the same group of people and calculate the correlation.
  • 26. Are stability reliability and alternate- form reliability ever combined? YES! This is called stability and alternate-form reliability. This is where there are consistent results over time using two different test forms of parallel-content and parallel-difficulty.
  • 27. 3. Internal Consistency reliability The degree to which all test items measure the content domain consistently. Use this when there is no concern about stability over time and no need for an alternate form. How do we do this? Split-half technique: Divide test in half by treating the odd numbered items and even numbered items as 2 separate tests. The entire test is administered and the 2 sub-scores (scores from even items & scores from odd items) are correlated.
  • 28. Reliability Coefficients (known as “r “) Stability reliability Alternative form reliability Internal reliability Pearson-Product moment Pearson-Product moment Pearson-Product moment is used to correlate each half or Kuder-Richardson or Cronbach’s alpha When establishing reliability a correlation between the two sets of data needs to be calculated using appropriate statistical formulas. Reliability method Statistical formula
  • 29. Acceptable r values A reliability value of 0.00 means absence of reliability whereas value of 1.00 means perfect reliability. An acceptable reliability coefficient should not be below 0.80, less than this value indicates inadequate reliability. However with stability and alternative-form combined reliability, .70 is acceptable since there are more variables.
  • 30. So let’s check your understanding You design an instrument to be used as a pre-post assessment. Which form of reliability should definitely be established? ____Stability ____Alternative-form ____Internal consistency What type of statistical formula should you use to correlate the two results? (i.e test and re-test scores) ____Pearson-Product Moment ____ Spearman Brown The reliability coefficient was .70 Is the assessment reliable? ____Yes ____No It needs to be at least .80
  • 31. Remember…. In order for an assessment to be worthwhile it needs to be RELIABLE and able to yield VALID data
  • 32. AND…. It’s quite possible for an instrument to be RELIABLE and not provide VALID inferences
  • 33. HOWEVER…. It’s NOT possible for an instrument to provide VALID inferences without being RELIABLE
  • 34. This ends Info Session 2 “Validity and Reliability” I highly recommend traveling through this session at least TWICE

Notas do Editor

  1. It can be argued that it is almost impossible to establish predictive validity since so many outside variables can impact the results over time.
  2. NOTE: Many believe it is nearly impossible to create two tests of the SAME difficulty. Assessment experts equalize the difficulty variance through a mathematical adjustment.