SlideShare uma empresa Scribd logo
1 de 28
VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES
125/12/2015
VALIDITY AND VALIDATION:
THEORIES AND PROCEDURES
VALIDATION TASK
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
225/12/2015
To establish whether the interpretation and uses
of the VSTEP test scores were valid for measuring the
English language competence of test-takers
from level 3 to level 5 on the Vietnamese English
language competence scale
VALIDITY & VALIDATION
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
325/12/2015
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support the
adequacy and appropriateness of inferences and actions based
on test scores or other models of assessment.
(Messick, 1989)
Validation is to marshal evidence and arguments in support of,
or counter to, proposed interpretations and uses of test scores.
(Messick, 1989)
VALIDITY THEORIES
 1985 – The 1985 Testing Standards
 Unified concept of validity
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Unified concept of validity
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)
425/12/2015
 1985 – The 1985 Testing Standards
 Unified concept of validity
 Construct-related evidence
 Content-related evidence
 Concurrent-related evidence
 1989 – Messick’s Validity Chapter
 Unified concept of validity
 Evidential basis (Construct, Relevance, Utility)
 Consequential basis (Values, Social Consequences)
MESSICK (1989)’S ASPECTS OF VALIDITY
Content
Structural
Consequential
External
Generalizability
Substantive
525/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive
MESSICK (1989)’S ASPECTS OF VALIDITY
 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process
625/12/2015
 The content aspect
 Content relevance
 Representativeness
 Technical quality
 The substantive aspect
Theoretical rationales for observed consistencies in responses
 Process of performance
 Empirical evidence of process
MESSICK (1989)’S ASPECTS OF VALIDITY
 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness
725/12/2015
 The structural aspect
The fidelity of the scoring structure to the construct structure.
 The generalizability aspect
The extent to which score properties and interpretations
generalize to and across groups, settings and tasks
 Reliability
 Content representativeness
MESSICK (1989)’S ASPECTS OF VALIDITY
 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness
825/12/2015
 The external aspect
 Convergent and discriminant evidence
 Criterion relevance
 Applied utility
 The consequential aspect
Value implications as a basis for action/consequences
 Bias
 Fairness
MESSICK (1989)’S VALIDITY FRAMEWORK
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context
925/12/2015
 Value
 The most influential framework of validity
 Criticisms
 Abstract
 Difficult to be done by a single researcher
 No specific guidance for specific validation context
VALIDITY THEORIES
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Development
Stage
1025/12/2015
 Kane (1992)’s and (2006)’s Validity Chapter
Argument-based Approach to Validation
 Interpretive Argument
The network of inferences and assumptions
 Validity Argument
 Logical evidence
 Empirical evidence
The
Appraisal
Stage
KANE (1992)’S VALIDITY FRAMEWORK
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
1125/12/2015
 Values
 The most practical, objective framework of validity
 Unique interpretive argument, consistent validity argument
steps (Bachman, 2004)
 Criticisms
 No attention to the structural aspect (Messick, 1995)
 Inadequate attention/method to policy context and
consequences of tests (McNamara, 2006).
LANGUAGE TEST VALIDATION
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s
1225/12/2015
 Bachman (1990)’s framework, after Messick (1989)’s
 Bachman (2004)’s framework, after Kane (1992)’s
CHOICE OF VALIDITY FRAMEWORK
 Messick (1989)’s
 Six aspects
Content
Structural
Consequential
External
Generalizability
Substantive
1325/12/2015
Content
Structural
Consequential
External
Generalizability
Substantive
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
VALIDATION QUESTIONS
1425/12/2015
1. To what extent was the test content relevant to and
representative of the domain of English language ability?
2. To what extent was each sub-test successful in measuring
students’ English language ability?
3. How well did the test-takers’ test scores on the VSTEP
correlate with their test scores on the IELTS?
4. What were the consequences of the UEE English test
scores' interpretation and use?
WINTERTemplate
01CONTENT
• Content relevance
• Technical quality
• Content representativeness
WINTERTemplate
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
01CONTENT
RELEVANCE
• Topical content
• Typical behavior
• Underlying process
• Test specifications
WINTERTemplate
01CONTENT
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
TECHNICAL QUALITY
Empirical Evidence
• difficulty level
• discriminating power
Expert Judgment
• readability level
• freedom of ambiguity/irrelevancy
• appropriateness of keyed answers & distractors
WINTERTemplate
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
01CONTENT
REPRESENTATIVENESS
The breadth of the content specifications for a test should
reflect the breadth of the construct invoked in score
interpretation” (Messick, 1989, p. 35).
All essential components of the construct domain are
covered (Messick, 1994, p. 12).
WINTERTemplate
01CONTENT
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
CONTENT ANALYSIS BY EXPERTS
• What knowledge and skills are needed to do each
item correctly?
• How relevant are the items to their assigned
objectives and domain?
Domain
• English secondary school curricula
• English program at the college
WINTERTemplate
01CONTENT
RASCH ANALYSIS
Item fit statistics
WINTERTemplate
01CONTENT
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
Item fit statistics
Smith (2004) suggested using item fit statistics to evaluate the
extent to which items tap into the same construct and place
test-takers in the same order.
- the extent to which the use of each item is consistent with the
way people have responded to the other items
- does the item rank order the individuals in a manner similar to
other items? (p. 106)
Smith (2004) argued that test-takers should be ranked
consistently by items measuring the same construct. If not, the
misfitting items to the Rasch model, i.e. the items that measure
a different construct, should be subject to revision or elimination
(p. 107).
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
02SUBSTANTIVE & STRUCTURAL
25/12/2015 22
To what extent was the VSTEP sub-tests successful in
measuring students’ English language competence?
ITEM RESPONSE THEORY (RASCH MODEL)
item fit
item discrimination
item cluster
DISCRIPTIVE STATISTICS
choice response analysis
How well did the test-takers’ VSTEP overall and
sub-test scores correlate with the test-takers’
overall and sub-test IELTS scores?
03CRITERION-RELATED
25/12/2015 23
04
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
CONSEQUENCES
25/12/2015 24
• The value implications of score interpretation
• The actual and potential consequences of score
uses
(Messick, 1989)
FOCUS: on validity of test score interpretation and
use - construct under-representation or construct-
irrelevant variance
04
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
CONSEQUENCES
25/12/2015 25
Sources of evidence
• Content relevance and representativeness
• Item bias
• Technical quality of the test
• Expert judgment
References
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
2625/12/2015
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards
for Educational and Psychological Testing. Washington, DC: Authors.
 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards
for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
 Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of
Employment, Education and Training [and] Higher Education Council.
 Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
 Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
 Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins
University Press.
 Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.
 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.
 Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.
 Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on
Education/Praeger.
 Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.
 McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
 Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.
 MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.
 Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.
 Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.
 Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge
University Press.
 Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R.
M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.
 Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian
Council for Educational Research.
THANK YOU
FOR YOUR ATTENTION
2725/12/2015
THANK YOU
FOR YOUR ATTENTION
Q & A
2825/12/2015

Mais conteúdo relacionado

Mais procurados

Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
Samcruz5
 

Mais procurados (20)

Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Validation
ValidationValidation
Validation
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importance
 
Presentation on validity and reliability in research methods
Presentation on validity and reliability in research methodsPresentation on validity and reliability in research methods
Presentation on validity and reliability in research methods
 
Rep
RepRep
Rep
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testing
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Validity & Reliability
Validity & ReliabilityValidity & Reliability
Validity & Reliability
 
Reliability and validity w3
Reliability and validity w3Reliability and validity w3
Reliability and validity w3
 
Reliablity and Validity
Reliablity and ValidityReliablity and Validity
Reliablity and Validity
 
Content &statistical validity
Content &statistical validityContent &statistical validity
Content &statistical validity
 
Validity
ValidityValidity
Validity
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity ppt
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods 15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibility
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and Reliability
 
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentValidity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
 

Destaque

Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copy
Marciano Melchor
 

Destaque (6)

Ail apresentation(kumazawa)
Ail apresentation(kumazawa)Ail apresentation(kumazawa)
Ail apresentation(kumazawa)
 
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
Peering through the Looking Glass: Towards a Programmatic View of the Qualify...
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copy
 
Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...Why Process Measures Are Often More Important Than Outcome Measures in Health...
Why Process Measures Are Often More Important Than Outcome Measures in Health...
 
Table of specifications
Table of specificationsTable of specifications
Table of specifications
 

Semelhante a Xác trị slide 1 - validation basics

HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
MohammedAskar22
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
Alfi Suru
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
Alfi Suru
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
songoten77
 

Semelhante a Xác trị slide 1 - validation basics (20)

reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxCopie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language Assessment
 
Designing classsroom
Designing classsroomDesigning classsroom
Designing classsroom
 
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
 
Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)Principles of language assessment ( evaluation of language teaching)
Principles of language assessment ( evaluation of language teaching)
 
Test construction
Test constructionTest construction
Test construction
 
Qualitative Research Methods
Qualitative Research MethodsQualitative Research Methods
Qualitative Research Methods
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
The Components of Test Specifications
The Components of Test SpecificationsThe Components of Test Specifications
The Components of Test Specifications
 
Language Testing : Principles of language assessment
Language Testing : Principles of language assessment Language Testing : Principles of language assessment
Language Testing : Principles of language assessment
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Validity
ValidityValidity
Validity
 
CRITERIA OF A GOOD TEST.pptx
CRITERIA OF A GOOD TEST.pptxCRITERIA OF A GOOD TEST.pptx
CRITERIA OF A GOOD TEST.pptx
 
NQC Presentation On Validation And Moderation
NQC Presentation On Validation And ModerationNQC Presentation On Validation And Moderation
NQC Presentation On Validation And Moderation
 
Intro assessmentcmm
Intro assessmentcmmIntro assessmentcmm
Intro assessmentcmm
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Item development.pdf for national examination development
Item development.pdf for national examination developmentItem development.pdf for national examination development
Item development.pdf for national examination development
 

Mais de englishonecfl

Mais de englishonecfl (20)

Chương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ IIChương trình và nội dung hội nghị Mạc tộc lần thứ II
Chương trình và nội dung hội nghị Mạc tộc lần thứ II
 
Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016Basic pronunciation online in Moodle 25.08.2016
Basic pronunciation online in Moodle 25.08.2016
 
Assessing speaking
Assessing speakingAssessing speaking
Assessing speaking
 
Reading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstepReading 2 - test specification for writing test - vstep
Reading 2 - test specification for writing test - vstep
 
Reading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing testReading 2 guideline for item writing writing test
Reading 2 guideline for item writing writing test
 
Reading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing promptsReading 1 guidelines for designing writing prompts
Reading 1 guidelines for designing writing prompts
 
Guiding questions for reading materials
Guiding questions for reading materialsGuiding questions for reading materials
Guiding questions for reading materials
 
Listening item submission template
Listening item submission templateListening item submission template
Listening item submission template
 
Examining reading
Examining readingExamining reading
Examining reading
 
Reading
ReadingReading
Reading
 
Reading
ReadingReading
Reading
 
Writing good multiple choice test questions
Writing good multiple choice test questionsWriting good multiple choice test questions
Writing good multiple choice test questions
 
Reading
ReadingReading
Reading
 
Nghe slide - testing listening skill slides
Nghe   slide - testing listening skill slidesNghe   slide - testing listening skill slides
Nghe slide - testing listening skill slides
 
Vstep listening item writer
Vstep listening item writerVstep listening item writer
Vstep listening item writer
 
Tham chiếu khung cefr của các bài thi
Tham chiếu khung cefr của các  bài thiTham chiếu khung cefr của các  bài thi
Tham chiếu khung cefr của các bài thi
 
Online version 20151003 main issues in language testing
Online version 20151003 main issues in language   testingOnline version 20151003 main issues in language   testing
Online version 20151003 main issues in language testing
 
Ke hoach to chuc bd can bo ra de thi 2015
Ke hoach to chuc bd can bo ra de thi   2015Ke hoach to chuc bd can bo ra de thi   2015
Ke hoach to chuc bd can bo ra de thi 2015
 
Khung chtr của 2 hợp phần
Khung chtr của 2 hợp phầnKhung chtr của 2 hợp phần
Khung chtr của 2 hợp phần
 
Google Forms
Google FormsGoogle Forms
Google Forms
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Xác trị slide 1 - validation basics

  • 1. VALIDITY AND VALIDATION: THEORIES AND PROCEDURES 125/12/2015 VALIDITY AND VALIDATION: THEORIES AND PROCEDURES
  • 2. VALIDATION TASK To establish whether the interpretation and uses of the VSTEP test scores were valid for measuring the English language competence of test-takers from level 3 to level 5 on the Vietnamese English language competence scale 225/12/2015 To establish whether the interpretation and uses of the VSTEP test scores were valid for measuring the English language competence of test-takers from level 3 to level 5 on the Vietnamese English language competence scale
  • 3. VALIDITY & VALIDATION Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other models of assessment. (Messick, 1989) 325/12/2015 Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other models of assessment. (Messick, 1989) Validation is to marshal evidence and arguments in support of, or counter to, proposed interpretations and uses of test scores. (Messick, 1989)
  • 4. VALIDITY THEORIES  1985 – The 1985 Testing Standards  Unified concept of validity  Construct-related evidence  Content-related evidence  Concurrent-related evidence  1989 – Messick’s Validity Chapter  Unified concept of validity  Evidential basis (Construct, Relevance, Utility)  Consequential basis (Values, Social Consequences) 425/12/2015  1985 – The 1985 Testing Standards  Unified concept of validity  Construct-related evidence  Content-related evidence  Concurrent-related evidence  1989 – Messick’s Validity Chapter  Unified concept of validity  Evidential basis (Construct, Relevance, Utility)  Consequential basis (Values, Social Consequences)
  • 5. MESSICK (1989)’S ASPECTS OF VALIDITY Content Structural Consequential External Generalizability Substantive 525/12/2015 Content Structural Consequential External Generalizability Substantive
  • 6. MESSICK (1989)’S ASPECTS OF VALIDITY  The content aspect  Content relevance  Representativeness  Technical quality  The substantive aspect Theoretical rationales for observed consistencies in responses  Process of performance  Empirical evidence of process 625/12/2015  The content aspect  Content relevance  Representativeness  Technical quality  The substantive aspect Theoretical rationales for observed consistencies in responses  Process of performance  Empirical evidence of process
  • 7. MESSICK (1989)’S ASPECTS OF VALIDITY  The structural aspect The fidelity of the scoring structure to the construct structure.  The generalizability aspect The extent to which score properties and interpretations generalize to and across groups, settings and tasks  Reliability  Content representativeness 725/12/2015  The structural aspect The fidelity of the scoring structure to the construct structure.  The generalizability aspect The extent to which score properties and interpretations generalize to and across groups, settings and tasks  Reliability  Content representativeness
  • 8. MESSICK (1989)’S ASPECTS OF VALIDITY  The external aspect  Convergent and discriminant evidence  Criterion relevance  Applied utility  The consequential aspect Value implications as a basis for action/consequences  Bias  Fairness 825/12/2015  The external aspect  Convergent and discriminant evidence  Criterion relevance  Applied utility  The consequential aspect Value implications as a basis for action/consequences  Bias  Fairness
  • 9. MESSICK (1989)’S VALIDITY FRAMEWORK  Value  The most influential framework of validity  Criticisms  Abstract  Difficult to be done by a single researcher  No specific guidance for specific validation context 925/12/2015  Value  The most influential framework of validity  Criticisms  Abstract  Difficult to be done by a single researcher  No specific guidance for specific validation context
  • 10. VALIDITY THEORIES  Kane (1992)’s and (2006)’s Validity Chapter Argument-based Approach to Validation  Interpretive Argument The network of inferences and assumptions  Validity Argument  Logical evidence  Empirical evidence The Development Stage 1025/12/2015  Kane (1992)’s and (2006)’s Validity Chapter Argument-based Approach to Validation  Interpretive Argument The network of inferences and assumptions  Validity Argument  Logical evidence  Empirical evidence The Appraisal Stage
  • 11. KANE (1992)’S VALIDITY FRAMEWORK  Values  The most practical, objective framework of validity  Unique interpretive argument, consistent validity argument steps (Bachman, 2004)  Criticisms  No attention to the structural aspect (Messick, 1995)  Inadequate attention/method to policy context and consequences of tests (McNamara, 2006). 1125/12/2015  Values  The most practical, objective framework of validity  Unique interpretive argument, consistent validity argument steps (Bachman, 2004)  Criticisms  No attention to the structural aspect (Messick, 1995)  Inadequate attention/method to policy context and consequences of tests (McNamara, 2006).
  • 12. LANGUAGE TEST VALIDATION  Bachman (1990)’s framework, after Messick (1989)’s  Bachman (2004)’s framework, after Kane (1992)’s 1225/12/2015  Bachman (1990)’s framework, after Messick (1989)’s  Bachman (2004)’s framework, after Kane (1992)’s
  • 13. CHOICE OF VALIDITY FRAMEWORK  Messick (1989)’s  Six aspects Content Structural Consequential External Generalizability Substantive 1325/12/2015 Content Structural Consequential External Generalizability Substantive
  • 14. 1. To what extent was the test content relevant to and representative of the domain of English language ability? 2. To what extent was each sub-test successful in measuring students’ English language ability? 3. How well did the test-takers’ test scores on the VSTEP correlate with their test scores on the IELTS? 4. What were the consequences of the UEE English test scores' interpretation and use? VALIDATION QUESTIONS 1425/12/2015 1. To what extent was the test content relevant to and representative of the domain of English language ability? 2. To what extent was each sub-test successful in measuring students’ English language ability? 3. How well did the test-takers’ test scores on the VSTEP correlate with their test scores on the IELTS? 4. What were the consequences of the UEE English test scores' interpretation and use?
  • 15. WINTERTemplate 01CONTENT • Content relevance • Technical quality • Content representativeness
  • 16. WINTERTemplate RELEVANCE • Topical content • Typical behavior • Underlying process • Test specifications 01CONTENT RELEVANCE • Topical content • Typical behavior • Underlying process • Test specifications
  • 17. WINTERTemplate 01CONTENT TECHNICAL QUALITY Empirical Evidence • difficulty level • discriminating power Expert Judgment • readability level • freedom of ambiguity/irrelevancy • appropriateness of keyed answers & distractors TECHNICAL QUALITY Empirical Evidence • difficulty level • discriminating power Expert Judgment • readability level • freedom of ambiguity/irrelevancy • appropriateness of keyed answers & distractors
  • 18. WINTERTemplate REPRESENTATIVENESS The breadth of the content specifications for a test should reflect the breadth of the construct invoked in score interpretation” (Messick, 1989, p. 35). All essential components of the construct domain are covered (Messick, 1994, p. 12). 01CONTENT REPRESENTATIVENESS The breadth of the content specifications for a test should reflect the breadth of the construct invoked in score interpretation” (Messick, 1989, p. 35). All essential components of the construct domain are covered (Messick, 1994, p. 12).
  • 19. WINTERTemplate 01CONTENT CONTENT ANALYSIS BY EXPERTS • What knowledge and skills are needed to do each item correctly? • How relevant are the items to their assigned objectives and domain? Domain • English secondary school curricula • English program at the college CONTENT ANALYSIS BY EXPERTS • What knowledge and skills are needed to do each item correctly? • How relevant are the items to their assigned objectives and domain? Domain • English secondary school curricula • English program at the college
  • 21. WINTERTemplate 01CONTENT Item fit statistics Smith (2004) suggested using item fit statistics to evaluate the extent to which items tap into the same construct and place test-takers in the same order. - the extent to which the use of each item is consistent with the way people have responded to the other items - does the item rank order the individuals in a manner similar to other items? (p. 106) Smith (2004) argued that test-takers should be ranked consistently by items measuring the same construct. If not, the misfitting items to the Rasch model, i.e. the items that measure a different construct, should be subject to revision or elimination (p. 107). Item fit statistics Smith (2004) suggested using item fit statistics to evaluate the extent to which items tap into the same construct and place test-takers in the same order. - the extent to which the use of each item is consistent with the way people have responded to the other items - does the item rank order the individuals in a manner similar to other items? (p. 106) Smith (2004) argued that test-takers should be ranked consistently by items measuring the same construct. If not, the misfitting items to the Rasch model, i.e. the items that measure a different construct, should be subject to revision or elimination (p. 107).
  • 22. To what extent was the VSTEP sub-tests successful in measuring students’ English language competence? ITEM RESPONSE THEORY (RASCH MODEL) item fit item discrimination item cluster DISCRIPTIVE STATISTICS choice response analysis 02SUBSTANTIVE & STRUCTURAL 25/12/2015 22 To what extent was the VSTEP sub-tests successful in measuring students’ English language competence? ITEM RESPONSE THEORY (RASCH MODEL) item fit item discrimination item cluster DISCRIPTIVE STATISTICS choice response analysis
  • 23. How well did the test-takers’ VSTEP overall and sub-test scores correlate with the test-takers’ overall and sub-test IELTS scores? 03CRITERION-RELATED 25/12/2015 23
  • 24. 04 • The value implications of score interpretation • The actual and potential consequences of score uses (Messick, 1989) FOCUS: on validity of test score interpretation and use - construct under-representation or construct- irrelevant variance CONSEQUENCES 25/12/2015 24 • The value implications of score interpretation • The actual and potential consequences of score uses (Messick, 1989) FOCUS: on validity of test score interpretation and use - construct under-representation or construct- irrelevant variance
  • 25. 04 Sources of evidence • Content relevance and representativeness • Item bias • Technical quality of the test • Expert judgment CONSEQUENCES 25/12/2015 25 Sources of evidence • Content relevance and representativeness • Item bias • Technical quality of the test • Expert judgment
  • 26. References  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for Educational and Psychological Testing. Washington, DC: Authors.  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.  Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of Employment, Education and Training [and] Higher Education Council.  Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.  Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.  Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins University Press.  Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.  Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.  Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.  Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education/Praeger.  Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.  McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.  Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.  MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.  Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.  Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.  Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge University Press.  Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.  Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian Council for Educational Research. 2625/12/2015  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for Educational and Psychological Testing. Washington, DC: Authors.  American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.  Andrich, D., & Mercer, A. (1997). International perspectives on selection methods of entry into higher education. Canberra: National Board of Employment, Education and Training [and] Higher Education Council.  Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.  Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.  Berk, R. A. (1980). Item Analysis. In R. A. Berk (Ed.), Criterion-referenced measurement: the state of the art. Baltimore and London: The Johns Hopkins University Press.  Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621-694). Washington, D.C.: American Council on Education.  Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications.  Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527.  Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education/Praeger.  Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(4), 635-694.  McNamara, T., & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.  Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan.  MOET. (2006). Secondary Education Curriculum: English. Hanoi: Education Publisher.  Moss, P. A. (2007). Reconstructing Validity. Educational Researcher, 36(8), 470-476.  Popham, W. J. (1997). Consequential Validity: Right Concern--Wrong Concept. Educational Measurement: Issues and Practice, 16(2), 9-13.  Purpura, J. E. (1999). Learner strategy use and performance on language tests : a structural equation modeling approach. Cambridge: Cambridge University Press.  Smith, E. V. (2004). Evidence for Reliability of Measures and Validity of Measure Interpretation: A Rasch Measurement Perspective. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications. Maple Grove: JAM Press.  Wu, M. L., Adams, R. J., & Haldane, S. (2008). ConQuest: Generalised Item Response Modelling Software [computer program]. Camberwell: Australian Council for Educational Research.
  • 27. THANK YOU FOR YOUR ATTENTION 2725/12/2015 THANK YOU FOR YOUR ATTENTION