1. Build Bright UniversityBuild Bright University
Language Testing and AssessmentLanguage Testing and Assessment
Chapter-2Chapter-2
Principles of LanguagePrinciples of Language
AssessmentAssessment
Prepared by Kheang SokhengPrepared by Kheang Sokheng
Ph.D Candidate and MEd inPh.D Candidate and MEd in
TESOLTESOL
2. Principles of Language AssessmentPrinciples of Language Assessment
Five cardinal criteria for “testing a test” areFive cardinal criteria for “testing a test” are
as follows:as follows:
PracticalityPracticality
ReliabilityReliability
ValidityValidity
AuthenticityAuthenticity
WashbackWashback
3. PracticalityPracticality
An effective practical test. This meansAn effective practical test. This means
that itthat it
is not excessively expensive,is not excessively expensive,
stays within appropriate time constraint,stays within appropriate time constraint,
is relatively easy to administer,is relatively easy to administer,
has a scoring/evaluation procedure thathas a scoring/evaluation procedure that
is specific and time-efficient.is specific and time-efficient.
4. Examples of Practicality checklistExamples of Practicality checklist
1. Are administrative details clearly1. Are administrative details clearly
established before the test?established before the test?
2. Can students complete the test2. Can students complete the test
reasonably within the set time frame?reasonably within the set time frame?
3. Is the cost of the test within budget3. Is the cost of the test within budget
limits?limits?
5. ReliabilityReliability
Reliability means the degree to which anReliability means the degree to which an
assessment tool producesassessment tool produces stablestable andand
consistentconsistent results.results.
A reliable test is consistent and dependable.A reliable test is consistent and dependable.
A test is reliable if:A test is reliable if:
““You give the same test to the same studentYou give the same test to the same student
or matched students on two differentor matched students on two different
occasions, the test should yield similaroccasions, the test should yield similar
results.” (Brown, 2004)results.” (Brown, 2004)
6. Student-Related ReliabilityStudent-Related Reliability
The most common learner-related issueThe most common learner-related issue
in reliability is caused by temporaryin reliability is caused by temporary
illness, fatigue, a “bad day”, anxiety,illness, fatigue, a “bad day”, anxiety,
and other physical or psychologicaland other physical or psychological
factors.factors.
7. Rater ReliabilityRater Reliability
Inter-rater reliability:Inter-rater reliability:
When two or more scorers yieldWhen two or more scorers yield
inconsistent scores of the same test.inconsistent scores of the same test.
Factors: lack of attention to scoring,Factors: lack of attention to scoring,
inexperience, inattention, etc.inexperience, inattention, etc.
Intra-rater reliability:Intra-rater reliability:
Scoring criteria, fatigue, bias towardScoring criteria, fatigue, bias toward
particular “good” and “bad” students,particular “good” and “bad” students,
or simple carelessness.or simple carelessness.
8. Test Administration ReliabilityTest Administration Reliability
This involves the condition in which theThis involves the condition in which the
test is administered.test is administered.
Unreliability occurs due to outsideUnreliability occurs due to outside
interference like noise, variations ininterference like noise, variations in
photocopying, temperature variations,photocopying, temperature variations,
the amount of light in various parts ofthe amount of light in various parts of
the room, and even the condition ofthe room, and even the condition of
desk and chairs.desk and chairs.
9. Test Administration ReliabilityTest Administration Reliability
Brown (2010) stated that he onceBrown (2010) stated that he once
witnessed the administration of a test ofwitnessed the administration of a test of
aural comprehension in which an audioaural comprehension in which an audio
player was used to deliver items forplayer was used to deliver items for
comprehension, but due to street noisecomprehension, but due to street noise
outside the building, test-taker sittingoutside the building, test-taker sitting
next to open windows could not hearnext to open windows could not hear
the stimuli clearly.the stimuli clearly.
10. Test ReliabilityTest Reliability
Factors cause unreliability:Factors cause unreliability:
If a test is too long, test takers mayIf a test is too long, test takers may
become fatigued by the time they reachbecome fatigued by the time they reach
the later items and hastily respondthe later items and hastily respond
incorrectly.incorrectly.
Ambiguous itemsAmbiguous items
11. ValidityValidity
Validity is the extent to which inferencesValidity is the extent to which inferences
made from assessment results aremade from assessment results are
appropriate, meaningful, and useful in termsappropriate, meaningful, and useful in terms
of the purpose of the assessment” (Gronlund,of the purpose of the assessment” (Gronlund,
1998, p.226).1998, p.226).
““ Measuring what should be measured”Measuring what should be measured”
Content-related evidenceContent-related evidence
Criterion-related evidenceCriterion-related evidence
Construct-related evidenceConstruct-related evidence
Consequential validityConsequential validity
Face validityFace validity
12. Content-Related EvidenceContent-Related Evidence
If a test samples the subject matterIf a test samples the subject matter
about which conclusions are to beabout which conclusions are to be
drawn.drawn.
If a test requires the test-taker toIf a test requires the test-taker to
perform the behavior that is beingperform the behavior that is being
measured.measured.
13. Criterion-Related EvidenceCriterion-Related Evidence
Criterion-Related Evidence is used toCriterion-Related Evidence is used to
demonstrate the accuracy of a measure ordemonstrate the accuracy of a measure or
procedure by comparing it with anotherprocedure by comparing it with another
measure or procedure which has beenmeasure or procedure which has been
demonstrated to be valid.demonstrated to be valid.
For instance, imagine a hands-on drivingFor instance, imagine a hands-on driving
test has been shown to be an accurate testtest has been shown to be an accurate test
of driving skills. By comparing the scores onof driving skills. By comparing the scores on
the written driving test with the scores fromthe written driving test with the scores from
the hands-on driving test, the written can bethe hands-on driving test, the written can be
validated by using a criterion relatedvalidated by using a criterion related
14. Criterion-Related EvidenceCriterion-Related Evidence
strategy in which the hand-on driving test isstrategy in which the hand-on driving test is
compared to the written test.compared to the written test.
1.1.Concurrent validity/empiric validity if a testConcurrent validity/empiric validity if a test
result is supported by other concurrentresult is supported by other concurrent
performance beyond assessment itself; forperformance beyond assessment itself; for
example, the validity of a high score on theexample, the validity of a high score on the
final exam of a foreign language course willfinal exam of a foreign language course will
be substantiated by actual proficiency in thebe substantiated by actual proficiency in the
language.language.
15. Criterion-Related EvidenceCriterion-Related Evidence
2.2. Predictive validity is used to assessPredictive validity is used to assess
(and predict) a test-taker’s likelihood of(and predict) a test-taker’s likelihood of
future success.future success.
E.g. Placement tests, admissionsE.g. Placement tests, admissions
assessment batteries, language aptitudeassessment batteries, language aptitude
tests.tests.
16. Consequential validityConsequential validity
It encompasses all the consequences ofIt encompasses all the consequences of
a test, including such considerationsa test, including such considerations
as its accuracy in measuring intendedas its accuracy in measuring intended
criteria, its impact on the preparation ofcriteria, its impact on the preparation of
the test-takers, its effect on the learner,the test-takers, its effect on the learner,
and the (intended and unintended) socialand the (intended and unintended) social
consequences of a test’s interpretationconsequences of a test’s interpretation
17. Face ValidityFace Validity
““It refers to the degree to which a testIt refers to the degree to which a test
looks right, and appears to measure thelooks right, and appears to measure the
knowledge or abilities it claims toknowledge or abilities it claims to
measure, based on the subjectivemeasure, based on the subjective
judgment of the examinees who take it,judgment of the examinees who take it,
the administrative personnel who decidethe administrative personnel who decide
on its use, and other psychometricallyon its use, and other psychometrically
unsophisticated observers” (Mousavi,unsophisticated observers” (Mousavi,
2002, p.244)2002, p.244)
18. Face ValidityFace Validity
Sometimes students don’t know what isSometimes students don’t know what is
being tested when they tackle a test. Theybeing tested when they tackle a test. They
may feel, for a variety of reasons, that is amay feel, for a variety of reasons, that is a
test isn’t testing what it is “ supposed” to test.test isn’t testing what it is “ supposed” to test.
Face validity means that the studentsFace validity means that the students
perceive the test to be valid.perceive the test to be valid.
Face validity will likely be high if the learnersFace validity will likely be high if the learners
encounter:encounter:
a well-constructed, expected format witha well-constructed, expected format with
familiar tasks,familiar tasks,
19. Face ValidityFace Validity
a test that is clearly doable within the allotteda test that is clearly doable within the allotted
time limit,time limit,
Items that are clear and uncomplicated,Items that are clear and uncomplicated,
Directions that are crystal clear,Directions that are crystal clear,
Tasks that relate to their course workTasks that relate to their course work
(content validity), and(content validity), and
a difficulty level that presents a reasonablea difficulty level that presents a reasonable
challenge.challenge.
20. AuthenticityAuthenticity
Bachman and Palmer(1996,p.23) define as “Bachman and Palmer(1996,p.23) define as “
the degree of correspondence of thethe degree of correspondence of the
characteristics of a given language test taskcharacteristics of a given language test task
to the features of a target language task,”to the features of a target language task,”
and then suggest an agenda for identifyingand then suggest an agenda for identifying
those target language tasks and forthose target language tasks and for
transforming them into valid test items.transforming them into valid test items.
Authenticity of a test may be present in theAuthenticity of a test may be present in the
following ways:following ways:
21. AuthenticityAuthenticity
The language in a test is as natural asThe language in a test is as natural as
possible.possible.
Items contextualized rather than isolated.Items contextualized rather than isolated.
Topics are meaningful (relevant,Topics are meaningful (relevant,
interesting ) for the learner.interesting ) for the learner.
Some thematic organization to items isSome thematic organization to items is
provided, such as through a story line orprovided, such as through a story line or
episode.episode.
Tasks represent, or closely approximate,Tasks represent, or closely approximate,
real-world tasks.real-world tasks.
22. WashbackWashback
The term ‘washback’ or backwash refersThe term ‘washback’ or backwash refers
to “the effect of testing on teaching andto “the effect of testing on teaching and
learning” (Hughes, 2003, p.1)learning” (Hughes, 2003, p.1)
For instance, the extent to whichFor instance, the extent to which
assessment affects a student’s futureassessment affects a student’s future
language development.language development.
Factors that provide beneficial washbackFactors that provide beneficial washback
in a test (Brown, 2010):in a test (Brown, 2010):
It can positively influence what and howIt can positively influence what and how
teachers teach, students learn;teachers teach, students learn;
23. WashbackWashback
offer learners a chance to adequately prepare,offer learners a chance to adequately prepare,
give learners feedback that enhance theirgive learners feedback that enhance their
language development,language development,
is more formative in nature than summative,is more formative in nature than summative,
Provide conditions for peak performance byProvide conditions for peak performance by
learners.learners.
In large-scale assessment, washback refers toIn large-scale assessment, washback refers to
the effects that tests have on instruction inthe effects that tests have on instruction in
terms of how students prepare for theterms of how students prepare for the
test−e.g., cram courses and teaching to thetest−e.g., cram courses and teaching to the
test.test.
24. WashbackWashback
Washback also includes the effects of anWashback also includes the effects of an
assessment on teaching and learning prior toassessment on teaching and learning prior to
the assessment itself, i.e. on preparation forthe assessment itself, i.e. on preparation for
the assessment.the assessment.
The challenge to teachers is to createThe challenge to teachers is to create
classroom tests that serve as learningclassroom tests that serve as learning
devices through which washback isdevices through which washback is
achieved.achieved.
Washback enhances a number of basicWashback enhances a number of basic
principles of learning acquisition: intrinsicprinciples of learning acquisition: intrinsic
motivation, autonomy, self-confidence,motivation, autonomy, self-confidence,
25. WashbackWashback
Ways to improve washback:Ways to improve washback:
To comment generously and specifically on testTo comment generously and specifically on test
performanceperformance
Through a specification of the numerical scores onThrough a specification of the numerical scores on
the various subsections of the test.the various subsections of the test.
Formative versus summative tests:Formative versus summative tests:
Formative tests provide washback in the form ofFormative tests provide washback in the form of
information to the learner on progress towardsinformation to the learner on progress towards
goals.goals.
Summative tests provide washback for learners toSummative tests provide washback for learners to
initiate further pursuits, more learning, more goals,initiate further pursuits, more learning, more goals,
and more challenges to face.and more challenges to face.