6. PRACTICALITY
◈ The ‘doability’ of a test.
An effective test is practical. This means
that:
It is not excessively expensive,
Stays within appropriate time
constraints,
Is relatively easy to administer, and
Has a scoring/evaluation procedure that
is specific and time-efficient.
7. PRACTICALITY
◈ “The value and quality of
a test sometimes hinge on
such nitty-gritty, practical
considerations.” (Brown,
2003)
8.
9.
10. RELIABILITY
◈ The test consistently yields the same results in a given population
Number of factors that may contribute to the unreliability of a test
(adopted from Mousavi, 2002, p.804):
Student- related reliability- there are factors
that may contribute to students being on
reliable (Affect, illness)
Rater reliability
⬩ inter-rater reliability
⬩ intra-rater reliability
Test administration reliability (environment)
Test reliability (Timed test, too long, directions,)
11.
12.
13. VALIDITY
◈ “an integrated evaluative judgement on the
degree to which empirical evidence and
theoretical rationales support the adequacy
and appropriateness of the inferences and
actions that are based on the test scores and
other modes of assessment”
14. VALIDITY
A valid test:
◈ Measures exactly what it proposes to measure (content-
related evidence)
◈ Measures whether the criterion of the test has been
achieved (criterion-related evidence)
Criterion validity can be established through two ways:
⬩ Concurrent validity
⬩ Predictive validity
◈ Is supported by research-based argument (construct-
related evidence)
◈ The impact of assessment before and after (washback)
(consequential validity)
◈ The student who receive the test their relevant and useful
(Face validity)
15.
16.
17. AUTHENTICITY
“how well did the characteristics of a task
correlate to the target language; how likely will
the language task actually be performed in the
real-word”
An Authentic Test is….
◈ Contains language that is natural
◈ Include linguistic items in context; not isolated
◈ Includes tasks that replicate real-world tasks
18.
19.
20. WASHBACK
◈ Help a teacher to become a teacher
◈ Helps students learn from the tests
◈ Provides feedback for students and teachers
◈ Is achieved by providing feedback to the test
takers and test developers
◈ Functions more like a diagnostic component
of the test so that students and test creators
can identify their weaknesses and strength
better.
21.
22. CONCLUSION
A test is good if it contains practicality, good
validity, high reliability, authenticity, and
positive washback
The five principles provides guidelines for
both constructing and evaluating the tests.
Teacher should apply these five principles in
constructing or evaluating tests which will be
used in assessment activities
23. Thank You!
Prepared By: Honey Mae Lingcallo
REFERENCE
Brown, H. Douglas. (2003) Language Assessment; Principles and
Classroom Practices. California: San Francisco State University.
24. APPLYING PRINCIPLES TO THE
EVALUATION OF CLASSROOM TESTS
Prepared by: Aranda, Adeleine Joy C.
25. Applying Principles to the Evaluation of Classroom Tests
Language assessment is a broad
discipline with many branches, interest
areas, and issues that other principles
may be also invoked in evaluating and
designing assessments.
25
26. Applying Principles to the Evaluation of Classroom Tests
It is far too complex to be consolidated
into five principles however, these five
principles will served as an excellent
foundation on which to evaluate
existing instruments and to build your
own.
26
27. Applying Principles to the Evaluation of Classroom Tests
Remember: however, the sequence of
these questions does not imply a priority
order. But when all is said and done, if
validity is not substantiated, all other
considerations may be rendered useless.
27
29. 1. Are the test procedures practical?
• Are the administrative details clearly
established before the test?
• Can students complete the test reasonably
within the set time frame?
• Cant the test be administered smoothly,
without procedural glitches?
29
30. 1. Are the test procedures practical?
• Are all materials and equipment ready?
• is the cost of the test within budgeted
limits?
• Is the scoring/evaluation system feasible in
the teacher’s time frame?
• Are methods for reporting results
determines in advance?
30
32. 2. Is the test Reliable?
Test Administration Reliability
-Part of achieving test reliability depends on
the physical context.
32
33. 2. Is the test Reliable?
Intra-rater Reliability
-For open-ended responses may be
enhanced by the following guidelines:
1. Use consistent sets of criteria for correct
response
2. Give uniform attention to those sets
throughout the evaluation time.
33
34. 2. Is the test Reliable?
3. Read through tests at least twice to
check for consistency
4. If you have made “mid-stream”
modifications of what you consider
as a correct response, go back and
apply the same standards to all.
34
35. 2. Is the test Reliable?
5. Avoid fatigue by reading the test in
several sittings, especially if the time
requirement is a matter of several
hours.
35
36. 3. Does the procedure
demonstrate content
validity?
36
37. 3. Does the Procedure demonstrate content validity?
37
-there are two steps to evaluating content
validity of a classroom test.
1. Are classroom objectives identifies and
appropriately framed?
2. Are lesson objectives represented in the
form of test specifications?
38. Test Specifications
◈ A test specification (“spec”) is a generative
blueprint for test items or tasks from which
many equivalent test items or tasks can be
produced. Specs present an operationalization
of the test content, that is, they present the test
in measurable terms (Davidson, Hudson, &
Lynch, 1985).
38
39. Test Specification
◈ The fundamental concepts of measurement is
an understanding of which is essential to the
development and use of language tests.
(Bachman, 1990)
◈ These include the terms Measurement, Test,
and Evaluation
39
40. Measurement
◈ The process of quantifying the characteristics
of persons according to explicit procedures
and rules.
1. Quantification
2. Characteristics
3. Explicit Rules and Procedures
40
Three
Distinguishing
Features
41. Test
◈ a psychological or educational test is a procedure
designed to elicit certain behavior from which one
can make inferences about certain characteristics
of an individual. (Carroll 1968: 46)
◈ a measurement instrument designed to elicit a
specific sample of an individual’s behavior.
41
42. Evaluation
◈ defined as the systematic gathering of
information for the purpose of making
decisions (Weiss 1972)
◈ the use of qualitative descriptions of
student performance for diagnosing
learning problems.
42
43. 4. Is the procedure Face
Valid and “biased for
best”?
43
44. 4. Is the procedure Face valid and “biased for best”?
44
- Concept of face validity with the
importance of structuring an
assessment procedure to elicit the
optimal performance of the student.
45. 4. Is the procedure Face valid and “biased for best”?
45
The test is face valid if…
directions are clear
the structure of the test is organized
logically
difficulty level is appropriately pitched
the test has no surprises
timing is appropriate
46. 4. Is the procedure Face valid and “biased for best”?
46
To give an assessment procedure that is
“biased for best,” a teacher must… (Swain,
1984)
offers students appropriate review and
preparation for the test
suggests strategies that will be beneficial,
and
How the students
view the test
strategically and
how the teacher
prepare, set-up
and follow the test
itself.
47. 4. Is the procedure Face valid and “biased for best”?
47
structures the test so that the best students
will be modestly challenged and the
weaker students will not be overwhelmed.
48. 4. Is the procedure Face valid and “biased for best”?
48
Before the test…
1. Give students all the information you can
about the test: What will the test cover?
Which topics will be most important? What
kind of items will be on it? How long will it
be?
2. Encourages students to do a systematic
review of material.
Test
Taking
Strategies
49. 4. Is the procedure Face valid and “biased for best”?
49
Before the test…
3. give them practice tests or exercise, if
available.
4. Facilitate formation of a study group, if
possible.
5. caution students to get a good night’s rest
before the test
6. remind students to get to the classroom early.
Test
Taking
Strategies
50. 4. Is the procedure Face valid and “biased for best”?
50
Test
Taking
Strategies
During the test…
1. After the test is distributed, tell students to
look over the whole test quickly in order
to get a good grasp of its different parts.
2. Remind them to mentally figure out, how
much time they will need for each part.
51. 4. Is the procedure Face valid and “biased for best”?
51
During the test…
3. advise them to concentrate as carefully as
possible.
4. warn students a few minutes before the
end of the class period so that they can
finish on time, proofread their answers and
catch careless errors.
Test
Taking
Strategies
52. 4. Is the procedure Face valid and “biased for best”?
52
After the test…
1. When you return the test, include feedback
on specific things.
2. Advise students to pay careful attention in
class to whatever you say about the test
results
Test
Taking
Strategies
53. 4. Is the procedure Face valid and “biased for best”?
53
After Test…
3. Encourage questions from students
4. Advise students to pay special attention in
the future to points on which they are
weak
Test
Taking
Strategies
54. 5. Are the test tasks as
authentic as possible?
54
55. 5. Are the test tasks as authentic as possible?
55
Evaluate the extent to which a test is authentic
by asking following questions:
Is the language in the test as natural as
possible?
Are items as contextualized as possible
rather than isolated?
57. 5. Are the test tasks as authentic as possible?
57
Are topics and situations interesting,
enjoyable, and/or humorous?
Is some thematic organization provided,
such as through a story line or episode?
do tasks represent or closely approximate,
real-world tasks?
58. 6. Does the test offer
beneficial washback to
the learners?
58
59. 6. Does the test offer beneficial wash back to the
learners?
59
An effective test should point the way to
beneficial washback.
A test that achieves content validity
demonstrates relevance to the curriculum in
question and thereby sets the stage for
washback.
A valid test must aim to provide true measure of the particular skill which is intended to measure.
Most complex but it is also the most important principles
There are evidences
Speaking skill- test paper
Criterion reference test- how well the student have learned specific knowledge/skills in a certain course
Concurrent- if result is supported by other performance
Predictive- future outcome. Like a placement test
Construct- theories/ studies/model/ hypothesis.
Consequential- effect of test on performance of student
Face- Students view as fair, relevant or useful for improving their learnings.