1. 1
Topic 9: Testing the four language skills
(Hughes, 1993/ 2002, Chapters 9 to 12)
Hoa Nguyen
2. 2
1. Testing listening
1. Specify what candidates should be able to do
• Content
– Operation: macros skills and microskills
– Types of text: dialogue, short talk, lecture
– Addressees: students, ESP students…
– Topics: general interests or specific
• Setting criteria levels of performance: full mark or
partial credit for each item, requirement for a
“pass”.
3. 3
Testing listening (cont)
2. Setting the task
– Selecting samples of speech
– Speech should be chosen based on test specifications (authentic
such as radio, TV program or teaching materials)
– Quality of the recording should be taken into consideration to ensure
the validity and reliability of the test.
– Writing items
– Note key information
– Time gap between items
– Recording vs. live presentations
» Multiple choice/ true false
» Short answer
» Information transfer
» Note taking
» Partial dictation
3. Scoring the listening tests
– Error of grammar or spelling should not be penalized or deducted
provided that correct response was intended.
4. 4
2. Testing reading
1. Specify what candidates should be able to do
• Content
– Operation: macros skills (locate specific information,
obtain the gist, identify stages of an argument) vs. and
micro skills (identify referents of pronouns, use context
to guess the meaning of unfamiliar words)
– Types of text: text books, newspapers, academic
journal, magazines, newspaper advertisements….
– Addressees: intended audience for reading materials
– Topics: general interests or specific
• Setting criteria levels of performance: full mark or
partial credit for each item, requirement for a
“pass” (how many %).
5. 5
2. Testing reading (cont.)
2. Setting the task
2.1 Selecting texts
-Keep specifications in mind, choose a representative of
sample
- Text length
- Choose a number of reading passages to increase reliability
- A passage with plenty of discrete pieces of information is
good for scanning skills
- Avoid texts requite specific information in a particular area or
specific culture background (culturally laden)
6. 6
2. Testing reading (cont.)
2.2 Writing items
• Set tasks which will involve candidates in providing
evidence of successful reading
• Questions types:
• Multiple choice/ True false (Problems?)
• Unique answer (single word or number)
• Short answer
• Guided shorts answers
• Summary cloze
• Information transfer
• Identify order of events, topics, or arguments
• Identifying referents
• Guessing the meaning of unfamiliar words from context.
• Others: matching the headings
7. 7
2. Testing reading (cont.)
Procedure for writing items
– Read the text carefully
– Keep specifications in mind
– Note the interesting/key information
– Determine reasonable tasks for test takers relating to
interesting/ key information
– Draft items:
• Items should follow the order of the text if scanning skills
required.
• Responses should demand minimal demands on writing ability
• Avoid any items for which correct response can be found without
understanding the text or reading the text
– Present drafted items to colleagues for moderation
– Revise the items based on feedback
8. 8
2. Testing reading (cont.)
2.3 Scoring the reading tests
• Error of grammar, spelling, or punctuation should not
be penalized provided that the candidate has
successfully performed the reading tasks which the
item set.
9. 9
3. Testing speaking
1. Specifying all appropriate tasks
• Content:
– Operation (expressing opinions, comment, attitude…)
– Text types
– Addressees and topics
– Criteria levels of performance
• Format
– Interview
– Interaction with peers
– Response to tape-recordings: (uniformity of elicitation
vs. inflexibility: no way of following up candidates’
responses)
10. 10
3. Testing speaking
2. Obtaining a sample of candidate’s speaking ability
• Planning and conducting the test carefully
– Make the oral test as long as is feasible
– Include a wide sample of specified content
– Plan the test carefully: follow some pattern
– Give the candidate as many fresh ‘start’ as possible
– Select interviewers carefully and train them
– Use two interviewers
– Tasks an topics should be familiar to test takers (not difficult in their
own language)
– Carry out interview in a quiet and acoustic room
– Put candidates at ease: easy questions to start with, natural
transitions between topics, end the interview at a level candidates feel
comfortable (leaving him or her with a sense of accomplishment).
– Collect enough relevant information
– Do not talk too much.
11. 11
3. Testing speaking (cont.)
• Eliciting techniques
– Avoid Yes-No questions
– Use picture for description
– Role play
– Discussion between candidates
– Use tape-recorded stimuli (look at example on page
109 of Hugh’s book and give test taker some time (eg.
10 second before giving a response)
3. Obtaining valid and reliable scoring
• Describing the criteria levels
• Training of scorer
12. 12
4. Testing writing
1. Setting the task
• Selecting all appropriate tasks and selecting a
sample, keeping specifications in mind
• Operations: describe, explain, compare &
contrast, argue
• Text types: form (letter, postcard, note, forms,
argument); types (announcement, description,
narration, comment/opinion); length
• Addresses: friends, university lecturer,
unspecified
• Topics: various
13. 13
4. Testing writing (cont.)
2. Eliciting sample of writing
• Set as many tasks as is feasible: several tasks
to increase validity but we must ensure
practicality.
• Test only writing ability: instruction should be
clear and NOT too long.
• Restrict candidates (writing tasks should be
well-defined: what is required, scope, aspect)
• Give no choice of task or a very limited choice
of task.
14. 14
. Testing writing (cont.)
3. Scoring reliably
• Holistic scoring (‘impressionistic’ scoring):
• Single score on the basis of overall impression.
• Eg. Two tasks set up to see if test taker’s writing ability is
adequate for study at university, each task is scored twice, scorer
reliability is 0.9.
• Analytic methods of scoring
• A separate score for each of a number of aspects of a task are
said to be analytic (eg. scale 1-6 applied to grammar, vocabulary,
mechanic, fluency, organization).
• The conduct of scoring
– Holistic or analytic
– Identifying bench mark
– Marking environment (well-lit, quiet) and scorer’s state of mind
– Multiple scoring