Session 2 Highlights
1. Session 1 revisited
2. Test production cycle
3. Approaches to listening item
writing
Activity 1.1. CEFR-VN Listening Re-
Familiarisation
• Work in groups of 4-5.
• Read the given list of descriptors on
worksheet 1.1
• Decide on the appropriate level of each
descriptor (B1, B2, C1)
• Be prepared to explain your answers.
• Please do NOT refer the training materials.
Activity 1.1. CEFR-VN Listening Re-
Familiarisation
• Italicised descriptors: added subject to
ongoing revision
• More descriptors in the training booklet
(Tables 1.1 – 1.5)
Reference during the test development
process.
Activity 1.2. Defining test construct
• Construct = what we want to measure
• Various ways to define
- Competence-based: underlying ability of test
takers
- Task-based: tasks they can perform (Target
Language Use Domain)
Drawbacks (Buck, 2001, pp.102-108)
Default listening test construct: operationalisable
Activity 1.2. Defining test construct
• Spend 5’ reading Section 3.1 (p153/
• Then work in pairs & complete Worksheet 3.1
in 10’.
• Get ready to share your ideas with the whole
class.
Activity 1.2. Defining test construct
1. purpose, TLU domain, resources
2. unique
3. fast, automatic, online processing
4. texts and topics
5. discourse skills, pragmatic knowledge, strategic
competence
6. inferred meanings
7. shared by ALL test-takers
8. general cognitive abilities
9. intelligence & common sense
10. Sociolinguistic variables, pragmatic inferences, TASKS
11. common communicative situations
Activity 1.3. (Listening) test
specifications
• What?
Generative explanatory documents for creation
of test tasks
Test blueprint
• Why?
- Test equivalence: same difficulty, objectives
(>< different content)
- Critical review by test developers & users
Activity 1.3. (Listening) test
specifications
• Popham (1978), Davidson & Lynch (2002)
- Guiding language
+ General description (skills to be tested)
+ Prompt attributes (texts & items)
+ Response attributes (student’s answer)
- Samples
- Specification supplement(s) (Optional)
Activity 1.3. (Listening) test
specifications
• Weir’s socio-cognitive framework (2005)
Test development and validation
More detailed, item writer-friendly
• Now study a sample test spec in pairs.
(Worksheet 3.3)
- Does it include the components in Popham’s
model?
- Does it include any new/different information?
Activity 1.3. (Listening) test
specifications
• Reverse engineering
- Study the listening test task
- Complete the test specifications
2. (Listening) test production cycle
Discussion Question
• How do you often prepare listening tests for
your students? (What steps do you go
through?
2. (Listening) test production cycle
Discussion Question
• What should item writers be responsible for?
3. Approaches to item writing
• Item writing = craft
Learn from experts/ experienced item writers
- Expert vs. non-expert outcomes
- Expert performance process
- Items first vs. Script first
- Discussion of performance processes
Expert vs. Non-expert outcomes
• Test scripts: speakerly
quality (oral mode,
contextual info,
instructions)
• Item type: appropriate
• Adhere to specs
• Keys: constrained, low
inference, salient
points, well-spaced
• Written
• Little context
• Item type:
inappropriate
• Syntactic vs. semantic
• Not follow text order/
not enough processing
time
• Keys : not constrained,
high inference
4. Item writing techniques
• Purposes: comply with test specifications
- Tested sub-skill (detail, main idea, inference,
vocabulary)
- Level of difficulty (B1, B2, C1)
- Spacing
- Constrain keys
• Typical tricks/ tips/ ruses
4. Item writing techniques
• Typical tricks/ tips/ ruses
- Text-item barter: Swap words in the text & items
+ Simpler words: items
+ More challenging synonyms: text
- Plausible distraction: esp. for MCQs
E.g. time of arrival/ departure/ delay
- Text trimming
+ reduce text length
+ constrain the key
e.g. note completion: compact noun phrases shorter key (<=3)
- Script padding: add words to provide more processing time
- Key modification/ word form shift: esp in MCQs avoid
dictation effects (paraphrase)
5. Controlled practice 1
• Work in groups of four.
• Look at the truncated specifications for a
listening test task.
• Then study the provided base text.
• Groups 2 pairs: 2Qs/ each pair (20’)
• Review each other’s questions (10’)
• Share with the group next to you (5’)
• Ready to show the whole class.
Session 4 Highlights
6. Preparing listening texts
7. Working with native speakers
8. Practice with short texts (P1)
- Devise items
- Review items
6. Preparing listening texts
Step 1: Choose base texts
• Start with a topic (consult B1-C1 books)
Remember to avoid
• specialized materials
• biases (culture, gender, age, etc.)
• taboo topics (war, death, politics, religious
beliefs, terminal illnesses, etc.)
6. Preparing listening texts
Step 1: Choose base texts
• Two approaches
- Spoken texts: as similar as possible to edited
texts (short news extracts/ announcements
for P1, conversations for P2, lectures for P3)
- Written texts: magazine/ newspaper articles/
(P2,3) forum chats (P1)
6. Preparing listening texts
• Spoken texts
- Less editing required
(Questionable?)
- Authentic (a quality of
test usefulness)
BUT
- Hard to find relevant ones
- Copyright issue
- Test security
• Written texts
- Extensive editing required
- Inauthentic (written
script): inexperienced
writers
BUT
- Easier to find base texts
- Better security
- Less risk of copyright
infringement
Tip: Immerse yourself in authentic texts of the same genre!
6. Preparing listening texts
Step 2: Craft items // Edit texts
• Look for testable points
• Draft the stems & distractors
• Edit the text accordingly (bartering, padding,
trimming, etc.)
• Check the text level
6. Preparing listening texts
• Useful tools for examining linguistic demands
- Lexical resources: Lex Tutor, English
Vocabulary Profile
- Grammatical resources (syntactic complexity):
Coh-metrix
(often out of order )
- General difficulty: Readability (Attn: original for
reading)
7. Working with native speakers
• Vetting & editing
- Ask for reasons/ explanations
- Consult with dictionaries/ grammar books
• Studio recording
- Best to choose English teachers
- Avoid strong accent (Scottish)
- Calibrate before & during the session
- Supervise, remind & motivate voice actors
7. Working with native speakers
• Speech rates
E.g. (Field, 2009)
A2 (KET) 2.51 wps 150.6 wpm
B1 (PET) 2.79 wps 167.4 wpm
B2 (FCE) 3.46 wps 207.6 wpm
>< Normal speech : Medium 200 wpm or 3.3 wps
Range:
LOW 3.1 sps (2.25 wps - 135)
HIGH 5.4 sps (7.44 wps/446wpm)
8. Hands-on Practice (P1)
• Work in groups of four (1,2,3,4).
• Design one question for each base text
following the test specifications.
• Write down your question on the A0 paper
(10’).
• Stick it on the board.
• Let’s review
8. Hands-on Practice (P2)
• Work in pairs.
• Review the given items CRITICALLY (15’)
Fun Fact
Dear Colleagues,
We all know that good quality test items are difficult to produce, and when finished represent a considerable investment by test
developers/publishers. Writing the item is only the first phase, these then go through a number of review and revision stages,
usually by highly-paid professionals and often by whole committees. Then they are piloted on a (large) number of suitable test
takers. This whole process is expensive and time consuming. Especially, when we consider that at each step of this process, a
number of items are discarded--often amounting to a large proportion of the original number.
Items that make it to final forms represent a considerable investment. The question I have for the list is whether anyone knows
of any data or research available that attempts to calculate the actual cost of final-form items in typical high-stakes testing
environments?
The only thing I have found is a 2005 paper by Richard
Luecht, on the cost of various CBT models, where he
discusses average cost per item (ACPI) typically being
from several hundred dollars to fifteen hundred dollars
per item; and later assumes about $500 per m/c items.
This makes sense to me, but he gives no source for
these assumptions.
Overview of VSTEP Listening Test
• Spend 2-3’ looking at the sample test
(p63/53).
• Answer the following Qs
1. How many parts?
2. How many questions (Qs)?
3. How many recordings & Qs in Part 1/2/3?
4. How many speakers?
5. Single/ Double play?
Overview of VSTEP Listening Test
PART Time Qs R S P Text type
1 10-11 8 8 1-2 S Instructions, announcements, short
conversations in standard accents
2 10 – 12 12 3 2 -3 S Conversations between 2-3 native/ fluent
speakers using a variety of accents
including ESL/EFL accents (eg Vietnamese)
3 12-15 15 3 1 S Public talks/ lectures using a variety of
accents including ESL/EFL accents (eg
Vietnamese)
Whole
test
32-38
+7’
transfer
35 14 - S Variety
Cognitive validation asks…
• Does a test elicit from test takers the kind of process
that they would use in a real-world context? In the
case of listening, are we testing the kinds of process
that listeners would actually use ?
• Or do the recordings and formats that we use lead
test takers to behave differently from the way they
would in real life?
Source: Field, 2013
Phases of listening
(Field 2008, 2013)
Speech signal
Words
Meaning
37
Input decoding
Lexical search
Parsing
Meaning construction
Discourse construction
Source: Field, 2013
Issues of cognitive validity
• A. To what extent do the processes elicited by a test
resemble real-world processes?
• B. To what extent are the processes elicited by a test
comprehensive enough to represent the range of
processes that make up a skill?
• C. Are the processes finely enough calibrated to
reflect what a listener is capable of at the target
level?
University of Bedfordshire 38
Source: Field, 2013