Specs design

Language Testing and Assessment
Summary of Units B4 and C4

Unit B4:

Optimal Specification
Design

The central question in this unit is:

“What is the optimal design for a test
specification and what elements
should it include?”

The Difference between
Prompt Attributes ‘PA’
&
Response Attributes ‘RA’

In Popham’s model, the Prompt Attribute
describes the input to the examinee while
the Response Attribute describes what the
examinee does as a result.

Bachman and Palmer (1996) phrased this
same distinction in different terms using
‘characteristics of the input’ versus
‘characteristics of the response.’

Clearly, the author disagrees with the
majority. It’s also clear that he’s open to
change of perspective. How do we know
this?

a) The author states precisely a willingness to
change.
(b) In line 10 ‘there may be some value to the
opposing view if...’ suggests a willingness to
change.
(c) The title suggests a willingness to change.
(d) The comment near the end indicates
willingness to change: ‘I may be persuaded if...’

1- The RA for this item might simply
read: “The student will select the
correct answer from among the choices
given.”

If that were the case, then the spec
writer has decided on a rather
minimalist approach to the PA/RA
distinction, describing the actual action
performed by the examinee.

2- An alternative RA might read like this:

a- The student will study all four choices.
b- If a particular choice references a
particular line in the passage, the student will
study that line carefully.
c- He or she will reread the passage to
eliminate three choices.
d- Then the student will select the correct
answer from among the choices.

Either of these RAs
could work in conjunction
with a PA
similar to the following:

“1-The item stem poses a question about
the author’s viewpoints, which will require
inference from the text.

2- Choices ‘a’, ‘b’, and ‘d’ are distracters that
attribute to the passage a comment that the
author didn’t make, or which is taken out of
context and misinterpreted.

Choice ‘a’ refers to a comment the author
made, without actual reference in the text
while choices ‘b’ and ‘d’ refer to some part of
the text, (e.g., a line number a paragraph, a
header, a title).

3- Choice ‘c’ will be the key or correct
response; it may use any of the locator
features given above (line number,
paragraph, header, title, etc.), or it can
simply refer to the whole passage.”

This PA/RA formula is a classical model of
spec for multiple-choice items.

In this formula, all guidelines about the item
are in the PA: the entire description of its
stem, its choices, why the incorrect choices
are incorrect, and why the key is correct is
considered to be part of the prompt and not
the response.

The choices themselves seem to be part of the
examinee’s thinking.

In our multiple-choice item, the examinee will
probably double-check whether the author did
indeed say what is claimed in line 10 or near
the end and if so, whether it is being interpreted
correctly. In effect, the item itself is a kind of
outline of the examinee’s answering strategy; a
layout of the response.

Guidance about both the prompt and the
response are important in a test specification.

It is possible to fuse the PA and RA and simply
give clear specification guidance on both;
actually, we could create a new spec element
(the ‘PARA’) in which we can put all this
guidance.

The basic element of spec design is
producing samples and the guiding
language that goes with them.

Guiding language and samples, constitute
a minimalist definition of a specification, in
an attempt to disentangle prompt from
response.

‘Event’ vs. ‘Procedure’
&
Specplates
as a universal design

A testing event is a single task or test
item such as a multiple-choice.

A procedure is a set of events or tasks
such as an oral interview or a portfolio
assessment for teacher observation.

Test developers organize items into a test using
a ‘table of specs’ that presents information, at a
very global level:

- How many of each item type and skill are
needed?

- What special materials are required to develop
the test?

Specplates

A ‘specplate’ is a combination of the
words ‘specification’ and ‘template,’ a
model for a specification, and a
generative blueprint which itself
produces blueprints.

Over time, certain specs fuse into a higher-
order specification. A specplate is a guide
tool to ensure that the new specifications
meet a common standard established by the
existing specs. One type of information that
might appear in a specplate is guidance on
task type.

PA (excerpt)

For a M.C. task on verb tense and voice
agreement: Each incorrect choice
(distracter) in the item must be incorrect
according to the focus of the item. One
distracter should be incorrect in tense,
another incorrect in voice, and the third
incorrect in both tense and voice.

PA Specplate (excerpt)

“When specifying the distracters, the PA
should contain the following language ‘Each
incorrect distracter in the item must be
incorrect according to the focus of the item.’
Immediately following that sentence, the PA
should clearly specify how each of the three
distracters is incorrect.”

You are encouraged to employ (if feasible) the
dual-feature model of multiple-choice item
creation, namely:

Key : both of two features of the item are
correct (tense/voice)
Distracter 1 : one of two key features of the
item is incorrect (tense/voice)
Distracter 2 : the other of two key features of
the item is incorrect (tense/voice)
Distracter 3 : both of two key features of the
item are incorrect (tense/voice).”

The ‘magic formula’ model of M.C. item
creation is: crafting an item for which, in order
to get the item right, examinees must do two
things correctly.

Once the specplate has been written, it can
serve as the starting point for new specs that
require those features. Rather than starting
from scratch each time, the specplate
generates the specification shell and important
details follow somewhat automatically.

Ownership
Specs ownership is part of human nature
because of a sense of investment in the
test-crafting process.

However, a well-crafted test is never owned
by a single individual. Thus, a simple
historical record of contributions is the best
way to attribute a spec to its various
authors.

Disagreement is sometimes inevitable in
specs design; yet, a compromise between
opposing positions is possible.

There is consensus that the faculty will
observe the test in action and decide after a
while whether more changes are needed.

Summary of Unit B4
The central focus of this unit was the nature
of test specs and their elements.

We have raised and tried to answer the
question: what are the essential minimum
components to specs beyond the bare
minimum of guiding language and samples?

In the conclusion to Unit A4 we listed the
following elements of a specification-driven
testing theory:

■ Specs exist.
■ Specs evolve.
■ The specs are not launched until ready.
■ Discussion lead to transparency.
■ All are welcome to discussion.

We saw in Unit B4, that all specs share
two common features:
1- spec-generated sample items, 2-
relative guiding language.

In this Unit, we will focus on some
design considerations that arise as a
spec evolves.

[V. 1: Guiding language on the scoring
scale]

The objective of this spec is for students to
produce a role-play task on the pragmatics
of making a complaint in a simple everyday
situation.
In a role-play with the teacher, students are
asked to plan and render a complaint about
something that has gone wrong.

Scoring of the interaction will be as follows:
1- not competent – the student displayed little
command of the situation pragmatics.
2- minimally competent – the student used
language of complaint, but the interaction was
hesitant and/or impolite.
3- competent – the student’s interactions were
smooth and generally fluent, and there was no
use of impolite language.
4- superb – the student’s interactions were
smooth and very fluent, and in addition, the
student displayed subtle command of nuance.

[Version 1, sample one]
You’ve recently purchased a radio; back home, you
discover that a part is missing from the box.

[Version 1, sample two]
After getting back home from shopping, you discover
that a jar of peanut butter is open and its seal is
punctured, so you’re worried that it may be unsafe to
eat.

In both cases, you want to return to the store to the
resolve the situation with the manager.

(a) write out a plan of what you will say, then,
(b) role-play the conversation with your teacher.

[V. 2: Guiding language on the scoring scale]

1- not competent – the student displayed little command of
the pragmatics of the situation. If the student wrote a plan, it
was inadequate or not implemented.
2- minimally competent – the student used language of
complaint, but the interaction was hesitant and/or impolite.
The student’s plan may have been adequate, but the student
was unable to implement it.
3- competent – the student’s interactions were smooth and
generally fluent, and there was no evidence of impolite
language use. The student wrote a viable plan and generally
followed it during the interaction.
4- superb – the student’s interactions were smooth and very
fluent, and in addition, the student displayed subtle command
of nuance. The student wrote a viable plan and generally
followed it during the interaction.

After some time, the descriptor for Level 4 is
improved again, Levels 1-3 being unchanged:

[V. 3: Guiding language on the scoring
scale]

4- superb – the student’s interactions were
smooth and very fluent, and the student displayed
subtle command of nuance. He/she wrote a viable
plan and generally followed it during the
interaction. Alternatively, the student wrote little (or
no) plan, but seemed to be able to execute the
interaction in a commanding and nuanced manner .

There are some interesting questions that
arise:

- What is the role of the written plan?
- Why have the instructors adapted the scoring scale
to reflect alternative use of the plan?
- Do you suspect that any changes might be coming
for level 3 on the scale?
- Do you suspect that the plan may prove to be an
optional testing task, in general?
- Do you think that the plan may prove unworkable?

Planning causes debate which in turn
causes change

A newcomer arrives at the faculty at a point
in time between Version 2 and Version 3, an
energetic instructor who plays the role of a
productive debater in meetings.

This new instructor asks, “Do we plan when
we do complaints in real life, and if yes, do
we write it?”

The newcomer causes the teachers to watch
carefully the use of this task in the next test
administration, and sure enough, there are high-
level students for whom the plan is irrelevant and
a waste of time.

New questions arise here:
- What obligations do teachers have to challenge
each other and help make tests better?
- What ownership should be given to this new
teacher or to any new teacher?

However… change stagnates

Gradually, teachers stop teaching the written
plan in their lessons, and most students do not
produce one during the test.

The instructors simply stop looking at the spec,
they stop using a written plan, and the task
evolves beyond reference to the spec.

Then, one teacher remembers to teach written plans
and the students feel they did better on the test
thanks to plan writing.

- Should students be welcome to discussions of test
evolution and change?
- Should teachers re-visit and re-affirm the wording of
the spec, which does permit a plan?
- Or should they follow their own instinct and ignore
this student feedback, encouraging role-plays without
written plans?
- Should teachers continue to heed the advice of their
‘energetic colleague’ and teach their students to do
such tasks without written plans, because that is more
authentic?

Application

Conduct a reverse engineering day-long
workshop with your colleagues on a test
task.

1- Introduction and Welcome:
Orient your colleagues to selected tasks. The
goal of this part is not to revise the tasks but to
make sure they know what the tasks are.

Orient the participants to the basic design of
specs: samples and guiding language. Don’t show
actual specs because people will think that the
spec samples you show are how all specs should
be written. In addition to the critical analysis that is
the target of the day, you want an organic, bottom-
up growth of specs.

2- Group Phase 1:
Divide the participants into groups or pairs,
each being assigned the same set of tasks.
Ask each group to do straight reverse
engineering and write out what they think is
the guiding language for the tasks without
recommending any changes.

This should be followed by a report back.

3- Vent Your Spleen:
In the whole group, allow people to vent
about test tasks they have never liked –
tasks they did not analyze in Phase 1.

Based on the judgmental splenetic
discussion that will certainly result, select a
new set of tasks, and proceed to the next
step.

4- Group Phase 2:
Divide the participants into groups, each having
to do critical reverse engineering of some tasks
about which they feel particularly splenetic. The
goal is a set of specs that improve testing at
your situation. A report back should follow.

5- ‘What’s Next?’
The group discusses which specs stand a
reasonable chance of implementation. Not
everything that arises will be feasible. Some
things will be difficult to implement. But some
should survive.

Summary
This Unit was a practical application on Units A4
and B4, a way to drill all the theoretical notions
and concepts that we have studied in both units.
The Unit proposes more exercise related to
validity as in Unit C1.

Specs design

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (15)

Destaque

Destaque (14)

Semelhante a Specs design

Semelhante a Specs design (20)

Specs design