Assessing speaking

AN OVERVIEW OF ASSESSING
SPEAKING

Speaking is ….
Important part of life
Important part of a language curriculum
-> Assessment needs to reflect that
However, it is challenging because …
So many dimensions, so little time
 Links with personality, culture, context, etc

Basic concepts in assessing speaking
Speaking tasks
Scoring and rating scales
Reliability and Validity

1.1. Approaches in assessing speaking
1.2. Speaking tasks
1.3. Scoring
1.4. Reliability and Validity

Read the book (p.7-10) and summarize
the key information
Add more information you know about
approaches in assessing speaking

Construct-based approach Task-based approach
• To primarily focus on the
construct of language ability ->
make sure that the scores
really reflect skills in speaking
or spoken interaction
• Test developers need to define
the construct with reference to
theoretical models, course
syllabuses, and/or careful
needs analysis
• To use tasks and language-use
contexts as the first level of
categorisation
• Test developers need to show
that the content of the test
tasks represents the demands
of the corresponding task
outside the test situation
CONSTRUCT TASK
1.1. APPROACHES IN ASSESSING SPEAKING

SPEAKING
SKILLS
READING
SKILLS
WRITING
SKILLS
LISTENING
SKILLS
SPEAKING
READING
LISTENING WRITING
STAND-ALONE ASSESSMENT
INTEGRATED ASSESSMENT

STAND-ALONE
• Avoid mixing extended reading,
writing or listening activities
• Easy to score
INTEGRATED
• Deal with a task (a situation)
• Attend to examinees’
comprehension of the input
material and its effect on
fluency, quality of content
• Far less agreement about final
ratings

Activity 1: Analyze the assessment approaches in
the following test questions:
1. IELTS speaking test questions
Part 1: Let’s talk about your home town or village:
what kind of place is it?
what’s the most interesting part of your
town/village?
what kind of jobs do the people in your town/village
do?
would you say it’s a good place to live? (why?)

2. TOEFL IBT® speaking test questions
Question 1: Talk about a pleasant and memorable event that
happened while you were in school. Explain why this event
brings back fond memories
3. VSTEP
Part 2:
Situation: You are having a birthday party and many of your
friends are invited. Three locations are suggested: at home,
in a restaurant, and in a karaoke bar. Which do you think is
the best place for the party?

1.2. TEST MODE
LIVE
• The most common way of
assessing
• Two-directional (One-to-
one/ One-two)
• The construct assessed:
spoken interaction
TAPE-BASED
• Monolog speaking tasks
• One-directional
• More clearly concerned
with spoken production

DEFINITIONS
 Nunan (1993: 59) defines a communicative task as:
“. . . a piece of classroom work which involves learners in comprehending,
manipulating, producing or interacting in the target language while their
attention is principally focused on meaning rather than form. . . . Minimally, a
task will consist of some input data and one or more related activities and
procedures. Input refers to the data that learners are to work on: it may be
linguistic (e.g. a radio broadcast), non-linguistic (e.g. a set of photographs),
or ‘hybrid’ (e.g. a road map). In addition, tasks will have, either explicitly or
implicitly (and in most cases these are implicit), goals, roles of teachers
and learners and a setting.”

Definition
- Input data: linguistic vs. non-linguistic, hybrid
- Goals: explicit vs implicit
- Setting: specific to each task
- Roles: of teachers and learners/ of examiner(s) and
examinee(s)

DEFINITIONS
Bachman & Palmer(1996:44):
“Speaking tasks can be seen as activities that
involve speakers in using language for the purpose
of achieving a particular goal or objective in a
particular speaking situation.”
-> Goal-oriented language use

1.2. Speaking tasks
Types of speaking tasks:
Open-ended speaking tasks
Structured speaking tasks
Semi-structured speaking tasks

What affects the
difficulty of
speaking tasks?

Features influencing task difficulty:
 The complexity of task materials
 Task familiarity
 Cognitive complexity and planning time
Interlocutor effects
-> Small effects
Difficult to predict

Task
difficulty
The exam
participant(s)
The
interlocutor(s)
The task
TESTING

Task
difficulty
The
performance
The scoring
criteria
The rater
Rating

Speaking scores: express how well the examinees can
speak the language being tested
Scales: express the developers’ understanding of how
good performances differ from weak ones, they form part
of their definition of the construct assessed in the test
Rating scale and scoring rubric are used
interchangeable and they both refer to “a set of generic
descriptions of student performance which can be used to
assign scores to an individual student’s performance in a
systematic fashion” (Carr 2011).
1.3. Scoring

Reliability:
“Reliability is usually defined as score consistency”
(AERA, 1999; Brown and Hudson, 2002; Henning,
1987)
Inter-rater reliability
Intra-rater reliability

Validity:
Refers to the meaningfulness of the scores,
which defines a broad scope of concerns.
Aspects of validity:
content coverage  content analysis, data analysis
correspondence between the test and the non-test
activities  claims & evidence
the impact of the test on the test stake-holders 
washback effects

ASSESSING SPEAKING
Speaking
tasks

2.1. Task types
2.2. Developing speaking tasks
2.3. Task-related materials

Describe a journey that you remember well.
You should say:
• where you went
• how you travelled
• why you went on the journey
and explain why you remember this journey well.

Situation: You are having a birthday party and many of your
friends are invited. Three locations are suggested: at home, in a
restaurant, and in a karaoke bar. Which do you think is the best
place for the party?
Decision tasks

1. What is your opinion on the way languages are taught in
schools?
2. How can the type of school you go to affect career success?
3. What changes do you think will happen in the classroom in
the near future?
Explain and
predict

Topic: Mobile phones are useful tools at schools.
Mobile
phones
Mobile office
tools
[your own ideas]
Means of
communication
Access to
Internet

 Luoma (2005), base on the relative amount structure that the tasks
provide for the test discourse, there are 2 types of speaking tasks
 Open-ended speaking tasks
 Structured speaking tasks
 Carr (2011), base on the number of test takers involving in the
assessment as well as the interaction between the examiner and the
examinee, there are 2 types of speaking tasks
 Interview
 Monolog

Brown (2004), base on micro and macro speaking
skills, there are 5 types of speaking tasks
 Imitative
 Intensive
 Responsive
 Interactive
 Extensive

2.1. TASK TYPES
OPEN-ENDED
TASKS
STRUCTURED TASKS SEMI-
STRUCTURED
TASKS
• Narrative
• Instruction
• Comparison
and contrast
• Explanation
• Prediction
• Justification
• Decision
• Role play
• Description
•Read aloud
• Factual short
answer questions
• Interview

What affects the possibility to
apply these task types in
speaking assessment in your
context?

 Influential factors to applying task types in
speaking assessment
 Test purpose
 Practical circumstances (time, place,
materials)
 Construct-related information that the
scores must deliver
(Luoma, 2005)

What should be done to develop speaking
tasks into a speaking test?

Developing speaking tasks – practical issues:
 Choosing what to test
 Writing task specifications
 Writing/producing the actual task materials and tasks
 Find appropriate topics and scenarios
 Imagine the communication during the actual testing situation
 Create all the materials that are needed with the help of the task
specifications.
 The language of task instructions
It is important to keep the instructions and prompts simpler than the
expected performance of the examinees

Task-related documents and materials:
the rubric and the instructions to examinees;
the task materials, which the examinees
use while performing the tasks (if relevant);
an interaction outline (interlocutor
frame), which gives guidelines or scripts for
examiners about the content and wording of
questions or prompts;
plans and instructions for administration.

 Group work
 Read the following interaction outline to
analyze:
- The questioning techniques presented
- The role of the examiner in the test
- The similarity and/or difference of this
interaction outline compared with the one
you often work with

 Interlocutor/ examiner skills
 Comparability between different
examiners to ensure fairness
 The outline handy during the test, but
do not affect the genuine interaction

ASSESSING SPEAKING
Scoring and
rating
scales

3.1. Scoring
3.2. Rating scales
3.3. Developing a rating scales
3.4 Rater training

Speaking scores express how well the
examinees can speak the language
being tested.
Holistic Vs Analytic approaches of
assessing speaking ability

Holistic approach: a test-taker’s performance on a
task (or set of task) is judged in terms of its overall
quality, or the overall level of ability displayed and a
single rating is given to their performance of the task
Analytic approach: is concerned with a specific
performance feature or portion of the overall
construct. Each subscale is considered separately
and receives its own rating.
(Louma, 2004)

Drawing from your experience what pros
and cons do holistic and analytic scoring
hold?
HOLISTIC // PROS
HOLISTIC // CONS
ANALYSTIC // CONS
ANALYSTIC // PROS

HOLISTIC SCORING ANALYTIC SCORING
ADVANTAGES
• Requires fewer decisions,
therefore, faster (and,
hence, maybe cheaper)
• More useful in providing diagnostic
information (in terms of what areas of
student’s speaking are stronger or
weaker.)
• Better able to describe the performance of
second language writers and speakers.
• Greater control over rater behavior
• Easier to use for new and inexperienced
raters
• Better feedbacks on different aspects of a
performance
DISADVANTAGES
• Cannot provide
information on separate
aspects of language
ability.
• Difficult to assign level
• Difficult to weigh
• Time consuming

Rating process: rater’s roles
Interlocutor: holistic rating + maintaining
materials + controlling timing + co-
constructing the interaction
Assessor: standing outside the interaction +
analytics rating
Interlocutor + assessor: rating + maintaining
materials + controlling timing + co-
constructing the interaction

Rating process: rating procedures
Listen to the examinees
Study the rating scales Vs examinees’
performance
Determine the scores
Validate the scores
Finalize the scores

Rating conditions:
Temporal conditions
Saptial conditions
Psychological conditions

 Rating scale and scoring rubric are used interchangeable and they
both refer to “a set of generic descriptions of student performance
which can be used to assign scores to an individual student‟
performance in a systematic fashion” (Carr 2011).
 Types: holistic Vs analytic & combined type of holistic and analytic
scales
 A rating scale has several levels or score bands, each of which has its
own description of what performance looks like at that level.
 The descriptors, or descriptions of performance within a particular
score band, should be based on the construct definition that was
written as part of the context and purpose specifications.

Activity: Here is an example of a rating
scale. Identify the terms that you have
learned.

Rating scale and scoring rubric are used interchangeable and
they both refer to “a set of generic descriptions of student
performance which can be used to assign scores to an individual
student‟ performance in a systematic fashion”.
A rating scale has several levels or score bands, each of which
has its own description of what performance looks like at that
level.
The descriptors, or descriptions of performance within a
particular score band, should be based on the construct
definition that was written as part of the context and purpose
specifications.
(Carr 2011)

SPEAKING SCORING GUIDE
OREGON DEPARTMENT OF EDUCATION

SPEAKING SCORING GUIDE
OREGON DEPARTMENT OF EDUCATION ANALYTIC SCORING / / RATING SCALE
SCALES
SCALESSCALES
SCALES
DESCRIPTOR
COMPONENT/CRITERIA

ACTIVITIY:
Examine 3 scoring scales and decide
1/ which one is holistic/global scoring?
which one is analytic scoring?
2/ which one is more user-friendly?

Adopting a rating scale
Adapting a rating scale
Developing a new rating scale

whether it will work with the task that will be
included in the test?
whether the descriptors for the various
levels seem to describe the speaking ability
of the students who will be taking the test?
does the number of levels in the rating scale
seem appropriate?

Adapting an existing rubric can involve:
- minor wording changes, such as adding
or changing a word here and there
- extensive modifications resembling to
creation of a completely new rating scale

DEVELOPING A NEW RATING SCALES
• Construct definition
• How many levels?
• How many criteria?
CONCERNS

a/ Which phases to take?
Intuitive
Qualitative
Quantitative
(Taylor, 2011)

The scale is developed by one expert or by a team of
experts. Steps:
Review materials: existing scales, teaching materials,
curriculum documents, and other relevant source materials
Propose the criteria
Determine the number of scales
Develop the descriptors
Discuss and revise
Trial the scale
Stabilize the scale – V1

External Expert reviewing the existing
scale:
Rank the descriptors in order of difficulty
Trial the scale
Discuss and revise the scale
Stabilize the scale V2

Standard-setting phase:
Train the raters
Trial the scale
Analyze the scores
Stabilize the scale  V3

HANDOUT 1,2,3
ACTIVITY:
DO YOU THINK THIS IS A
GOOD RATING SCALE?
WHY?

ASSESSING SPEAKING
Reliability
& Validity

Assessing speaking

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Assessing speaking

Semelhante a Assessing speaking (20)

Mais de englishonecfl

Mais de englishonecfl (20)

Último

Último (20)

Assessing speaking

Notas do Editor