2. Speaking is ….
Important part of life
Important part of a language curriculum
-> Assessment needs to reflect that
However, it is challenging because …
So many dimensions, so little time
Links with personality, culture, context, etc
3. Basic concepts in assessing speaking
Speaking tasks
Scoring and rating scales
Reliability and Validity
5. 1.1. Approaches in assessing speaking
1.2. Speaking tasks
1.3. Scoring
1.4. Reliability and Validity
6. Read the book (p.7-10) and summarize
the key information
Add more information you know about
approaches in assessing speaking
7. Construct-based approach Task-based approach
• To primarily focus on the
construct of language ability ->
make sure that the scores
really reflect skills in speaking
or spoken interaction
• Test developers need to define
the construct with reference to
theoretical models, course
syllabuses, and/or careful
needs analysis
• To use tasks and language-use
contexts as the first level of
categorisation
• Test developers need to show
that the content of the test
tasks represents the demands
of the corresponding task
outside the test situation
CONSTRUCT TASK
1.1. APPROACHES IN ASSESSING SPEAKING
9. STAND-ALONE
• Avoid mixing extended reading,
writing or listening activities
• Easy to score
INTEGRATED
• Deal with a task (a situation)
• Attend to examinees’
comprehension of the input
material and its effect on
fluency, quality of content
• Far less agreement about final
ratings
1.1. APPROACHES IN ASSESSING SPEAKING
10. Activity 1: Analyze the assessment approaches in
the following test questions:
1. IELTS speaking test questions
Part 1: Let’s talk about your home town or village:
what kind of place is it?
what’s the most interesting part of your
town/village?
what kind of jobs do the people in your town/village
do?
would you say it’s a good place to live? (why?)
11. 2. TOEFL IBT® speaking test questions
Question 1: Talk about a pleasant and memorable event that
happened while you were in school. Explain why this event
brings back fond memories
3. VSTEP
Part 2:
Situation: You are having a birthday party and many of your
friends are invited. Three locations are suggested: at home,
in a restaurant, and in a karaoke bar. Which do you think is
the best place for the party?
12. 1.2. TEST MODE
LIVE
• The most common way of
assessing
• Two-directional (One-to-
one/ One-two)
• The construct assessed:
spoken interaction
TAPE-BASED
• Monolog speaking tasks
• One-directional
• More clearly concerned
with spoken production
14. DEFINITIONS
Nunan (1993: 59) defines a communicative task as:
“. . . a piece of classroom work which involves learners in comprehending,
manipulating, producing or interacting in the target language while their
attention is principally focused on meaning rather than form. . . . Minimally, a
task will consist of some input data and one or more related activities and
procedures. Input refers to the data that learners are to work on: it may be
linguistic (e.g. a radio broadcast), non-linguistic (e.g. a set of photographs),
or ‘hybrid’ (e.g. a road map). In addition, tasks will have, either explicitly or
implicitly (and in most cases these are implicit), goals, roles of teachers
and learners and a setting.”
15. Definition
- Input data: linguistic vs. non-linguistic, hybrid
- Goals: explicit vs implicit
- Setting: specific to each task
- Roles: of teachers and learners/ of examiner(s) and
examinee(s)
16. DEFINITIONS
Bachman & Palmer(1996:44):
“Speaking tasks can be seen as activities that
involve speakers in using language for the purpose
of achieving a particular goal or objective in a
particular speaking situation.”
-> Goal-oriented language use
19. Features influencing task difficulty:
The complexity of task materials
Task familiarity
Cognitive complexity and planning time
Interlocutor effects
-> Small effects
Difficult to predict
22. Speaking scores: express how well the examinees can
speak the language being tested
Scales: express the developers’ understanding of how
good performances differ from weak ones, they form part
of their definition of the construct assessed in the test
Rating scale and scoring rubric are used
interchangeable and they both refer to “a set of generic
descriptions of student performance which can be used to
assign scores to an individual student’s performance in a
systematic fashion” (Carr 2011).
1.3. Scoring
23. Reliability:
“Reliability is usually defined as score consistency”
(AERA, 1999; Brown and Hudson, 2002; Henning,
1987)
Inter-rater reliability
Intra-rater reliability
24. Validity:
Refers to the meaningfulness of the scores,
which defines a broad scope of concerns.
Aspects of validity:
content coverage content analysis, data analysis
correspondence between the test and the non-test
activities claims & evidence
the impact of the test on the test stake-holders
washback effects
29. Describe a journey that you remember well.
You should say:
• where you went
• how you travelled
• why you went on the journey
and explain why you remember this journey well.
31. Situation: You are having a birthday party and many of your
friends are invited. Three locations are suggested: at home, in a
restaurant, and in a karaoke bar. Which do you think is the best
place for the party?
Decision tasks
32. 1. What is your opinion on the way languages are taught in
schools?
2. How can the type of school you go to affect career success?
3. What changes do you think will happen in the classroom in
the near future?
Explain and
predict
33.
34. Topic: Mobile phones are useful tools at schools.
Mobile
phones
Mobile office
tools
[your own ideas]
Means of
communication
Access to
Internet
39. Luoma (2005), base on the relative amount structure that the tasks
provide for the test discourse, there are 2 types of speaking tasks
Open-ended speaking tasks
Structured speaking tasks
Carr (2011), base on the number of test takers involving in the
assessment as well as the interaction between the examiner and the
examinee, there are 2 types of speaking tasks
Interview
Monolog
40. Brown (2004), base on micro and macro speaking
skills, there are 5 types of speaking tasks
Imitative
Intensive
Responsive
Interactive
Extensive
42. What affects the possibility to
apply these task types in
speaking assessment in your
context?
43. Influential factors to applying task types in
speaking assessment
Test purpose
Practical circumstances (time, place,
materials)
Construct-related information that the
scores must deliver
(Luoma, 2005)
44. What should be done to develop speaking
tasks into a speaking test?
45. Developing speaking tasks – practical issues:
Choosing what to test
Writing task specifications
Writing/producing the actual task materials and tasks
Find appropriate topics and scenarios
Imagine the communication during the actual testing situation
Create all the materials that are needed with the help of the task
specifications.
The language of task instructions
It is important to keep the instructions and prompts simpler than the
expected performance of the examinees
47. Task-related documents and materials:
the rubric and the instructions to examinees;
the task materials, which the examinees
use while performing the tasks (if relevant);
an interaction outline (interlocutor
frame), which gives guidelines or scripts for
examiners about the content and wording of
questions or prompts;
plans and instructions for administration.
48. Group work
Read the following interaction outline to
analyze:
- The questioning techniques presented
- The role of the examiner in the test
- The similarity and/or difference of this
interaction outline compared with the one
you often work with
49.
50. Interlocutor/ examiner skills
Comparability between different
examiners to ensure fairness
The outline handy during the test, but
do not affect the genuine interaction
53. Speaking scores express how well the
examinees can speak the language
being tested.
Holistic Vs Analytic approaches of
assessing speaking ability
54. Holistic approach: a test-taker’s performance on a
task (or set of task) is judged in terms of its overall
quality, or the overall level of ability displayed and a
single rating is given to their performance of the task
Analytic approach: is concerned with a specific
performance feature or portion of the overall
construct. Each subscale is considered separately
and receives its own rating.
(Louma, 2004)
55. Drawing from your experience what pros
and cons do holistic and analytic scoring
hold?
HOLISTIC // PROS
HOLISTIC // CONS
ANALYSTIC // CONS
ANALYSTIC // PROS
56. HOLISTIC SCORING ANALYTIC SCORING
ADVANTAGES
• Requires fewer decisions,
therefore, faster (and,
hence, maybe cheaper)
• More useful in providing diagnostic
information (in terms of what areas of
student’s speaking are stronger or
weaker.)
• Better able to describe the performance of
second language writers and speakers.
• Greater control over rater behavior
• Easier to use for new and inexperienced
raters
• Better feedbacks on different aspects of a
performance
DISADVANTAGES
• Cannot provide
information on separate
aspects of language
ability.
• Difficult to assign level
• Difficult to weigh
• Time consuming
59. Rating process: rating procedures
Listen to the examinees
Study the rating scales Vs examinees’
performance
Determine the scores
Validate the scores
Finalize the scores
61. Rating scale and scoring rubric are used interchangeable and they
both refer to “a set of generic descriptions of student performance
which can be used to assign scores to an individual student‟
performance in a systematic fashion” (Carr 2011).
Types: holistic Vs analytic & combined type of holistic and analytic
scales
A rating scale has several levels or score bands, each of which has its
own description of what performance looks like at that level.
The descriptors, or descriptions of performance within a particular
score band, should be based on the construct definition that was
written as part of the context and purpose specifications.
62. Activity: Here is an example of a rating
scale. Identify the terms that you have
learned.
63. Rating scale and scoring rubric are used interchangeable and
they both refer to “a set of generic descriptions of student
performance which can be used to assign scores to an individual
student‟ performance in a systematic fashion”.
A rating scale has several levels or score bands, each of which
has its own description of what performance looks like at that
level.
The descriptors, or descriptions of performance within a
particular score band, should be based on the construct
definition that was written as part of the context and purpose
specifications.
(Carr 2011)
64. Activity: Here is an example of a rating
scale. Identify the terms that you have
learned.
66. SPEAKING SCORING GUIDE
OREGON DEPARTMENT OF EDUCATION ANALYTIC SCORING / / RATING SCALE
SCALES
SCALESSCALES
SCALES
DESCRIPTOR
COMPONENT/CRITERIA
67. ACTIVITIY:
Examine 3 scoring scales and decide
1/ which one is holistic/global scoring?
which one is analytic scoring?
2/ which one is more user-friendly?
73. Adopting a rating scale
Adapting a rating scale
Developing a new rating scale
74. whether it will work with the task that will be
included in the test?
whether the descriptors for the various
levels seem to describe the speaking ability
of the students who will be taking the test?
does the number of levels in the rating scale
seem appropriate?
75. Adapting an existing rubric can involve:
- minor wording changes, such as adding
or changing a word here and there
- extensive modifications resembling to
creation of a completely new rating scale
76. DEVELOPING A NEW RATING SCALES
• Construct definition
• How many levels?
• How many criteria?
CONCERNS
77. a/ Which phases to take?
Intuitive
Qualitative
Quantitative
(Taylor, 2011)
78. The scale is developed by one expert or by a team of
experts. Steps:
Review materials: existing scales, teaching materials,
curriculum documents, and other relevant source materials
Propose the criteria
Determine the number of scales
Develop the descriptors
Discuss and revise
Trial the scale
Stabilize the scale – V1
79. External Expert reviewing the existing
scale:
Rank the descriptors in order of difficulty
Trial the scale
Discuss and revise the scale
Stabilize the scale V2
83. Validity:
Refers to the meaningfulness of the scores,
which defines a broad scope of concerns.
Aspects of validity:
content coverage content analysis, data analysis
correspondence between the test and the non-test
activities claims & evidence
the impact of the test on the test stake-holders
washback effects
Notas do Editor
Activity 2: Group discussion
Discuss in groups of 4 or 5 to brainstorm ideas for the questions: What (concepts) should be included in speaking assessment?
Focus on approaches, the issue of difficulty, and fairness
Explain the term “language ability” and what it refers to in assessing speaking
Language competences: communicative competence, sociolinguistic competence, etc
Speaking skills: fluency, accurate use of vocabulary, accuracy in pronunciation, turn-taking skills, responding and initiating, etc
“Can-do” statements: describe an object, describe a person, express an opinion, etc
Activity 2: Discuss in groups of 4-6 people in 5 minutes, then 1 person reports to the class
Cognitive complexity: It is the extent to which a person differentiates and integrates an event
Luoma, 2004
Interactions between a number of task features, the ability of the examinee, the performance of the interlocutor and the conditions under which the tasks are performed
Luoma, 2004
The properties of the test discourse, the contents of the scoring criteria, and the performance of the raters in intepreting these and applying them to the performances
Focus on approaches, the issue of difficulty, and fairness
Chỉ nêu qua các nội dung vì mình chỉ tập trung vào task materials, lờ đi phần scripts của giám khảo cho VSTEP vì phần này mình không được công bố bây giờ.
Activity 2: Group discussion
Discuss in groups of 4 or 5 to brainstorm ideas for the questions: What (concepts) should be included in speaking assessment?
Focus on approaches, the issue of difficulty, and fairness