SlideShare uma empresa Scribd logo
1 de 35
Unit A4: Test
Specifications
and Designs
Presented by:
Amir Hamid Forough Ameri
ahfameri@gmail.com
October 2015
Introduction
Test specifications – usually called ‘specs’ – are generative explanatory
documents for the creation of test tasks.
Specs tell us the nuts and bolts of
•how to phrase the test items,
•how to structure the test layout,
•how to locate the passages,
•and how to make a host of difficult choices as we prepare test
materials. More importantly,
•They tell us the rationale behind the various choices that we make.
• The idea of specs is rather old
• Ruch (1924) may be the earliest proponent of
something like test specs.
• Fulcher and Davidson believe that specs are
actually a common-sense notion in test
development.
Planning seems wise in test authorship.
 Specs are often called blueprints. Blueprints are used to build structures,
and from blueprints many equivalent structures can be erected.
 The classic utility of specs lies in test equivalence. Suppose we had a
particular test task and we wanted an equivalent task with
• same difficulty level,
• same testing objective, but
• different content.
We want this to vary our test without varying its results.
• We simply want a new version of the same test with the same assurance of
reliability and validity. This was the original purpose of specs.
• Equivalence, reliability, validity are
all assured because, in addition to
the mechanical blueprint-like
function of specs, they also serve
as a formal record of critical
dialogue.
PLANNING IN TEST AUTHORING
 How much can we actually plan in any test?
According to (Ruch, 1924), detailed rules of procedure in the
construction of an objective examination can hardly be
formulated.
The type of questions must be decided on the basis of such
facts as:
 the school subject concerned,
 the purposes of the examination,
 the length and reliability of the proposed examination,
 preferences of teachers and pupils, …
• Kehoe (1995) presents a series of guidelines for creating
multiple-choice test items.
• These guidelines are spec-like in their advice:
1. Before writing the stem, identify the one point to be tested by
that item. In general, the stem should not pose more than one
problem
2. Construct the stem to be either an incomplete statement or a
direct question, avoiding stereotyped phraseology.
Example:
In a recent survey, eighty per cent of drivers were found to
wear seat belts. Which of the following could be true?
(a) Eighty of one hundred drivers surveyed were wearing seat
belts.
(b) Twenty of one hundred drivers surveyed were wearing seat
belts.
(c) One hundred drivers were surveyed, and all were wearing seat
belts.
(d) One hundred drivers were surveyed, and none were wearing
seat belts.
GUIDING LANGUAGE VERSUS SAMPLES
All test specifications have two components:
Note: Guiding language comprises all parts of the test spec
•other than the sample itself.
For the above seat-belt sample, guiding language might include:
[1] This is a four-option multiple-choice test question.
[2] The stem shall be a statement followed by a question about
the statement.
[3] Each choice shall be plausible against real-world knowledge,
and each choice shall be internally grammatical.
[4] The key shall be the only inference that is feasible from the
statement in the stem.
CONGRUENCE (OR FIT-TO-SPEC)
• Congruence (also called fit-to-spec) is the degree to which a new item
fits an existing spec.
• Our next step above was to induce some guiding language such that items
equivalent to the seat-belt question could be written. For example:
• The vast majority of parents (in a recent survey) favour stricter attendance
regulations at their children’s schools. Which of the following could be true?
• (a) Most parents want stricter attendance rules.
• (b) Many parents want stricter attendance rules.
• (c) Only a few parents think current attendance rules are acceptable.
• (d) Some parents think current attendance rules are acceptable.
HOW DO TEST QUESTIONS
ORIGINATE?
REVERSE
ENGINEERING
ARCHETYPES
• Spec writing is an organic process.
• These can cause the spec to grow and to
evolve and to better represent what its
development team wishes to accomplish.
• We can follow this organic process by use of
an audit trail (Li, 2006), which is a narrative of
how we feel the test has improved
REVERSE ENGINEERING
 most tests are created from tests we have already seen or
have already experienced (perhaps as test takers).
 Often, test creation starts by looking at existing tests – a
process known as reverse engineering.
 Reverse engineering (RE) is an idea of ancient origin; the
name was coined by Davidson and Lynch (2002):
 RE is an analytical process of test creation that begins with an
actual test question and infers the guiding language that drives
it, such that equivalent items can be generated.
There are five types of RE:
•Straight RE: this is when you infer guiding
language about existing items without changing
the existing items at all.
•The purpose is solely to produce equivalent test
questions.
• Historical RE: this is straight RE across several
existing versions of a test.
• If the archives at your teaching institution
contain tests that have changed and evolved,
you can do RE on each version to try to
understand how and why the tests changed.
• Critical RE: perhaps the most common form of
RE, as we analyse an item, we think critically:
• Are we testing what we want?
• Do we wish to make changes in our test
design?
• Test deconstruction RE:
• The term ‘test deconstruction’ was coined by Elatia (2003)
• whether critical or straight, whether historical or not, it provides
insight beyond the test setting. We may discover larger
realities.
• why, for instance, would our particular test setting so value
close reading for students in the seat-belt and attendance
items?
• What role does close inferential reading have to the school
setting?
• Parallel RE: In some cases, teachers are asked to produce
tests according to external influences, what Davidson and
Lynch (2002) call the ‘mandate’.
• Teachers may feel compelled to design tests that adhere to
these external standards.
• If we obtain sample test questions from several teachers which
(the teachers tell us) measure the same thing, and then
perform straight RE on the samples, and then compare the
resulting specs, we are using RE as a tool to determine
parallelism.
• 1. Charles sent the package __________ Chicago.
*a. to b. on c.at d. with
• 2. Mary put the paper __________ the folder.
a. at b. for *c. in d. to
• 3. Steven turned __________ the radio.
*a. on b. at c. with d. to
• At first glance, the items all seem to test the English grammar of
prepositions. Item 3, however, seems somewhat different. It is
probably a test of two-word or “phrasal” verbs. If we reverse-
engineered a spec from these items, we might have to decide
whether item 3 is actually in a different skill domain and should be
reverse-engineered into a different spec. (Davidson & Lynch, 2002)
ARCHETYPES
• An ‘archetype’ is a canonical item or task; it is the typical
way to measure a particular target skill.
• If we then step back and look at that item, we will often see
echoes of an item or task that we have used in the past, that
we have done (as test takers) or that we have studied in a
testing textbook.
• Perhaps all supposed spec-driven testing is actually a form of
reverse engineering: we try to write a blueprint to fit item or
task types – archetypes – that are already in our experience.
TOWARDS SPEC-DRIVEN THEORY
• Specs exist. Somehow, somewhere, the system is able to
articulate the generative guiding language behind any sample
test items or tasks.
• Written specs have two great advantages:
 firstly, they spark creative critical dialogue about the testing
 secondly, they are wonderful training tools when newcomers
join the organization.
• Specs evolve. They grow and change, in creative engineering-
like response to changes in theory, in curriculum, in funding,
and so on.
• The specs (and the test they generate) are not launched until
ready. The organization works hard to discuss and to try out
test specs and their items and tasks, withholding a rush to
production, examining all evidence available;
• Discussion happens and that leads to transparency. Spec-
driven testing can lead to ‘shared authority, collaboration, and
involvement of different stake-holders
• And this is discussion to which all are welcome.
Unit A5: Writing
Items and Tasks
INTRODUCTION
• One of the most common mistakes made in language testing is for the test
writer to begin the process by writing a set of items or tasks that are
intuitively felt to be relevant to the test takers.
• An iterative process which is one in which all tasks are undertaken in a
cyclical fashion should be used.
• So test specifications are not set until the final test is ready to be rolled out in
actual use.
• a well-built test can be adapted and changed even after it is operational and
as things change, provided that we have a clear record (specifications) of
how it was built in the first place and evidence is collected to support the link
between the task and the construct.
• A test task is a device that allows the tester to collect evidence. This
evidence is a response from the test taker. The ‘response as evidence’
indicates that we are using the responses to make inferences about the
ability of the test taker to use language in the domains and situations defined
in the test specifications.
EVIDENCE-CENTRED DESIGN (ECD)
• It is important that we see the tasks or items that we design for tests as part
of a larger picture, and one approach to doing this in a systematic way is
ECD: ECD is a methodology for designing assessments that underscores
the central role of evidentiary reasoning in assessment design. ECD is
based on three premises:
(1) An assessment must build around the important knowledge in the domain of
interest;
(2) The chain of reasoning from what participants say and do in assessments
to inferences about what they know, must be based on the principles of
evidentiary reasoning;
(3) Purpose must be the driving force behind design decisions, which reflect
constraints, resources and conditions of use. (Mislevy et al., 2003: 20)
• Evidentiary reasoning or validity argument. → The argument
shows the reasoning that supports the inferences we make
from test scores to what we claim those scores mean.
• Evidentiary reasoning is the reasoning that leads from evidence
to the evaluation of the strength of the validity claim.
• All tests are indirect. From test performance we obtain a score,
and from the score we draw inferences about the constructs the
test is designed to measure. It is therefore a ‘construct-centered
approach’, as described by Messick (1994).
• The way in which we do things reflects how we think we know things.
Peirce (1878) argued that there were four ways of knowing:
■ Tenacity: we believe what we do because we have always believed this and not
questioned it. → What we do is therefore based on what we have always done.
■ Authority: we believe what we do because these beliefs come from an authoritative
source. → What we do therefore follows the practice dictated by this source.
■ A-priori: what we believe appears reasonable to us when we think about it, because
it feels intuitively right. → What we do is therefore what we think is reasonable and
right.
■ Scientific: what we believe is established by investigating what happens in the world.
→ What we do is therefore based on methods that lead to an increase in
knowledge.
→ECD treats knowledge as scientific because it is primarily a method that leads
to the test designer understanding more about relations between variables for
a particular assessment context.
The structure of ECD
ECD claims to provide a very systematic way of thinking about the process of
assessment and the place of the task within that process.
ECD is considered to be a ‘framework’ in the sense that it is structured and formal
and thus enables ‘the actual work of designing and implementing assessments’
(Mislevy et al., 2003: 4) in a way that makes a validity argument more explicit.
It is sometimes referred to as a conceptual assessment framework (CAF).
Within this framework are a number of models, and these are defined as design
objects. These design objects help us to think about how to go about the practical work
of designing a test.
Within ECD-style test specification there are six models or design objects:
1. Student model. This comprises a statement of the particular mix of knowledge,
skills or abilities about which we wish to make claims as a result of the test. In other
words, it is the list of constructs that are relevant to a particular testing situation. This is
the highest-level model, and needs to be designed before any other models can be
addressed, because it defines what we wish to claim about an individual test taker: what
are we testing?
2. Evidence models. Once we have selected constructs for the student model, we
need to ask what evidence we need to collect in order to make inferences from
performance to underlying knowledge or ability: what evidence do we need to test the
construct(s)? In ECD the evidence is frequently referred to as a work product, which
means whatever comes from what the test takers do. It includes two components:
a) evaluation component: The focus here is the evidentiary interrelationships being
drawn among characteristics of students, of what they say and do, and of task and real-
world situations.
b) measurement component that links the observable variables to the student
model by specifying how we score the evidence. This turns what we observe into the
score from which we make inferences.
3. Task models: How do we collect the evidence? Task models therefore describe the
situations in which test takers respond to items or tasks that generate the evidence we
need. Task models minimally comprise three elements: the presentation material,
or input; the work products, or what the test takers actually do; and finally, the task
model variables that describe task features. Task features are those elements that tell
us what the task looks like, and which parts of the task are likely to make it more or
less difficult.
4. Presentation model. Items and tasks can be presented in many different formats.
A text and set of reading items may be presented in paper and pencil format, or on
a computer. The presentation model describes how these will be laid out and
presented to the test takers.
5. Assembly model. An assembly model accounts for how the student model,
evidence models and task models work together. It does this by specifying two
elements: targets and constraints.
 A target is the reliability with which each construct in a student model should be
measured.
 A constraint relates to the mix of items or tasks on the test that must be included in
order to represent the domain adequately: how much do we need to test?
6. Delivery model. This final model is not independent of the others, but explains
how they will work together to deliver the actual test – for example, how the modules
will operate if they are delivered in computer-adaptive mode, or as set paper and
pencil forms. Of course, changes at this level will also impact on other models and
how they are designed. This model would also deal with issues that are relevant at
the level of the entire test, such as test security and the timing of sections of the test
(Mislevy et al., 2003: 13).However, it also contains four processes, referred to as the
delivery architecture (see further discussion in Unit A8). These are the presentation
process, response processing, summary scoring and activity selection.
Models in the conceptual assessment framework of ECD (adapted from Mislevy et al., 2003)
Student
Models
Evidence
Models
Task
Models
Assembly
Model
Presentation
Model
Delivery
Model
Thank You

Mais conteúdo relacionado

Mais procurados

Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)Kheang Sokheng
 
Presentation assessing reading
Presentation  assessing readingPresentation  assessing reading
Presentation assessing readingEdgar Lucero
 
Testing grammar and vocabulary
Testing grammar and vocabularyTesting grammar and vocabulary
Testing grammar and vocabularymarinasr_
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washbackMaury Martinez
 
Approaches and Methods for Language Teaching
Approaches and  Methods for Language TeachingApproaches and  Methods for Language Teaching
Approaches and Methods for Language Teachingvblori
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingCALPER
 
Stages of test development and common test techniques (1)
Stages of test development and common test techniques (1)Stages of test development and common test techniques (1)
Stages of test development and common test techniques (1)Maury Martinez
 
Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)Kheang Sokheng
 
Language testing
Language testingLanguage testing
Language testingJihan Zayed
 
Testing oral ability
Testing oral abilityTesting oral ability
Testing oral abilitypaoladed
 
Practical Language Testing Glenn Fulcher
Practical Language Testing Glenn FulcherPractical Language Testing Glenn Fulcher
Practical Language Testing Glenn Fulchertranslatoran
 
Alternatives in assessment
Alternatives in assessmentAlternatives in assessment
Alternatives in assessmentRochiLuu
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentHamid Najaf Pour Sani
 
Testing writing (for Language Teachers)
Testing writing (for Language Teachers)Testing writing (for Language Teachers)
Testing writing (for Language Teachers)Wenlie Jean
 
Language Assessment - Standardized Testing by EFL Learners
Language Assessment - Standardized Testing by EFL LearnersLanguage Assessment - Standardized Testing by EFL Learners
Language Assessment - Standardized Testing by EFL LearnersEFL Learning
 
Asessing listening.ppt
Asessing listening.pptAsessing listening.ppt
Asessing listening.pptDewiEwhyAtikah
 
Designing classroom language tests
Designing classroom language testsDesigning classroom language tests
Designing classroom language testsSutrisno Evenddy
 

Mais procurados (20)

Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)
 
Presentation assessing reading
Presentation  assessing readingPresentation  assessing reading
Presentation assessing reading
 
Testing grammar and vocabulary
Testing grammar and vocabularyTesting grammar and vocabulary
Testing grammar and vocabulary
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washback
 
Approaches and Methods for Language Teaching
Approaches and  Methods for Language TeachingApproaches and  Methods for Language Teaching
Approaches and Methods for Language Teaching
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language Teaching
 
Stages of test development and common test techniques (1)
Stages of test development and common test techniques (1)Stages of test development and common test techniques (1)
Stages of test development and common test techniques (1)
 
Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)
 
Testing reading
Testing readingTesting reading
Testing reading
 
Language testing
Language testingLanguage testing
Language testing
 
Testing oral ability
Testing oral abilityTesting oral ability
Testing oral ability
 
Practical Language Testing Glenn Fulcher
Practical Language Testing Glenn FulcherPractical Language Testing Glenn Fulcher
Practical Language Testing Glenn Fulcher
 
Alternatives in assessment
Alternatives in assessmentAlternatives in assessment
Alternatives in assessment
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
 
Extensive Listening Assessment
Extensive Listening AssessmentExtensive Listening Assessment
Extensive Listening Assessment
 
Testing writing (for Language Teachers)
Testing writing (for Language Teachers)Testing writing (for Language Teachers)
Testing writing (for Language Teachers)
 
Testing listening slide
Testing listening slideTesting listening slide
Testing listening slide
 
Language Assessment - Standardized Testing by EFL Learners
Language Assessment - Standardized Testing by EFL LearnersLanguage Assessment - Standardized Testing by EFL Learners
Language Assessment - Standardized Testing by EFL Learners
 
Asessing listening.ppt
Asessing listening.pptAsessing listening.ppt
Asessing listening.ppt
 
Designing classroom language tests
Designing classroom language testsDesigning classroom language tests
Designing classroom language tests
 

Destaque

Destaque (16)

2016 resume v3
2016 resume v32016 resume v3
2016 resume v3
 
Mandi's Resume
Mandi's ResumeMandi's Resume
Mandi's Resume
 
Domaine le jardin presentation (1)
Domaine le jardin presentation (1)Domaine le jardin presentation (1)
Domaine le jardin presentation (1)
 
Diario Resumen 20170105
Diario Resumen 20170105Diario Resumen 20170105
Diario Resumen 20170105
 
Los precursores del marginalismo
Los precursores del marginalismoLos precursores del marginalismo
Los precursores del marginalismo
 
El ser humano.ppt
El ser humano.pptEl ser humano.ppt
El ser humano.ppt
 
Ppt on cognitive items, items types and computer based test items
Ppt on cognitive items, items types and computer based test itemsPpt on cognitive items, items types and computer based test items
Ppt on cognitive items, items types and computer based test items
 
1998 2006 kpds_kelimeleri_listesi
1998 2006 kpds_kelimeleri_listesi1998 2006 kpds_kelimeleri_listesi
1998 2006 kpds_kelimeleri_listesi
 
Alexander - Education in the Internet of Everything
Alexander - Education in the Internet of EverythingAlexander - Education in the Internet of Everything
Alexander - Education in the Internet of Everything
 
Toxic Workplace: Bullying
Toxic Workplace: BullyingToxic Workplace: Bullying
Toxic Workplace: Bullying
 
Bishop 2
Bishop 2Bishop 2
Bishop 2
 
Change agent
Change agentChange agent
Change agent
 
FFT2 Cert
FFT2 CertFFT2 Cert
FFT2 Cert
 
change agent
change agentchange agent
change agent
 
FinTech Belgium - Compliance Regulation & Innovation - Monizze
FinTech Belgium - Compliance Regulation & Innovation - MonizzeFinTech Belgium - Compliance Regulation & Innovation - Monizze
FinTech Belgium - Compliance Regulation & Innovation - Monizze
 
The Ultimate Guide To Converting JPG to DWG
The Ultimate Guide To Converting JPG to DWGThe Ultimate Guide To Converting JPG to DWG
The Ultimate Guide To Converting JPG to DWG
 

Semelhante a Test specifications and designs session 4

Bioscience Laboratory Workforce Skills - part II
Bioscience Laboratory Workforce Skills - part IIBioscience Laboratory Workforce Skills - part II
Bioscience Laboratory Workforce Skills - part IIbio-link
 
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...Md Arman
 
English Proficiency Test
English Proficiency TestEnglish Proficiency Test
English Proficiency TestRoselle Reonal
 
Clean tests
Clean testsClean tests
Clean testsAgileee
 
Reading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptReading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptBilal Yaseen
 
Fulcher and Davidson Unit a5
Fulcher and Davidson Unit a5Fulcher and Davidson Unit a5
Fulcher and Davidson Unit a5Farzaneh Hamidi
 
Business Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusBusiness Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusKartikeya Singh
 
Test Construction in language skills.pptx
 Test Construction in language skills.pptx Test Construction in language skills.pptx
Test Construction in language skills.pptxForouzan Dehbashi
 
Item development.pdf for national examination development
Item development.pdf for national examination developmentItem development.pdf for national examination development
Item development.pdf for national examination developmentGalataaAGoobanaa
 
Assessment 2 Briefing and Guidance (Full).pptx
Assessment 2 Briefing and Guidance (Full).pptxAssessment 2 Briefing and Guidance (Full).pptx
Assessment 2 Briefing and Guidance (Full).pptxsnowyflake786
 
Ssr test construction admin and scoring
Ssr test construction admin and scoringSsr test construction admin and scoring
Ssr test construction admin and scoringVictor Jr. Bactol
 
Construction of a proper test
Construction of a proper testConstruction of a proper test
Construction of a proper testReijiNakashi
 
Designing Calssroom Language Test and Test Methods
Designing Calssroom Language Test and Test MethodsDesigning Calssroom Language Test and Test Methods
Designing Calssroom Language Test and Test MethodsIin Widya Lestari
 
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptx
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptxResaerch-design-Presentation-MTTE-1st-sem-2023.pptx
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptxJanVincentFuentes
 
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfMohammedAskar22
 
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...Manjula Branch
 

Semelhante a Test specifications and designs session 4 (20)

Bioscience Laboratory Workforce Skills - part II
Bioscience Laboratory Workforce Skills - part IIBioscience Laboratory Workforce Skills - part II
Bioscience Laboratory Workforce Skills - part II
 
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...
Readingtestspecifications assignment-01-ppt-141130013903-conversion-gate01 - ...
 
La notes (5 10)
La notes (5 10)La notes (5 10)
La notes (5 10)
 
English Proficiency Test
English Proficiency TestEnglish Proficiency Test
English Proficiency Test
 
Clean tests
Clean testsClean tests
Clean tests
 
Reading test specifications assignment-01-ppt
Reading test specifications assignment-01-pptReading test specifications assignment-01-ppt
Reading test specifications assignment-01-ppt
 
Fulcher and Davidson Unit a5
Fulcher and Davidson Unit a5Fulcher and Davidson Unit a5
Fulcher and Davidson Unit a5
 
1 3
1 31 3
1 3
 
HDP PPT.pptx
HDP PPT.pptxHDP PPT.pptx
HDP PPT.pptx
 
Lt j-test construction procedure
Lt j-test construction procedureLt j-test construction procedure
Lt j-test construction procedure
 
Business Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusBusiness Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow Syllabus
 
Test Construction in language skills.pptx
 Test Construction in language skills.pptx Test Construction in language skills.pptx
Test Construction in language skills.pptx
 
Item development.pdf for national examination development
Item development.pdf for national examination developmentItem development.pdf for national examination development
Item development.pdf for national examination development
 
Assessment 2 Briefing and Guidance (Full).pptx
Assessment 2 Briefing and Guidance (Full).pptxAssessment 2 Briefing and Guidance (Full).pptx
Assessment 2 Briefing and Guidance (Full).pptx
 
Ssr test construction admin and scoring
Ssr test construction admin and scoringSsr test construction admin and scoring
Ssr test construction admin and scoring
 
Construction of a proper test
Construction of a proper testConstruction of a proper test
Construction of a proper test
 
Designing Calssroom Language Test and Test Methods
Designing Calssroom Language Test and Test MethodsDesigning Calssroom Language Test and Test Methods
Designing Calssroom Language Test and Test Methods
 
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptx
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptxResaerch-design-Presentation-MTTE-1st-sem-2023.pptx
Resaerch-design-Presentation-MTTE-1st-sem-2023.pptx
 
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdfHND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
HND_MSCP_W5_Reliability_and_Validity_of_Research.pdf
 
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...
RM & IPR Module1 PPT by Prof. Manjula K, Assistant Professor, Dept. of ECE, S...
 

Mais de Amir Hamid Forough Ameri

The task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodThe task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodAmir Hamid Forough Ameri
 
Task based research and language pedagogy ellis
Task based research and language pedagogy ellisTask based research and language pedagogy ellis
Task based research and language pedagogy ellisAmir Hamid Forough Ameri
 
Critical pedagogy in l2 learning and teaching suresh canagarajah
Critical pedagogy in l2 learning and teaching  suresh canagarajahCritical pedagogy in l2 learning and teaching  suresh canagarajah
Critical pedagogy in l2 learning and teaching suresh canagarajahAmir Hamid Forough Ameri
 
Critical literacy and second language learning luke and dooley
Critical literacy and second language learning  luke and dooleyCritical literacy and second language learning  luke and dooley
Critical literacy and second language learning luke and dooleyAmir Hamid Forough Ameri
 
Thesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriThesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriAmir Hamid Forough Ameri
 
The role of corrective feedback in second language learning
The role of corrective feedback in second language learningThe role of corrective feedback in second language learning
The role of corrective feedback in second language learningAmir Hamid Forough Ameri
 
Standards based classroom assessments of english proficiency
Standards based classroom  assessments of english proficiencyStandards based classroom  assessments of english proficiency
Standards based classroom assessments of english proficiencyAmir Hamid Forough Ameri
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jonesAmir Hamid Forough Ameri
 
Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Amir Hamid Forough Ameri
 

Mais de Amir Hamid Forough Ameri (20)

Tsui 2011
Tsui 2011Tsui 2011
Tsui 2011
 
The task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodThe task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewood
 
Task based research and language pedagogy ellis
Task based research and language pedagogy ellisTask based research and language pedagogy ellis
Task based research and language pedagogy ellis
 
Sifakis 2007
Sifakis 2007Sifakis 2007
Sifakis 2007
 
Notional functional syllabus
Notional functional syllabusNotional functional syllabus
Notional functional syllabus
 
Integrated syllabus
Integrated syllabusIntegrated syllabus
Integrated syllabus
 
Exploring culture by ah forough ameri
Exploring culture by ah forough ameriExploring culture by ah forough ameri
Exploring culture by ah forough ameri
 
Critical pedagogy in l2 learning and teaching suresh canagarajah
Critical pedagogy in l2 learning and teaching  suresh canagarajahCritical pedagogy in l2 learning and teaching  suresh canagarajah
Critical pedagogy in l2 learning and teaching suresh canagarajah
 
Critical literacy and second language learning luke and dooley
Critical literacy and second language learning  luke and dooleyCritical literacy and second language learning  luke and dooley
Critical literacy and second language learning luke and dooley
 
Context culture .... m. wendt
Context culture .... m. wendtContext culture .... m. wendt
Context culture .... m. wendt
 
Thesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriThesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameri
 
The role of corrective feedback in second language learning
The role of corrective feedback in second language learningThe role of corrective feedback in second language learning
The role of corrective feedback in second language learning
 
Standards based classroom assessments of english proficiency
Standards based classroom  assessments of english proficiencyStandards based classroom  assessments of english proficiency
Standards based classroom assessments of english proficiency
 
Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jones
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
Extroversion introversion
Extroversion introversionExtroversion introversion
Extroversion introversion
 
Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...
 
Classroom assessment, glenn fulcher
Classroom assessment, glenn fulcherClassroom assessment, glenn fulcher
Classroom assessment, glenn fulcher
 
Behavioral view of motivation
Behavioral view of motivationBehavioral view of motivation
Behavioral view of motivation
 

Último

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 

Último (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Test specifications and designs session 4

  • 1. Unit A4: Test Specifications and Designs Presented by: Amir Hamid Forough Ameri ahfameri@gmail.com October 2015
  • 2. Introduction Test specifications – usually called ‘specs’ – are generative explanatory documents for the creation of test tasks. Specs tell us the nuts and bolts of •how to phrase the test items, •how to structure the test layout, •how to locate the passages, •and how to make a host of difficult choices as we prepare test materials. More importantly, •They tell us the rationale behind the various choices that we make.
  • 3. • The idea of specs is rather old • Ruch (1924) may be the earliest proponent of something like test specs. • Fulcher and Davidson believe that specs are actually a common-sense notion in test development. Planning seems wise in test authorship.
  • 4.  Specs are often called blueprints. Blueprints are used to build structures, and from blueprints many equivalent structures can be erected.  The classic utility of specs lies in test equivalence. Suppose we had a particular test task and we wanted an equivalent task with • same difficulty level, • same testing objective, but • different content. We want this to vary our test without varying its results. • We simply want a new version of the same test with the same assurance of reliability and validity. This was the original purpose of specs.
  • 5. • Equivalence, reliability, validity are all assured because, in addition to the mechanical blueprint-like function of specs, they also serve as a formal record of critical dialogue.
  • 6. PLANNING IN TEST AUTHORING  How much can we actually plan in any test? According to (Ruch, 1924), detailed rules of procedure in the construction of an objective examination can hardly be formulated. The type of questions must be decided on the basis of such facts as:  the school subject concerned,  the purposes of the examination,  the length and reliability of the proposed examination,  preferences of teachers and pupils, …
  • 7. • Kehoe (1995) presents a series of guidelines for creating multiple-choice test items. • These guidelines are spec-like in their advice: 1. Before writing the stem, identify the one point to be tested by that item. In general, the stem should not pose more than one problem 2. Construct the stem to be either an incomplete statement or a direct question, avoiding stereotyped phraseology.
  • 8. Example: In a recent survey, eighty per cent of drivers were found to wear seat belts. Which of the following could be true? (a) Eighty of one hundred drivers surveyed were wearing seat belts. (b) Twenty of one hundred drivers surveyed were wearing seat belts. (c) One hundred drivers were surveyed, and all were wearing seat belts. (d) One hundred drivers were surveyed, and none were wearing seat belts.
  • 9. GUIDING LANGUAGE VERSUS SAMPLES All test specifications have two components:
  • 10. Note: Guiding language comprises all parts of the test spec •other than the sample itself. For the above seat-belt sample, guiding language might include: [1] This is a four-option multiple-choice test question. [2] The stem shall be a statement followed by a question about the statement. [3] Each choice shall be plausible against real-world knowledge, and each choice shall be internally grammatical. [4] The key shall be the only inference that is feasible from the statement in the stem.
  • 11. CONGRUENCE (OR FIT-TO-SPEC) • Congruence (also called fit-to-spec) is the degree to which a new item fits an existing spec. • Our next step above was to induce some guiding language such that items equivalent to the seat-belt question could be written. For example: • The vast majority of parents (in a recent survey) favour stricter attendance regulations at their children’s schools. Which of the following could be true? • (a) Most parents want stricter attendance rules. • (b) Many parents want stricter attendance rules. • (c) Only a few parents think current attendance rules are acceptable. • (d) Some parents think current attendance rules are acceptable.
  • 12. HOW DO TEST QUESTIONS ORIGINATE? REVERSE ENGINEERING ARCHETYPES
  • 13. • Spec writing is an organic process.
  • 14. • These can cause the spec to grow and to evolve and to better represent what its development team wishes to accomplish. • We can follow this organic process by use of an audit trail (Li, 2006), which is a narrative of how we feel the test has improved
  • 15. REVERSE ENGINEERING  most tests are created from tests we have already seen or have already experienced (perhaps as test takers).  Often, test creation starts by looking at existing tests – a process known as reverse engineering.  Reverse engineering (RE) is an idea of ancient origin; the name was coined by Davidson and Lynch (2002):  RE is an analytical process of test creation that begins with an actual test question and infers the guiding language that drives it, such that equivalent items can be generated.
  • 16. There are five types of RE: •Straight RE: this is when you infer guiding language about existing items without changing the existing items at all. •The purpose is solely to produce equivalent test questions.
  • 17. • Historical RE: this is straight RE across several existing versions of a test. • If the archives at your teaching institution contain tests that have changed and evolved, you can do RE on each version to try to understand how and why the tests changed.
  • 18. • Critical RE: perhaps the most common form of RE, as we analyse an item, we think critically: • Are we testing what we want? • Do we wish to make changes in our test design?
  • 19. • Test deconstruction RE: • The term ‘test deconstruction’ was coined by Elatia (2003) • whether critical or straight, whether historical or not, it provides insight beyond the test setting. We may discover larger realities. • why, for instance, would our particular test setting so value close reading for students in the seat-belt and attendance items? • What role does close inferential reading have to the school setting?
  • 20. • Parallel RE: In some cases, teachers are asked to produce tests according to external influences, what Davidson and Lynch (2002) call the ‘mandate’. • Teachers may feel compelled to design tests that adhere to these external standards. • If we obtain sample test questions from several teachers which (the teachers tell us) measure the same thing, and then perform straight RE on the samples, and then compare the resulting specs, we are using RE as a tool to determine parallelism.
  • 21. • 1. Charles sent the package __________ Chicago. *a. to b. on c.at d. with • 2. Mary put the paper __________ the folder. a. at b. for *c. in d. to • 3. Steven turned __________ the radio. *a. on b. at c. with d. to • At first glance, the items all seem to test the English grammar of prepositions. Item 3, however, seems somewhat different. It is probably a test of two-word or “phrasal” verbs. If we reverse- engineered a spec from these items, we might have to decide whether item 3 is actually in a different skill domain and should be reverse-engineered into a different spec. (Davidson & Lynch, 2002)
  • 22. ARCHETYPES • An ‘archetype’ is a canonical item or task; it is the typical way to measure a particular target skill. • If we then step back and look at that item, we will often see echoes of an item or task that we have used in the past, that we have done (as test takers) or that we have studied in a testing textbook. • Perhaps all supposed spec-driven testing is actually a form of reverse engineering: we try to write a blueprint to fit item or task types – archetypes – that are already in our experience.
  • 23. TOWARDS SPEC-DRIVEN THEORY • Specs exist. Somehow, somewhere, the system is able to articulate the generative guiding language behind any sample test items or tasks. • Written specs have two great advantages:  firstly, they spark creative critical dialogue about the testing  secondly, they are wonderful training tools when newcomers join the organization. • Specs evolve. They grow and change, in creative engineering- like response to changes in theory, in curriculum, in funding, and so on.
  • 24. • The specs (and the test they generate) are not launched until ready. The organization works hard to discuss and to try out test specs and their items and tasks, withholding a rush to production, examining all evidence available; • Discussion happens and that leads to transparency. Spec- driven testing can lead to ‘shared authority, collaboration, and involvement of different stake-holders • And this is discussion to which all are welcome.
  • 26. INTRODUCTION • One of the most common mistakes made in language testing is for the test writer to begin the process by writing a set of items or tasks that are intuitively felt to be relevant to the test takers. • An iterative process which is one in which all tasks are undertaken in a cyclical fashion should be used. • So test specifications are not set until the final test is ready to be rolled out in actual use. • a well-built test can be adapted and changed even after it is operational and as things change, provided that we have a clear record (specifications) of how it was built in the first place and evidence is collected to support the link between the task and the construct. • A test task is a device that allows the tester to collect evidence. This evidence is a response from the test taker. The ‘response as evidence’ indicates that we are using the responses to make inferences about the ability of the test taker to use language in the domains and situations defined in the test specifications.
  • 27. EVIDENCE-CENTRED DESIGN (ECD) • It is important that we see the tasks or items that we design for tests as part of a larger picture, and one approach to doing this in a systematic way is ECD: ECD is a methodology for designing assessments that underscores the central role of evidentiary reasoning in assessment design. ECD is based on three premises: (1) An assessment must build around the important knowledge in the domain of interest; (2) The chain of reasoning from what participants say and do in assessments to inferences about what they know, must be based on the principles of evidentiary reasoning; (3) Purpose must be the driving force behind design decisions, which reflect constraints, resources and conditions of use. (Mislevy et al., 2003: 20)
  • 28. • Evidentiary reasoning or validity argument. → The argument shows the reasoning that supports the inferences we make from test scores to what we claim those scores mean. • Evidentiary reasoning is the reasoning that leads from evidence to the evaluation of the strength of the validity claim. • All tests are indirect. From test performance we obtain a score, and from the score we draw inferences about the constructs the test is designed to measure. It is therefore a ‘construct-centered approach’, as described by Messick (1994).
  • 29. • The way in which we do things reflects how we think we know things. Peirce (1878) argued that there were four ways of knowing: ■ Tenacity: we believe what we do because we have always believed this and not questioned it. → What we do is therefore based on what we have always done. ■ Authority: we believe what we do because these beliefs come from an authoritative source. → What we do therefore follows the practice dictated by this source. ■ A-priori: what we believe appears reasonable to us when we think about it, because it feels intuitively right. → What we do is therefore what we think is reasonable and right. ■ Scientific: what we believe is established by investigating what happens in the world. → What we do is therefore based on methods that lead to an increase in knowledge. →ECD treats knowledge as scientific because it is primarily a method that leads to the test designer understanding more about relations between variables for a particular assessment context.
  • 30. The structure of ECD ECD claims to provide a very systematic way of thinking about the process of assessment and the place of the task within that process. ECD is considered to be a ‘framework’ in the sense that it is structured and formal and thus enables ‘the actual work of designing and implementing assessments’ (Mislevy et al., 2003: 4) in a way that makes a validity argument more explicit. It is sometimes referred to as a conceptual assessment framework (CAF). Within this framework are a number of models, and these are defined as design objects. These design objects help us to think about how to go about the practical work of designing a test. Within ECD-style test specification there are six models or design objects:
  • 31. 1. Student model. This comprises a statement of the particular mix of knowledge, skills or abilities about which we wish to make claims as a result of the test. In other words, it is the list of constructs that are relevant to a particular testing situation. This is the highest-level model, and needs to be designed before any other models can be addressed, because it defines what we wish to claim about an individual test taker: what are we testing? 2. Evidence models. Once we have selected constructs for the student model, we need to ask what evidence we need to collect in order to make inferences from performance to underlying knowledge or ability: what evidence do we need to test the construct(s)? In ECD the evidence is frequently referred to as a work product, which means whatever comes from what the test takers do. It includes two components: a) evaluation component: The focus here is the evidentiary interrelationships being drawn among characteristics of students, of what they say and do, and of task and real- world situations. b) measurement component that links the observable variables to the student model by specifying how we score the evidence. This turns what we observe into the score from which we make inferences.
  • 32. 3. Task models: How do we collect the evidence? Task models therefore describe the situations in which test takers respond to items or tasks that generate the evidence we need. Task models minimally comprise three elements: the presentation material, or input; the work products, or what the test takers actually do; and finally, the task model variables that describe task features. Task features are those elements that tell us what the task looks like, and which parts of the task are likely to make it more or less difficult. 4. Presentation model. Items and tasks can be presented in many different formats. A text and set of reading items may be presented in paper and pencil format, or on a computer. The presentation model describes how these will be laid out and presented to the test takers. 5. Assembly model. An assembly model accounts for how the student model, evidence models and task models work together. It does this by specifying two elements: targets and constraints.
  • 33.  A target is the reliability with which each construct in a student model should be measured.  A constraint relates to the mix of items or tasks on the test that must be included in order to represent the domain adequately: how much do we need to test? 6. Delivery model. This final model is not independent of the others, but explains how they will work together to deliver the actual test – for example, how the modules will operate if they are delivered in computer-adaptive mode, or as set paper and pencil forms. Of course, changes at this level will also impact on other models and how they are designed. This model would also deal with issues that are relevant at the level of the entire test, such as test security and the timing of sections of the test (Mislevy et al., 2003: 13).However, it also contains four processes, referred to as the delivery architecture (see further discussion in Unit A8). These are the presentation process, response processing, summary scoring and activity selection.
  • 34. Models in the conceptual assessment framework of ECD (adapted from Mislevy et al., 2003) Student Models Evidence Models Task Models Assembly Model Presentation Model Delivery Model