SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
It’s a myth:
High stakes cause test score inflation
Richard P. Phelps
researchED 2017 National Conference
7 October, 2017
Brooklyn, NY
Educational testing in the US: early 1980s
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Student testing with stakes
reintroduced late 1970s,
early 1980s
Debra P. v. Turlington
“Truth in testing” laws
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Educational testing in the US: 1980s
Residency in rural, poor Appalachia, 1980s
Surprised by claims that state and school district scored
“above average” on national tests
Investigated, all US states claimed to be “above average”
John J. Cannell, M.D.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
“Welcome to Lake Wobegon, where all the women
are strong, all the men are good-looking, and all
the children are above average.”
- Garrison Keillor, A Prairie Home Companion
researchED, October High stakes & test score inflation 7 October, 2017
Cannell’s
suspects
• Lax security
• Outdated or invalid norms
• Deliberate educator manipulation (i.e., cheating)
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
US Education Establishment Responds
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
“While supporting Cannell’s
general finding … our
analyses lead us to
conclusions that are
different, and certainly
less sensational, than the
ones he reached.”
— Linn, Graue, Sanders ,
CRESST, 1990
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
“There are many reasons
for the Lake Wobegon
Effect, most of which are
less sinister than those
emphasized by Cannell.”
— Linn, CRESST, 2000
CRESST’s Lake
Wobegon suspects
Outdated or invalid norms
High stakes, that induce “teaching to the test”
(i.e., test coaching) under pressure
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
“We know that tests that are
used for accountability tend to
be taught to in ways that
produce inflated scores.”
— Daniel Koretz, CRESST,
1992
“Corruption of indicators is a
continuing problem where tests
are used for accountability or
other high-stakes purposes.”
— Robert Linn, CRESST,
2000
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
CRESST counters Cannell’s Lake Wobegon
study with their own, 1991
Students took test a few years. Scores rose. Then took
“competing test” district had used before. Scores fell.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
CRESST 1991 “Generalization” Study
Unnamed school district
Unnamed tests
Neither replicable nor falsifiable
A conference presentation; not peer-reviewed.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
CRESST 1991 “Generalization” Study
3 tests in the study
1.Annual NRT
2.Parallel form
3.A “competing” NRT
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1991 CRESST “Generalization” Study
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1991 CRESST “Generalization” Study
School district test was only “perceived to be high stakes.”
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1991 CRESST “Generalization” Study
Study’s assumptions
1. Publication of aggregate results = “high stakes”
2. “Competing” NRTs should get same results
3. “Test coaching” improves scores
4. Low-stakes test scores are reliable and can be used
to benchmark unreliable high stakes scores
5. High-stakes cause test-score inflation?
Jim Popham “high stakes” definition 1987
... Such tests include the many statewide achievement
tests whose results are reported by local newspapers on a
school-by-school or district-by-district basis.”
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
1. Publication of aggregate results = high stakes?
Jim Popham “high stakes” definition 1992
A test “subject to legal scrutiny.”
Tests such as those used “for employment, licensure, or a
high school graduation requirement”
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
1. Publication of aggregate results = high stakes?
“High-stakes test. A test used to provide results that have
important, direct consequences for examinees,
programs, or institutions involved in the testing.” (p.176)
“Low-stakes test. A test used to provide results that have
only minor or indirect consequences for examinees,
programs, or institutions involved in the testing.” (p.178)
Standards for Educational and
Psychological Testing
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1. Publication of aggregate results = high stakes?
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
“...tests taken to obtain
admission to an educational
program or taken during and
at the conclusion of a program
to obtain a qualification.”
“…high-stakes decisions, such as
whether a student will move on to
the next grade level or receive a
diploma.”
1. Publication of aggregate results = high stakes?
A high-stakes test is a test with important consequences
for the test taker. Passing has important benefits, such as
a high school diploma, a scholarship, or a license to
practice a profession.
Wikipedia
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
1. Publication of aggregate results = high stakes?
2. Research: Comparability of different tests
Scores Comparable
?
Scores Not Comparable
NRTs
Freeman, Kuhs, Porter, Floden, Schmidt, Schwille
(1983); Debra P. v. Turlington (1984); Cohen,
Spillane (1993); La Marca, Redfield, Winter, Bailey,
and Despriet (2000); Wainer (2011)
Standards
Archbald (1994); Buckendahl, Plake, Impara, Irwin
(2000); Bhola, Impara, Buckendahl (2003); Phelps
(2005)
CRTs
Massell, Kirst, Hoppe (1997); Wiley, Hembry,
Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
3. Research: Effects of test coaching
It works
Significant score
increase from learning
format tricks
Aldeman & Powers
(1980) Samson (1985)
Scruggs (1985)
Roznowski & Bassett
(1992) McMann (1994)
Holmes, Keffer (1995)
Camel & Chung (2002)
Filizola (2008)
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
4. Research: Low-stakes test reliability
Reliable
“no incentive to manipulate
scores”
Kipliinger, Linn (1992)
O’Neil, Sugre, Baker (1995) *
Hout, Elliot (2011)
* 1 of 2 groups
Not reliable
student effort varies;
scores easy to manipulate
Rothe (1947); Jennings (1953); Uguroglu,
Walberg (1979); Taylor & White (1981);
Arvey, et al. (1990); Schmit, Ryan (1992);
Brown & Walberg (1993); Kim, McLean
(1995), Wolf, Smith (1995), Wolf, Smith,
DiPaulo (1996); Schiel (1996); Sundre
(1999), Sundre, Moore (2002), Sundre, Wise
(2003); DeMars (2000), Wise (2006ª,
2006b), Wise, DeMars (2005, 2005, 2006,
2010), Wise, et al., (2009); Hoyt (2001);
Eklof (2006, 2007, 2010);
….....etc.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
“…for consequential exams, the average score on the
motivation scale was quite high with a low standard
deviation. Essentially, most of the students were displaying
uniformly high levels of motivation (i.e., ceiling effect).
However, for the nonconsequential groups, motivation
played an important role in predicting test performance. The
overall motivation scores for the no consequence groups
were lower than the motivation for the consequential groups,
with much greater variability.”
—Cole, Bergin, Whittaker (2008), p. 612
4. Research: Low-stakes test reliability
5. High stakes cause test score inflation?
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Then, why no score
inflation with
certification and
licensure tests?
More left-out-
variable bias
CRESST’s Linn (2000) cites higher gains
on a federal anti-poverty program’s pre-
post testing over 9 months than over 12
as evidence of inflation
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Cannell found score inflation in
elementary school tests in
dozens of states – none of
those tests had high stakes.
Cannell also found score
inflation in secondary school
tests in dozens of states –
only one had high stakes.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Test Score Inflation Occurs where
Security is Lax
Cannell’s test categorizations confirmed
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Confusions from misinformation
1. Tests sample from larger domains
2. Campbell’s Law
3. “Teaching to the test” & “Narrowing the curriculum”
4. Incentives and causes
5. Educators face many incentives; “high stakes” only one
6. Today’s tests have much higher stakes than past tests
1. No one wants to be responsible for test security
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1. Tests only sample larger domains
"Tests are about making a measurement, and generally, tests
are trying to measure something huge." — Daniel Koretz
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
TRUE of many tests, e.g.,
NRTs, aptitude, IQ tests
NOT TRUE of well-done
standards-based tests
2. Campbell’s Law — a truism
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
"The more any quantitative social indicator is used for social decision-
making, the more subject it will be to corruption pressures and the more apt it
will be to distort and corrupt the social processes it is intended to monitor."
Social indicators can be beneficial:
- for understanding
- monitor progress
- benchmarking
- setting goals
- process improvements
3. Teaching the test; Narrowing the curriculum
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
4. Incentives and causes
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Question:
Do high stakes
present an
incentive to cheat
on tests?
Answer:
Of course they do
5. Educators face many incentives
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Incentives of
test “stakes”
is just one
6. Today’s tests have higher stakes
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Exactly the
opposite is true.
Koretz: States in
1980s and 1990s were
“chicken feed”
compared to today’s
tests.
7. No one inside education wishes to be
responsible for test security
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
… including test
development firms.
Large-scale test, tight security
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Large-scale test, lax security
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Harms of disinformation
1. Acceptance of low standard for research as valid
2. Unfairly discredits useful evaluation tool
3. Test security (in U.S.) remains shoddy
4. Teachers given mixed messages
5. Now spreading worldwide
6. Corruption of Test Standards barely averted
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
1. Acceptance of very low quality standard
for popular research results
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
CRESST studies:
- no controls
- secret test
- secret
location
- secret
definitions
Non-replicable,
Non-falsifiable
2. Uniquely useful evalution tool is
discredited
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
…and, in the US, the
only objective measure
available to the public
(i.e., not under the
control of insiders).
3. Test security (in U.S.) remains shoddy
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
ACT, SAT, PARCC, SBAC
now administered statewide
by schools, on varying
dates. Tests save money,
hassle, gain customers by
outsourcing (or, ignoring)
test security.
4. Teachers given mixed messages
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
“Teaching to the test” is
unethical; Don’t do it! Teach
content beyond the
standards.
“Teaching to the test works!
You and your students will
be better off if you do it!
5. Standards corruption barely averted
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
6. Disinformation spreading worldwide
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
• Motive alone is not sufficient
if test security is tight.
• Means and opportunity exist
only in the absence of
security measures and form
and item rotation.
Artificial test score gains (score inflation) are
caused by lax security; they require means
and opportunity.
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Test Security in
South Carolina:
“Unlike their other two tests,
… teachers are allowed to look at test booklets,
… teachers may obtain test booklets before the day of testing,
… booklets are not sealed, and
… testing is not routinely monitored by state officials.
… Outside test proctors are not used,
… test questions have not been rotated every year, and
… answer sheets have not been scanned for suspicious erasures or
analyzed for cluster variance.
… There are no state regulations that govern test security and test
administration for norm-referenced testing done independently
in the local school districts.”
researchED, Brooklyn High stakes & test score inflation 7 October,
2017
Cannel’s score-inflated test
Test Security in
South Carolina:
“South Carolina also administers a graduation exam and a criterion
referenced test, both of which have significant security
measures.
… Teachers are not allowed to look at either of these two test
booklets,
… teachers may not obtain booklets before the day of testing,
… the graduation test booklets are sealed,
… testing is routinely monitored by state officials,
… special education students are generally included in all tests,
… outside test proctors administer the graduation exam, and
… most test questions are rotated every year on the criterion
referenced test.”
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
Tests not in Cannell’s study
Lessons Learned
If terms can be defined arbitrarily, and not specified, any
research result is possible.
Cleverly-disguised falsehoods and obfuscation can be well-
rewarded in US education schools (e.g., with endowed
professorships at Harvard and Stanford).
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
US education: Research quality
standards extremely low for
popular results; impossibly high
for unpopular results
http://nonpartisaneducation.org/Review/Articles/v6n3.htm
researchED, Brooklyn High stakes & test score inflation 7 October, 2017
richard@nonpartisaneducation.org

Mais conteúdo relacionado

Semelhante a It's a myth: High stakes cause test score inflation

Correlation causality
Correlation causalityCorrelation causality
Correlation causalityveesingh
 
Chahine Understanding Common Study Results
Chahine Understanding Common Study ResultsChahine Understanding Common Study Results
Chahine Understanding Common Study ResultsSaad Chahine
 
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docx
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docxArchitectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docx
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docxrossskuddershamus
 
Chapter 5( standards based assessment)
Chapter 5( standards based assessment)Chapter 5( standards based assessment)
Chapter 5( standards based assessment)Kheang Sokheng
 
COM 2204HStandardized Testing Problem Speech OutlinePolicy P
COM 2204HStandardized Testing Problem Speech OutlinePolicy PCOM 2204HStandardized Testing Problem Speech OutlinePolicy P
COM 2204HStandardized Testing Problem Speech OutlinePolicy PLynellBull52
 
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...Caveon Test Security
 
To MATE or not to MATE - Evolution2016
To MATE or not to MATE - Evolution2016To MATE or not to MATE - Evolution2016
To MATE or not to MATE - Evolution2016Cory Kohn
 
The early learning accomplishment profile (e lap) l. comley
The early learning accomplishment profile (e lap) l. comleyThe early learning accomplishment profile (e lap) l. comley
The early learning accomplishment profile (e lap) l. comleyMarta Montoro
 
A comparison study on academic performance between ryerson (1)
A comparison study on academic performance between ryerson (1)A comparison study on academic performance between ryerson (1)
A comparison study on academic performance between ryerson (1)amo0oniee
 
An Analysis Of Correctional Education GED Essays
An Analysis Of Correctional Education GED EssaysAn Analysis Of Correctional Education GED Essays
An Analysis Of Correctional Education GED EssaysMichele Thomas
 
Standardized State Testing The Impact.pdf
Standardized State Testing The Impact.pdfStandardized State Testing The Impact.pdf
Standardized State Testing The Impact.pdfBriannaPerez11
 
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...Georgia State School of Public Health
 
EDUC 246 Standardized Testing Multi Media Presentation
EDUC 246 Standardized Testing Multi Media PresentationEDUC 246 Standardized Testing Multi Media Presentation
EDUC 246 Standardized Testing Multi Media Presentationusique
 

Semelhante a It's a myth: High stakes cause test score inflation (20)

Correlation causality
Correlation causalityCorrelation causality
Correlation causality
 
Chahine Understanding Common Study Results
Chahine Understanding Common Study ResultsChahine Understanding Common Study Results
Chahine Understanding Common Study Results
 
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docx
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docxArchitectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docx
Architectural EngineeringUniversityCityStateIELTSTOEFLSATNotesCali.docx
 
Chapter 5( standards based assessment)
Chapter 5( standards based assessment)Chapter 5( standards based assessment)
Chapter 5( standards based assessment)
 
Standardized testing
Standardized testingStandardized testing
Standardized testing
 
Standardized testing
Standardized testingStandardized testing
Standardized testing
 
COM 2204HStandardized Testing Problem Speech OutlinePolicy P
COM 2204HStandardized Testing Problem Speech OutlinePolicy PCOM 2204HStandardized Testing Problem Speech OutlinePolicy P
COM 2204HStandardized Testing Problem Speech OutlinePolicy P
 
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...
Caveon Webinar Series: What you Should Know about High Stakes Cheating in You...
 
To MATE or not to MATE - Evolution2016
To MATE or not to MATE - Evolution2016To MATE or not to MATE - Evolution2016
To MATE or not to MATE - Evolution2016
 
The early learning accomplishment profile (e lap) l. comley
The early learning accomplishment profile (e lap) l. comleyThe early learning accomplishment profile (e lap) l. comley
The early learning accomplishment profile (e lap) l. comley
 
Reunião para discussão do ASQ-3 (versão em Português)
Reunião para discussão do ASQ-3 (versão em Português)Reunião para discussão do ASQ-3 (versão em Português)
Reunião para discussão do ASQ-3 (versão em Português)
 
A comparison study on academic performance between ryerson (1)
A comparison study on academic performance between ryerson (1)A comparison study on academic performance between ryerson (1)
A comparison study on academic performance between ryerson (1)
 
E. Jenkins' Thesis
E. Jenkins' Thesis E. Jenkins' Thesis
E. Jenkins' Thesis
 
An Analysis Of Correctional Education GED Essays
An Analysis Of Correctional Education GED EssaysAn Analysis Of Correctional Education GED Essays
An Analysis Of Correctional Education GED Essays
 
Standardized State Testing The Impact.pdf
Standardized State Testing The Impact.pdfStandardized State Testing The Impact.pdf
Standardized State Testing The Impact.pdf
 
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...
SafeCare: An Evidence-based Widely Disseminated Parent Trianing Program to Pr...
 
The Error of Our Ways
The Error of Our WaysThe Error of Our Ways
The Error of Our Ways
 
High-Stakes Testing
High-Stakes TestingHigh-Stakes Testing
High-Stakes Testing
 
EDUC 246 Standardized Testing Multi Media Presentation
EDUC 246 Standardized Testing Multi Media PresentationEDUC 246 Standardized Testing Multi Media Presentation
EDUC 246 Standardized Testing Multi Media Presentation
 
Finance Final 1
Finance Final 1Finance Final 1
Finance Final 1
 

Mais de Richard P Phelps

The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...Richard P Phelps
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionRichard P Phelps
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksRichard P Phelps
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationRichard P Phelps
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment SystemRichard P Phelps
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Richard P Phelps
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Richard P Phelps
 
Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentationRichard P Phelps
 
Classroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningClassroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningRichard P Phelps
 
University Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSUUniversity Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSURichard P Phelps
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationRichard P Phelps
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testingRichard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...Richard P Phelps
 
The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010Richard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...Richard P Phelps
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsRichard P Phelps
 

Mais de Richard P Phelps (17)

The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admission
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and Drawbacks
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment System
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
 
Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentation
 
Classroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningClassroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learning
 
University Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSUUniversity Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSU
 
Test benefits slide show
Test benefits slide showTest benefits slide show
Test benefits slide show
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in education
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testing
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
 
The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive Reviews
 

Último

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...Nguyen Thanh Tu Collection
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Unit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceUnit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceDr Vijay Vishwakarma
 

Último (20)

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Unit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional IntelligenceUnit :1 Basics of Professional Intelligence
Unit :1 Basics of Professional Intelligence
 

It's a myth: High stakes cause test score inflation

  • 1. It’s a myth: High stakes cause test score inflation Richard P. Phelps researchED 2017 National Conference 7 October, 2017 Brooklyn, NY
  • 2. Educational testing in the US: early 1980s researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 3. Student testing with stakes reintroduced late 1970s, early 1980s Debra P. v. Turlington “Truth in testing” laws researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Educational testing in the US: 1980s
  • 4. Residency in rural, poor Appalachia, 1980s Surprised by claims that state and school district scored “above average” on national tests Investigated, all US states claimed to be “above average” John J. Cannell, M.D. researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 5. “Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” - Garrison Keillor, A Prairie Home Companion researchED, October High stakes & test score inflation 7 October, 2017
  • 6. Cannell’s suspects • Lax security • Outdated or invalid norms • Deliberate educator manipulation (i.e., cheating) researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 7. US Education Establishment Responds researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 8. “While supporting Cannell’s general finding … our analyses lead us to conclusions that are different, and certainly less sensational, than the ones he reached.” — Linn, Graue, Sanders , CRESST, 1990 researchED, Brooklyn High stakes & test score inflation 7 October, 2017 “There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell.” — Linn, CRESST, 2000
  • 9. CRESST’s Lake Wobegon suspects Outdated or invalid norms High stakes, that induce “teaching to the test” (i.e., test coaching) under pressure researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 10. “We know that tests that are used for accountability tend to be taught to in ways that produce inflated scores.” — Daniel Koretz, CRESST, 1992 “Corruption of indicators is a continuing problem where tests are used for accountability or other high-stakes purposes.” — Robert Linn, CRESST, 2000 researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 11. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 CRESST counters Cannell’s Lake Wobegon study with their own, 1991 Students took test a few years. Scores rose. Then took “competing test” district had used before. Scores fell.
  • 12. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 CRESST 1991 “Generalization” Study Unnamed school district Unnamed tests Neither replicable nor falsifiable A conference presentation; not peer-reviewed.
  • 13. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 CRESST 1991 “Generalization” Study 3 tests in the study 1.Annual NRT 2.Parallel form 3.A “competing” NRT
  • 14. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1991 CRESST “Generalization” Study
  • 15. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1991 CRESST “Generalization” Study School district test was only “perceived to be high stakes.”
  • 16. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1991 CRESST “Generalization” Study Study’s assumptions 1. Publication of aggregate results = “high stakes” 2. “Competing” NRTs should get same results 3. “Test coaching” improves scores 4. Low-stakes test scores are reliable and can be used to benchmark unreliable high stakes scores 5. High-stakes cause test-score inflation?
  • 17. Jim Popham “high stakes” definition 1987 ... Such tests include the many statewide achievement tests whose results are reported by local newspapers on a school-by-school or district-by-district basis.” researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1. Publication of aggregate results = high stakes?
  • 18. Jim Popham “high stakes” definition 1992 A test “subject to legal scrutiny.” Tests such as those used “for employment, licensure, or a high school graduation requirement” researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1. Publication of aggregate results = high stakes?
  • 19. “High-stakes test. A test used to provide results that have important, direct consequences for examinees, programs, or institutions involved in the testing.” (p.176) “Low-stakes test. A test used to provide results that have only minor or indirect consequences for examinees, programs, or institutions involved in the testing.” (p.178) Standards for Educational and Psychological Testing researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1. Publication of aggregate results = high stakes?
  • 20. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 “...tests taken to obtain admission to an educational program or taken during and at the conclusion of a program to obtain a qualification.” “…high-stakes decisions, such as whether a student will move on to the next grade level or receive a diploma.” 1. Publication of aggregate results = high stakes?
  • 21. A high-stakes test is a test with important consequences for the test taker. Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession. Wikipedia researchED, Brooklyn High stakes & test score inflation 7 October, 2017 1. Publication of aggregate results = high stakes?
  • 22. 2. Research: Comparability of different tests Scores Comparable ? Scores Not Comparable NRTs Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) Standards Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) CRTs Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015) researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 23. 3. Research: Effects of test coaching It works Significant score increase from learning format tricks Aldeman & Powers (1980) Samson (1985) Scruggs (1985) Roznowski & Bassett (1992) McMann (1994) Holmes, Keffer (1995) Camel & Chung (2002) Filizola (2008) researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 24. 4. Research: Low-stakes test reliability Reliable “no incentive to manipulate scores” Kipliinger, Linn (1992) O’Neil, Sugre, Baker (1995) * Hout, Elliot (2011) * 1 of 2 groups Not reliable student effort varies; scores easy to manipulate Rothe (1947); Jennings (1953); Uguroglu, Walberg (1979); Taylor & White (1981); Arvey, et al. (1990); Schmit, Ryan (1992); Brown & Walberg (1993); Kim, McLean (1995), Wolf, Smith (1995), Wolf, Smith, DiPaulo (1996); Schiel (1996); Sundre (1999), Sundre, Moore (2002), Sundre, Wise (2003); DeMars (2000), Wise (2006ª, 2006b), Wise, DeMars (2005, 2005, 2006, 2010), Wise, et al., (2009); Hoyt (2001); Eklof (2006, 2007, 2010); ….....etc. researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 25. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 “…for consequential exams, the average score on the motivation scale was quite high with a low standard deviation. Essentially, most of the students were displaying uniformly high levels of motivation (i.e., ceiling effect). However, for the nonconsequential groups, motivation played an important role in predicting test performance. The overall motivation scores for the no consequence groups were lower than the motivation for the consequential groups, with much greater variability.” —Cole, Bergin, Whittaker (2008), p. 612 4. Research: Low-stakes test reliability
  • 26. 5. High stakes cause test score inflation? researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Then, why no score inflation with certification and licensure tests?
  • 27. More left-out- variable bias CRESST’s Linn (2000) cites higher gains on a federal anti-poverty program’s pre- post testing over 9 months than over 12 as evidence of inflation researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 28. Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes. Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes. researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Test Score Inflation Occurs where Security is Lax
  • 29. Cannell’s test categorizations confirmed researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 30. Confusions from misinformation 1. Tests sample from larger domains 2. Campbell’s Law 3. “Teaching to the test” & “Narrowing the curriculum” 4. Incentives and causes 5. Educators face many incentives; “high stakes” only one 6. Today’s tests have much higher stakes than past tests 1. No one wants to be responsible for test security researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 31. 1. Tests only sample larger domains "Tests are about making a measurement, and generally, tests are trying to measure something huge." — Daniel Koretz researchED, Brooklyn High stakes & test score inflation 7 October, 2017 TRUE of many tests, e.g., NRTs, aptitude, IQ tests NOT TRUE of well-done standards-based tests
  • 32. 2. Campbell’s Law — a truism researchED, Brooklyn High stakes & test score inflation 7 October, 2017 "The more any quantitative social indicator is used for social decision- making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." Social indicators can be beneficial: - for understanding - monitor progress - benchmarking - setting goals - process improvements
  • 33. 3. Teaching the test; Narrowing the curriculum researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 34. 4. Incentives and causes researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Question: Do high stakes present an incentive to cheat on tests? Answer: Of course they do
  • 35. 5. Educators face many incentives researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Incentives of test “stakes” is just one
  • 36. 6. Today’s tests have higher stakes researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Exactly the opposite is true. Koretz: States in 1980s and 1990s were “chicken feed” compared to today’s tests.
  • 37. 7. No one inside education wishes to be responsible for test security researchED, Brooklyn High stakes & test score inflation 7 October, 2017 … including test development firms.
  • 38. Large-scale test, tight security researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 39. Large-scale test, lax security researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 40. Harms of disinformation 1. Acceptance of low standard for research as valid 2. Unfairly discredits useful evaluation tool 3. Test security (in U.S.) remains shoddy 4. Teachers given mixed messages 5. Now spreading worldwide 6. Corruption of Test Standards barely averted researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 41. 1. Acceptance of very low quality standard for popular research results researchED, Brooklyn High stakes & test score inflation 7 October, 2017 CRESST studies: - no controls - secret test - secret location - secret definitions Non-replicable, Non-falsifiable
  • 42. 2. Uniquely useful evalution tool is discredited researchED, Brooklyn High stakes & test score inflation 7 October, 2017 …and, in the US, the only objective measure available to the public (i.e., not under the control of insiders).
  • 43. 3. Test security (in U.S.) remains shoddy researchED, Brooklyn High stakes & test score inflation 7 October, 2017 ACT, SAT, PARCC, SBAC now administered statewide by schools, on varying dates. Tests save money, hassle, gain customers by outsourcing (or, ignoring) test security.
  • 44. 4. Teachers given mixed messages researchED, Brooklyn High stakes & test score inflation 7 October, 2017 “Teaching to the test” is unethical; Don’t do it! Teach content beyond the standards. “Teaching to the test works! You and your students will be better off if you do it!
  • 45. 5. Standards corruption barely averted researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 46. 6. Disinformation spreading worldwide researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 47. • Motive alone is not sufficient if test security is tight. • Means and opportunity exist only in the absence of security measures and form and item rotation. Artificial test score gains (score inflation) are caused by lax security; they require means and opportunity. researchED, Brooklyn High stakes & test score inflation 7 October, 2017
  • 48. Test Security in South Carolina: “Unlike their other two tests, … teachers are allowed to look at test booklets, … teachers may obtain test booklets before the day of testing, … booklets are not sealed, and … testing is not routinely monitored by state officials. … Outside test proctors are not used, … test questions have not been rotated every year, and … answer sheets have not been scanned for suspicious erasures or analyzed for cluster variance. … There are no state regulations that govern test security and test administration for norm-referenced testing done independently in the local school districts.” researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Cannel’s score-inflated test
  • 49. Test Security in South Carolina: “South Carolina also administers a graduation exam and a criterion referenced test, both of which have significant security measures. … Teachers are not allowed to look at either of these two test booklets, … teachers may not obtain booklets before the day of testing, … the graduation test booklets are sealed, … testing is routinely monitored by state officials, … special education students are generally included in all tests, … outside test proctors administer the graduation exam, and … most test questions are rotated every year on the criterion referenced test.” researchED, Brooklyn High stakes & test score inflation 7 October, 2017 Tests not in Cannell’s study
  • 50. Lessons Learned If terms can be defined arbitrarily, and not specified, any research result is possible. Cleverly-disguised falsehoods and obfuscation can be well- rewarded in US education schools (e.g., with endowed professorships at Harvard and Stanford). researchED, Brooklyn High stakes & test score inflation 7 October, 2017 US education: Research quality standards extremely low for popular results; impossibly high for unpopular results
  • 51. http://nonpartisaneducation.org/Review/Articles/v6n3.htm researchED, Brooklyn High stakes & test score inflation 7 October, 2017 richard@nonpartisaneducation.org

Notas do Editor

  1. State / Local determination of ...curricula, textbooks, courses, sequencing of courses …constitutionally, states determine education laws Tests develooped by commercial firms. Most test developers also textbook publishers Tests purchased “off the shelf” based on generalized curricula. Norm-referenced tests used national norms. School personnel administered tests.
  2. Student testing with stakes reintroduced after perhaps too-liberal 1960s Debra P v. Turlington and move to standards-based tests “Truth in testing” laws
  3. From Flat Top, West Virginia, Dr. Cannell formed an organization of friends and relatives called “Friends for Education” and investigated. They surveyed 50 state education departments, and many school districts. Also, they “stung” testing firms pretending to be local educators and discovered test salespersons were very willing to help them artificially boost their scores.
  4. Lake Wobegon is a fictional town in the state of Minnesota in a radio comedy show. …where all the children are above average. Cannell’s findings were called the “Lake Wobegon Effect”
  5. Showing test items to teachers beforehand Keeping test forms around for years Misleading reporting, etc. Much of the “cheating” was unintentional. One test publisher even recommended that teachers examine the test booklets beforehand.
  6. Since 1980, CRESST has been the only federally funded research center focused on education standards and student testing. A West Virginia country doctor embarrassed them. They have been the best funded, most highly visible US testing researchers for decades. They have assumed control of the testing research function in other high profile organizations – e.g., the National Research Council, the National Academy of Education.
  7. CRESST has a HUGE amount of power and influence. Few are willing to confront them. Those few that do pay a price. Cannell was portrayed as as crank sensationalist who accidentally stumbled onto something he didn’t understand. Given he was not a testing expert, it was now time for the professionals to investigate things. Meanwhile Cannell needed to complete his medical degree, and moved on.
  8. Cannell’s findings implied widespread and casual corruption in US education administration. The obvious solution was external control of educational testing. CRESST successfully spun a threat to the status quo into an argument for maintaining education establishment control.
  9. No ifs, ands, or buts according to CRESST for the past 30 years. Dan Koretz has just published another book reiterating the same themes. High stakes cause test score inflation. We “know” this.
  10. Teaching to the test caused by high stakes must have caused artificial test score gains that do not generalize to a “similar” test. What about curricular alignment? …or test security? No controls for either.
  11. To this day, Koretz claims the identity of the school district must be protected. From what is unknown.
  12. To demonstrate that teachers were teaching to a familiar test form, logical course of action would have been to administer a parallel form and compare results. In fact, they did that and found no difference in results. But, they identified this comparison as a control for the effect of motivation, not a test of the score inflation hypothesis. The comparison that allegedly reveals test score inflation was made between the current test and a “competing test” – i.e., some other firm’s NRT.
  13. Read the study casually and it seems to be a careful, highly technical, scientific study. It employs the tone and language one would expect. Read it in detail, and it makes little sense. Perhaps it has been so successful because it is so convoluted.
  14. Koretz later revealed that the only “stakes” were… … publication of the aggregate results.
  15. Jim Popham wrote that there were 2 types of high-stakes tests. His first type would be familiar to us all. Absolutely nothing official or consensual about this definition. Jim Popham is just one guy, and this definition was just one guy’s idea at that one time.
  16. But, Popham’s 1987 definition was untenable. With “truth in testing” laws, aggregate results for all systemwide tests were now reported. According to his 1987 definition, all systemwide tests were high stakes, so the definition was meaningless. By 1992, five years later, it would seem that he had changed his mind.
  17. At least in the US, the most official definition is that of the Standards published jointly by the American Psychological Association, American Educational Research Association, and the National Council for Measurement in Education. Like most definitions of the term, includes 2 notions: substantial and direct effect
  18. Two more relatively official renderings of the definition.
  19. Even Wikipedia defines the term the way most of us would.
  20. The second assumption of the 1991 CRESST generalization study. Here are citations to research comparing tests on the same general topic, but developed by different organizations. I could find none claiming comparability. Even Common Core tests no longer comparable …up to 30% difference in test content where states have the same standards but separately create and revise frameworks, blueprints, and test items. “If you want to measure change you should not change the measure”
  21. The third assumption of the 1991 CRESST generalization study. Test coaching research is more mixed. Most studies find some small effect. Some coaching is good: students need to be familiar with the format, for example, particularly if it is new and familiar. And, some test prep is more subject matter learning than it is drill on format and old items.
  22. The fourth assumption of the 1991 CRESST generalization study; one can use low-stakes tests as reliable benchmarks because “there is no incentive to manipulate scores”. The experiments finding low-stakes tests to be NOT reliable found moderate to high effect sizes by varying test incentives (i.e., stakes). The mean in one meta-analysis was 0.59, a fairly large effect. The 3 studies on the “reliable” side were all conducted by CRESST or CRESST affiliates
  23. Indeed, high-stakes scores more reliable than low-stakes scores. This simply reflects common sense. Students, particularly teenagers, are reticent to put much effort into a test that doesn’t count. This is a quote from one study.
  24. One more CRESST study The 9 month interval would be taught by the same teacher who, supposedly, had an incentive to inflate scores. 2 different teachers involved in 12 month interval (ergo, no incentive to raise test scores). Difference of 1 month’s achievement. Does not consider 3 months of forgetting. Harris Cooper’s meta-analysis of summer learning loss studies averaged 1 month of loss. Mean test scores by district were required to be reported That’s all, there were no stakes
  25. Cannell surveyed all the US states, and many school districts. He asked about all their systemwide tests. In state after state, district after district, the NRTs – the score-inflated tests – had no stakes and lax security. Rather, standards-based tests, not comparable across states, were the tests with stakes. Of the hundreds of tests, the “Lake Wobegon” tests – the score inflated tests -- the NRTs -- had the lowest stakes.
  26. A few year later, I conducted a study for the US Congress that surveyed all US states and over 600 school districts about all their systemwide testing. Testing environment was the same as Cannell found in that almost all NRTs were administered without stakes. By contrast, almost all standards based tests had stakes.
  27. Confustions from the Lake Wobegon Effect misinformation spread by CRESST.
  28. In standards-based tests I have helped to develop myself, at least one test item was written to each standard, and rather literally. If standard read “Students should be able to add two digit numbers.” There was at least one test item in which students were asked to add two digit numbers, …and so on. Every standard is tested. The test is NOT a sample of a much larger domain. it is a census of a well-circumscribed domain. Last week, Koretz stated that a 12th-grade test of mathematics covered 12 years of math with just 42 test items. This is misleading. Certainly, this test did not include 1st-grade test items, 2nd-grade test items, and so on. The tests containing those items were already taken long ago. The 12th-grade tests covers 12th-grade standards.
  29. Testing opponents frequently cite Campbell’s Law to justify complete elimination of high-stakes or external testing. By the same logic, should we not eliminate all occupational testing? How about eliminating the criminal code? …the police? Besides, Campbell’s Law applies as well to all other evaluation methods, including those of the teacher and the school. There is simply no escaping it. J.J. Cannell’s work shows it is as true for no stakes tests as for high stakes tests.
  30. Not necessarily related. Can have one without the other.
  31. Asked why do you rob banks, bank robber Willy Sutton replied, “Because that’s where the money is.” Banks present an incentive to rob them, simply by having a lot of money inside. But, most banks are never robbed.
  32. Tight security is expensive and time-consuming. Lax test security is convenient and inexpensive. Increasing test score trends help education administrators politically and professionally.
  33. Koretz “chicken feed” statement. Most states in the 1980s and 1990s had high-stakes graduation exams or grade promotion exams, or both. As does most of the world. With No Child Left Behind, many of those tests have disappered. Current federally-required tests have NO stakes for students.
  34. a thankless responsibility, fraught with risks and complaints and, in the United States, lawsuits. Like being a referee, only worse, because fewer understand the work involved. So long as there is no external control, any control is internal.
  35. Here is a photograph of Indian Army proctors administering a test to new recruits. Notice that the test-takers are allowed few places to hide cheating materials, and are separated by a distance unfavorable to reading others’ test answers. One often finds very high levels of test security for professional selection tests – those already in the profession want to work with the best job candidates.
  36. Here is another photograph from India, this time of a school. The reporter posting this photo tells us that the people climbing the walls are family members helping students inside the building with answers on a test. CRESST studies treat test security as an irrelevant factor in their studies. The contrast of these 2 photos illustrates how relevant a factor test security is.
  37. The “high stakes cause test score inflation” myth is not just an innocent pet theory. It carries harmful consequences.
  38. I’m told that when Koretz is pressed on the issue, he admits that there could be other explanations for the test score inflation in the 1991 CRESST study. When not pressed, however, he declares high-stakes to be the definite cause. Contrary evidence suppressed, sometimes even declared nonexistent, and wistleblowers discredited.
  39. CRESST asserts that test scores from high-stakes tests are unreliable, untrustworthy – ambiguous at best, possibly meaningless. When, in fact, they are more reliable than the low-stakes tests that CRESST promotes. With the exception of just a few states, US has no inspectorate system.
  40. 30 years after Cannell showed security was lax for all school tests but the ACT and SAT, which were administered securely by ACT and SAT themselves… Now the ACT, SAT, PARCC, & SBAC are being administered internally by schools, too. Security guidelines that testing firms give to school personnel typically a few dozen pages long; impossible that untrained educators will follow them consistently.
  41. Pity the poor teachers, ethically obligated to teach the legally-mandated standards. They may feel pressure from administrators for improper teaching to the test (e.g., drilling on fromat, old test items). Meanwhile, Koretz wants them to teach “to a larger domain”
  42. Many test develpers adhere to the Standards for Educational and Psychological Testing religiously. Court judges use them for reference. They are the equivalent of construction codes for home builders or ethics codes for professionals. To my knowledge, only one person in the world objected to draft standards that included all the CRESST disinformation about test policies. He was successful, …for the moment.
  43. OECD recently conducted a completely one-sided study on testing. World Bank: 30 years has told only one side of story; testing office run by professional colleagues of CRESST. Popularity of International tests grows, with strong incentives to cheat in some countries.
  44. When investigating crimes, police detectives look for means, motive, and opportunity. High stakes may provide a motive. But, it does not provide means or opportunity. And, you need all three.
  45. SHOW Cannell’s book.
  46. Fin