“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
NCCE Bylsma
1. MAKING SENSE OF THE NEW
ACCOUNTABILITY INDEX AND
STUDENT GROWTH PERCENTILES
Dr. Pete Bylsma
Director, Assessment/Student Information Services
Renton School District
(Past President, Washington Educational Research Association - WERA)
Dr. Glenn Malone
Executive Director of Assessment, Accountability & Student Success
Puyallup School District
(WERA President-Elect)
NCCE Conference
March 12, 2014
2. Describe changes in federal accountability that
prompted changes in old Index and required
student growth measures
Describe old and new Achievement Index that
rates schools (assigns labels, identifies high and
low performers, basis for State Board of
Education/OSPI recognition)
Describe & critique the new student growth
percentile measure (SGP) used in the new index
(and potentially used in staff evaluations)
SESSION OBJECTIVES
3. AYP under NCLB started in 2002, state discarded its
existing accountability system
• AYP used 9 student groups, reading/math proficiency
and participation, graduation rate
• 37 “cells” possible for schools, 111 for district
• Gradually increasing goal, all groups must meet
standard by 2014
• “Conjunctive” model – not making it in one area means
not making AYP
• Escalating negative sanctions when not making AYP, but
only for Title I schools
Why Change Accountability System?
3
4. • System is too complicated, invalid, and unrealistic
– Different “rules” than those used by state
• Larger minimum N, margin of error, excludes some students
– Negative label applied when missing one goal,
ELLs must take test despite not knowing English
– Conjunctive model all will eventually “fail”
• Resulted in unintended side effects
– Focus on “bubble kids,” narrowing curriculum, some
states lowered standards so all can pass by 2014
Problems with AYP System
4
5. AYP waiver approved in 2012, some rules no longer
apply
• Do not need to have all students meet standard by 2014
• Do not need to set aside Title I funds
• School choice or supplemental services not required
• Still looks at reading & math percent meeting standard,
95% participation rate, graduation rates
Annual Measurable Objectives (AMO) is new measure
• Each subgroup in each school has its own annual targets
• Targets use a 2011 baseline, must cut in half the
“proficiency gap” (difference between baseline and 100%
meeting standard) by 2017
5
New Federal Accountability Rules
7. Instead of “not making AYP,” lowest performing
schools are now identified for more support
3 types of “Persistently Low Achieving” schools
• Priority: Bottom 5% in “all students” category
• Focus: Bottom 10% of all subgroups
(Asian, black, Hispanic, white, low income, ELL, special
education)
• Emerging: Schools close to becoming Priority or Focus
(next lowest 5%/10%)
No grade-band distinctions
(elementary, middle, high, comprehensive, alternative are all in
the same rankings)
7
Revised Federal Accountability “Sanctions”
8. System to identify low performing schools is badly
flawed
• Applies only to Title I schools, must have N > 30 for
three years
• To identify Focus and Emerging schools, all subgroups
are combined and ranked together
• In 2012, every Focus and Emerging school (186 total)
was identified based on ELL or SpEd subgroups (or
both)*
If a school has a large ELL and/or SpEd population and is Title I, the
odds of identification is very high
*A few alternative schools were also identified for low graduation rates
8
Revised Federal Accountability “Sanctions”
9. Educational accountability systems require:
(1) measures of effectiveness
(2) goals to guide improvement efforts
(3) reports that provide useful information to
policymakers, educators, and parents
(4) a set of consequences that recognize exemplary
performance and support those needing more help
In response to flawed AYP system, the State Board of
Education created an Accountability Index in 2009 to
provide a better measure of school effectiveness
Accountability Systems
9
10. Original Accountability Index*
Five Outcomes
Results from 4 assessments (reading, writing, math, science)
aggregated together from all grades and all students, extended
graduation rate for all students, minimum N = 10
Four Indicators
1. Achievement by non-low income students
(% meeting standard/ext. grad rate)
2. Achievement by low income students (eligible for FRL)
3. Achievement vs. Peers (make “apples to apples” comparisons by
controlling for percent ELL, low-income, special ed, gifted, mobility)
4. Improvement (change in Learning Index from previous year)
Creates a 5x4 matrix with 20 outcomes,
each rated on a scale of 1-7
10* Required by Legislature in 2009 (ESHB 2261)
11. Original Accountability Index Matrix
(multiple measures using available state data)
Outcomes
Indicator Reading Writing Math Science Ext. G.R. Avg.
Non-low inc.
achievement
Low inc. ach.
Ach. vs. peers
Improvement
Average Index *
* Simple average of all rated cells (compensatory model)
11
12. Index Benchmarks and Ratings
Indicator Reading Writing Math Science Ext. grad rate
Achievement of
- Non-low inc.
- Low income
(% met standard)
% MET STANDARD RATING
90 – 100% 7
80 – 89.9% 6
70 – 79.9% 5
60 – 69.9% 4
50 – 59.9% 3
40 – 49.9% 2
< 40% 1
RATE RATING
> 95 7
90 – 95% 6
85 – 89.9% 5
80 – 84.9% 4
75 – 79.9% 3
70 – 75% 2
< 70% 1
- Achievement
vs. Peers
(Learning Index)
DIFFERENCE IN
LEARNING INDEX RATING
> .20 7
.151 to .20 6
.051 to .15 5
-.05 to .05 4
-.051 to -.15 3
-.151 to -.20 2
< -.20 1
DIFFERENCE
IN RATE RATING
> 12 7
6.1 to 12 6
3.1 to 6 5
-3 to 3 4
-3.1 to -6 3
-6.1 to -12 2
< -12 1
12
13. Indicator Reading Writing Math Science Ext. grad rate
- Improvement
(Learning Index)
CHANGE IN
LEARNING INDEX RATING
> .15 7
.101 to .15 6
.051 to .10 5
-.05 to .05 4
-.051 to -.10 3
-.101 to -.15 2
< -.15 1
CHANGE
IN RATE RATING
> 6 7
4.1 to 6 6
2.1 to 4 5
-2 to 2 4
-2.1 to -4 3
-4.1 to -6 2
< -6 1
Index Benchmarks and Ratings
13
• No Improvement rating given when performing at a
very high level (sensitive to “ceiling” effect)
• Index excluded ELL results in the first 3 years of
enrollment (ELLs must still take tests, most exit in 3 years)
14. Achievement vs. Peers
•Recognizes context affects outcomes
•Makes “apples to apples” comparisons (“statistical
neighbors”) to control for 5 student variables
(percent ELL, low-income, special education, mobile, gifted)
•Separate analysis for each type of school
(e.g., elementary, middle, high, multiple grades)
•Non-regular schools do not receive a “peer” rating
14
16. Five Tier Names and Ranges
Schools/districts
assigned to a “tier”
based on index score
(but some applied A-F labels
to these tiers)
Tier Index Range
Exemplary 5.50 – 7.00
Very Good 5.00 – 5.49
Good 4.00 – 4.99
Fair 2.50 – 3.99
Struggling 1.00 – 2.49
16
17. Example - XXX High School
Index
(Good)
Indicator Reading Writing Math Science
Grad
Rate Average
Non-low inc. ach. 7 7 3 3 6 5.20
Low-inc. ach. 6 7 2 2 6 4.60
Ach. vs. peers 4 4 4 4 6 4.40
Improvement 5 2 1 4 3 3.00
Average 5.50 5.00 2.50 3.25 6.00 4.37
Indicator Reading Writing Math Science
Grad
Rate
Non-low inc. ach.* 92.5 93.7 58.7 56.5 94.9
Low-inc. ach.* 87.2 91.8 44.8 40.8 94.2
Ach. vs. peers** +.05 +.01 +.03 +.05 +10.3
Improvement** +.09 -.14 -.26 -.04 -2.5
* Percent meeting standard for content areas, extended graduation rate
** All students, content areas measured using the Learning Index
17
19. Washington Achievement Awards
OSPI/SBE used 2-year averages from Accountability Index
• Overall Excellence Award uses the Index score (top 5% by grade band)
• Special Recognition given “on the edges” when 2-year average is > 6.00
Language arts, math, science, graduation rate, Improvement
19
Outcomes
Indicator Reading Writing Math Science G.R. Average
Non-low inc.
achievement
Compare1
Low inc. ach.
Ach. vs. peers
Improvement 6.00
Average 6.00 6.00 6.00 6.00 Top 5%1
1 Overall Excellence is
granted only if the
average difference in
the income gap and the
race/ethnicity gap
(using a separate
matrix) is < 2.5
20. • Federal NCLB waiver required a change to the
current Index – it must include subgroups and a
growth measure
• Merges two different accountability systems
(state and federal) into one system
• Has no relationship with AMOs!
• New index is much more complicated, has
different rules compared to previous index
20
New Accountability Index
21. • Included in waiver proposal to U.S. Dept. of Education
(waiver still not approved)
• Includes all subgroups (race/ethnicity, programs)
• N > 20 across grade band (not grade)
• New rating scales (1-10) and more “labels”
• No Peer rating
• Growth based on SGPs, not grade band improvement in Levels
• Includes all ELL results (including results of students who exited program)
• Basis for identifying low-performing schools (federal acct.)
• Sanctions also apply to non-Title I schools
• Preliminary analyses show high correlation with school % FRL
-.53 (elementary) -.45 (middle) -.60 (high)
21
New Accountability Index
22.
23.
24.
25.
26. 6 Labels, Norm-referenced
• Exemplary: Top 5% of schools using overall index, must have
60% students proficient in all tested subjects (given recognition)
• Very Good: Next 15% of schools
• Good: Next 30% of schools
• Fair: Next 30% of schools
• Underperforming: Next 5% of schools + 10% with large
achievement gaps
• Priority: Lowest 5% of index
27. Proposed Priority, Focus, Emerging
• Includes all schools, not just Title I
• Uses Index to identify schools rather than stacked
rankings
Priority system uses the overall index value
– Bottom 5% are Priority (“Struggling”)
– Next 5% from the bottom are Emerging Priority
Focus system uses index value for each subgroup in each school
– Bottom 10% are Focus
– Next 10% from the bottom are Emerging Focus
28. Getting Off the Priority / Focus List*
• For 3 consecutive years in Math and Reading:
– Meet or exceed AMOs for all subgroups
– Have at least 95% participation for all subgroups
– Not be in the bottom 5% (or 10% for Focus)
– Decrease % of students in all groups scoring Level 1 or
2 in reading and math. Improvement % must be
comparable to top 30% of Title 1 schools
• OSPI determines sufficient progress has been
made
* Unclear how Emerging schools get off list
29. New Emphasis on Student Growth
• Federal waiver submitted in 2011 requires a student
growth measure for the Index and for teacher and
principal evaluations
• Index has growth measure but “weak legislation”
regarding use of state test results in growth measure puts
waiver in jeopardy
• OSPI amended waiver in July 2013 and requires student
growth to be a “substantial factor” in 3 of 8 teacher and
principal criteria – brinksmanship occurring right now
• Many ways to measure growth, State Board only
considered Student Growth Percentile (SGP)
31. Measuring Student Growth
• Growth, in its simplest form, is a
comparison of the assessment results of a
student or group of students between two
points in time where a positive difference
would imply growth.
32. Student Growth Percentiles
• Problem: Current state assessment system was
not designed to measure student growth
– Only selected grades and subjects are tested
– Difficulty varies in passing the test from one year to the
next (high school reading and writing HSPE is easy to
pass (bar was lowered due to graduation requirement)
• State’s Solution: Use a norm-referenced system
that ranks the rate of student growth
33. Student Growth Percentiles
• SGPs compare the growth rates of students who
were at the same scale score level the previous
year (their “academic peers”)
Example: A student earning an SGP of 80 performed as well or
better than 80 percent of the students who scored the same
score the previous year
• Does not compare the growth rate of all students
to each other or compare the achievement to all
students (the usual way to give percentiles)
34. Student Growth Percentiles
• SGP trajectory predicts where students will perform in
the future, based on their previous growth rate and
students who were at the same scale score level the
previous year
• OSPI groups students into three categories
High Growth Top 1/3 67th to 99th percentile
Typical Growth Middle 1/3 34th to 66th percentile
Low Growth Bottom 1/3 1st to 33rd percentile
• The median SGPs for a class, grade, school or district
is the “score” (school median SGP is used in the new Index)
35. SGP Student Data
Student Growth Percentile (SGP) results are
available to the public on the OSPI State
Longitudinal Data System (SLDS) website 1
• From OSPI homepage, select “K-12 Data &
Reports” on right side
• Select “Static Data Files”
• Select “Assessment” menu item, scroll down
to find the SGP files and reports
1 http://data.k12.wa.us/PublicDWP/Web/WashingtonWeb/Home.aspx
36. SGP School Data
Available to the public on the OSPI State
Longitudinal Data System (SLDS) website
http://data.k12.wa.us/PublicDWP/Web/WashingtonWeb/Home.aspx
• From OSPI’s homepage, click on the “K-12 Data &
Reports” button on the right-hand side, then click on
“Static Data Files”
• Under the “Assessment” menu item, you can scroll
down to find the SGP files and reports
40. SGPs on OSPI’s Web site
Three types of SGP files available to public
• Bubble chart with all schools, with district’s
schools identified (hover over bubble for results)
• Individual school results by subgroup
(compared to district and state for three years)
• Excel file with all results for all schools and
district (Renton’s file has > 5000 rows and 20 columns)
41.
42.
43. Problems with SGP
1. Results can be misleading
Percentile rank is not based on all students, so the 50th
percentile is not the middle of the entire distribution, just
those who had the same scale score the previous year
2. SGPs do not provide a measure of adequate
(enough) growth or a year’s worth of growth
A student can be at the 50th percentile and not make a year’s
worth of growth or enough growth to meet expectations
upon graduation; another student can be at the 50th
percentile and make more than a year’s worth of growth
46. Problems with SGP
3. Results may not reflect an accurate measure of
student growth or educator effectiveness
• SGPs are “highly unstable” and “problematic” for students
with very high and low scores because there are relatively
few students with those scores to obtain stable rankings1
• No standard errors reported
• Does not control for differences in the student population
4. Results are not available in a timely manner
5. SGPs are new and harder to understand than current
metrics
1 Castellano, K. and Ho, A. (2013). A Practitioner’s Guide to Growth
Models. Washington, DC: Council of Chief State School Officers
47. Alternative Measure of Student Growth
• Criterion-referenced approach
• Students are compared to their own growth, not the
growth rate of others
• Encourages cooperation because score doesn’t
depend on how other students perform
• Can be computed quickly and easily – doesn’t require
a minimum number of students and doesn’t depend
on how other students perform
• Uses familiar data and concepts, makes it easy to
understand
49. -100 -50 0 50 100
2013 Achievement and Growth from 2012
(Math, Grade 4 and Change from Grade 3)
Leading
Slipping
GainingLagging
Average change in scale score: +6.5 (413.1 to 419.6) N = 913 R2=.58
56.3% of the students made at least one year gain (change in scale score > 0)
Each dot represents a student who was enrolled in the district in both 2012 and 2013
(scores below 300 were marked as 300, scores above 500 were marked as 500)
15.6%
(N=142) 50.4%
(N=460)
5.9%
(N=54)
28.1%
(N=257)
Change in Scale Score from Grade 3 (2012)
2013Grade4MathScaleScore
500
440
400
375
300
Above 439 Level 4
(Exceeds standard)
Below 375 Level 1
(Far below standard)
375-399 Level 2
(Below standard)
400-439 Level 3
(Meets standard)
50. Change in Math Scale Scores, 2011 to 2012
Non-Low Income Low Income (FRL)
43% made 1+ years gain60% made 1+ years gain
51. Limitations to Alternative Measure
• Proficiency cut scores vary slightly from grade to
grade
It’s harder to meet standard in some grades compared to
others (like having an easy teacher one year and a hard
teacher the next)
• No “vertical scale” to measure absolute growth
Smarter Balanced assessments will have a vertical scale and
cut scores that align with college/career readiness
For more details, see WERA Educational Journal, Winter 2014
article, “Using SGPs to Measure Student Growth: Context,
Characteristics, and Cautions” www.wera-web.org
ESHB 2261 – SBE must develop an Accountability Index to identify schools/districts for recognition and additional state support
It took two years of detailed conversation with data experts and a wide range of stakeholders to come up with this system, which was easier to understand and more valid than the federal accountability system (NCLB , AYP).5 outcomes across the top, 4 indicators down the left-hand side(Averages computed for each row and column)Simple average of the 20 “inner” cells is the index number (bottom right corner)Elementary/middle schools have 16 cells (vs 37 cells)HS and district have 20 cells (vs 45 and 119 cells)
Not really “peers” in a strict sense of the term – look at percentages of students in certain categories.Multiple regression to determine a “predicted level” of achievement: Positive scores are “beating the odds” Negative scores are underperforming
This explains how the Achievement vs Peers indicator works using the elementary math index results from 2007You’re familiar with scatterplots and how student performance declines as the level of poverty increases. The heavy black trend line is the predicted Learning Index level for schools with that level of low-income students.School A and B have about the same Learning Index (about 2.5). However, one has >85% FRL and other has < 25% FRL. The distance to the heavy black line is their “score” when adjusting for socioeconomic status. School A is almost .4 above the line and would be given a 7 for its rating compared to its peers, while School B is almost .4 below the line and would be given a 1 for its rating compared to its peers.Let’s assume this scatterplot represents the results when adjusting for all 4 variables, not just one. The thick dotted trend lines reflect the cutpoints for the highest and lowest ratings (-.20 to +.20). All schools above the upper line would be rated a 7, all schools below the bottom dotted red line are more than .20 points below the predicted level and would get a 1. The other dotted lines are the other cutpoints.
4. Since districts do not have access to student-level results statewide, they cannot compute SGP results on their own. The state must compute and report the results. OSPI published 2013 student, school, and district SGP results in December (> 3 months after school began).